A Universal Database Definition

Architecture

Every data system consists of a storage capability (data source) and an address mechanism (key) to access a location where data can be stored and retrieved.  The implementation details vary, but this principle applies to diverse approaches such as file servers, databases, and more recently, cloud computing.  Ideally, in an OO environment, the implementation details are obscured, and only these essentials are exposed at the interface.  My specific objective is to migrate from another data storage mechanism, so this consistent, uniform interface is essential.
Based on this approach, PL2SQL's interface is essentially this single method:
  $object = __PACKAGE__->sqlobject( $datasource, $id ) ;
Changes to the object are automatically written to the database when the object is destroyed.
One of my favorite features of Perl is the ability to override the assignment (=) operator using the TIE function.  In PL2SQL, all of the data storage capabilities are implemented using this override.  Basically, a RHS (right hand side) assignment initiates a SELECT SQL request, and a LHS assignment initiates an INSERT or UPDATE request.
In this implementation, the SQL instructions are batched at the beginning and end of the object's lifecycle, and the assignment override initiates a more complicated caching mechanism.  However, the effect is the same: Every LHS assignment commits a change to the database.  The interface consists of a couple additional methods to override this behavior: sqlclone creates an untied copy of the object.  And sqlrollback sets a flag that disables the SQL writes on destruction.
The source code contains definitions for the following packages.  Based on the interface, these packages are mostly private from a visibility standpoint.  None should be directly instantiated:
 
  • TQIS::SQLObject
  • DBI::db
  • TQIS::mysql::table
  • TQIS::mysql::node
  • TQIS::mysql::object
  • pl2sql
  • perldata
TQIS::SQLObject
SQLObject should be defined as a base class of any stored object so its methods are inherited:
push @__PACKAGE__::ISA, qw( TQIS::SQLObject ) ;
$o = __PACKAGE__->sqlobject( $datasource, $id ) ;
Inside the database table, a pl object spans a cluster of records that share a single key based on an ID assigned by the caller when insertobject is invoked.  This module does not ensure that the ID's are unique. (A duplicate ID will indeterminately either clobber an existing object or be dropped.) Currently, any class extending TQIS::SQLObject is responsible for externally associating the ID's and ensuring each is unique.  Since this design still needs to accommodate duplicate ID's assigned by different classes, the data table is keyed on the numeric ID combined with the class name of the stored object.  With the above technique, the class name is automatically passed as the first argument.
DBI::db
I added a few methods to this universal interface, mostly for personal convenience.  However, DBI::db->table( $table ) is an essential component to the interface.
TQIS::mysql::table
One of PL2SQL's advantages is that object data is consolidated within a single database table.  Consequently, the table name must be part of the data source definition.  The datasource is instantiated indirectly using the DBI::db->table method.
New objects are marshalled using this object:
  DBI::db->table( $table )->insertobject( $key, $object ) ;
This object is also the datasource argument to the sqlobject and sql2pl methods in TQIS::SQLObject.
TQIS::mysql::node
node objects correlate directly with the table records.  Each node's sql member is an associative array that matches the data structure of a table record.  The sql method generates the actual SQL message as a string, and optionally transmits the message to the database server. node objects are created and used for write operations.
These objects also have an xml member that is an internal pointer to an xml structure.  The node object enforces the correspondence of each xml node to table record. pl2xml creates some redundant nodes that wrap other container nodes (presumably so that all sibling nodes have the same tag type). TQIS::mysql::table->combine goes through the entire set and combines these with their children.
TQIS::mysql::object
The object class contains the TIE methods that override the assignment operators used by pl object members.  These objects are created when a pl object is read from the database and the pl object members are assembled from their tied references.  Every pl object member is either a scalar, a reference to a container (hash or array), or a scalar reference.  PL2SQL represents each of these as table records, and also supports references to other members.
A new object is instantiated for every LHS assignment (including unshift, push, and splice). If the assignment is an object or hash/array set, that object or set is wrapped inside a single object. At the end of the lifecycle, objects are converted into node objects: A single node object is sufficient to represent a scalar or existing container.  Otherwise, multiple node objects are generated for wrapped containers and their elements.
pl2sql
The module name, and hence this package name, acknowledges pl2xml, which, for descriptiveness, is extremely well named. pl2xml also forms the basis for this design.
This package contains a single public method.  Based on convention, this method would be named new if it returned a single object.  However, as the name implies, the method takes a single object input and returns a set of SQL statements as node objects.  Currently, I've settled on the name pl2sql::factory. Better suggestions are welcome.
pl2sql::factory, while public, should be called indirectly from TQIS::mysql::table->insertobject instead.
While I owe a debt to pl2xml, XML::Dumper::pl2xml returns a string output.  I'm using one that I rolled myself for the time being.
perldata
pl2xml uses a limited number of tag types that either define scalar data (scalar, scalarref, or item) or a container (hashref or arrayref). PL2SQL mirrors this tag name in the reftype table field. pl2xml marshalls a pl object as one of the container types, and then wraps everything inside a perldata tag.  Although redundant, this node is saved as a data table record. SQLObject relies on the fact that, for a given ID and class name, the perldata value should point to a unique record and always refer to the top node.
Each TQIS::mysql::object element contains a perldata member to access the table records, and is used to traverse child or sibling nodes.  The perldata functions rely on this data set for their operations.
Table Fields
The universal table structure is shown as part of the Interface. Here is some descriptive detail.  As described above, each record represents a data node that essentially corresponds to XML output.  Nodes are either data nodes or container nodes. scalarref members are either data nodes or container nodes that reference another shared data node.
 
id
This autoincremented value is referred to as the record number.
objectid
This ID needs to be unique for an object, but is common to all records that comprise an object.
objecttype
The class name of the object is combined with the objectid as the unique key that identifies a stored object.  In the top node, the objecttype equals the blesstype.
stringrepr
Data nodes only. The stringrepr value is never used.  It contains a copy of the data, and may be truncated.
intdata
Data nodes only. For compatibility, PL2SQL preserves the primitive type of simple values.  If intdata is non-null, the original member was an integer.  For convenience, a string equivalent is stored in stringrepr.
doubledata
Data nodes only. Like intdata, doubledata is intended to preserve the original member's primitive data type.
stringdata
Data nodes only. stringdata is a 512 varchar.  Longer strings are split among multiple records.
textkey
Data nodes only. Hash table elements are treated as NVP's (name value pairs) within a single record.  Implentations may vary, but as currently defined, textkey stores the name as a 40 varchar
intkey
Data nodes only. intkey maintains the sequence order of array elements.  These values are determined automatically.
blesstype
When an object is saved, its class name is represented as the objecttype. Object members can also be objects, whose class names are represented as blesstype. blesstype is only used for container nodes and scalar references.
reftype
reftype reflects the original tag value assigned by pl2xml, to one of the following:
  • arrayref
  • hashref
  • item
  • scalar
  • scalarref
  • string
Naturally, arrayref and hashref values represent container node records.

item
represents data nodes.  Similarly, scalar is used by pl2xml to distinguish objects that are simple scalars.(The interface documentation will probably recommend using a blessed scalar reference instead.) In either case the object is contained within a single data node.

If a scalarref reftype has a refto value, it is a container node, or data node otherwise.

The string value is not derived from pl2xml. It designates the additional records used to store segments of long strings.  The value in the first segment record reflects the original pl2xml tag type.
item
item is one of 3 values assigned record numbers as pointers to other records.  Whether a container is a hash (associative array) or array, the contained elements are a set of records connected as a linked list, connected by the item value pointers.  The item value is a sibling pointer.
refto
Container nodes only. The refto value contains a pointer to a child, which is the first element of a linked list set.  A scalar reference to another object member is also considered a container, with a pointer to the referral target.
chainedstring
Data nodes only. Long strings are split into 512 character segments to accomodate the length of the varchar stringdata fields.  These segments are connected as a linked list.  The chainedstring values are the connecting pointers.
defined
Data nodes only. defined is a flag that distinguishes zero length strings from undefined values.
deleted
When a container is deleted, its elements' records are deleted via SQL. (nodes are considered deleted if they are orphaned after reassignment.) However, deleted records in a set are merely flagged using the deleted value to avoid rebuilding the linked lists.  If a deleted data node is the referral target of a scalar reference (eg has a ref count), it will also be flagged instead in order to preserve its value.
Comments Leave a comment
  • Release changes
    Nov 3 2012
    TQIS Jim
    0 Replies
  • SQL in TQIS::mysql::Node
    Oct 8 2012
    TQIS Jim
    0 Replies