Record Schemas and DODA Resources

The c-tree File Handler is a record-oriented file manager. However, to support intelligent key value operations, and to support higher level systems applications such as report generation, user interfaces, and ODBC, c-tree provides support for individual field definitions.

The definition of each field comprising a record can be stored in the data file itself if the data file has not disabled RESOURCE support. The d-tree component of the FairCom ToolBox automatically stores the record schema in the data file, or you can do it yourself by calling PutDODA(). The advantage of storing the schema is that applications can use this information without any outside information. Further, c-tree can use field numbers to specify key segments if the schema is stored in the file, enabling new and more flexible key segment definitions.

This section discusses the structure of the Record Schema Resource, the DODA, how it can be used to define Key Segments, and how you can create this resource via low-level c‑tree function calls.


Field Types

c-tree recognizes the following field types:

  • Signed and unsigned char’s.
  • Signed and unsigned 2-byte/4-byte integers.
  • Single, double and extended precision floats.
  • Scaled BCD numbers.
  • Arbitrary fixed-length data.
  • Fixed- and variable-length strings delimited by a specified field delimiter.
  • Fixed- and variable-length strings whose actual contents are specified by 1-byte, 2-byte or 4-byte length counts which occur at the beginning of the field.

The field delimiter defaults to a null byte. However, the SetVariableBytes() function can be used to change the delimiter to a value other than the default.

Following the Pascal string storage convention, the value stored in the length count byte(s) corresponds to the length of the string and does not include the length count bytes themselves. However, the length count byte(s) should be included in the length specified in the DODA flen element. For example, a 10-byte string of type CT_F4STRING has a length of 14-bytes in the flen element of the DODA, however, the length stored in the byte count portion of the data record would be 10.

Caution: When using the CT_F2STRING and CT_F4STRING types, the byte count can be affected by the alignment of the compiler.

A fixed-length field with varying length contents means that the field occupies a fixed number of bytes in the record structure, but that a field delimiter or a length count specifies the actual amount of data in the field.

The constants used to represent these field types (found in ctport.h) are:

Symbolic Constant Interpretation
 
CT_BOOL 1-byte Boolean.
CT_CHAR Signed char.
CT_CHARU Unsigned char.
CT_INT2 Signed 2-byte integer.
CT_INT2U Unsigned 2-byte integer.
CT_INT4 Signed 4-byte integer.
CT_INT4U Unsigned 4-byte integer.
CT_INT8 Signed 8-byte integer.
CT_INT8U Unsigned 8-byte integer.
CT_MONEY Signed 4-byte integer interpreted as number of pennies up to a precision of 9.
CT_DATE Unsigned 4-byte integer interpreted as date.
CT_TIME Unsigned 4-byte integer interpreted as time.
CT_SFLOAT 4-byte floating point.
CT_DFLOAT 8-byte floating point.
CT_EFLOAT Extended precision floating point (not supported as a key segment).
CT_TIMES 8-byte floating point.
CT_ARRAY Arbitrary fixed length data.
CT_RESRVD Reserved for future use.
CT_FSTRING Fixed length field delimited data.
CT_FPSTRING Fixed length data with 1-byte length count.
CT_F2STRING Fixed length data with 2-byte length count.
CT_F4STRING Fixed length data with 4-byte length count.
CT_STRING Varying length field delimited data. (NUL terminated.)
CT_PSTRING Varying length data with 1-byte length count.
CT_2STRING Varying length data with 2-byte length count.
CT_4STRING Varying length data with 4-byte length count.
CT_UNICODE A variable-length field containing a UTF16 encoded, null terminated string.
CT_2UNICODE A variable-length field that begins with a 2-byte integer specifying the number of bytes in the following UTF16 encoded string.
CT_FUNICODE A fixed length field containing a UTF16 encoded, null terminated string.
CT_F2UNICODE A fixed length field that begins with a 2-byte integer specifying the number of bytes in the following UTF16 encoded string.
CT_UNIXTIME_T A 4 byte UNIX timestamp that maps to the SQL TIMESTAMP type
CT_UNIXTIME64_T An 8 byte UNIX timestamp that maps to the SQL TIMESTAMP type


 

Create the Record Schema with PutDODA

The definition of each of the fields comprising a record can be stored in the data file itself if the data file has not disabled RESOURCE support. The advantage of storing the schema is that applications can use this information without any outside information. Further, c-tree can use field numbers to specify key segments if the schema is stored in the file, enabling flexible key segment definitions.

An application can store the schema in a file directly by use of the PutDODA() function. A DODA is a data object definition array. Each element of the array is comprised of a structure of type DATOBJ which is typedefed in ctport.h. Only three of the first four fields of the DATOBJ are required for the PutDODA() call. DATOBJ is defined as follows:

typedef struct {

  pTEXT     fsymb;  /* ptr to symbol name            */

  pTEXT     fadr;   /* adr of field in record buffer */

  UCOUNT    ftype;  /* type indicator                */

  UCOUNT    flen;   /* field length in bytes         */

  ...

} DATOBJ;

  • fsymb points to a symbolic name for the field and should not be NULL.
  • fadr is not used by the c-tree PutDODA() call and its value is ignored.
  • ftype is one of the field types specified in the “Field Types” table.
  • flen is set to the field’s length in bytes for fixed length fields, or the known maximum for varying length fields with a known maximum length in bytes, or zero for varying length fields without known maximum. If the field type has an intrinsic length, which is true for types CT_CHAR through CT_DFLOAT, a zero length is automatically replaced by the intrinsic length.

Given a data record with structure:

struct {

  TEXT      zipcode[10]; /* Zip code           */

  LONG      ssn;         /* social security #  */

  TEXT      name[50];    /* name               */

} DATA_FORMAT;

The corresponding DODA would be defined as:

DATOBJ doda[] = {

    {"ZipCode", NULL, CT_FSTRING, 10},

    {"SocialSecurity", NULL, CT_INT4},

    {"Name", NULL, CT_STRING,50}

};

Note: zipcode is considered to take up a fixed space in the record, CT_FSTRING, while name takes up a variable amount of space, CT_STRING, up to a maximum of 50 bytes.

Notes when using a DATOBJ with the r-tree Report Generator

  • The r-tree Report Generator requires the first four fields of the DATOBJ structure to be defined and the second field of the DATOBJ structure, fadr, must be filled-in.
  • For the r-tree Report Generator report() function the DODA is terminated by an entry of the form:

{NULL,NULL,0,0,-1}

A DODA in this form can be used by PutDODA(), however, the terminating entry shown above is not counted as one of the fields unless accounted for in the last parameter of the PutDODA() function. See PutDODA for details.

 

Key Segments

ISAM level applications define key values by specifying where to find, and how to treat each data field, or part of a data field, comprising the key. Each part of a key is called a key segment.

Once you define a Record Schema for a file, you can use it to easily define key segments. See Key Segment Modes (Key Segment Modes, /doc/ctreeplus/30863.htm) for more information.

When key segment mode SCHSEG (12) or any of its variants (USCHSEG, VSCHSEG, or UVSCHSEG) are used, the interpretation of the data is based on the underlying type found in the schema. Based on the above DODA example, a triple of the form below implies the usage of the third field, social security number (the first field is schema field zero). Since it is an unsigned integer, it will be treated as such by the index routines.

segment position = 2

segment length   = 4

segment mode     = 12 (SCHSEG)

The use of VSCHSEG instead of SCHSEG is rather subtle. If a fixed-length field, such as the zipcode field in the above DODA example, has a field delimiter or a length count, specifying VSCHSEG indicates that if the actual data is shorter than the segment length, pad the contents out to the full segment length. The default padding is an ASCII space character, but the SetVariableBytes() function can be used to change the padding byte. If SCHSEG is used instead, then delimiters or length counts will not be used to determine how to pad the key segment. Whatever is in the fixed-length field will be used; even if it causes “garbage” to be used beyond the actual contents. Of course, if a fixed-length field is carefully prefilled or padded, then the use of SCHSEG will not cause any problem.

The segment mode value can be modified to permit the use of alternative collating sequences or to collate the key segment in descending order. Note that these modifications apply to the segment, not the entire key. OR-ing DSCSEG into the segment mode, (or adding 16 to the segment mode), indicates descending order. OR-ing ALTSEG into the segment mode, (or adding 32 to the segment mode), indicates the use of an alternative collating sequence. Both of these modifications can be used at the same time.

Alternative collating sequences are assigned to an index file with SetAlternateSequence().

 

Record Schema Internals

The Record Schema consists of a schema map header, followed by a number of schema field blocks, which are in turn followed by a number of schema name details. The structure of this information is provided here for your information. You will not have to create this information directly. It is prepared automatically by PutDODA().

Schema Map

A schema map is stored in the form of a map header defined by the ConvMap typedef:

typedef struct convmap {

  UTEXT     flavor;  /* 1-68000  2-8086  3-pdp               */

  UTEXT     align;   /* 1-byte 2-word 4-dbl word 8-quad word */

  UTEXT     flddelm; /* field delimiter byte                 */

  UTEXT     padding; /* field padding byte                   */

  VRLEN     maplen;  /* total length of map including header */

  VRLEN     nbrflds; /* number of fields in map              */

  VRLEN     nbrblks; /* number of field blocks (which may be

                      less than nbrflds due to repeat counts */

} ConvMap;

The map header is followed by a specified number of field blocks, nbrblks, defined by the ConvBlk typedef shown below:

typedef struct convblk {

  UCOUNT    len;    /* field length / or maximum length   */

  UTEXT     kind;   /* c-tree field type (see later) */

  UTEXT     repcnt; /* repeat count                       */

} ConvBlk;

A repcnt zero means no repeat fields; a repeat count of 1 means one more field with the same definition, etc.

Schema Name Details

Each field can be assigned a symbolic name. The names (which are not counted in the maplen parameter) associated with each field are stored consecutively in the form:

  • 1-byte length count (which includes length count and terminator)
  • n-byte name
  • 1-byte null terminator

Therefore, the symbolic name “FIRST” will be stored as:

7FIRST0

If no name is associated with a field, then a single byte of value 1 is stored. For example, if the first and third fields are named “FIRST” and “THIRD” respectively, and the second field has no name, the names will be stored as:

7FIRST017THIRD0

There is a single name for each field block. If there is a non-zero repeat count, only one name is stored. There is not a name stored for each repeated field.

Record Structure

While c-tree can support virtually any organization of data fields, there are some guidelines which promote maximum support by c-tree and FairCom’s other products. We strongly recommend following these guidelines unless there are compelling reasons not to do so.

The field types described in the next section can be grouped into fixed and varying length fields. For this grouping, the fixed refers to whether or not the space occupied by the field is fixed, not whether the contents are fixed.

When defining your record structure, the varying length fields will either have a known maximum length or be unlimited (within the bounds of the length count bytes if any). For example, you may have a varying length description field that is null terminated and guaranteed not to be longer than 1024 bytes. This is a varying length field with a known maximum. Conversely, you may have a 2-byte length count field that is varying in length, but there is no specified maximum length. This is a varying length field without a known maximum. Therefore, each field can be classified into one of three groups:

  • Fixed Length
  • Varying Length Max
  • Varying Length No Max

We recommend that fields be placed in the record in the order shown above.

Note: Both limited and unlimited variable-length fields require the application to properly pack and unpack the record structure. See the variable-length record discussion covered in Data and Index Files for additional information on building variable-length records.