Partitioned Files (ISAM for C)

Partitioned Files

The FairCom Database Engine supports a unique feature known as Partitioned Files. A partitioned file logically appears to be one file (or more accurately one data file and its associated index files), but is actually a set of files whose contents are partitioned by the value of the partition key. Both the data files and index files are partitioned. This permits data with a defined range of values for the partition key to be rapidly purged or archived (instead of having to delete record-by-record each record within this range).

Use partitioned files to:

  • Maintain data in separate c-tree data/index files, while enjoying access from a single host file.
  • Rapidly purge or archive individual member files.
  • Partitioned files are easily implemented at the ISAM and FairCom DB API development levels with a single additional API call.
  • Easily partition an existing linked file with SQL.
  • Use all standard FairCom DB data searches on the entire file or directly on a member file.

 

Implementation

Implementing partitioned files requires a FairCom DB library with partition capabilities and Extended files with the ctPARTAUTO extended file mode. To implement partitioned files:

  1. Activate the partitioned files capabilities at compile time with the ctPARTITION define. This define is on by default, provided the ctHUGEFILE, RESOURCE, CTS_ISAM, and ctCONDIDX defines are in place. Check ctree.mak/ctoptn.h and ctopt2.h for the current settings.
  2. To create a partitioned file (using the default partition naming, partition rule, and maximum number of partitions), simply create an Extended file with ctPARTAUTO in the x8mode parameter of the Xtd8 extended file creation block. Partitioned files do not have to be HUGE unless the logical file size will exceed the 2GB/4GB limit, but they do require the extended header, i.e., they are Extended files.

 

Partition Naming

The partition file name is the base file name with the 3-digit raw partition number as the file extension. Only automatic naming is available at this time.

 

Maximum Partition Number vs File Size

By default, 16 bits of the 64-bit record offset are used to reference the raw partition number, allowing each partitioned file to support up to 65535 member files over its lifetime. This can be adjusted at create time using the callparm parameter of the extended file creation block, where a value of 0 defaults to 16 bits, values less than 4 bits default to 4 bits (maximum 15 member files), and 32 bits is the maximum value (4,294,967,295 member files). The number of bits determines the total number of raw partitions for the entire life of the host file. This is not the number of partitions active at one time. Raw partitions are not reassigned.

 

Rules

In V11 and later, user-defined conditional expressions can be used for partition rules, as described in the next section.

Legacy Versions

The partition key can be set when the file is created using the prtkey parameter of the extended file creation block. This value defaults to 0, indicating the first key associated with the data file. Set this value to the relative key number for the desired index if the default is not appropriate for your application.

The default rule, developed for testing purposes only, uses a simple algorithm to divide added records into partitions based on the first byte of the selected key as a test of the partition capability. See the function kprawno() in ctpart.c for the sample algorithm.

Cautions and Restrictions

See "Partition Ordering and Range Query" in Raw Partition Numbers for information on number generating rules.

 

User-Defined Partitioned File Conditional Expressions

In FairCom DB V11 and later, user-defined conditional expressions can be used for partition rules.

 

User-Defined Conditional Expressions for Easy Partitioned File Creation

Partition files are easily created with user-defined conditional expressions. Partitioned tables are fully supported via both SQL and FairCom DB ISAM applications.

A partitioned file logically appears to be one data file (and its associated index files). It is actually a set of files whose contents are physically partitioned by the value of a partition key.

Partitioned files are intended for applications requiring fast purging and archiving of large amounts of data at once. As both data and index files are partitioned, this permits data within a defined range of partition key values to be rapidly purged or archived (rather than deleting each record within this range).

Consider rapidly acquired log data. Frequently, this data has a lifespan of days to months. After this time it is usually purged or archived for permanent storage. This can take time with large amounts of new data that have since been acquired. By avoiding lengthy key searches for expired data over the entire data set, it is much better performing to have all related data in a single silo and operate on it with a single operation. Partitioned files give you this powerful ability.

For background information regarding partitioned files, refer to Partitioned Files in the FairCom DB Programmer's Reference and Function Reference Guide.

 

Conditional Expressions and Partition Rules

Conditional expressions make it easy to build a partition rule. A partition rule is a conditional expression ultimately evaluating to a number. That number associates data with a specific partition within the file.

FairCom DB expression parsing can now evaluate an expression into a numeric representation to be used for the partition rule. Many built-in conditional expression functions exist for flexible rule generation. Time and date based functions, numeric functions, and string manipulation functions are all available. FairCom DB expression syntax can even reference complex functions via an external shared library (DLL) in calculating partition numbers.

If the table has an embedded DODA resource and a conditional expression references schema segments, an expression can refer to index key fields explicitly by DODA name. Together with field names, complex expression rules can be crafted.

Prior to FairCom DB V11, partitioned files required these rules to be hard-coded at compile time (specified in ctpart.c) and did not allow run-time flexibility in adapting rules for unique deployed environments.

By combining existing FairCom DB advanced support for conditional expressions with our high performing partitioned files, applications can now create exact expression rules directly when creating partitioned files. In addition, existing partitioned files can have their partition rule changed and subsequently rebuilt based on a new rule logic allowing applications to grow and adapt with their customer needs.

Partitioned files are supported directly from the FairCom DB ISAM API, FairCom DB API and FairCom DB SQL.

Performance

Additional processing of advanced expression parsing can potentially impact performance to a small degree. Should performance become an issue, you might consider an external DLL to efficiently evaluate partition rules directly.

 

Partitioned Files in FairCom DB SQL

In FairCom DB SQL, a table is partitioned when a partition index is created. To create a partition index with conditional expressions for partitioning, use the following syntax:

CTREATE INDEX .... STORAGE_ATTRIBUTES 'partition=<expression>'

To create a partition index forcing hard-coded rules (pre-V11), the following is supported:

CTREATE INDEX .... STORAGE_ATTRIBUTES 'partition'

To change a partition rule on an existing partitioned file, call ALTER INDEX with your new rule in the STORAGE_ATTRIBUTES clause.

Example

This script demonstrates partition file rules in SQL. It creates a table, prtest, with an index on the integer field f7. The storage_attributes clause define a partition rule in which each record is stored in a partition number equal to the value of field f7 plus 1:

create table prtest (f1 integer, f2 char(50), f3 char (50), f4 char(50), f5 timestamp, f6 varchar(50), f7 integer, f8 time);
create index pridx on prtest (f7) storage_attributes 'partition=f7+1';

With a rich assortment of conditional expression functions available, much more complex rules can be created.

For example, a table containing a field "invoice_date" and requiring monthly partitions can be created with this simple expression:

month(invoice_date)

For syntax details, refer to Conditional Expression Parser topics in the FairCom DB Programmer's Reference and Function Reference Guide.

Example

Using functions to convert Unix time_t fields to c-tree Date and Time types (TIMET2CTDATE) in a partitioned file expression to partition into months since Jan, 2010:


CREATE TABLE unixtest (name CHAR(10), u_date INTEGER)
    or
CREATE TABLE unixtest (name CHAR(10), u_date BIGINT)

CREATE INDEX unixtest_date_idx ON unixtest (u_date) STORAGE_ATTRIBUTES 'partition=( ( YEAR( TIMET2CTDATE( u_date) ) -2010) * 12) + MONTH ( TIMET2CTDATE(u_date))'

    Date String   Unix Date Partition created
    Mon, 23 Feb 2015 11:01:32 GMT 1424689292 62
    Sat, 23 Jan 2016 11:01:32 GMT 1453546892 73
    Tue, 23 Feb 2016 11:01:32 GMT 1456225292 74
    Wed, 23 Mar 2016 11:01:32 GMT 1458730892 75

For syntax details, for the TIMET2* functions, see C Language Equivalents.

 

FairCom DB API Partition File API Support

Partitioned file support is extended to the FairCom DB API API. While creating a file it is possible to call ctdbSetTablePartitionIndexNbr to set the partition index, ctdbSetTablePartitionNumberBits to set the number of bits reserved for partition numbers, and ctdbSetTablePartitionRule to set the partition rule.

On existing tables, after calling the above function you then call ctdbAlterTable forcing an CTDB_ALTER_FULL action.

ctdbSetTablePartitionRule

This FairCom DB API function sets partition rules:

ctdbEXPORT CTDBRET ctdbDECL ctdbSetTablePartitionRule(CTHANDLE Handle, pTEXT expr);
  • expr - The expression, expr, will be evaluated against the key for the partition index. It must evaluate to an integer.

The partition rule uses standard c-tree expression syntax.

 

FairCom DB ISAM Usage

Partitioned files are created with standard c-tree APIs. A partitioning conditional expression is defined with the PTADMIN Partition Administration API call and the ptADMINrule parameter. Specify an extended (Xtd8) ctPARTAUTO mode in the x8mode parameter of an extended file creation block. While partitioned files require an extended (Xtd8) header they do not have to be HUGE files unless the logical file size will exceed the 2GB/4GB limit.

A partition key is set when the file is created using the prtkey parameter of the extended file creation block. The default is 0 (the first key associated with the data file). Set this value to the relative key number for the desired index if the default is not appropriate for your application.

A partition file name is the base data file name with a 3-digit raw partition number as the file extension. Only automatic naming is available in this mode.

Note: Alternative file naming is possible with custom modifications to the ctpart.c module and recompiling your FairCom Database Engine.

By default, 16 bits of the 64-bit record offset reference the raw partition number, allowing each partitioned file to support up to 65,535 member files over its lifetime, thus also somewhat limiting overall file size (i.e., the more bits used for partition numbering, the smaller the overall size of the file can be). The maximum value is 32-bits (4,294,967,295 member files). This numbering vs. size tradeoff is set at file create time using the callparm parameter of the extended file creation block (Xtd8), where a value of 0 defaults to 16-bits. Values less than 4-bits default to 4-bits (maximum 15 member files).

Note: Default partition naming, partition rules, and maximum number of partitions are used by default when not defined by the Xtd8 extended parameter block and the PTADMIN API call.

 

Operation

For now, we make two major assumptions:

  1. The FairCom Database Engine assigns data records to a partition by applying the data file’s partition rule to the partition key; and
  2. Partitions are assigned in increasing order of the partition key values. That is, if KeyValue2 > KeyValue1, then the partition assigned to KeyValue2 will be the same as or after the partition assigned to KeyValue1.

Neither of these two assumptions is absolutely critical, but the second assumption does permit much more efficient key searches when the relationship between key values and partitions is well ordered.

Once the host file is created, the operation of partitioned files should be invisible to the application. The functions that add, update, and delete records are the same for both partitioned and non-partitioned files. However, functions requiring a record offset must use the ctSETHGH() and ctGETHGH() functions, even if the partitioned files are not HUGE to ensure the high-order bytes are included, as described below.

 

Raw Partition Numbers

The raw partition numbers must be 1 or greater. When passing a file position that includes a partition number to a routine, the partition number is encoded in the high-order bits of the high-order word. Ordinarily, the application will only get such information from a call to CurrentFileOffset() followed by a call to ctGETHGH().

Partition numbers are stored in the higher-order bytes of the 64-bit record offset. This allows the ISAM API calls to remain unchanged. Simply change the parameters of your file creation call, and your application is ready to use partitioned files. For this reason, functions requiring a record offset must use the ctSETHGH() and ctGETHGH() functions, even if the partitioned files are not HUGE to ensure these high-order bytes are included.

Partition Ordering and Range Query

Partitions are assigned in increasing order of the partition key values. That is, if KeyValue2 > KeyValue1, then the partition assigned to KeyValue2 will be the same as or after the partition assigned to KeyValue1.

We allow any user-defined expression that evaluates to a numeric value to be used as a partition rule. However, our partition search logic requires that a partition rule assigns partitions in increasing order of the partition key values. That is, the partition function is required to be a monotonically increasing function: for any two partition key values A and B, if A > B then the partition rule must output values p(A) and p(B) such that p(A) > p(B).

We don’t currently check that a user-defined partition rule meets the monotonically increasing property. If a rule is supplied that doesn't have this property, partition queries will return incorrect results such as not finding key values that exist in the table.One example of a function that does not meet this requirement is partitionRule = (partitionKeyValue MOD 12). Note that the values of this function increase then decrease again rather than always increasing as the partition key value increases.

It is up to the developer to be aware of this requirement and to only use partition rules that meet this requirement.

Partition Number Base

Use the PartitionAdmin() function to increase or decrease the lowest permitted partition number, called the “base” partition number. The system enforces an absolute lowest value for the base of one (1), but PartitionAdmin() can be used to change the base as long as it is one or greater. However, when changing this base value, PartitionAdmin() ensures no inconsistencies will arise. For example, one cannot increase the base value if it would eliminate any active or archived partitions (however it can eliminate purged partitions).

 

Unique Keys

Unique indexes are managed using the host index files to maintain values across all partitions for unique non-partition keys. The values in the host index use their partition number (plus 1) as their associated values. The partition index contains the key value with the associated record position within the data partition. This permits the partitioned files to be treated as a self-contained set of ISAM files, and still enforces unique keys across all partitions. Of course, there is a performance penalty associated with indexing the non-partition unique keys twice.

A duplicate key flag value, 2, permits unique keys within a partition, but no check is made for global uniqueness across all partitions.

 

Serial Segments (SRLSEG)

SRLSEG is managed by using the data file host header to maintain the serial numbers used by key segments across all partitions. SRLSEG key segments can be used to make a key value unique, and this still works with partitioned files.

FairCom DB V11.0 and later supports disabling/enabling SRLSEG with a PUTIFIL() call and during compact and rebuild when the updateIFIL option is specified. A file that contains an extended header now supports turning the SRLSEG or SCHSRL segment mode on or off by:

  1. A call to PUTIFIL().
    or
  2. A call to the rebuild and compact function with the updateIFIL option specified in the IFIL's tfilno field.

Note: A call to rebuild or compact will fail with error IAIX_ERR (608) if the key segment definitions specified by the caller change the serial number attributes (either turning serial number on or off or changing its location in the record) when the updateIFIL option is not specified in the tfilno field of the IFIL structure.

 

Transaction Processing

A partitioned file that supports transaction processing must be a transaction-dependent (TRANDEP) file. This is automatically enforced.

 

Set maximum active partitions for auto-purge feature

FairCom Database Engine supports setting a maximum number of active partitions on a partitioned file. When a new partition is created, if the new number of active partitions exceeds the limit, the oldest partitions are purged.

This feature is well-suited for a time-based or incrementing sequence number partition key. In this situation, the partition number matches the time order of the partition creation, and so the automatic purge provides a way to keep the N most recent partitions.

Support for setting the maximum number of partitions has been implemented in the FairCom Low-Level API function PUTHDR(). This feature is also available through the FairCom DB API, C++, and REST APIs.

To set this value: After creating the partition host, call PUTHDR(partition_host_data_file_number, max_partition_members, ctMAXPARTMBRhdr).

A value of zero for max_partition_members is the default, and means no maximum number of partitions.

Note that c-tree itself supports up to 65535 active partitions per partition host, so this PUTHDR() call fails with error PBAD_ERR if a max_partition_members value larger than 65535 is specified.

Typical Scenario

The following example clarifies how the partition purge behaves:

  1. Create the partition host and set maximum partition members to 5 for the partition host file.
  2. Add records whose partition key cause partitions 1, 2, 3, 4, and 5 to be created.
  3. Add a record whose partition key causes partition 6 to be created. Partition 1 is automatically purged, leaving active partitions 2, 3, 4, 5, and 6.
  4. Add a record whose partition key causes partition 7 to be created. Partition 2 is purged, leaving active partitions 3, 4, 5, 6, and 7.
  5. Add a record whose partition key causes partition 9 to be created. Partitions 3 and 4 are purged, leaving active partitions 5, 6, 7, and 9.

Note: The number of active partitions is calculated based on the highest-numbered active partition. When a new highest-numbered partition N comes into existence, only that partition and max_partition_members - 1 consecutive partitions preceding that partition are kept.

Limitations:

  1. If a partition that is to be purged has been updated by a transaction that is still active, the purge of that partition is skipped. Although the partition is not purged at that time, it can be purged later by the creation of a new partition after the transaction that last updated that partition has committed.
  2. Because the purging is done asynchronously, by a separate thread, it is possible that the number of partitions might exceed the specified limit. This can happen if the rate of creating partitions is faster than the rate of purging partitions.

API Changes:

The PUTHDR() function now accepts a mode of ctMAXPARTMBRhdr, and the specified value sets the maximum number of partitions on a partitioned host file.

The ctFILBLK() function now accepts a new mode bit, ctFBfailIfTranUpdated. When this mode bit is used, if a file that is being blocked has been updated in a transaction that is still active, instead of aborting that transaction the file block fails with error code 1136 (FBTU_ERR): "File block failed because the file has been updated in an active transaction and the caller requested that the file block should fail in this situation instead of aborting the transaction."

 

Encryption

Encryption is supported with partitioned files.

Notice that there is an important complication with archived partitions. Encryption involves creating a random key for each individual physical file. That data encryption key is in turn encrypted with the master key. This is why a single master key can decrypt an entire database and each file is still protected by a unique "private" key. Partitioned files are collections of many physical files. Imagine you archive a set of those files and physically remove them, then change (rotate) your master key as you should over time. If you bring the archived partitions back online at a future time, you can no longer decrypt their contents because the original master key is no longer available.

 

Partitioned File Security - File password support

File passwords are supported for partitioned files. A partition created on-the-fly is assigned the same security information as the host file.

 

Partition Administration Function

The Partition Administration function, PartitionAdmin(), allows on-the-fly adjustment to the partitions associated with a given host file. This includes the capability to:

  • Add, remove, or archive partition(s)
  • Modify lower limit of the raw partition number
  • Modify limit on the number of partitions
  • Reuse the raw partition number of a purged member
  • Activate archived member(s)
  • Return a member file status

For additional information on PartitionAdmin(), see PartitionAdmin.

 

Managing Partitions

Each FairCom DB data partition and index pair resides as an independent data set. Their purpose is for ease and speed of data management. They can easily be purged, archived, re-activated, and even rebuilt as individual data silos with a single operation. Administration is done through the FairCom DB PTADMIN API function.

SQL Administration

FairCom DB SQL includes built-in procedures to administer partitions directly from your SQL application. The fc_ptadmin_num() procedure provides access to many basic administration functions.

call  fc_ptadmin_num('admin',  'custmast',  'archive',  123)

To identify current lowest and highest active partition numbers, the built-in SQL procedure fc_get_partbounds() returns this information.

call  fc_get_partbounds('admin',  'custmast')