OCCURS Table Row Filtering

OCCURS tables are secondary tables generated by the Databridge Client when OCCURS clauses for items (or GROUPs) are not flattened. This is the default behavior of the Databridge Client. It involves creating a separate row in these tables for each occurrence of the item (or GROUP) with the keys of the primary table record duplicated and an additional column named index1, which contains the occurrence number (starting at 1), added to them. In the case of nested OCCURS clauses you end up with two tables, the first of which could be suppressed when you have nothing but keys in it (i.e. you have a GROUP within an OCCURS clause that contains only a GROUP, which also has an OCCURS clause). In the case of nested OCCURS clauses the second table has two columns named index1 and index2 added. These columns hold the occurrence numbers of the corresponding items (or GROUPS) within the OCCURS clauses.

Not all of the rows in such tables contain meaningful data, for this reason it is sometimes desirable to discard the ones with meaningless data. There are several advantages to doing this:

It saves storage, as these secondary tables are quite expensive, particularly when the item with the OCCURS clause is a single item.
The users of the database do not have to discard unwanted data when they fetch data from the secondary table.
The number of updates is significantly reduced, resulting in better performance. This can further be improved by setting the optimize_updates parameter to true. This parameter only applies updates to rows that are actually changed. This avoids doing redundant updates, and can thus greatly improve performance. The process of discarding rows that do not contain meaningful data is done by defining a set of filters for such tables that describe the conditions under which the rows should be discarded. This requires having access the before and after images for updates, as a change in the data can affect whether the row is to be filtered or not. Since we already have the before and after images when doing filtering, enabling optimize_updates does not add any additional overhead, other than the comparison of the before image and after image data to determine if anything changed, which is a lot quicker than executing a redundant update (that is SQL that does not change anything).

Filter Source File

The implementation of row filtering for secondary tables derived from items with OCCURS clause does not involve changing any configuration file parameters. All you need to do is to create a text file that specifies the filtering conditions for all such tables that need to be filtered. We refer to this text file as the filter source file. This file normally resides in the config sub-directory of the data source's working directory.The filter source file, which is formatted in a somewhat similar manner to the row filtering sections of GenFormat, defines the filters for the various secondary tables using SQL-like statements. This file is then be compiled using a utility called makefilter, which is included in the Client files. The makefilter utility checks the syntax of the filter source file and validates all the specified table and column names. It then creates a binary file named "dbfilter.cfg" in the config sub-directory of the Client's working directory. This file then gets loaded and bound to the corresponding data tables and data items at the start of a Client process or clone command. The Client looks for the file "dbfilter.cfg" and loads it when it is present. The binding process replaces column numbers by pointers to the structures that hold the corresponding DATAITEMS control table entries. The Client uses a general purpose filtering procedure that interprets the filter pseudo code using the DMSII data buffer for the update and returns a result that indicates whether or not the row should be discarded. The Client can thus determine whether or not to insert (load in the case of data extraction) or update a row in the table. In the case of a delete we do not bother with filtering, we simply delete all rows that have the keys of the parent table record (i.e. for all values of index1). To make the Client run efficiently, we made it use host variables to do these sort of operations, which we refer to as DELETE_ALL operations (when using stored procedure we use the z_tablename stored procedure for this purpose). This means that besides INSERT, DELETE and UPDATE statements we also have compound DELETE statements for OCCURS tables (i.e. delete from tabname where key1=val1 and ... keyn=valn; without specifying a value for index1).

The Filter File

The filter source file, which is modeled after the row filtering in GenFormat, uses a syntax that defines the conditions when a row is to be discarded, rather than when it is to be selected. The statements are free format and can extend over multiple lines, but they must be terminated by a semicolon. You can add comments using "// ...", which makes the scanner stop scanning the image before the slashes.

By using delete statements instead of select statements we make the "where" clause define the conditions under which a row is filtered out rather than selected. The reason for doing this, is that it is easier to follow (no need to use De Morgan's law). An example of a filter file source follows.

Sample Filter File
delete from customer_hold_information where hold_type = 0 or hold_type = 4;
delete from customer_account_abbr where account_abbr = " ";
delete from meter_readings where amount_read = NULL;

The makefilter program converts these filters into a list of tokens that contain all the required information for processing them using the general purpose filtering procedure that acts like a VM that executes the filter pseudo-code.

Any table that is not specified in the filter file will have no filter and will be treated normally. Filtering is limited to secondary tables derived from items with OCCURS clauses (a.k.a. OCCURS tables). We allow the testing for NULL by using "column_name= NULL" or "column_name!= NULL" (or "column_name <> NULL"), which is not proper SQL. If the item is ALPHA the fact that NULL is not in quotes is enough to distinguish it from the value "NULL". Unlike relational databases, NULL in DMSII is an actual value (typically all high values for numeric items and all low values for ALPHA items). All constants are stored in the data area of the filter using their corresponding DMSII representations. Character constants are automatically space padded. Numeric constants have leading zeroes added.

The 3 types of tokens involved in these expressions are variables (i.e. column names), constants and operators. Constants consist of a data type (that matches the first operand's type, which must be a column name), an offset into the filter's data area (the length is the same as that of the associated column name). We do not need the declared length, as all comparisons work at the character or digit level (we already do this when testing for NULL). Operators also include and end-of-statement indicator which corresponds to the semicolon in the pseudo-SQL statements in the filter source file. All comparisons must start with a column name and the second operand must be a constant or the word "null". Comparing two columns as a condition for filtering is not allowed. All object names are case sensitive and must be typed in lower-case, keywords and the word NULL are not case sensitive. String constants must be enclosed in double quotes (the use of single quotes is not currently supported).

Important

String constants must be enclosed in double quotes (the use of single quotes is not currently supported).

In the case of a DMSII structural reorganization the filters must be recompiled if any of the data sets that have filters for secondary tables are affected by the reorganization. The Client automatically takes care of this by initiating a recompile of the filter at the end of define and redefine commands or an Administrative Console Configure command run, when there is filter file present in the config directory.

The changes to the Client itself are pretty straightforward and involve using the filter routine on the image to determine whether it gets discarded or not. The Client handles the situation where an item, that was not stored, needs to be stored after an update (in this case the Client does an INSERT). Similarly, it handles the situation where an item, that was being stored, needs to be discarded after an update (in this case the Client does a DELETE). The remaining cases are handled normally, if the item was discarded and still needs to be discarded, we do nothing. And if the item was stored and still needs to be stored we update it, unless optimize_updates is True, in which case we skip the update if the values of all the columns are unchanged.

The following table summarizes the supported operators and their relative precedence.

Level	Operators
1	=,>,<,>=,<=,!= (or <>)
2	AND
3	OR

The use of parentheses is allowed, but usually not usually necessary. There is no limit to the number of items that can be specified in the where clause, other than the actual number of data items that are not keys contained in the table.

The use of DMSII items whose data type is REAL are restricted to tests for NULL and 0 in filters. Items that are not nullable in DMSII cannot be tested for NULL. When using items whose data type is BOOLEAN you must use 0 or 1 in the constants (the use of TRUE and FALSE is currently not supported).

The makefilter program has two commands, import and display. The import command compiles the filter source file, which can be specified using the -f option, to create the binary filter file dbfilter.cfg. If no filter file is specified the command tries to use the file dbfilter.txt in the config subdirectory of the data source's working directory. The display command produces a report that describes the content of the binary filter file. All makefilter log output is written to the file "prefix_flt_yyyymmdd[_hhmmss].log" keeping it separate from the Client log files.