Compression

File compression can be used on indexed files to save disk space. The Vision file system supports compression, but not all file systems do. Compression is enabled by specifying the WITH COMPRESSION phrase in the ASSIGN clause of a file's SELECT statement. Compression must be specified when the file is initially created to have any effect. However, the vutil-rebuild option allows you to apply or remove compression during the file rebuilding process. See Rebuilding Files for more information.

File compression uses a simple run-length compression scheme. This replaces runs of identical bytes with a shorter sequence. Files using compression may contain any type of data.

Some files will compress better than others. Generally speaking, files that contain text compress the best due to repeated space characters. Results can vary significantly, however. Experimentation is the best way to tell how much space may be saved.

Each compressed record usually retains some extra, unused space for future expansion. This is advisable especially if the records are frequently changed. You can specify via a compression factor how much of the space saved by compression should be retained to allow for future growth. When no compression factor is specified, WITH COMPRESSION uses the default compression factor (70). The following paragraphs explain how the factor is used.

A compression factor other than the default may be specified via the COMPRESSION CONTROL VALUE IS clause in the SELECT statement. The factor must be a numeric literal within the range zero (no compression) to 100 (maximum compression). A factor of one (1) causes Vision to examine the COMPRESS_FACTOR configuration variable. If COMPRESS_FACTOR is not set, the default compression factor is used (70).

For factors from two through 100, the factor is considered to be a percentage. It specifies how much of the space saved by compression is actually removed from the record. For example, suppose an 80-byte record is compressed to 30 bytes. Then the compression factor is used to determine how much of the 50 bytes of saved space is to be removed from the record. A compression factor of 70 would mean that 70% of the 50 bytes (35 bytes total) will be removed. This leaves 15 bytes for future expansion, and results in a compressed record size of 45 bytes (30 compressed size plus 15 extra for growth). The larger the compression factor, the more of the saved space is removed. A compression factor of 100 removes all saved space and is advisable only if the file is rarely updated.

An alternate way to specify the compression factor is to set the COMPRESS_FACTOR configuration variable. COMPRESS_FACTOR is used when the COMPRESSION CONTROL VALUE IS clause is either omitted or set to a value of one. See the entry for COMPRESS_FACTOR for more informaqtion. As noted earlier, the compression factor for a file is established when the file is created. Subsequent changes to COMPRESS_FACTOR do not affect existing files.

The selection of the compression factor should be based on the amount of updating that the file undergoes. If rewrites and deletes are rarely or never done on the file, then a high compression factor is most efficient. We recommend 100 for files that are rarely updated, 70 for average files, and 50 (or less) for files that are frequently updated.