Understanding Data Compression

Transformation Hub compression settings affect data in two general places, communication and storage. Specifically, this refers to data stored on disk, in Kafka topic partitions, and data that is in transit.

Data Consumers

There is no property that controls data compression on consumers. Consumers read metadata from each message, which indicates the correct decompression algorithm to use. Since this is evaluated on a message-by-message basis, the consumer's behavior does not depend on which topic it is consuming from. A single topic might contain messages which have been compressed with different compression algorithms (also referred to as compression types or codecs).

Data Storage (Data at Rest)

The algorithm used to compress stored data is determined by the topic configuration. All Transformation Hub topics, except th-arcsight-avro and mf-event-avro-enriched, currently use the default compression type, which is the same as that used by producer. This configuration choice means the topic will retain the original compression algorithm set by the producer. By leaving this as producer-defined, there is flexibility for the producer to send either compressed (using any supported codec) or uncompressed data.

The mf-event-avro-enriched topic is an exception because the database scheduler reads from this topic, but does not yet have support for reading messages encoded with the ZStandard (zstd) compression algorithm. Therefore, there is a specific, out-of-the-box value for this topic, to insure that the database scheduler can read it, no matter what over-the-wire compression was used.

Topic Compression Type Transformation Hub Version Support

All topics except th-arcsight-avro and mf-event-avro-enriched

producer (default) 3.4.0 and earlier (3.5.0 and earlier for mf-event-avro-enriched)
th-arcsight-avro gzip 3.4.0 and 3.3.0
th-arcsight-avro uncompressed 3.2.0 and earlier
mf-event-avro-enriched gzip 3.5.0

Configuring Compression

There are two places in the Kafka architecture where compression can be configured: the producer and the topic.

Compression Types

While Kafka supports a handful of compression types, Transformation Hub implements only two types: gzip and zstd.