Advanced distribution mode is a generic term for modes where DIH processes the incoming data to generate new data files for each child. Each data file contains only the documents that the child server must index.
Hash-Based Distribution Modes. Distribute documents to multiple child servers based on a checksum hash of the value in a specified field.
Value-Based Distribution Modes. Distribute documents to multiple child servers based on a specified value in a specified field.
In hash-based distribution modes, DIH calculates a hash of the document value for a specified field. It then divides the hash number by the number of child servers; the remainder value determines the server that it sends the document to.
There are two options for this method:
Distribute by reference distributes documents by the reference field. In this mode, DIH can also distribute DREREPLACE
index actions to the appropriate child server, according to the reference in the data.
Distribute by fields distributes documents by a specified field or fields. This mode does not process DREREPLACE
index data, so you can use this method when you use different fields to update targets and duplicates.
Advantages:
You can use all standard deduplication options, as long as all versions of the document have the same value in the distribution field.
Each child receives only the documents it must index, which reduces network load and I/O.
Disadvantages:
Changing the number of child servers alters the document-to-server mappings, so you cannot easily add or remove child servers.
Other limitations:
Generally, with this method you can only deduplicate by the value of a single field.
A DIH in a hash-based distribution mode should not be added under a DIH operating in simple distribution mode.
You should not tier DIHs that distribute on the hash of the same field.
If you add or remove child servers, you must export and reimport all data into a clean system through the updated DIH.
In value-based distribution, DIH distributes information between child servers according to a particular value. The following options are available:
In distribute by field values, you can also explicitly specify the value of a field to distribute to a particular child server. This method is usually required only in very specialized situations.
Advantages:
These methods provide fine control over document distribution for advanced users.
Disadvantages:
Misconfiguration can easily result in a skewed distribution of documents or query load.
|