DIH Distribution Mode Features

This section describes the features that are or are not available for different DIH distribution modes, and describes the data volumes that you can expect in each distribution mode. You can use the two tables on this page to help decide which distribution mode you want to use.

Features Available in DIH Distribution Modes

The following table describes the features that are available in various DIH distribution modes.

Feature Simple Distribute Simple Distribute with DistributeSendMinimal Batch Mode DistributeByReference (hash) DistributeByFields (hash) Consistent HashingNoteUseConsistentHashing is available in hash-based modes (DistributeByReference and DistributeByFields) only. Round Robin
KillDuplicates NoteDeduplication can occur only if you turn on RespectDocumentDate and the duplicates have the same datestamp as the original document.
KillDuplicates on any field NoteDeduplication can occur only if you turn on RespectDocumentDate and the duplicates have the same datestamp as the original document.
KillDuplicates with KeepExisting NoteDeduplication can occur only if you turn on RespectDocumentDate and the duplicates have the same datestamp as the original document.
Can use PreserveDREADD or ConvertToDREADD
Quick document parsing (IDX only) N/A N/A NoteThis option is available only if you turn on RespectDocumentDate. Otherwise, documents pass through to the current server without the need for parsing.
Can distribute DREREPLACE data NoteFor DistributeByReference mode used with UseConsistentHashing only. NoteThis option is possibly only if the DREREPLACE data contains a datestamp for target documents in the #DREDATE field.
Change the number of childrenNoteThis indicates that you can change the number of child servers without affecting deduplication, and without needing to reindex content. NoteDIH minimizes the number of documents that need reindexing, and performs this operation automatically.
Automatic redistribution
Redistribute when children are down NoteEnabling redistribute might mean that deduplication does not occur properly for documents in the offline child server.
Weighted children
Update-only children
Respect child fullness
Take children up or down for query
Clear data in oldest child

Data Transfer Volumes for DIH Distribution Modes

The following table describes the data transfer volumes for the different DIH distribution modes in terms of the fraction of the number of documents in the incoming IDX or XML. These approximately correspond to data file size (ignoring marginal overheads such as XML headers). However, if your documents differ widely in size or you index in small batches, these values will only emerge as long-term averages.

In the table:

Amount of Data Simple Distribute Simple Distribute with ConvertToDREADD Simple Distribute with DistributeSendMinimal Batch Mode Hash-Based Modes Hash-Based Modes with Consistent Hashing Round Robin Round Robin with RespectDocumentDate
Saved to incoming 1 1 1 1 1 1 1 1
Read by the DIH (parsing) - - 1 - 1 1 - 1
Written by the DIH (parsing) - - 1NoteIn DistributeSendMinimal mode, DIH sends each child server the full content of teh documents that it must index, plus a minimal representation of documents that it must deduplicate, containing only the necessary reference fields for deduplication. For reasonable large documents, these minimal documents make a negligible contribution to the volume of data sent to the child, which will be a little more than 1/N. For very small documents, the size of the minimal representation might become an appreciable fraction of the original, and the observed savings in transmitted data might be correspondingly lower. - M/N (r+1)M/NNoteWhen consistent hashing is turned on, the DIH inserts a tracking field into each document to allow it to redistribute data. For all but the very smallest of documents, the contribution of the additional field to the overall volume of data is negligible. - M/N
Total read or sent for indexing to child servers M - M/NNoteIn DistributeSendMinimal mode, DIH sends each child server the full content of teh documents that it must index, plus a minimal representation of documents that it must deduplicate, containing only the necessary reference fields for deduplication. For reasonable large documents, these minimal documents make a negligible contribution to the volume of data sent to the child, which will be a little more than 1/N. For very small documents, the size of the minimal representation might become an appreciable fraction of the original, and the observed savings in transmitted data might be correspondingly lower. M/N M/N (r+1)M/NNoteWhen consistent hashing is turned on, the DIH inserts a tracking field into each document to allow it to redistribute data. For all but the very smallest of documents, the contribution of the additional field to the overall volume of data is negligible. M/N M/N
Volume received per child for one job 1 - 1/NNoteIn DistributeSendMinimal mode, DIH sends each child server the full content of teh documents that it must index, plus a minimal representation of documents that it must deduplicate, containing only the necessary reference fields for deduplication. For reasonable large documents, these minimal documents make a negligible contribution to the volume of data sent to the child, which will be a little more than 1/N. For very small documents, the size of the minimal representation might become an appreciable fraction of the original, and the observed savings in transmitted data might be correspondingly lower. 0 or 1NoteIn Batch Mode, each child server received all or none of the incoming data, with the target server rotating between index actions. 1/N (r+1)NNoteWhen consistent hashing is turned on, the DIH inserts a tracking field into each document to allow it to redistribute data. For all but the very smallest of documents, the contribution of the additional field to the overall volume of data is negligible. 0 or 1 0 or 1NoteThis value assumes that most documents are new additions and do not carry a historic data field value.
Long term average value per child 1 - 1/NNoteIn DistributeSendMinimal mode, DIH sends each child server the full content of teh documents that it must index, plus a minimal representation of documents that it must deduplicate, containing only the necessary reference fields for deduplication. For reasonable large documents, these minimal documents make a negligible contribution to the volume of data sent to the child, which will be a little more than 1/N. For very small documents, the size of the minimal representation might become an appreciable fraction of the original, and the observed savings in transmitted data might be correspondingly lower. 1/N 1/N (r+1)NNoteWhen consistent hashing is turned on, the DIH inserts a tracking field into each document to allow it to redistribute data. For all but the very smallest of documents, the contribution of the additional field to the overall volume of data is negligible. 1/N 1/N

_FT_HTML5_bannerTitle.htm