This section describes the most easily configured methods for distributing data to multiple child servers.
Mirror Mode. Configure identical child servers for failover and load balancing.
Simple Distribution Mode. Distribute content between multiple servers.
DistributeSendMinimal. Distribute content between multiple servers, and send only the minimal content that each child server needs to index and deduplicate.
Batch Mode. Distribute content between multiple servers by alternating the server that receives data.
In mirror mode, DIH copies all index actions to all child servers, to produce multiple identical copies of the index.
Advantages:
This method is useful for failover and load balancing.
DIH does not parse the document data, which means you can use the PreserveDREADD setting to send the original index action on to the child servers, reducing the network load.
Disadvantages:
This method does not distribute the data between multiple servers.
You can often eliminate mirror mode from a tiered architecture by using server groups.
In this method, mirror mode is turned off, but no other distribution options are turned on.
DIH copies all index actions to all child servers, but modifies the add actions so that each child server indexes only a portion of the documents. The child server does not index the other documents, but it does use them to deduplicate with existing content, according to your KillDuplicates options.
Advantages:
You can remove duplicates according to the content of any field.
You can add children at any time.
DIH does not parse the document data, which means you can use the PreserveDREADD setting to send the original index action on to the child servers, reducing the network load.
You can weight child servers, or specify that a child server must only update existing documents (UpdateOnly
mode).
You can automatically treat servers that have reached a configured number of documents as UpdateOnly
, and optionally choose to fill up a fixed number of servers at a time (rather than spreading data evenly across all non-full servers).
You can use redistribution options to index only to child servers that are active.
When IDOL Server redistributes actions, normal deduplication might not occur.
Disadvantages:
You cannot use deduplication methods that keep the existing document.
By default, DIH posts the entire data file to every child server, which might be a potential network or I/O bottleneck.
In DistributeSendMinimal
mode, DIH distributes documents in the same way as Simple distribution mode. Each server receives a modified add index action that instructs it to index only a portion of the documents received. However, unlike Simple mode, DIH parses and modifies the data that it sends to the child servers. For documents that the child server must index, it sends the full text of the document. For all other documents, DIH sends only a minimal representation, which contains only the reference fields that the server must use to remove any existing documents that are now duplicates.
Advantages:
There is a large potential saving on the volume of data that DIH sends to its child servers, compared to simple distribution mode.
Most features of the simple distribution mode are still available, such as weighting, respect child fullness, and UpdateOnly
child server mode (see the disadvantages section for exceptions).
Disadvantages:
The DIH has increased requirements for processor and temporary disk space when it prepares the per-child copies of incoming data.
The PreserveDREADD
option is not available.
In batch mode, add index actions are rotated between child servers, so that each action goes to a single child server.
Advantages:
This mode is the most lightweight distribution mode. DIH does not parse any data, and all data is streamed to a single child server.
DIH does not parse the document data, which means you can use the PreserveDREADD setting to send the original index action on to the child servers, reducing the network load.
You can allocate documents according to how full the child servers are, or specify that a child server must only update existing documents.
You can use redistribution options to index only to child servers that are active.
When IDOL Server redistributes actions, you cannot deduplicate.
Disadvantages:
You cannot deduplicate documents across all your servers.
This method might be unsuitable if you index documents infrequently, or in particularly large batches of widely differing sizes. In these cases, data might not be evenly spread.
|