You can configure IDOL Server to automatically prevent duplicate documents being indexed. This section describes the settings that you can use to decide how to process duplicates. You must also consider deduplication when you are choosing a distribution mode for DIH.
See Also: Deduplicate Indexed Content
The KillDuplicates
parameter is available as a configuration option, or as an option during indexing. If you want to prevent duplicate documents, you can use one of the following options for deduplication.
Option | Description |
---|---|
REFERENCE
|
A new document is a duplicate if the DREREFERENCE field contains the same value as an existing document. |
REFERENCEMATCHN
|
A new document is a duplicate if N or more percent of the content is the same as an existing document. |
FieldName
|
A new document is a duplicate if the FieldName reference field contains the same value as an existing document. |
By default, when IDOL Server detects a duplicate, it indexes the new document and deletes the original. When your repository content is updated, your connector retrieves the updates, and IDOL Server indexes the updated version.
If you are sure that you do not want to overwrite an existing document, you can use the KeepExisting
parameter for an index action. In this case, IDOL Server uses the existing version of the document, and discards the new version. This option saves time during indexing.
You can use this option if your content does not change much, or if it does not matter if users search against slightly older versions of the document.
On a blog, a user might modify a post to add a few corrections or clarifications. Most of the content does not change, so it does not matter if IDOL Server searches against an earlier version of the document.
On a News Web site, the content of each article is updated regularly as new information is made available. However, the topic of the story does not change significantly, so that the same searches retrieve the earlier or the later version of the document. In this case, it might not matter if IDOL Server has indexed the latest version.
In a legal search, there is a requirement that the search always matches against the latest version of the document. In this case, you would not use KeepExisting
.
IDOL Server indexes new documents as normal. You might also run your index schedule such that at peak times, only new documents are indexed, but you update other content at periods of low search activity.
You can use other index action parameters to send the duplicate document to another database. For more information, refer to the IDOL Server Reference.
When IDOL Server detects a duplicate document, you can add the content of some of the fields from one version to the other. IDOL Server overwrites the old document, but copies the specified fields to the new document.
|