AutoDetectLanguagesAtIndex

Set AutoDetectLanguagesAtIndex to True to automatically detect document languages and encodings during indexing.

For accurate language detection, documents must include several sentences. You can change the amount of text that IDOL Content Component analyzes to detect languages by changing the MaxLanguageDetectTerms configuration parameter.

Use DiscardUnconfiguredLanguagesAtIndex and DiscardUnknownLanguagesAtIndex to configure how IDOL Content Component handles documents when it cannot detect the language type because the language type is not defined in the configuration file, or is not recognized.

By default, if IDOL Content Component detects a language type that is not configured, it indexes to the equivalent General language type for the encoding, if it exists. It also logs a warning message in the index log so that you can add an appropriate language type to the configuration file.

Unknown languages are also indexed to the General language type for the encoding, if it exists. If the encoding is unknown, the document is indexed to the default language type.

NOTE: You can use AutoDetectLanguagesAtIndex only if it is included in your IDOL Content Component license.

Type: Boolean
Default: False
Required: No
Configuration Section: Server
Example: AutoDetectLanguagesAtIndex=True
See Also: DiscardUnconfiguredLanguagesAtIndex
DiscardUnknownLanguagesAtIndex
LangDetectType
LangDetectUTF8
MaxLanguageDetectTerms