LangDetectUTF8

Whether automatic language detection detects files that contain 7-bit ASCII as UTF-8, rather than ISO-8859-1 (ASCII).

Automatic language detection uses the contents of the LangDetectType fields to detect the language of a document. By default, if these fields contain only 7-bit ASCII, IDOL Content Component detects the document as UTF-8. If you want to group these documents with documents that use 8-bit ASCII, set LangDetectUTF8 to False.

After IDOL Content Component detects the language of a document, it identifies the encoding by checking against the encoding options that you configure for the language (see Language Configuration). If you have not configured any compatible encodings, IDOL assigns the default language type. To ensure that a language is detected as UTF-8, you must include UTF8 as one of its encoding options.

Type: Boolean
Default: True
Required: No
Configuration Section: LanguageTypes
Example: LangDetectUTF8=True
See Also: AutoDetectLanguagesAtIndex