The LangDetectUTF8 configuration parameter allows you classify files that contain 7-bit ASCII as UTF-8.

Automatic Language Detection  uses the contents of the LangDetectType to determine the language of the document. If these fields contain only 7-bit ASCII characters, IDOL Server detects the document as ASCII. If additional fields in the document contain UTF-8, these might be converted incorrectly.

If you know that your documents are generally in UTF-8, set LangDetectUTF8 to True, to classify these documents as UTF-8. For example, when you Retrieve Content using connectors, the connectors output most data in UTF-8.