Use Your Content > Improve > Media Analysis > Optical Character Recognition (OCR) > Configure OCR

Configure OCR

OCR has many configuration options that allow you to fine-tune its operation to improve accuracy. This section describes the basic settings that you need to consider before running OCR.

You must specify all the languages that you expect the text to be in using the Languages parameter. Media Server restricts its identification attempts to characters that are used by the specified languages. You can add extra characters to this character list (for example, rarer punctuation) using the WhiteList parameter. You can also further restrict the possible character choices, for example to a single case or to digits only, using the CharacterTypes parameter. In many cases, you know in advance that only a limited subset of characters will occur in the images (for clarity, many forms use only upper case, digits, and limited punctuation). In this situation, reducing the list of characters that Media Server considers improves accuracy and speed.

Images

The parts of an image that are likely to be text depends on the context. To reflect this, Media Server has the following OCR modes for processing images:

Specify the mode using the OcrMode parameter in the configuration file.

Video

Media Server supports two types of subtitle. By default, Media Server searches for single color text against a plain, single color background. You can also configure Media Server to search for black-bordered white letters that have been superimposed directly onto the background TV image, which is a widely used type of subtitling. The Media Server configuration file refers to this type of subtitle as 'hollow text'.


_HP_HTML5_bannerTitle.htm