OCR has many configuration options that allow you to fine-tune its operation to improve accuracy. This section describes the basic settings that you need to consider before running OCR.
You must specify all the languages that you expect the text to be in using the Languages
parameter. Media Server restricts its identification attempts to characters that are used by the specified languages. You can add extra characters to this character list (for example, rarer punctuation) using the WhiteList
parameter. You can also further restrict the possible character choices, for example to a single case or to digits only, using the CharacterTypes
parameter. In many cases, you know in advance that only a limited subset of characters will occur in the images (for clarity, many forms use only upper case, digits, and limited punctuation). In this situation, reducing the list of characters that Media Server considers improves accuracy and speed.
The parts of an image that are likely to be text depends on the context. To reflect this, Media Server has the following OCR modes for processing images:
Specify the mode using the OcrMode
parameter in the configuration file.
Media Server supports two types of subtitle. By default, Media Server searches for single color text against a plain, single color background. You can also configure Media Server to search for black-bordered white letters that have been superimposed directly onto the background TV image, which is a widely used type of subtitling. The Media Server configuration file refers to this type of subtitle as 'hollow text'.
|