Optical Character Recognition (OCR)

Media Server can perform OCR to extract text (in UTF-8 format) from images and video containing text, such as scanned documents or TV footage that has subtitles.

Media Server initially determines which parts of an image have 'text-like' properties, for example by searching for similarly-sized ink blobs grouped into 'word-like' sequences. It then compares the properties of these blobs with known character properties, and selects the most probable characters. Media Server bases its selection on both the appearance of the text and also language context information and dictionaries. For example, character selections that produce known dictionary words are favored over selections that produce random-looking sequences of letters. Combining all the individual character selections produces a UTF-8 representation of the text parts of the image.

By default, Media Server returns each line of detected text as a string of characters (UTF-8 encoding), together with an overall confidence score and the bounding box of the text on the page. Alternatively, you can configure OCR tasks to return the output broken down into individual words or even individual characters, together with their own confidence scores and bounding boxes.


_FT_HTML5_bannerTitle.htm