OCR Results

This section describes the format of the results produced by an OCR analysis task.

Results by Line

The following XML shows records from the Result track of an OCR task. The analysis engine produces one record for each line of text in the analyzed image or video frame.

If you are processing a document, then unless you have set ProcessTextElements=FALSE, some of the records in the Result track could represent text that has been extracted from text elements that were present in the document.

<record>
    ...
    <trackname>OCR.Result</trackname>
    <OCRResult>
        <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id>
        <text>New rover discovers life on Mars</text>
        <region>
            <left>35</left>
            <top>21</top>
            <width>290</width>
            <height>15</height>
        </region>
        <confidence>99</confidence>
        <angle>0</angle>
        <block>0</block>
        <source>image</source>
    </OCRResult>
</record>
<record>
    ...
    <trackname>OCR.Result</trackname>
    <OCRResult>
        <id>e17ee583-e980-4d07-92c1-579657f46c3e</id>
        <text>Some more text</text>
        <region>
            <left>89</left>
            <top>66</top>
            <width>140</width>
            <height>15</height>
        </region>
        <confidence>99</confidence>
        <angle>0</angle>
        <block>1</block>
        <source>image</source>
    </OCRResult>
</record>

Each record contains the following information:

Results by Word

OCR also produces a WordResult output track. This track contains a record for each recognized word. The following XML shows an example record.

NOTE: Text that is extracted from a text element in a document is not output to the WordResult or WordData tracks.

<record>
    ...
    <trackname>OCR.WordResult</trackname>
    <OCRResult>
        <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id>
        <text>New</text>
        <region>
            <left>35</left>
            <top>21</top>
            <width>39</width>
            <height>15</height>
        </region>
        <confidence>99</confidence>
        <angle>0</angle>
        <block>0</block>
        <source>image</source>
    </OCRResult>
</record>

Each record contains the following information:

Results by Character

When you analyze an image or document (but not video), OCR produces a CharResult output track. This track contains a record for each line of text. However, the records in this track provide detail about individual characters rather than the whole line. The following XML shows an example record.

NOTE: Text that is extracted from a text element in a document is not output to the CharResult track.

<record>
    ...
    <trackname>OCR.CharResult</trackname>
    <OCRDetail>
        <id>c0cf6d75-ad43-4fce-8589-e2a297923996</id>
        <angle>0</angle>
        <block>0</block>
        <character>
            <text>N</text>
            <region>
                <left>35</left>
                <top>21</top>
                <width>12</width>
                <height>15</height>
            </region>
        </character>
        <character>
            <text>e</text>
            <region>
                <left>49</left>
                <top>25</top>
                <width>10</width>
                <height>11</height>
            </region>
        </character>
        ...
    </OCRDetail>
</record>

Each record includes the following information:

Tables

OCR can identify tables that occur in images. When processing a table the record IDs in the Result, WordResult, and CharResult tracks represent table cells rather than lines of text.

When OCR recognizes text that appears to be arranged in a table, it also produces a TableResult track. This track contains a record for each table that was identified. Each record includes enough structure information to reconstruct the table. The records in the TableResult track do not include the recognized text, instead they include record IDs that match the records in the Result, WordResult, and CharResult tracks. For example:

<record>
    ...
    <trackname>OCR.TableResult</trackname>
    <OCRTableResult>
        <id>6596a664-b69a-4a33-b9fc-8adb2be6c37f</id>
        <region>
            <left>256</left>
            <top>166</top>
            <width>1213</width>
            <height>362</height>
        </region>
        <block>0</block>
        <columnCount>9</columnCount>
        <rowCount>10</rowCount>
        <row>
            <cell>
                <columnSpan>1</columnSpan>
            </cell>
            <cell>
                <columnSpan>2</columnSpan>
                <OCRResultID>2240914c-440c-40cc-9254-c3c59727953e</OCRResultID>
	     </cell>
	     <cell>
		 <columnSpan>3</columnSpan>
	     </cell>
            <cell>
	         <columnSpan>3</columnSpan>
	         <OCRResultID>9ea804cc-a1a2-4d31-99f8-d1d96a3a1c9e</OCRResultID>
            </cell>
        </row>
        ...
    </OCRTableResult>
</record>

Each record contains the following elements:

NOTE: Tables contained in a text element in a document are not output to the TableResult track. OCR only detects tables that are included as images.

Media Server includes an example session configuration and XSL transform, named Table.cfg and toHTMLTable.xsl, that use the information in the TableResult track to output HTML tables.