Media Server connection

You can use your existing OpenText IDOL Media Server implementation to perform optical character recognition (OCR) on documents managed by Fusion.

Fusion includes a default OCR configuration file that defines a Media Server analysis task for basic OCR and uses a custom XSL to manipulate the Media Server response to only that which Fusion requires. This configuration does not apply orientation correction, cropping, or template or region detection. You can create custom Media Server configurations to perform OCR analysis tasks.

If you plan to connect to Media Server to facilitate optical character recognition (OCR), additional tasks are required

Requirements

Prior to beginning the connection tasks, you must have the following in place.

  • Fusion processing agent fully installed and configured, following practices and procedures defined in Agent installation and configuration.

    NOTE: Each processing agent host can connect to a single Media Server host.

  • Supported version of Media Server fully installed and configured, following standard Media Server practices and procedures. For supported version of Media Server, see Support matrix > Integrated Resources.

  • Fusion agents and any Media Server host machines they interact with must be within the same network and can not have a proxy between them.

  • Ensure the Media Server host machine IP or server name is resolvable to the Fusion processing agent host machine it will be connected to.
  • If you plan to use existing or new Media Server configurations to perform OCR analysis tasks, ensure that these configurations have been verified and tested in Media Server and conform to the necessary format for compatibility with Fusion. For configuration format information, see Custom Media Server configurations for OCR task analysis.

Update Media Server

You must update your Media Server implementation to include the necessary components to interact with Fusion

Update Fusion

You must update your Fusion processing agent implementation to include the necessary components to use Media Server for OCR.

Custom Media Server configurations for OCR task analysis

If you choose to create your own Media Server configurations for OCR task analysis, follow all procedures, guidelines, and suggestions for OCR as documented in the Media Server documentation. The Media Server Administration Guide contains the necessary details and includes information for improving OCR and optimizing performance for object recognition.

Fusion expects a response engine type to facilitate the processing of the extracted metadata. The response engine must have the following format:

[ResponseEngineName]
Type = response
Input = OCREngineName.Result

If defining regions for OCR, you can use Media Server Visual Training to develop a template. A template consists of a series of regions which contain a name and set of region coordinates (in pixels). Each region defined in Media Server must have two respective engines in the processing configuration: a SetRectangle type engine and an ocr type engine. It is not a requirement for either engine to be named exactly the same as the respective Media Server metadata field, though there is a naming convention for the ocr type engine that must be followed to work correctly with the Fusion agent. During document ingestion,Fusion uses the name of each ocr type engine to define the respective value leveraged.

The format of the ocr engine type region name must use the format <OBJECT_TYPE>-<REGION_NAME>-<REGION_TYPE>. The <OBJECT_TYPE>- must always exist and come first. There must be a separating hyphen '-' between each portion with a minimum of one (<OBJECT_TYPE>-<REGION_TYPE>) and maximum of two (<OBJECT_TYPE>-<REGION_NAME>-<REGION_TYPE>).

For grammar value extraction using a template or region-based configuration and metadata fields within OCR results need to be in a language context other than English, the ocr engines need to be URL encoded throughout the configuration file. As an example, a Turkish driver's license includes "Sürücü belgesi", which translates to "driver's license" in English. The engine type must be declared with the non-English characters in "Sürücü" URL encoded as follows:

Engine12 = DriversLicense-S%C3%BCr%C3%BCc%C3%BC_belgesi

The LuaScript file of the SetRectangle type engine includes the actual name of the respective Media Server metadata field that is used and therefore serves as the mapping between the configuration and the Media Server metadata region field. The region's coordinates defined in Media Server are used to get the region rectangle on the on-the-fly cropped image and the subsequent ocr engine uses this to OCR scan only the contents of that region.

If fields in the layout of the card/template/image are generally required to be together as one field for successful grammar value extraction, such as forename and surname, you can group these fields. Within the configuration, group these fields together into a single field (or type engine) and use the group output as a single entry input used for the Media Server response.

For the sub-grouping to be picked up and handled correctly, copy the updated toAgentDataResponse.xsl file on the agent host (<agentInstallDir>\mediaserver\xsl\response_templates\toAgentDataResponse.xsl) and paste in the \response_templates directory on each Media Server ‘host managed by this agent (<MediaServerInstallDir>\ configurations\xsl\response_templates), overwriting the existing file if necessary. It is within this XSL transform that the detected OCR text value for each item in a group is concatenated in the order specified in the respective group or type engine (input0, input1…inputN). An average confidence value for all the engine results in the group is also generated for the single record output.

For further guidance on custom Media Server configuration, contact Support.