Process external metadata

When processing items in file system datasets, you can choose to have OpenText Core Data Discovery & Risk Insights capture external metadata that is not normally indexed. You can process external metadata for items regardless of whether you store the item's content.

OpenText Core Data Discovery & Risk Insights uses a PowerShell script to instruct the agent how to process external metadata. The script, the External metadata file processor, is defined in the primary capture rules of the file system dataset. You can capture the external metadata dynamically and scan every item in the dataset, or use extracted metadata files stored alongside the original items in your file system.

When creating a file system dataset,

  • The external metadata file processor must be a PowerShell script (.ps1 file extension) located in the path defined by Scripts Directory in the Agent Admin UI under Advanced Settings > Run Script. By default, this is the \Agent\Scripts directory of the OpenText Core Data Discovery & Risk Insights installation path (for example, C:\Program Files\OpenText\Core Data Discovery and Risk Insights\Agent\Scripts). When creating the dataset, you will specify the file name of the metadata file processor.

    A sample file, ExternalMetadata.ps1, is provided with OpenText Core Data Discovery & Risk Insights and is located in the \Agent\Scripts directory of the OpenText Core Data Discovery & Risk Insights installation path (default is C:\Program Files\OpenText\Core Data Discovery and Risk Insights\Agent\Scripts).

  • Select Dynamic as the external metadata processing type to scan each item in the dataset and gather metadata according to the configuration defined by the external metadata file processor. This option does not require extraction ahead of time, but may require additional processing time.

  • Select From file as the processing type and define the extension of the files that contain the extracted metadata. The external metadata files are not processed as part of the dataset primary capture rules.

    For example, you use an application to extract content from images, scanned documents, audio files, and video files. This content is saved in metadata files that are directly associated to the original items, such as image001.tiff (original item) and image001.tiff.idx (extracted metadata file). When viewing a document with processed external metadata in Analyze or Manage, the metadata displays in the content view panel in a section labeled "Metadata Text Content".

    • The external metadata files that contain the metadata information must be located in the same directory as the original document from which the metadata was extracted. When creating the dataset, you will define the file extensions for the files that contain the external metadata.

    • The external metadata files should provide well formatted, human readable text. The external metadata text is used as is for identifying and extracting grammar values, tagging, keyword searching, and document preview. Well formatted text such as the use of key value pairs results in better output of processing and searches.

Sample external metadata file processor

OpenText Core Data Discovery & Risk Insights includes a sample external metadata file processor, ExternalMetadata.ps1. The sample file includes configurations to read the metadata information from the external metadata files and provides information for how to capture external metadata dynamically. Modify the sample file processor script to suit your needs or create a new processor file using the sample as a base. The script adds the metadata to the index for the associated original documents.

When "External metadata capture" is enabled for a file system dataset, the OpenText Core Data Discovery & Risk Insights processing agent looks for and reads the external metadata file processor script and takes action accordingly.