Image Description

The image description analysis task uses an AI model to generate a textual description of an image or video frame.

IMPORTANT: Image description can take a significant amount of processing time when running on a CPU. To produce a description for a single image takes several minutes using the pre-trained model ImageDescription_LLaVA-v1.6-Mistral-7B-HF.dat.

IMPORTANT: Image description requires a significant amount of memory. OpenText recommends that your Media Server has at least 64GB RAM for using the pre-trained model ImageDescription_LLaVA-v1.6-Mistral-7B-HF.dat.

Configuration Parameter Description
GPUDeviceID The device ID of the GPU to use.
Input The image track to process.
MaxOutputTokens Limits the length of the text produced by the AI model.
Model The name of the model to use to generate image descriptions.
Region A region of the image or video to restrict analysis to.
SampleInterval The interval at which frames are selected to be analyzed.
SyncDatabase Specifies whether to synchronize with the training database before beginning the analysis task.
Type The analysis engine to use. Set this parameter to ImageDescription.

Output Tracks

Output track Description
Data Contains a record, with a textual description, for each processed image or video frame.
DataWithSource

The same as the Data track, but each record also includes the source frame.