When a text transcript corresponding to an audio file is available, transcript alignment can match the two files and generate the time positions of each word in a final transcript. The text transcript does not have to match the audio file exactly.
Transcript alignment involves the following steps:
The transcript alignment module compares the speech-to-text output and the normalized transcript text to generate the alignment and produce the time codes.
You might need to configure the speech-to-text for best results. Micro Focus recommends that you build a ‘transcript’ custom language model to use in the speech-to-text (see Transcript Language Models).
Transcript alignment uses the same operation as the process of scoring speech-to-text results. The alignment function also reports the precision and recall rates with respect to the alignment. Measuring transcription success rates in this way is consistent with the industry standard practice of evaluating speech-to-text systems.
|