Iterative Alignment

HPE recommends iterative alignment if the alignment quality is poor or if large sections of audio have not been aligned. This situation can arise when aligning very long audio. In iterative alignments, alignment occurs over two or more steps:

  1. Use TranscriptCheck to produce approximate timings for the transcript.

  2. (Optional) Perform alignment at the word level.

    1. Use the TranscriptAlign task with the MatchType configuration parameter value set to words to align the audio. Retrieve the alignment output in a .ctm file format.

    2. Convert the .ctm file to use it as the transcript text.

  3. Perform aligment at the prons level, either by using the TranscriptCheck approximate transcript time output file, or the modified output from the optional transcript alignment at word level.

Converting the .ctm file involves normalization and ensuring that there is only one word on each line, for example:

Article
one 
All
human
beings
are
born
free

You can optionally follow words with a pair of numbers that specify the earliest start time and latest end time in seconds at which the word can appear in the aligned output, for example:

Article 0.000 1.000
one 0.000 1.000
All 0.000 1.000
human 0.500 1.500
beings 0.500 1.500
are 1.000 2.000
born 1.000 2.000
free 1.000 2.000

This example indicates that the word Article must appear between 0.000 and 1.000 seconds in the aligned output, human must appear between 0.500 and 1.500 seconds, and so on.

HPE IDOL Speech Server cannot perform this step automatically. HPE recommends that you subtract a small amount of time from the word start positions and add it to the word end positions generated by the initial alignment. This step allows the second alignment stage to make small adjustments to the word start and end points.


_HP_HTML5_bannerTitle.htm