Prepare the Transcription Data

Before you run the adaptation process, you must convert the verbatim transcription files into aligned transcription .ctm files. These contain timestamps that mark the point in time that each word occurs in the audio. File conversion in IDOL Speech Server is a four-step process:

  1. Normalize the transcription files (see Run Text Normalization).
  2. Create a language model based on the normalized transcription files (see Build the Language Model).
  3. Run speech-to-text on the audio data, using the language model created in the previous step to optimize performance (see Speech-to-Text).
  4. Run the scorer task on the text, using the speech-to-text output produced in the previous step (see Run the Scorer). This process produces both a score file (which might indicate whether there are any issues with the transcript compared to what is actually said in the audio) and an aligned .ctm file. The .ctm file is used as input in the training process, as discussed below.

_HP_HTML5_bannerTitle.htm