Overview

If a transcript is available for an audio recording, you can use the TranscriptAlign task to place the time location for each word in the transcript. Use this task to align subtitles to audio or video files.

The transcript does not need to match the speech data exactly. The transcript aligner can tolerate small numbers of errors in the transcript, as well as mitigating factors related to audio, such as background noise and music.

The transcript aligner can also place metadata tags in the transcript to allow you to easily identify sections. These metadata tags do not affect the alignment process. For more information, see Metadata Tag Syntax.

The alignment process works by using speech-to-text to identify words, sounds, or characters from the transcript within the audio, and assigning them a time location.

The accuracy of the speech-to-text process affects the accuracy of the end alignment. For best results, you should run speech-to-text using a custom language model built from the transcript text. The custom language model models the words in the transcript text and makes them much more likely to come out in the speech-to-text transcript.

The following diagram describes the transcript alignment workflow.

The transcript alignment process includes the following steps:

  1. Normalize the transcript so that you can identify numbers written in numeric form, and so on. See Normalize the Transcript.

  2. Build a transcript language model with the normalized text. See Build the Transcript Language Model.
  3. Run the speech-to-text task using the custom language model from Step 2. See Run Speech-to-Text.
  4. If your audio and transcript is likely to contain misaligned sections, or if you have very large files and speed is important, run a check transcript task using the output from the speech-to-text task. This task provides rough time estimates and information on how well the audio matches the transcript. These time estimates make the alignment process faster and more accurate. See Check Transcript.
  5. Run the transcript alignment task using the information from the check transcript task in Step 4.

_FT_HTML5_bannerTitle.htm