Acoustic Adaptation Diagnostics

The AmTrain task can produce a diagnostics file that contains information about the word alignments that it uses to update the models. Any significant errors in these word alignments can severely impair the quality of the adaptation.

To return diagnostic information

You can open the diagnostics file directly, or you can issue the GetResults action with the Label parameter set to diag. The following is an example of the diagnostic content:

[ADAPTFILE] C:\data\test1.plh, beam: 5000
[ALIGN] 0 2.06 <s>
[ALIGN] 2.06 2.15 Okay
[ALIGN] 2.15 4.98 <s>
[ALIGN] 4.98 5.18 this
[ALIGN] 5.18 5.3 is
[ALIGN] 5.3 5.5 it
[ALIGN] 5.5 6.79 <s>
[ALIGN] 6.79 6.86 You
[ALIGN] 6.86 7.01 sure
[ALIGN] 7.01 7.11 you
[ALIGN] 7.11 7.28 wanna
[ALIGN] 7.28 7.4 do
[ALIGN] 7.4 7.7 this
[ALIGN] 7.7 8.97 <s>
...
[ALIGN] 112.3 112.97 Well
[ALIGN] 112.97 115.27 <s>
[ALIGN] 115.27 115.48 hello
[ALIGN] 115.48 116.97 <s>
[SUCCESS] - Adaptation pass succeeded, updating accumulates

This example shows the time positions (start and end) for each word as estimated during the adaptation process. It also marks the start and end of processing for each adaptation file. In the example, the file was processed successfully. If the process fails, the diagnostics file indicates this, along with whether HPE IDOL Speech Server made a subsequent attempt at a higher pruning beam.

Exclude Low-Quality Adaptation Data

Poor word alignments can occur for several reasons.

You can choose to exclude data from poorly-aligned sections from the adaptation, to improve the model.

HPE IDOL Speech Server scores each word alignment to show how closely the word recognized in the audio resembles the word in the transcript. The score represents the differences between the two versions of the word: a score of 0 (zero) means that there were no differences, whereas a score of 4 means that the words differ by a large amount. You can instruct HPE IDOL Speech Server to classify poorly-aligned (high-scoring) words as 'junk'. The adaptation process uses the location of junk words in the data but ignores the words themselves.

Words can have a duration of zero seconds. This usually means that the word occurs in the user-supplied transcript but not the audio; however, it might be because of an error during alignment. If you believe the alignment might be compromised by some of the problems previously listed, you might choose to classify zero-duration words as 'junk'. If the words are likely to be in the audio, and HPE IDOL Speech Server has incorrectly assigned them an empty duration, you can keep these words to ensure proper internal alignment during adaptation. HPE IDOL Speech Server keeps zero-duration words by default.

To exclude low-quality adaptation data, you can either modify the configuration file, or send additional parameters with the AddTask action.

To exclude low-quality data from the adaptation process (by modifying the configuration file)

  1. Open the HPE IDOL Speech Server tasks configuration file with a text editor.
  2. In the AmTrain task configuration section, add the following parameters to the amadaptadddata module:

    JunkEnabled Set to True to label junk word alignments.
    JunkWordThresh The alignment score threshold. Word alignments scoring above this value are labeled as junk.
    ZeroDurationWords Whether to label zero-duration words as junk.
  3. Save and close the configuration file.
  4. Restart HPE IDOL Speech Server.

When you next perform the AmTrain task, HPE IDOL Speech Server identifies and labels junk word alignments and does not use them in the acoustic adaptation process.

To exclude low-quality data from the adaptation process (when sending the action)


_HP_HTML5_bannerTitle.htm