The AmTrain
task can produce a diagnostics file that contains information about the word alignments that it uses to update the models. Any significant errors in these word alignments can severely impair the quality of the adaptation.
To return diagnostic information
When you send the AddTask
action to run the task, include the Diag
and DiagFile
parameters. Set Diag
to True
and DiagFile
to the name of the file to write the diagnostic information to.
You can open the diagnostics file directly, or you can issue the GetResults
action with the Label
parameter set to diag
. The following is an example of the diagnostic content:
[ADAPTFILE] C:\data\test1.plh, beam: 5000 [ALIGN] 0 2.06 <s> [ALIGN] 2.06 2.15 Okay [ALIGN] 2.15 4.98 <s> [ALIGN] 4.98 5.18 this [ALIGN] 5.18 5.3 is [ALIGN] 5.3 5.5 it [ALIGN] 5.5 6.79 <s> [ALIGN] 6.79 6.86 You [ALIGN] 6.86 7.01 sure [ALIGN] 7.01 7.11 you [ALIGN] 7.11 7.28 wanna [ALIGN] 7.28 7.4 do [ALIGN] 7.4 7.7 this [ALIGN] 7.7 8.97 <s>
...
[ALIGN] 112.3 112.97 Well [ALIGN] 112.97 115.27 <s> [ALIGN] 115.27 115.48 hello [ALIGN] 115.48 116.97 <s> [SUCCESS] - Adaptation pass succeeded, updating accumulates
This example shows the time positions (start and end) for each word as estimated during the adaptation process. It also marks the start and end of processing for each adaptation file. In the example, the file was processed successfully. If the process fails, the diagnostics file indicates this, along with whether IDOL Speech Server made a subsequent attempt at a higher pruning beam.
Poor word alignments can occur for several reasons.
You can choose to exclude data from poorly-aligned sections from the adaptation, to improve the model.
IDOL Speech Server scores each word alignment to show how closely the word recognized in the audio resembles the word in the transcript. The score represents the differences between the two versions of the word: a score of 0 (zero) means that there were no differences, whereas a score of 4 means that the words differ by a large amount. You can instruct IDOL Speech Server to classify poorly-aligned (high-scoring) words as 'junk'. The adaptation process uses the location of junk words in the data but ignores the words themselves.
Words can have a duration of zero seconds. This usually means that the word occurs in the user-supplied transcript but not the audio; however, it might be because of an error during alignment. If you believe the alignment might be compromised by some of the problems previously listed, you might choose to classify zero-duration words as 'junk'. If the words are likely to be in the audio, and IDOL Speech Server has incorrectly assigned them an empty duration, you can keep these words to ensure proper internal alignment during adaptation. IDOL Speech Server keeps zero-duration words by default.
To exclude low-quality adaptation data, you can either modify the configuration file, or send additional parameters with the AddTask
action.
To exclude low-quality data from the adaptation process (by modifying the configuration file)
In the AmTrain
task configuration section, add the following parameters to the amadaptadddata
module:
JunkEnabled
|
Set to True to label junk word alignments. |
JunkWordThresh
|
The alignment score threshold. Word alignments scoring above this value are labeled as junk. |
ZeroDurationWords
|
Whether to label zero-duration words as junk. |
When you next perform the AmTrain
task, IDOL Speech Server identifies and labels junk word alignments and does not use them in the acoustic adaptation process.
To exclude low-quality data from the adaptation process (when sending the action)
When you send the AddTask
action to run the task, include the following parameters:
Junk
|
Set to True to label junk word alignments. |
JunkThresh
|
The alignment score threshold. IDOL Speech Server labels word alignments that score above this value as junk. |
ZeroDurWords
|
Whether to label zero-duration words as junk. |
|