Acoustic Models

Audio quality is affected by many factors, including the properties and position of the audio capture device, channel distortions (in the case of phone calls), and speaker population factors, such as dialect, accent, and timbre. When the recognized words poorly resemble the spoken words, it usually indicates that the acoustic models are a poor match for the audio data being processed.

NOTE:

In the 10.7 release of IDOL Speech Server, you could use acoustic adaptation to adapt the Gaussian Mixture Model (GMM) acoustic models to match an audio domain. To improve speech-to-text accuracy, IDOL Speech Server now includes Deep Neural Network (DNN) acoustic modeling. DNNs are not currently adaptable, but typically outperform even adapted GMM acoustic models. As a result, Micro Focus does not generally recommend acoustic adaptation. However, in certain scenarios (for example, in cases where the language packs do not have a DNN, or where you are working with a very specific domain and believe that DNN recognition could be improved upon), acoustic adaptation can still be useful. Use the following instructions to perform this process.

IDOL Speech Server provides tools for adapting acoustic models. However, adapting models is not easy and must be done with great care.

To adapt acoustic models, you need:

Micro Focus recommends that you use 5-20 hours of audio data to adapt an acoustic model, although there is no upper limit to the amount of data you can use.

You must prepare the transcript text according to the transcription guidelines (see Audio Transcript Requirements).

The underlying algorithm used in adapting acoustic models is iterative; as a result, adaptation requires multiple processing passes. The steps involved are:

  1. Create front-end feature vector data files.
  2. Normalize and prepare the transcript text files. You need to perform steps 1 and 2 only once.
  3. Run the adaptation algorithm and save the crunched data.
  4. Finalize the new adapted acoustic model.
  5. Measure the speech-to-text success rates (see Measure Speech-to-Text Success Rates).
  6. Repeat steps 3 and 4 as required, depending on the speech-to-text success rates.

Adapting the acoustic model can also improve other operations that involve acoustic models, such as phonetic phrase search.


_FT_HTML5_bannerTitle.htm