Language Models

IDOL Speech Server uses N-gram statistical language models when performing speech-to-text. An N-gram language model works out the probability of particular sub-sequences of words occurring in a longer word sequence (usually a sentence). The language model is estimated from text that represents the spoken or natural language.

As an example of how the language model works, consider the sentence “The color of the car is red”. For an N-gram model with a value of n=3, the sequence is split into every possible sub-sequence that contains up to three words. So for this sentence, the n(=3)-gram fragments are:

“The”
“The color”
“The color of”
“color of the”
“of the car”
“the car is”
“car is red”
“is red.”

The probability of observing the entire sentence is the probability of observing all these individual fragments.

IDOL Speech Server allows you to build new language models. However, the language models included in the language packs are trained with billions of text samples across a wide range of topics, and should be sufficient for most deployments.

NOTE:

Any language model has a finite vocabulary. As a result, some of the spoken words might not be in the vocabulary for the speech-to-text. For most deployments, the out-of-vocabulary rates are low enough not to cause any concern.

IDOL Speech Server lets you supplement the standard language model with smaller, focused language models that are customized for the specific speech-to-text. Micro Focus recommends that you build a custom language model if you want to improve speech-to-text success rates.

The following scenarios might require a custom language model:

Measure the Effectiveness of Language Models

Before you build a custom language model, Micro Focus recommends that you work out whether the base language model provided with the language pack is sufficient for your purpose. There are two ways to measure effectiveness:

To measure the vocabulary coverage of a language model

  1. Prepare a list of the words that you want to check are present in the language model.
  2. Normalize the list (see Run Text Normalization).
  3. Check that the words are in the vocabulary (see Look Up Vocabulary).

To measure the perplexity of a language model

  1. Prepare text transcript files or text files that contain sentences from typical conversations that you expect in the audio data.
  2. Normalize the text files (see Run Text Normalization).
  3. Use the IDOL Speech Server tool for calculating perplexity, which also reports out-of-vocabulary words in the text (see Calculate Perplexity).

Perplexity values around or below 100 are typical and acceptable for call center-like conversations. Aim for this value when you process telephone data.

Perplexity values around or below 250 are typical and acceptable for broad coverage content, such as news. Aim for this value when you process such audio data.

Source Text for Building Custom Language Models

To create an effective custom language model, you must use sample text that is strongly representative of the speech data that you want to process. For example, you could source text from:

You might have to clean up the text to meet the requirements for presentation to IDOL Speech Server:

Build Custom Language Models

You can build custom language models in IDOL Speech Server (see Create Custom Language Models). In general, the more text that you use to build a language model, the better the performance of that model. However, the use of inappropriate text can diminish performance.

After you build a custom language model, you can check its suitability and performance. For instructions, see Measure the Effectiveness of Language Models.


_FT_HTML5_bannerTitle.htm