Overview

IDOL Speech Server requires language packs to perform speech processing tasks. Several language packs are available (see Supported Resources). A language pack contains a language model and an acoustic model. The key components of the language model are:

The language model covers a broad vocabulary, reflecting the general spoken language. However, you might want to process speech data that covers specialized topics, such as financial or medical topics. The standard language model might not cover such specialized vocabulary or sentence structures (relating to N-gram patterns). In such cases, you can build custom language models with specialized vocabulary for IDOL Speech Server to use when processing this audio.

When you build a language model, you can control:

You must also decide whether to treat the training text as a ‘closed’ set or an ‘open’ set. Most language models are built with the assumption that the training text is part of an ‘open’ set, meaning that it does not represent the entire set of sentences expected from the language. A closed set contains all the sentences that occur in the data to be processed. An example of a closed set of text is a transcript language model (see Transcript Language Models).

Building a new language model requires a lot of text–in the order of millions or billions of words. The standard language packs are usually built with many billions of words of text. Therefore, the best way to customize a language model is to build a small custom language model that uses the specialized text, and then combine it with the standard language model when you perform speech-to-text.

You can use an IDOL Content component as a source of text data when you build a language model, either in addition to, or instead of, local text data. The data in the IDOL Content component index must be appropriate for building the language model. You can normalize the text after retrieval before using it to build the language model.

For more details about the IDOL Content component configuration, refer to the IDOL Server Administration Guide.


_FT_HTML5_bannerTitle.htm