Speech-to-Text

IDOL Speech Server’s approach to performing speech-to-text is motivated by an information breakdown point of view of speech. This state-of-the-art approach is used by most leading speech technologists.

This approach means that the speech-to-text engine requires a language pack that contains:

The following diagram shows the inputs and resources that the speech-to-text engine receives.

Models of fundamental sound patterns
Telephone models: 8 kHz
Broadcast models: 16 kHz or above

 

Pronunciation dictionary with vocabulary Base language models and customized models

_HP_HTML5_bannerTitle.htm