Micro Focus recommends that you use the following machine specifications for a machine running a single IDOL Speech Server:
Memory requirements depend on the number of language packs, the number of simultaneous decode tasks, and the operating system.
600 MB for each language pack (shared across multiple channels).
Micro Focus recommends that you assign each speech-to-text task to a single, 2 GHz core.
IDOL Speech Server has the following memory requirements:
Each language pack requires approximately 500 MB of RAM to load.
Broadband and telephony versions of the same language pack count as separate language packs internally. Similarly, if you load the same language pack without a DNN, with a small DNN (for example, for real-time processing), and with the standard DNN, this counts as three separate language packs.
Each IDOL Speech Server task requires additional memory. Micro Focus recommends the following approximate values. (If you run multiple tasks simultaneously, the amount of memory that is required increases; for example, to run two concurrent tasks that each use 250 MB of memory, 500 MB of memory is required.)
Task type | Memory |
---|---|
Speech-to-text | 150 MB1Stereo speech-to-text uses two stt modules and therefore uses twice the memory of a standard speech-to-text task. |
Language model building | 300 MB2If the training texts contain unusually large vocabularies, the language model building tasks might require more memory. |
Acoustic model adaptation | 750 MB |
Speaker identification | 100 MB |
Transcript alignment | 250 MB |
Language identification | 250 MB |
Language identification training | 300 MB |
Language identification optimization | 10 MB |
To ensure smooth operation, the speechserver.cfg
configuration file allows you to limit the number of concurrent actions on the server. You must also be careful when you set the maximum number of language models that the server can load.
IDOL Speech Server has the following other general requirements that you might need to take into account when setting up your system:
For speech-to-text in live or relative mode, you can use Dynamic Neural Network (DNN) acoustic modelling only if your DNN files are smaller than a certain size. In addition, you must use Intel (or compatible) processors that support SIMD extensions SSSE3 and SSE4.1. For more information, see Use Live Mode for Streaming.
|