Run the Task

HPE IDOL Speech Server provides four preconfigured speech-to-text tasks:

You can set Punctuation to True in any of these tasks to perform speech-to-text that includes simple sentence-forming punctuation (for example, full stops and initial capital letters) in the .CTM file. The speech-to-text task estimates the start and end of the sentence, although this is a best guess only and is not 100% accurate.

NOTE:

The Punctuation parameter should be used only for languages that use the Latin alphabet.

You can use the SpeedBiasLevel parameter in any speech-to-text task to quickly set the balance between speed and accuracy in the decoder. By default, SpeedBiasLevel is set to 0, which leaves the underlying parameter settings untouched (that is, quick configuration of relevant parameters is disabled). To enable the speed configuration, set SpeedBiasLevel to a value between 1 (slowest) and 6 (fastest). The default speech-to-text parameters are equivalent to a speed bias of 2.

NOTE:

You can use the SpeedBiasLevel functionality only when the speech-to-text mode is fixed (see Control Speech-to-Text Process Speed), and with a DNN-based language resource.

You can also use the PunctuateCtm task to add punctuation to any .CTM file. For more information, see the HPE IDOL Speech Server Reference.

To run speech-to-text on an audio file

For example:

http://localhost:13000/action=AddTask&Type=WavToText&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS

This action uses port 13000 to instruct HPE IDOL Speech Server, which is located on the local machine, to perform the WavToText task on the Speech.wav file and write the results to the SpeechTranscript.ctm file. The Speech.wav file contains U.S. English dialect speech.

If you are using a lattice file and want to reduce the lattice output size by including only one sample of each word in a specific window size, you can also set the LatWinSize parameter. See Use a Lattice File and the HPE IDOL Speech Server Reference for more information.

This action returns a token. You can use the token to:

When you use HPE IDOL Speech Server to process multiple data streams or files at the same time, the server might not have enough CPU or memory to process all of them at once. Speech-to-text operation is very CPU-intensive. To check whether a server has sufficient resources to run a WavToText task, send a CheckResources action. See Check Available Resources.


_HP_HTML5_bannerTitle.htm