IDOL Speech Server can analyze streamed audio data, in addition to audio files.
The exact task names and action parameters to use depend on the configuration in the IDOL Speech Server tasks configuration file.
Streamed audio must conform to a required format. For more information, see Streamed Audio.
Send an AddTask
action with the Type
parameter set to SpeechToText
, and the InputType
parameter set to Stream
. For example:
http://localhost:13000/action=AddTask&Type=SpeechToText&Lang=ENUK&Out=Transcript1.ctm&InputType=Stream
After this request, the server initializes the resources required for the task. When the server is ready to process the audio, the status of the task changes to WAITING_CONNECTIONS
.
Wait until the WAITING_CONNECTIONS
status appears before you begin audio streaming.
Connect to the binary data port specified in the BinaryDataPort
parameter in the [Server]
section of the configuration file.
The action ends if no connection is received within 30 minutes of the task becoming ready for connections, or if no data is received for more than 30 seconds following the connection.
To configure the length of time that IDOL Speech Server waits for data before it times out, set the StreamReadWarning
and StreamReadTimeout
parameters. For more information about these parameters, see the IDOL Speech Server Reference.
Send the audio stream using TCP. IDOL Speech Server uses the header to associate an audio stream with a particular task. All multiple byte integers must be sent in little-endian format.
The TCP stream header must have the following binary format.
Number of Bytes | Content |
---|---|
Variable | The token returned by the AddTask action (in ASCII). This must be NULL terminated. |
4 | The fixed integer constant 0x4D525453 . |
4 | The stream version number (currently 1). |
4 | The stream sampling rate in Hz (8000 or 16000). |
4 | The number of channels (1 or 2). |
Remaining | Audio data samples as 2 byte, little-endian integers. |
If you are using a lattice file and want to reduce the lattice output size by including only one sample of each word in a specific window size, you can also set the LatWinSize
parameter. See Use a Lattice File and the IDOL Speech Server Reference for more information.
If you want to filter out any areas from the resulting .CTM
file that are categorized as music, you can use the SpeechToTextFilter
task instead of the SpeechToText
task. This task combines the SpeechToText
task with the SpeechSILClassification
audio preprocessing task in a single step.
|