Run Speech-to-Text on Live Audio

HPE IDOL Speech Server can analyze streamed audio data, in addition to audio files.

NOTE:

The exact task names and action parameters to use depend on the configuration in the HPE IDOL Speech Server tasks configuration file.

Streamed audio must conform to a required format. For more information, see Streamed Audio.

To run speech-to-text on an audio stream

  1. Send an AddTask action with the Type parameter set to StreamToText. For example:

    http://localhost:13000/action=AddTask&Type=StreamToText&Lang=ENUK&Out=Transcript1.ctm

    After this request, the server initializes the resources required for the task. When the server is ready to process the audio, the status of the task changes to WAITING_CONNECTIONS.

    CAUTION:

    Wait until the WAITING_CONNECTIONS status appears before you begin audio streaming.

  2. Connect to the binary data port specified in the BinaryDataPort parameter in the [Server] section of the configuration file.

    The action ends if no connection is received within 30 minutes of the task becoming ready for connections, or if no data is received for more than 30 seconds following the connection.

    NOTE:

    To configure the length of time that HPE IDOL Speech Server waits for data before it times out, set the StreamReadWarning and StreamReadTimeout parameters. For more information about these parameters, see the HPE IDOL Speech Server Reference.

  3. Send the audio stream using TCP. HPE IDOL Speech Server uses the header to associate an audio stream with a particular task. All multiple byte integers must be sent in little-endian format.

    The TCP stream header must have the following binary format.

    Number of Bytes Content
    Variable The token returned by the AddTask action (in ASCII). This must be NULL terminated.
    4 The fixed integer constant 0x4D525453.
    4 The stream version number (currently 1).
    4 The stream sampling rate in Hz.
    4 The number of channels (1 or 2).
    Remaining Audio data samples as 2 byte, little-endian integers.
  4. Speech-to-text finishes when the connection closes.

If you are using a lattice file and want to reduce the lattice output size by including only one sample of each word in a specific window size, you can also set the LatWinSize parameter. See Use a Lattice File and the HPE IDOL Speech Server Reference for more information.

Remove Areas Categorized as Music or Noise

If you want to filter out any areas from the resulting .CTM file that are categorized as music, you can use the StreamToTextMusicFilter task instead of the StreamToText task. This task combines the StreamToText task with the SpeechSILClassification audio preprocessing task in a single step.


_HP_HTML5_bannerTitle.htm