SpeechToTextTelephony

The SpeechToTextTelephony task converts a telephony audio file or stream into a text transcript. In addition to transcribing speech, the task recognizes and transcribes dial tones including DTMF. Sections of the audio that are determined to be music are filtered out and not included in the transcript.

Parameters

Parameter Description Required
Type The task name. Specify SpeechToTextTelephony. Yes
AppFrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
ClassWordFile The path to a list of new words and weightings to add to the language model at load time.  
Conf Whether to generate word confidence scores.  
CustomLm The custom language model to use.  
Diag Whether to generate diagnostic information.  
DiagFile The alignment diagnostics file to generate.  
DnnFile The DNN file to use.  
DnnScale The DNN output acoustic score scaling factor.  
DoDialTones The type of dial tone to identify.  
EndTime The end of an audio section to process.  
File The audio file to process. Yes, if InputType is File.
ForceRecompoundOff Whether to prevent recompounding.  
ForceRecompoundOn Whether to force recompounding.  
FrameDupl An integer value which allows for greater time efficiency with only a minimal loss of recognition accuracy.  
InputType The type of audio to process (file, binary data, or stream).  
Lang The language pack to use. Yes
LatFile The name of the lattice file that contains word hypotheses.  
LatScale The depth of the lattice.  
LatWinSize The size (in seconds) of the lattice output window.  
LatWordFile A list of words to find.  
Mode The algorithm mode for the speech-to-text process.  
ModeValue The value of the parameter associated with the speech-to-text algorithm mode.  
Out The file to write the transcription to. Yes
PronFile A file to use to either replace or add alternative pronunciations of words at language load time.  
Punctuation Whether to add punctuation to the word data.  
SilThresh The threshold between what the module identifies as silence and non-silence.  
SpeechBias Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments.  
SpeedBiasLevel The balance between speed and accuracy in the decoder.  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file. This parameter does not apply to input streams.  
SugdInputFrequency The sampling rate of the input media file. This parameter does not apply to input streams.  
WordBar Switches on word barring.  
WordBarList The location of a list of words to be barred.  

Example

http://localhost:13000/action=AddTask&Type=TelWavToText&File=C:/myData/tel.wav&Out=TelTranscript.ctm&Lang=ENUS

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to perform the TelWavToText task on the tel.wav file and write the results to the TelTranscript.ctm file. The tel.wav file contains U.S. English dialect speech.


_HP_HTML5_bannerTitle.htm