The SpeechToTextTelephony
task converts a telephony audio file or stream into a text transcript. In addition to transcribing speech, the task recognizes and transcribes dial tones including DTMF. Sections of the audio that are determined to be music are filtered out and not included in the transcript.
Parameter | Description | Required |
---|---|---|
Type | The task name. Specify SpeechToTextTelephony . |
Yes |
AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
AudioUpsampling | Whether to allow audio upsampling if the input audio has a sample rate too low for the task. | |
ClassWordFile | The path to a list of new words and weightings to add to the language model at load time. | |
Conf | Whether to generate word confidence scores. | |
CustomLm | The custom language model to use. | |
Diag | Whether to generate diagnostic information. | |
DiagFile | The alignment diagnostics file to generate. | |
DnnFile | The DNN file to use. | |
DnnScale | The DNN output acoustic score scaling factor. | |
DoDialTones | Whether to include dial tones in addition to DTMF tones, if tone detection is enabled. | |
EndTime | The end of an audio section to process. | |
File | The audio file to process. | Yes, if InputType is File . |
ForceRecompoundOff | Whether to prevent recompounding. | |
ForceRecompoundOn | Whether to force recompounding. | |
FrameDupl | An integer value which allows for greater time efficiency with only a minimal loss of recognition accuracy. | |
InputType | The type of audio to process (file, binary data, or stream). | |
Lang | The language pack to use. | Yes |
LatFile | The name of the lattice file that contains word hypotheses. | |
LatScale | The depth of the lattice. | |
LatWinSize | The size (in seconds) of the lattice output window. | |
LatWordFile | A list of words to find. | |
Mode | The algorithm mode for the speech-to-text process. | |
ModeValue | The value of the parameter associated with the speech-to-text algorithm mode. | |
Out | The file to write the transcription to. | Yes |
PronFile | A file to use to either replace or add alternative pronunciations of words at language load time. | |
Punctuation | Whether to add punctuation to the word data. | |
SilThresh | The threshold between what the module identifies as silence and non-silence. | |
SpeechBias | Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments. | |
SpeedBiasLevel | The balance between speed and accuracy in the decoder. | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply to input streams. | |
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply to input streams. | |
WordBar | Switches on word barring. | |
WordBarList | The location of a list of words to be barred. |
http://localhost:15000/action=AddTask&Type=SpeechToTextTelephony&File=C:/myData/tel.wav&Out=TelTranscript.ctm&Lang=ENUS
This action performs the SpeechToTextTelephony
task on the tel.wav
file and writes the results to the TelTranscript.ctm
file. The tel.wav
file contains U.S. English dialect speech.
|