The SpeechToText task converts an audio file or stream into a text transcript.
| Parameter | Description | Required |
|---|---|---|
| Type | The task name. Specify SpeechToText. |
Yes |
| AudioUpsampling | Whether to allow audio upsampling if the input audio has a sample rate too low for the task. | |
| ClassWordFile | The path to a list of new words and weightings to add to the language model at load time. | |
| Conf | Whether to generate word confidence scores. | |
| CustomLm | The custom language model to use. | |
| Diag | Whether to generate diagnostic information. | |
| DiagFile | The alignment diagnostics file to generate. | |
| DnnFile | The DNN file to use. | |
| DnnScale | The DNN output acoustic score scaling factor. | |
| EndTime | The end of an audio section to process. | |
| File | The audio file to process. | Yes, if InputType is File. |
| ForceRecompoundOff | Whether to prevent recompounding. | |
| ForceRecompoundOn | Whether to force recompounding. | |
| FrameDupl | An integer value which allows for greater time efficiency with only a minimal loss of recognition accuracy. | |
| InputType | The type of audio to process (file, binary data, or stream). | |
| Lang | The language pack to use. | Yes |
| LatFile | The name of the lattice file that contains word hypotheses. | |
| LatScale | The depth of the lattice. | |
| LatWinSize | The size (in seconds) of the lattice output window. | |
| LatWordFile | A list of words to find. | |
| Mode | The algorithm mode for the speech-to-text process. | |
| ModeValue | The value of the parameter associated with the speech-to-text algorithm mode. | |
| Out | The file to write the transcription to. | Yes |
| PronFile | A file to use to either replace or add alternative pronunciations of words at language load time. | |
| Punctuation | Whether to add punctuation to the word data. | |
| SpeedBiasLevel | The balance between speed and accuracy in the decoder. | |
| StartTime | The beginning of an audio section to process. | |
| SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream. |
|
| SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream. |
|
| WordBar | Switches on word barring. | |
| WordBarList | The location of a list of words to be barred. |
http://localhost:15000/action=AddTask&Type=SpeechToText&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS
This action performs the WavToText task on the Speech.wav file and writes the results to the SpeechTranscript.ctm file. The Speech.wav file contains U.S. English dialect speech.
http://localhost:15000/action=AddTask&Type=StreamToText&Lang=ENUK&Out=Transcript1.ctm&InputType=Stream
This action transcribes the audio stream using the ENUK language pack and writes the results to the Transcript1.ctm file.
|
|