The SpeechSilClassification
task segments an audio file or stream by content, classifying each segment as either speech, non-speech, or music.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to SpeechSilClassification . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
EndTime | The end of an audio section to process. | |
File | The audio file to process. | Yes, if InputType is File . |
InputType | The type of audio to process (file, binary data, or stream). | |
MaxSegSize | The maximum audio class size in frames. | |
MinSegSize | The minimum segment size in frames. | |
Out | The file to write the results to. | Yes |
Sfreq | The sample frequency of the audio file to process. | |
SilThresh | The threshold between what the task identifies as silence and non-silence. | |
SpeechBias | Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments. | |
SpeechThresh | The threshold between speech and non-speech (music or noise). | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream . |
|
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream . |
http://localhost:13000/action=AddTask&Type=SpeechSilClassification&File=C:\Data\Conference.wav&Out=ConfClassification.ctm
This action uses port 13000
to instruct IDOL Speech Server, which is located on the local machine, to segment and classify the Conference.wav
file and to write the results to the ConfClassification.ctm
file.
|