SpeechSilClassification

The SpeechSilClassification task segments an audio file or stream by content, classifying each segment as either speech, non-speech, or music.

Parameters

Parameter Description Required
Type The task name. Set to SpeechSilClassification. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
AppFrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
EndTime The end of an audio section to process.  
File The audio file to process. Yes, if InputType is File.
InputType The type of audio to process (file, binary data, or stream).  
MaxSegSize The maximum audio class size in frames.  
MinSegSize The minimum segment size in frames.  
Out The file to write the results to. Yes
Sfreq The sample frequency of the audio file to process.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechBias Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file. This parameter does not apply when InputType is Stream.  
SugdInputFrequency The sampling rate of the input media file. This parameter does not apply when InputType is Stream.  

Example

http://localhost:13000/action=AddTask&Type=SpeechSilClassification&File=C:\Data\Conference.wav&Out=ConfClassification.ctm

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to segment and classify the Conference.wav file and to write the results to the ConfClassification.ctm file.


_HP_HTML5_bannerTitle.htm