SpeechSilClassification

The SpeechSilClassification task segments an audio file or stream by content, classifying each segment as either speech, non-speech, or music.

Parameters

Parameter Description Required
Type The task name. Set to SpeechSilClassification. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
AppFrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
AudioUpsampling Whether to allow audio upsampling if the input audio has a sample rate too low for the task.  
DoDialTones Whether to include dial tones in addition to DTMF tones, if tone detection is enabled.  
DoToneClass Whether to perform DTMF and dial tone identification.  
EndTime The end of an audio section to process.  
File The audio file to process. Yes, if InputType is File.
InputType The type of audio to process (file, binary data, or stream).  
MaxSegSize The maximum audio class size in frames.  
MinSegSize The minimum segment size in frames.  
Out The file to write the results to. Yes
Sfreq The sample frequency of the audio file to process.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechBias Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file. This parameter does not apply when InputType is Stream.  
SugdInputFrequency The sampling rate of the input media file. This parameter does not apply when InputType is Stream.  

Example

http://localhost:15000/action=AddTask&Type=SpeechSilClassification&File=C:\Data\Conference.wav&Out=ConfClassification.ctm

This action segments and classifies the Conference.wav file and writes the results to the ConfClassification.ctm file.


_FT_HTML5_bannerTitle.htm