SpeechSilClassification

The SpeechSilClassification task segments an audio file by content, classifying each segment as either speech, non-speech, or music.

Parameters

Parameter Description Required
Type The task name. Set to SpeechSilClassification. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
EndTime The end of an audio section to process.  
File The audio file to process. Yes
FrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
MaxSegSize The maximum audio class size in frames.  
MinSegSize The minimum segment size in frames.  
Out The file to write the results to. Yes
Sfreq The sample frequency of the audio file to process.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechBias Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  

Example

http://localhost:13000/action=AddTask&Type=SpeechSilClassification&File=C:\Data\Conference.wav&Out=ConfClassification.ctm

This action uses port 13000 to instruct HPE IDOL Speech Server, which is located on the local machine, to segment and classify the Conference.wav file and to write the results to the ConfClassification.ctm file.


_HP_HTML5_bannerTitle.htm