ClusterSpeech

The ClusterSpeech task clusters wide-band speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0 and Cluster_1 respectively.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeech. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
AppFrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
EndTime The end of an audio section to process.  
File The input audio file. Yes, if InputType is File.
FixTime A fixed size for speaker clusters.  
InputType The type of audio to process (file, binary data, or stream).  
Lang The name of a language pack. Yes
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
Out The file that IDOL Speech Server writes task output to.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file. This parameter does not apply when InputType is Stream.  
SugdInputFrequency The sampling rate of the input media file. This parameter does not apply when InputType is Stream.  

Example

http://localhost:15000/a=AddTask&Type=ClusterSpeech&File=wide.wav&lang=ENUK&out=outWide

This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to cluster the data in the wide.wav wide-band audio file into speaker segments, and to write the results to the outWide output file.


_HP_HTML5_bannerTitle.htm