ClusterSpeechTel

The ClusterSpeechTel task clusters telephony speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0 and Cluster_1 respectively.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeechTel. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
AppFrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
AudioUpsampling Whether to allow audio upsampling if the input audio has a sample rate too low for the task.  
EndTime The end of an audio section to process.  
File The input audio file. Yes, if InputType is File.
FixTime A fixed size for speaker clusters.  
InputType The type of audio to process (file, binary data, or stream).  
Lang The name of a language pack. Yes
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
Out The file to write the results to.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
StartTime The beginning of an audio section to process.  
SugdInputChannels The channel layout of the input media file. This parameter does not apply when InputType is Stream.  
SugdInputFrequency The sampling rate of the input media file. This parameter does not apply when InputType is Stream.  

Example

http://localhost:15000/action=AddTask&Type=ClusterSpeechTel&File=1h.wav&Lang=ENUK-tel&Out=outTel

This action clusters the data in the 1h.wav telephony audio file into speaker segments, and writes the results to the outTel output file.


_FT_HTML5_bannerTitle.htm