The ClusterSpeechTel
task clusters telephony speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0
and Cluster_1
respectively.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to ClusterSpeechTel . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
EndTime | The end of an audio section to process. | |
File | The input audio file. | Yes, if InputType is File . |
FixTime | A fixed size for speaker clusters. | |
InputType | The type of audio to process (file, binary data, or stream). | |
Lang | The name of a language pack. | Yes |
MaxNumSpeakers | The final maximum number of speakers to produce. | |
MergeThresh | The threshold below which to merge clusters. | |
MinNumSpeakers | The final minimum number of speakers to produce. | |
Out | The file that IDOL Speech Server writes task output to. | |
SilThresh | The threshold between what the task identifies as silence and non-silence. | |
SpeechThresh | The threshold between speech and non-speech (music or noise). | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream . |
|
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream . |
http://localhost:13000/action=AddTask&Type=ClusterSpeechTel&File=1h.wav&Lang=ENUK-tel&Out=outTel
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to cluster the data in the 1h.wav
telephony audio file into speaker segments, and to write the results to the outTel
output file.
|