The ClusterSpeechTel
task clusters telephony speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0
and Cluster_1
respectively.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to ClusterSpeechTel . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
AudioUpsampling | Whether to allow audio upsampling if the input audio has a sample rate too low for the task. | |
EndTime | The end of an audio section to process. | |
File | The input audio file. | Yes, if InputType is File . |
FixTime | A fixed size for speaker clusters. | |
InputType | The type of audio to process (file, binary data, or stream). | |
Lang | The name of a language pack. | Yes |
MaxNumSpeakers | The final maximum number of speakers to produce. | |
MergeThresh | The threshold below which to merge clusters. | |
MinNumSpeakers | The final minimum number of speakers to produce. | |
Out | The file to write the results to. | |
SilThresh | The threshold between what the task identifies as silence and non-silence. | |
SpeechThresh | The threshold between speech and non-speech (music or noise). | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream . |
|
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream . |
http://localhost:15000/action=AddTask&Type=ClusterSpeechTel&File=1h.wav&Lang=ENUK-tel&Out=outTel
This action clusters the data in the 1h.wav
telephony audio file into speaker segments, and writes the results to the outTel
output file.
|