ClusterSpeech

The ClusterSpeech task clusters wide-band speech into speaker segments. For example, if two speaker clusters are identified, the output labels are Cluster_0 and Cluster_1 respectively.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeech. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
File The input audio file.  
FixTime A fixed size for speaker clusters.  
FrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
Lang The name of a language pack. Yes
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
Out The file that HPE IDOL Speech Server writes task output to.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  

Example

http://localhost:15000/a=AddTask&Type=ClusterSpeech&File=wide.wav&lang=ENUK&out=outWide

This action uses port 15000 to instruct HPE IDOL Speech Server, which is located on the local machine, to cluster the data in the wide.wav wide-band audio file into speaker segments, and to write the results to the outWide output file.


_HP_HTML5_bannerTitle.htm