ClusterSpeechToTextTel

The ClusterSpeechToTextTel task performs clustering of two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. As before, any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.

Parameters

Parameter Description Required
Type The task name. Set to ClusterSpeechToTextTel. Yes
AppDnnBase The location of the appResources directory, which contains the DNN and .ian files to use.  
ClassWordFile The path to a list of new words and weightings to add to the language model at load time.  
Conf Whether to generate word confidence scores.  
CustomLm The custom language model to use.  
Diag Whether to generate diagnostic information.  
DiagFile The file to write the diagnostic information to.  
DnnFile The DNN file to use.  
DnnScale The DNN output acoustic score scaling factor.  
File The input audio file.  
FixTime A fixed size for speaker clusters.  
ForceRecompoundOff Whether to prevent recompounding.  
ForceRecompoundOn Whether to force recompounding.  
FrameDupl The balance between performance and speed for audio preprocessing DNN classification.  
Lang The name of a language pack. Yes
LatFile The name of the lattice file that contains word hypotheses.  
LatScale The depth of the lattice.  
LatWinSize The size (in seconds) of the lattice output window.  
LatWordFile A list of words to find.  
MaxNumSpeakers The final maximum number of speakers to produce.  
MergeThresh The threshold below which to merge clusters.  
MinNumSpeakers The final minimum number of speakers to produce.  
Mode The algorithm mode for the speech-to-text process.  
ModeValue Sets the value of the parameter associated with the speech-to-text algorithm mode.  
Out The file that HPE IDOL Speech Server writes task output to.  
PronFile A file to use to either replace or add alternative pronunciations of words at language load time.  
Punctuation Whether to add punctuation to the word data.  
SilThresh The threshold between what the task identifies as silence and non-silence.  
SpeedBiasLevel The balance between speed and accuracy in the decoder.  
SpeechThresh The threshold between speech and non-speech (music or noise).  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  
WordBar Switches on word barring.  
WordBarList The location of a list of words to be barred.  

Example

http://localhost:13000/action=AddTask&Type=ClusterSpeechToTextTel&File=C:/myData/Speech.wav&Out=SpeechTranscript.ctm&Lang=ENUS

This action uses port 13000 to instruct HPE IDOL Speech Server, which is located on the local machine, to perform the ClusterSpeechToTextTel task on the Speech.wav file and write the results to the SpeechTranscript.ctm file. The Speech.wav file contains U.S. English dialect speech.


_HP_HTML5_bannerTitle.htm