WavSidTrain

DEPRECATED:

The WavSidTrain task is deprecated for HPE IDOL Server version 11.3. Use the SpkIdTrainWav task instead.

This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.

The WavSidTrain task creates a speaker template from an audio file containing speech from a single speaker.

Parameters

Parameter Description Required
Type The task name. Set to WavSidTrain. Yes
Ast The speaker classifier file. See Comments.
Diag Whether to generate diagnostic information.  
DiagFile The file to write the diagnostic information to.  
File The audio file containing sample speech from a single person. Aim to use a minimum of five minutes of speech. Yes
MinFrames The minimum number of speech audio frames required to train each component of a speaker model.  
NewModel The speaker template file to create. Yes
NMix The number of components to create in the speaker model.  
Rel The relevance to give to USM model parameters during adaptation.  
Sfreq The sample frequency of the audio file to process.  
SidBase The sid base pack resource to use to determine the base files to use.  
Sig The .sig file to use for speaker identification.  
SugdInputChannels The channel layout of the input media file.  
SugdInputFrequency The sampling rate of the input media file.  
USM The USM file to use.  
USMEnabled Whether to use the USM as a base for speaker training.  

Example

http://localhost:13000/action=AddTask&Type=WavSidTrain&File=C:/Data/BobSpeech.wav&NewModel=Bob.spk

This action uses port 13000 to instruct HPE IDOL Speech Server, which is located on the local machine, to create the Bob.spk template using the BobSpeech.wav file.

Comments

If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.


_HP_HTML5_bannerTitle.htm