Deprecated: The WavSpeakerId
task is deprecated for HPE IDOL Server version 11.2.0. Use the SpkIdEvalWav
task instead.
This task is still available for existing implementations, but it might be incompatible with new functionality. The task might be deleted in future.
The WavSpeakerId
task segments an audio file by speaker and identifies the speaker in each segment. If the speech does not match a speaker within the classifier file, HPE IDOL Speech Server identifies it as from an unknown speaker. HPE IDOL Speech Server also identifies periods of non-speech within the audio file.
To run the WavSpeakerID
task, speakers must be trained to HPE IDOL Speech Server.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to WavSpeakerId . |
Yes |
Ast | The speaker classifier file. | Yes. See Comments. |
CompSelect | The number of components to select for use in scoring. | |
Diag | Whether to generate diagnostic information. | |
DiagFile | The file to write the diagnostic information to. | |
DiscardShort | Exclude segments shorter than a specific duration from further analysis. | |
EndTime | The end of an audio section to process. | |
File | The audio file to analyze. | Yes |
MinNonSpeech | The minimum size in seconds of non-speech segments. | |
MinSpeech | The minimum size in seconds of speech segments. | |
Out | The file to write the speaker identification results to. | Yes |
Sfreq | The sample frequency of the audio file to process | |
SidBase | The sid base pack resource to use to determine the base files to use. | |
Sig | The .sig file to use for speaker identification. | |
SpkSegCoef | Applies a weight to bias the decision about where speaker boundaries occur. | |
SpkThreshCoef | A fixed value to use to adjust the speaker identification threshold, to trade off false acceptances against rejections. | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. | |
SugdInputFrequency | The sampling rate of the input media file. | |
USMEnabled | Whether to use the USM for speaker identification. |
http://localhost:13000/action=AddTask&Type=WavSpeakerId&File=C:\Data\Speech.wav&Ast=C:\training\speakers.ast&Out=SpeakID.ctm
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to search the Speech.wav
file for speakers contained in the speakers.ast
classifier file and to write the identification results to the SpeakID.ctm
file.
If you do not specify the Ast parameter, the action uses the base ast file, determined by the SidBase resource. This base file does not contain any speaker information, and cannot identify speakers, but it performs gender detection and speaker segmentation.
|