The IvSpkId
task performs iVector-based speaker identification on an audio file, or on any sections of an audio stream where the trained speakers are present.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to IvSpkId . |
Yes |
AllowEmpty | Whether to produce gender labels as output if no speakers are specified. | |
DiagFile | The name of the file to write diagnostic information to. | |
DiagLevel | The level of detail to include in the diagnostic information. | |
DiscardShort | Exclude segments shorter than a specific duration from further analysis. | |
EndTime | The end of an audio section to process. | |
File | The audio file to process. | Yes, if InputType is File . |
FrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
InputType | The type of audio to process (file, binary data, or stream). | |
MaxNonSpeech | The maximum length of non-speech segments. | |
MaxSpeech | The maximum length of speech segments. | |
MinNonSpeech | The minimum length of non-speech segments. | |
MinSpeech | The minimum length of speech segments. | |
Out | The file to write the results to. | |
ScoreMode | The scoring method to use for speaker identification. | |
Sfreq | The sample frequency of the audio stream to process. | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream . |
|
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream . |
|
The file extension to use for template files. | ||
TemplateList | A list file that lists multiple speaker template files to use. | |
TemplatePath | The path to the directory containing the speaker templates. | |
TemplateSet | An audio template set file. | |
ThreshScale | The rate at which to scale the thresholds. |
http://localhost:15000/action=AddTask&Type=IvSpkId&InputType=File&File=C:\Data\Speech.wav&TemplateSet=speakers.ivs&ClosedSet=False&Out=results.ctm
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to search the Speech.wav
file for speakers based on the template set file speakers.ivs
, and to write the identification results to the results.ctm
file.
http://localhost:15000/action=AddTask&Type=IvSpkId&InputType=Stream&TemplateSet=speakers.ivs&Out=results.ctm
This action uses port 15000
to instruct IDOL Speech Server, which is located on the local machine, to search the audio stream for speakers based on the iVector-based template set file speakers.ivs
, and to write the identification results to the results.ctm
file.
|