The IvSpkId task performs iVector-based speaker identification on an audio file, or on any sections of an audio stream where the trained speakers are present.
| Parameter | Description | Required |
|---|---|---|
| Type | The task name. Set to IvSpkId. |
Yes |
| AllowEmpty | Whether to produce gender labels as output if no speakers are specified. | |
| AudioUpsampling | Whether to allow audio upsampling if the input audio has a sample rate too low for the task. | |
| DiagFile | The file to write the diagnostic information to. | |
| DiagLevel | The level of detail to include in the diagnostic information. | |
| DiscardShort | Exclude segments shorter than a specific duration from further analysis. | |
| EndTime | The end of an audio section to process. | |
| File | The audio file to process. | Yes, if InputType is File. |
| FrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
| InputType | The type of audio to process (file, binary data, or stream). | |
| MaxNonSpeech | The maximum length of non-speech segments. | |
| MaxSpeech | The maximum length of speech segments. | |
| MinNonSpeech | The minimum length of non-speech segments. | |
| MinSpeech | The minimum length of speech segments. | |
| Out | The file to write the results to. | |
| ScoreMode | The scoring method to use for speaker identification. | |
| Sfreq | The sample frequency of the audio stream to process. | |
| StartTime | The beginning of an audio section to process. | |
| SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream. |
|
| SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream. |
|
| The file extension to use for template files. | ||
| TemplateList | A list file that lists multiple speaker template files to use. | |
| TemplatePath | The path to the directory containing the speaker templates. | |
| TemplateSet | An audio template set file. | |
| ThreshScale | The rate at which to scale the thresholds. |
http://localhost:15000/action=AddTask&Type=IvSpkId&InputType=File&File=C:\Data\Speech.wav&TemplateSet=speakers.ivs&ClosedSet=False&Out=results.ctm
This action searches the Speech.wav file for speakers based on the template set file speakers.ivs, and writes the identification results to the results.ctm file.
http://localhost:15000/action=AddTask&Type=IvSpkId&InputType=Stream&TemplateSet=speakers.ivs&Out=results.ctm
This action searches the audio stream for speakers based on the iVector-based template set file speakers.ivs, and writes the identification results to the results.ctm file.
|
|