Identify Speakers in Audio

After you have trained a set of speaker templates, you can analyze audio to identify any sections where the trained speakers are present, by using the IvSpkId task.

You can specify the audio templates to use either by specifying a list as the value of the TemplateList parameter, or by specifying a template set as the value of the TemplateSet parameter. If you do not set any templates, IDOL Speech Server performs speaker segmentation and gender identification, but with no speaker labels.

To identify speakers in an audio file

For example:

http://localhost:15000/action=AddTask&Type=IvSpkId&File=C:\Data\Speech.wav&TemplateSet=speakers.ivs&ClosedSet=False&Out=results.sid

This action uses port 15000 to instruct IDOL Speech Server, which is located on the local machine, to search the Speech.wav file for speakers based on the template set file speakers.ivs, and to write the identification results to the results.sid file. Because the test is set to be open-set, IDOL Speech Server marks sections where no speaker scores above their respective thresholds as Unknown_.

This action returns a token. You can use the token to:

Format of Speaker Identification Results

IDOL Speech Server supports two speaker identification output formats: CTM and XML.

The following example shows CTM output produced by the IvSpkId task.

1 A 0.000 0.520 Unknown_ NonSpeech_ 0.000
1 A 0.520 10.030 Brown MALE 3.540
1 A 10.550 0.080 Unknown_ NonSpeech_ 0.000
1 A 10.630 9.460 Unknown_ FEMALE 0.000
1 A 20.090 6.150 Smith MALE 6.983

From left to right, the columns in the CTM file contain:

NOTE:

The score for an identified speaker represents how well the processed speech matches the template. Scores can be negative or positive depending on the type of score normalization used, but in all cases a higher value represents a score that is closer to the model.

The following example shows XML output with the mode set to default:

<sid_transcript>
	<sid_record>
		<start>0.000</start>
		<end>0.520</end>
		<label>Unknown_</label>
		<gender>NonSpeech_</gender>
		<score>0.000</score>
	</sid_record>
	<sid_record>
		<start>0.520</start>
		<end>10.550</end>
		<label>Brown</label>
		<gender>MALE</gender>
		<score>3.540</score>
	</sid_record>
</sid_transcript>

_FT_HTML5_bannerTitle.htm