When identifying the most likely speaker in a section of audio, HPE IDOL Speech Server scores how closely the segment acoustic properties match with each of the speaker templates. For closed-set operation, the top-scoring speaker is simply taken as being the true result.
However, for open-set identification, HPE IDOL Speech Server needs to allow for unknown speakers in the audio. It does this by estimating a score threshold for each speaker; the hit is considered valid only if a template scores above this threshold. If for any audio segment the top-scoring template falls below the threshold associated with that speaker, the segment is assumed to be an unknown speaker.
The score threshold for each speaker template is based on an analysis of the speaker match scores observed for that template against both matching speaker data (true examples), and non-matching speaker data (false examples).
The IvSpkIdDevelWav
task takes a single audio file, along with the name of the speaker the file is associated with, and generates score statistics for one or more speaker templates. These statistics are then stored in an iVector template development file (.ivd
).
If the speaker is not in the set, you can set the speaker name to “Unknown”.
You must run the IvSpkIdDevelWav
task once for each audio file to be used in threshold estimation. You can choose to append the scores for each audio file to a single .ivd
file (the default method), or to create a separate development file for each audio file. You can use one or more development files when estimating the threshold for each speaker template.
To append the scores to a common development file, you must ensure that the file does not exist before you run the first IvSpkIdDevelWavtask
task.
The creation of an individual development file for each audio file ensures that the statistics do not get inadvertently appended to a file that already existed before running the first IvSpkIdDevelWav
task (for example, from a previous HPE IDOL Speech Server installation). You must specify a unique name for the development file each time that you run the task, to avoid overwriting files.
You can specify the method to use by using the DevAppend
configuration parameter in the task's IvDevel
module, which you can set by using the Append
parameter on the command line. For more information about this parameter, see the HPE IDOL Speech Server Reference.
To generate a development score file
Gather together the required audio files for testing, including:
At least one file for each speaker that contains speech from that speaker only; aim to use a minimum of five minutes of speech for each speaker.
Do not use the same audio that you used to create the speaker templates.
Files that contain unknown speakers (those not in the training set).
It is important to use a substantial amount of unknown speaker data, from a wide range of speakers, to correctly tune the thresholds.
Create a list of the speaker templates. Each list entry must include the name of the speaker, and the name of their template file. Use the format:
speakerLabel;templateFile
For example:
Brown;brown.iv Jones;jones.iv Smith;smith.iv
For more information about HPE IDOL Speech Server's list manager, see Create and Manage Lists.
For each audio file, send an AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to IvSpkIdDevelWav . |
File
|
The audio file that contains the speaker example speech. |
DataLabel
|
The name of the speaker that the audio is associated with. |
TemplateList
|
The list file that specifies the set of speaker templates to use. |
DevFile
|
The development file (.ivd ) to create or update. |
For example:
http://localhost:15000/action=AddTask&Type=IvSpkIdDevelWav&File=C:/Data/BrownSpeech4.wav&DataLabel=Brown&TemplateList=ListManager/speakers&DevFile=speakers.ivd
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to generate match statistics based on audio from the speaker named Brown
, in the audio file BrownSpeech4.wav
, against all of the speaker templates specified in the speakers list. The results are written to the speakers.ivd
development file.
You can set additional parameters. For details of the optional parameters, see the HPE IDOL Speech Server Reference.
This action returns a token. You can use the token to:
.ivd
file. To process streamed audio, use the ivSpkIdDevelStream
task. For more details about this standard task, see the HPE IDOL Speech Server Reference.
To estimate Speaker Template score thresholds
After you gather both positive and negative score statistics for each of the speaker templates, you can calculate the threshold associated with each speaker. This threshold is stored within the speaker template file.
You can do this for each speaker template individually, or across the whole set at once. The example given here shows the latter approach.
You can specify multiple template development files in a list file, or just a single development file. Again, the latter approach is shown here.
You can use the Bias
parameter to bias the threshold calculated towards fewer false positives (at the likely cost of more misses), or the other way around. Increase the value of the Bias
parameter to reduce false positives and increase precision, lower it to reduce misses and increase recall.
For details on other options associated with this task, see the HPE IDOL Speech Server Reference.
To estimate the thresholds for a set of speaker templates, given a single development score file, send an AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to IvSpkIdDevelFinal . |
|
The input template development file. |
TemplateList
|
A list file that specifies the templates to use. |
Bias
|
The bias setting to use when calculating thresholds. |
OutPath
|
The output path for the updated speaker templates. |
OutExt
|
The file extension for output speaker templates. NOTE:
If you do not set either the |
For example:
http://localhost:15000/action=AddTask&Type=IvSpkIdDevelFinal&DevFile=speakers.ivd&TemplateList=ListManager/speakers&Bias=0.2
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to use the development scores in speakers.ivd
to calculated thresholds (using a Bias
value of 0.2
when balancing recall against precision) for each speaker template specified in the speakers list. HPE IDOL Speech Server updates the template files in place to contain the threshold values calculated.
You can set additional parameters. For details of the optional parameters, see the HPE IDOL Speech Server Reference.
This action returns a token. You can use the token to:
To modify the threshold of a single template
You can use the IvSpkIdTmpEditThresh
standard task to modify the threshold of a single template by specifying the template file (.iv
).
AddTask
action to HPE IDOL Speech Server, and set the following parameters:Type
|
The task name. Set to .
|
TemplateFile
|
The name of the template to modify. |
Thresh
|
The value to use for the threshold. |
For example:
http://localhost:15000/action=AddTask&Type=IvSpkIdTmpEditThresh&TemplateFile=brown.iv&Thresh=0.5
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to set the threshold of the brown.iv
template file to 0.5
. HPE IDOL Speech Server updates the template file in place to contain the new threshold value.
You can set additional parameters. For details of the optional parameters, see the HPE IDOL Speech Server Reference.
This action returns a token. You can use the token to:
To retrieve information on a single template
You can use the IvSpkIdTmpInfo
standard task to write information on a specified template file to a log file.
AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to IvSpkIdTmpInfo . |
TemplateFile
|
The name of the template file to retrieve information for. |
Log
|
The log file to write the information to. |
For example:
http://localhost:15000/action=AddTask&Type=IvSpkIdTmpInfo&TemplateFile=brown.iv&Log=brown.log
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to write information on the brown.iv
template file to the log file brown.log
.
You can set additional parameters. For details of the optional parameters, see the HPE IDOL Speech Server Reference.
This action returns a token. You can use the token to:
Log File Example
<TEMPLATE_0> <NAME> test.atf </NAME> <THRESH_ENABLED> Yes </THRESH_ENABLED> <THRESH_VALUE> 1.158 </THRESH_VALUE> <NCOMPS> 1023 </NCOMPS> <SHARE_ICOV> Yes </SHARE_ICOV> <SHARE_MEANS> Yes </SHARE_MEANS> <SHARE_MEANS_PERCENT> 23.4604 </SHARE_MEANS_PERCENT> </TEMPLATE>
This file shows some information about how the template was trained and optimized, along with information about the template. The log file includes the following fields:
|
The start of information on template 0, and so on. |
<NAME>
|
The name associated with the template. |
<THRESH_ENABLED>
|
Whether a score threshold is enabled for this template. |
<THRESH_VALUE>
|
The score threshold that has been estimated for this template. |
<NCOMPS>
|
The number of components used in this template. NOTE:
This information is not available for iVector-based templates. |
<SHARE_ICOV>
|
Whether this template shares variance statistics with a base template. NOTE:
This information is not available for iVector-based templates. |
<SHARE_MEANS>
|
Whether this template shares mean parameters with a base template. NOTE:
This information is not available for iVector-based templates. |
<SHARE_MEANS_PERCENT>
|
The percentage of mean parameter components shared with the base template. NOTE:
This information is not available for iVector-based templates. |
|