Audio samples are used for training speaker templates, estimating speaker score thresholds, and for identification. There are tasks supporting these functions that work directly from an audio file (or stream). However, these tasks can only take a single audio source as input, due to the pipeline used for processing the audio.
For template training and for threshold estimation, you might want to use multiple audio files in a single task. To do this, create a speaker ID feature file for each audio file that you want to use, and then present the set of feature files to HPE IDOL Speech Server.
To create a speaker ID feature file
Send an AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SpkIdFeature . |
File
|
The audio file that contains sample speech from one person. |
Out
|
The name of the speaker ID feature file to create. |
For example:
http://localhost:15000/action=AddTask&Type=SpkIdFeature&File=C:/Data/BrownSpeech1.wav&out=BrownSpeech1.atv
This action uses port 15000
to instruct HPE IDOL Speech Server, which is located on the local machine, to create the BrownSpeech1.atv
feature file using the BrownSpeech1.wav
file.
This action returns a token. You can use the token to:
You can set additional parameters. For details of the optional parameters, see the HPE IDOL Speech Server Reference.
|