You must create a speaker template for each speaker that you want to identify.
Micro Focus recommends that you should use a minimum of five minutes of speech data for each speaker. There should be no speech from other speakers present in the training audio. In general, you should use good quality audio samples that contain only the speaker’s voice and no significant background noise. However, it is important where possible to include examples of the speaker from a typical range of recording situations and environments that might be expected in the final system (indoors, outdoors, noisy, and so on). The spoken content can contain any vocabulary.
There are two ways to train speaker templates:
Single audio file: IDOL Speech Server takes a single audio file that contains speech from a single speaker, and generates a single speaker template file. This approach is the most straightforward, but if the speech data you wish to use is stored in multiple audio files, you must use a third-party audio editing tool to combine them into a single file.
|