The following schema describes how to train the speaker identification module with speakers.
[ivSpkIdTrainAudio] 0 = a <- audio(MONO, input) 1 = f1 <- frontend1(_,a) 2 = nf1 <- normalizer1(_, f1) 3 = w <- audiopreproc(A, a, nf1) 4 = f2 <- frontend2(_,a) 5 = nf2 <- normalizer2(BLOCK, f2) 6 = nf3 <- filter(FEAT_INCLUSIVE, nf2, w) 7 = output <- ivfile(CREATE_STREAM, nf3)
0
|
The audio module processes the mono audio data. |
1
|
The frontend1 module converts the audio data from 0 into front-end frame data. |
2
|
The normalizer1 module normalizes the frame data from 1 . |
3
|
The audiopreproc1 module in audio classification mode processes the audio (a ) and normalized frame data (nf1 ). |
4
|
The frontend2 module converts the audio data (a ) into front-end frame data. |
5
|
The normalizer2 module normalizes the frame data from 4 . |
6
|
The filter module filters the output from 5 (nf2 ) to include only frames that occur in segments that contain speech. |
7
|
The ivfile module trains the iVector feature files. |
|