The following schema describes how to run speaker identification given a set of templates.
[ivSpkId] 0 = a,ts <- audio(MONO, input) 1 = w1 <- speakerid(GENDER_DETECT, a) 2 = f1 <- frontend1(_,a) 3 = nf1 <- normalizer1(_, f1) 4 = w2 <- audiopreproc(A, a, nf1) 5 = f2 <- frontend2(_,a) 6 = nf2 <- normalizer2(SEGMENT_BLOCK, f2, w1) 7 = nf3 <- filter(FEAT_INCLUSIVE, nf2, w2) 8 = sid <- ivScore(SEGMENT, nf3, w1) 9 = output <- sidout(_, sid, ts)
0
|
The audio module processes the mono audio data. |
1
|
The speakerid module takes the audio data (a ) and outputs speaker turn segments. |
2
|
The frontend1 module converts the audio data from 1 into front-end frame data. |
3
|
The normalizer1 module normalizes the frame data from 2 . |
4
|
The audiopreproc1 module in audio classification mode processes the audio (a ) and normalized frame data (nf1 ). |
5
|
The frontend2 module converts the audio data from 0 into front-end frame data. |
6
|
The normalizer2 module normalizes the frame data from 5 . |
7
|
The filter module filters the output from 6 (nf2 ) to include only frames that occur in segments that contain speech, by using the audio classification data from 4 (w2 ). |
4
|
The ivscore module takes audio features from 7 (nf3 ) and speaker segment information (w1 ), and produces a set of iVector speaker scores for each segment. |
9
|
The sidout module takes the speaker ID score information (sid ) and writes this information into a results file. |
|