The following schema describes a speech-to-text operation that is followed by a postprocessing operation that filters the results to replace any inappropriate words with a specified term (by default, this term is “<BLEEP>”
).
[SpeechToTextFilter] 0 = a,ts <- audio(MONO, input) 1 = f1 <- frontend1(_, a) 2 = nf1 <- normalizer1(_, f1) 3 = w1 <- audiopreproc1(A, a, nf1) 4 = f2 <- frontend2(_, a) 5 = nf2 <- normalizer2(_, f2) 6 = w2 <- stt2(_, nf2) 7 = w3 <- postproc2(_, w2) 8 = w4 <- mixer(_, wa:w1, wb:w3) 9 = output <- wout(_, w4, ts) DefaultResults=out
0
|
The audio module processes the mono audio. |
1
|
The frontend1 module converts the audio data from 1 into speech front-end frame data. |
2
|
The normalizer1 module normalizes the frame data from 2 . |
3
|
The audiopreproc1 module in audio classification mode processes the audio (a ) and normalized frame data (nf1 ). |
4
|
The frontend2 module converts the audio data from 0 (a ) into speech front-end frame data. |
5
|
The normalizer2 module normalizes the frame data from 4 . |
6
|
The stt2 module converts the normalized frame data into text. |
7
|
The postproc2 module can apply punctuation to the text produced by 6 (w2 ). |
8
|
The mixer module combines word outputs from 3 (w1 ) and 7 (w3 ) into a single timeline. |
9
|
The wout module writes the filtered words resulting from 8 to file. |
|