Speech-to-Text with Word Filtering

The following schema describes a speech-to-text operation that is followed by a postprocessing operation that filters the results to replace any inappropriate words with a specified term (by default, this term is “<BLEEP>”).

[SpeechToTextFilter]
0 = a,ts <- audio(MONO, input)
1 = f1 <- frontend1(_, a)
2 = nf1 <- normalizer1(_, f1)
3 = w1 <- audiopreproc1(A, a, nf1)
4 = f2 <- frontend2(_, a)
5 = nf2 <- normalizer2(_, f2)
6 = w2 <- stt2(_, nf2)
7 = w3  <- postproc2(_, w2)
8 = w4 <- mixer(_, wa:w1, wb:w3)
9 = output <- wout(_, w4, ts)
DefaultResults=out
0 The audio module processes the mono audio.
1 The frontend1 module converts the audio data from 1 into speech front-end frame data.
2 The normalizer1 module normalizes the frame data from 2.
3 The audiopreproc1 module in audio classification mode processes the audio (a) and normalized frame data (nf1).
4 The frontend2 module converts the audio data from 0 (a) into speech front-end frame data.
5 The normalizer2 module normalizes the frame data from 4.
6 The stt2 module converts the normalized frame data into text.
7 The postproc2 module can apply punctuation to the text produced by 6 (w2).
8 The mixer module combines word outputs from 3 (w1) and 7 (w3) into a single timeline.
9 The wout module writes the filtered words resulting from 8 to file.

_HP_HTML5_bannerTitle.htm