The following schema describes a speech-to-text operation that is followed by a postprocessing operation that filters the results to replace any inappropriate words with a specified term (by default, this term is “<BLEEP>”
).
[WavToTextFilter] 0 = a ← wav(MONO, input) 1 = f ← frontend(_, a) 2 = nf ← normalizer(_, f) 3 = w1 ← stt (_, nf) 4 = w2 ← postproc(B, w1) 5 = output ← wout (_,w2) DefaultResults = out
0
|
The wav module processes the mono audio. |
1
|
The frontend module converts the audio data into speech front-end frame data. |
2
|
The normalizer module normalizes the frame data. |
3
|
The stt module converts the normalized frame data into text. |
4
|
The postproc module replaces barred words in the text with a specified term. |
5
|
The wout module writes the filtered words resulting from 4 to file. |
|