Audio Speech-to-Text Conversion

IDOL Speech Server can perform real-time speech-to-text conversion on a file or audio stream. The following sample configuration is for such a task.

[SpeechToText]
0 = a,ts <- audio(MONO, input)
1 = f <- frontend(_, a)
2 = nf <- normalizer(_, f)
3 = w1 <- stt(_, nf)
4 = w2  <- postproc(_, w1)
5 = output <- wout(_, w2, ts)
DefaultResults=Out
0 The audio module processes stream audio data.
1 The frontend module converts audio data into speech front-end frame data.
2 The normalizer module normalizes frame data from 1 (f).
3 The stt module converts normalized frame data from 2 (nf) into text.
4 The postproc module runs any post processing tasks on the results from 3 (w1).
5 The wout module writes the recognized words resulting from 4 (w2) to the output file.

You can also use the [SpeechToTextFilter] schema to combine audio stream-to-text conversion with speech classification in a single step so that you can then remove sections classified as music or noise from the resulting output file.


_FT_HTML5_bannerTitle.htm