Stereo Speech-to-Text Conversion

To perform speech-to-text conversion on stereo audio input data, each channel can be processed separately. For example:

[SpeechToText]
0 = l,r ← audio(STEREO, input)
1 = f1 ← frontend1(_, a:l) 
2 = nf1 ← normalizer1(_, f1)
3 = w1 ← stt1(_, nf1)
4 = output ← wout1(_, w1)
5 = f2 ← frontend2(_, a:r)
6 = nf2 ← normalizer2(_, f2)
7 = w2 ← stt2(_, nf2)
8 = output ← wout2(_, w2)
0 The audio module processes the input stereo audio file as left and right audio data.
1 The frontend1 module converts left audio channel (l) into speech front-end frame data. In this step, the variable form a:l represents the change of name for the left channel audio data (type l) to audio data (type a).
2 The normalizer1 module normalizes the frame data from 1 (f1).
3 The stt1 module converts the normalized frame data from 2 (nf1) into text.
4 The wout1 module writes the recognized words resulting from 3 (w1) to the output file.
5 The frontend2 module converts right audio channel (r) into speech front-end frame data. In this step, the variable form a:r represents the change of name for the right channel audio data (type r) to audio data (type a).
6 The normalizer2 module normalizes frame data from 5 (f2).
7 The stt2 module converts the normalized frame data from 6 (nf2) into text.
8 The wout2 module writes the recognized words resulting from 7 (w2) to the output file.

_HP_HTML5_bannerTitle.htm