Combine Output from Multiple Channels

The following schema describes how to use the mixer module to combine word output from multiple modules into a single timeline.

[StereoSpeechToText2]
0 = l,r,ts ← audio(STEREO, Input) 
1 = f1 ← frontend1(_, a:l) 
2 = nf1 ← normalizer1(_, f1) 
3 = w1 ← stt1(_, nf1) 
4 = f2 ← frontend2(_, a:r) 
5 = nf2 ← normalizer2(_, f2) 
6 = w2 ← stt2(_, nf2) 
7 = w3 ← mixer(_, wa:w1, wb:w2)
8 = output ← wout(_, w3, ts)
0 The audio module processes the input stereo audio file as left and right audio data.
1 The frontend1 module converts the left audio channel (l) into speech front-end frame data. In this step, the variable form a:l represents the change of name for the left channel audio data (type l) to audio data (type a).
2 The normalizer1 module normalizes the frame data from 1 (f1).
3 The stt1 module converts the normalized frame data from 2 (nf1) into text.
4 The frontend2 module converts the right audio channel (r) into speech front-end frame data. In this step, the variable form a:r represents the change of name for the right channel audio data (type r) to audio data (type a).
5 The normalizer2 module normalizes the frame data from 5 (f2).
6 The stt2 module converts the normalized frame data from 6 (nf2) into text.
7 The mixer module combines the recognized words resulting from 3 (w1) and from 6 (w2) into a single word output timeline (w3).
8 The wout module writes the recognized words resulting from 7 (w3) to the output file.

_HP_HTML5_bannerTitle.htm