The following diagram shows the modules in IDOL Speech Server that enable spoken language identification in a single step.
|
The a is the audio window series.
The f is the feature vector series.
The nf is the normalized feature vector series.
The lf is the language identification feature.
The lid is the output time-marked language identification data.
The |
The schema that implements this feature is:
[MyLangId] a, ts ← audio (MONO, input) f ← frontend (_, a) nf ← normalizer (_, f) lf ← lidfeature (_, nf) lid ← langid (_, lf) output ← lidout (_, lid, ts)
|