Spoken Language Identification

The following diagram shows the modules in IDOL Speech Server that enable spoken language identification in a single step.

 

The audio module reads the audio file or stream and prepares windowed data.

a is the audio window series.

 

The frontend module takes each window of samples and converts it to a feature vector.

f is the feature vector series.


The normalizer module adjusts the feature vectors to produce normalized feature vectors.

nf is the normalized feature vector series.


The lidfeature module searches the feature vector time series to create phoneme-based language identification features.

lf is the language identification feature.


The langid module analyzes the features and determines the identified languages in the audio data.

lid is the output time-marked language identification data.


The lidout module prepares the output language labels and time positions for storage and result reporting.

The schema that implements this feature is:

[MyLangId]
a, ts ← audio (MONO, input)
f ← frontend (_, a)
nf ← normalizer (_, f)
lf ← lidfeature (_, nf)
lid ← langid (_, lf)
output ← lidout (_, lid, ts)

_HP_HTML5_bannerTitle.htm