Spoken Language Identification

The following diagram shows the modules in HPE IDOL Speech Server that enable spoken language identification in a single step.

 

The wav module reads the audio file and prepares windowed data.

a is the audio window series.

 

The frontend module takes each window of samples and converts it to a feature vector.

f is the feature vector series.


The normalizer module adjusts the feature vectors to produce normalized feature vectors.

nf is the normalized feature vector series.


The lidfeature module searches the feature vector time series to create phoneme-based language identification features.

lf is the language identification feature.


The langid module analyzes the features and determines the identified languages in the audio data.

w is the output time-marked word series.


The lidout module prepares the output language labels and time positions for storage and result reporting.

The schema that implements this feature is:

[MyLangId]
a ← wav (MONO, input)
f ← frontend (_, a)
nf ← normalizer (_, f)
lf ← lidfeature (_, nf)
lid ← langid (CUMULATIVE, lf)
output ← lidout (_, lid)

_HP_HTML5_bannerTitle.htm