Spoken Language Identification

HPE IDOL Speech Server can analyze audio data for numerous languages across the world. HPE IDOL Speech Server contains a universal phoneme decoder that detects a wide range of phonemes across multiple languages. After HPE IDOL Speech Server extracts phoneme information from an audio file, it performs a statistical analysis on the general distribution of the phonemes to estimate the identity of the spoken language.

HPE IDOL Speech Server requires a language classifier file for each language that you want to identify. HPE IDOL Speech Server provides a classifier pack that covers the following 20 languages for broadband (16 kHz) audio. To identify any other languages or dialects, or to process telephony (8 kHz) audio, you must create your own classifiers.

Note: Telephony data quality varies hugely, therefore HPE IDOL Speech Server requires classifiers that are trained on representative data.

Arabic Hebrew Portuguese
Danish Italian Romanian
Dutch Japanese Russian
English Korean Slovak
French Mandarin Spanish
German Persian Swedish
Greek Polish  

Although you can perform language identification tasks out of the box, HPE recommends that you optimize the classifiers on your own audio data. For instructions, see Optimize the Language Identification Set.

For best results, train the language identification system to detect only the list of languages expected in the audio file. To train the language identification system, you need samples of typical audio that contain only the labeled spoken language. HPE recommends that you remove any non-speech sections in the audio, such as music. However, there is no need to remove sections of silence.

Note: You do not require audio transcriptions to train the language identification system.

The training process involves performing a phoneme analysis on the training data and then using the analysis results to build a statistical model.

To train the language identifier

  1. Decide which languages you want to detect.
  2. Obtain the audio data. You need:

  3. Train the individual language identification models (see Create Your Own Language Classifiers).
  4. If required, create an ‘other’ language identification model that uses data from the ‘other’ languages.
  5. Create the combined language identification system, putting together all the individual language identifiers.

Spoken language identification is affected by the same factors that affect speech-to-text and other speech processing.

Spoken language identification is text independent.

The data processing flow to perform language identification is:

  1. Process the audio file to identify the phoneme sequences from the universal phoneme set.
  2. Use that information to identify the spoken language.

_HP_HTML5_bannerTitle.htm