Troubleshooting

Some common situations can reduce the performance of language identification. For example:

Music and noise in audio

Significant amounts of music and noise in audio can seriously impact the language identification results.

To reduce the impact, you can enable speech detection and then skip all audio that is not identified as speech.

  1. Set DetectSpeech=True in the frontend module configuration that the language identification task uses.
  2. Set ZeroSilFrames=True in the normalizer module configuration section that the language identification task uses.
NOTE: When you train a language classifier, you can set these parameters in the modules used by the LangIdFeature task.
Poorly balanced language classifiers

If particular languages perform significantly better than others, or IDOL Speech Server consistently misidentifies sections as particular languages (false positive results), the language classifiers might not be optimally balanced.

To resolve this issue, you can repeat the optimization of the language classifier set using extra data (such as the files being processed when you detected the problem) to update the classifier weights.

You can also manually adjust the weighting by lowering the weights of classifiers that are being over-identified (false positives), and raising the weights of classifiers for languages that are frequently not identified (false negatives).

Mismatched audio quality between training, optimization, and identification The audio that you use for training and optimization must closely match the audio that is processed for identification, in terms of recording type, quality, accents, and so on. Any significant mismatch in the data degrades performance.
Poor detection of language boundaries You can use several parameters (such as WindowSize, SegmentStepSize, SegmentSmoothWin, SegmentThreshold, and so on) to tune the boundary detection algorithm. For more details about the parameters, see the IDOL Speech Server Reference.
False hits in open set language identification.

If too many audio segments from unknown languages are returning as results for a known languages, you might want to repeat the optimization stage, providing data that is representative of the data you are using to run identification.

Additionally, if misses (true language sections labeled as unknown) are not important, you can use the ThresholdScale parameter in the langid module configuration to modify the thresholds. Decreasing the threshold scale increases the threshold, which means fewer false hits.

Missing results in open set language identification

If too many audio segments from known languages are return as unknown results, you might want to repeat the optimization stage, providing data that is representative of the data you are using to run identification.

Additionally, you can use the ThresholdScale parameter in the langid module configuration to modify the thresholds. Increasing the threshold scale decreases the threshold, which means fewer misses.


_HP_HTML5_bannerTitle.htm