Some common situations can reduce the performance of language identification. For example:
Music and noise in audio |
Significant amounts of music and noise in audio can seriously impact the language identification results. To reduce the impact, you can enable speech detection and then skip all audio that is not identified as speech.
When you train a language classifier, you can set these parameters in the modules used by the
LangIdFeature task. |
Poorly balanced language classifiers |
If particular languages perform significantly better than others, or HPE IDOL Speech Server consistently misidentifies sections as particular languages (false positive results), the language classifiers might not be optimally balanced. To resolve this issue, you can repeat the optimization of the language classifier set using extra data (such as the files being processed when you detected the problem) to update the classifier weights. You can also manually adjust the weighting by lowering the weights of classifiers that are being over-identified (false positives), and raising the weights of classifiers for languages that are frequently not identified (false negatives). |
Mismatched audio quality between training, optimization, and identification | The audio that you use for training and optimization must closely match the audio that is processed for identification, in terms of recording type, quality, accents, and so on. Any significant mismatch in the data degrades performance. |
Poor detection of language boundaries | You can use several parameters (such as WindowSize , SegmentStepSize , SegmentSmoothWin , SegmentThreshold , and so on) to tune the boundary detection algorithm. For more details about the parameters, see the HPE IDOL Speech Server Reference. |
|