Run Language Identification

The process of running language identification tasks is very similar regardless of which mode it is run in. As such, the bulk of this section focuses on segmented identification, with significant differences for other modes described where appropriate.

To identify the languages in streamed audio, use the LangId task, with LIDMode set to the appropriate language identification mode. For more information about this standard tasks, see the IDOL Speech Server Reference.

Use the following procedure to identify the languages in an audio file.

To identify languages in an audio file

  1. Create a list that contains the file names (including file extensions) of the classifiers to use.

    For more information about IDOL Speech Server's list manager, see Create and Manage Lists.

  2. Send an AddTask action to IDOL Speech Server, and set the following parameters:

    Type The task name. Set to LangId.
    LidMode The mode to use. Set to Segmented for segmented mode (this is the default), Boundary for boundary mode, or Cumulative for cumulative mode.
    File

    The audio file to process.

    To restrict processing to a section of the audio file, set the StartTime and EndTime parameters. For more information, see the IDOL Speech Server Reference).

    Out The file to write the language identification results to.
    • If you want to change the audio sample rate, or if you want to use your own custom classifiers, you must also set the ClassList parameter. You might also need to specify the ClassPath parameter, depending on the location of the classifier files. See the IDOL Speech Server Reference for more information.

    • If you use the base classifier pack, set the languages that you want to identify in the LangList configuration parameter in the langid module.
    • If you want to use open set language identification, you must also set the ClosedSet parameter to False. For more information about open set language identification, see Open Set Language Identification and the IDOL Speech Server Reference.

For example:

http://localhost:13000/action=AddTask&Type=LangId&LIDMode=Segmented&File=C:\Data\Speech.wav&ClassList=ListManager\OptClassSet&ClassPath=C:\LangID\&Out=SpeechLang1.ctm

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to identify languages in the Speech.wav file using the language classifiers specified in the OptClassSet list, and to write the identification results to the SpeechLang1.ctm file.

This action returns a token. You can use the token to:

IDOL Speech Server displays the results in XML format in your web browser. You can also open the .ctm file from the configured IDOL Speech Server temporary directory (or another location if you specified a path in the Out parameter).

The following is an example of the .ctm output produced by the LangId task in Segmented mode.

1 L1 0.00 30.58 English 1.000 1.252
1 L2 0.00 30.58 German 0.686 1.252
1 L3 0.00 30.58 French 0.550 1.252
1 L1 30.58 28.30 German 1.000 1.306
1 L2 30.58 28.30 English 0.562 1.306
1 L3 30.58 28.30 Italian 0.517 1.306
1 L1 58.88 31.12 English 1.000 1.295
1 L2 58.88 31.12 French 0.680 1.295
1 L3 58.88 31.12 German 0.511 1.295

From left to right, the columns in the .ctm file contain:

The example shows a 90-second file being recognized in segments, each approximately 30 seconds in duration. For the first segment, English is the language that is identified as being the most likely (L1), followed by German (L2) and French (L3). For the next segment, German has the highest confidence score. For the final segment, English has the highest confidence score again.

In Segmented mode, it is common to see different results for each segment, because the language might change throughout the file. Cumulative mode assesses the most dominant language across the whole file, so you would not expect to see these changes.

The following example shows some of the same information displayed in XML format.

<lid_transcript>
	<lid_record>
		<start>0.000</start>
		<end>30.580</end>
		<label>English</label>
		<score>1.000</score>
		<confidence>1.252</confidence>
		<rank>1</rank>
	</lid_record>
	<lid_record>
		<start>0.000</start>
		<end>30.580</end>
		<label>German</label>
		<score>0.686</score>
		<confidence>1.252</confidence>
		<rank>2</rank>
	</lid_record>
</lid_transcript>

This output format is common to the Segmented and Cumulative modes. The output format for Boundary mode is similar, but the time points occur whenever a language change is detected, instead of after a fixed time period.


_HP_HTML5_bannerTitle.htm