Run Language Identification

The process of running language identification tasks is very similar regardless of which mode it is run in. As such, the bulk of this section focuses on segmented identification, with significant differences for other modes described where appropriate.

To identify the languages in streamed audio, use one of the LangIdBndStream, LangIdCumStream, or LangIdSegStream tasks, depending on the mode that you want to use. If you already have language identification feature (.lif) files, use one of the LangIdBndLif, LangIdCumLif, or LangIdSegLif tasks. For details about these standard tasks, see the IDOL Speech Server Reference.

Use the following procedure to identify the languages in an audio file.

To identify languages in an audio file

  1. Create a list that contains the file names (including file extensions) of the classifiers to use.

    For more information about IDOL Speech Server's list manager, see Create and Manage Lists.

  2. Send an AddTask action to IDOL Speech Server, and set the following parameters:

    Type The task name. Set to LangIdSegWav for segmented mode, LangIdBndWav for boundary mode, or LangIdCumWav for cumulative mode.
    File

    The audio file to process.

    To restrict processing to a section of the audio file, set the start and end times in the wav module (for information about how to configure the wav module, see the IDOL Speech Server Reference).

    Out The file to write the language identification results to.

    Note: If you want to change the audio sample rate, or if you want to use your own custom classifiers, you must also set the ClassList parameter. You might also need to specify the ClassPath parameter, depending on the location of the classifier files. See the IDOL Speech Server Reference for more information.

    Note: If you use the base classifier pack, set the languages that you want to identify in the LangList configuration parameter in the langid module.

    If you set Type to LangIdBndWav, you must also set the OutB parameter.

    OutB The file to write the boundary point information to.

For example:

http://localhost:13000/action=AddTask&Type=LangIdSegWav&File=C:\Data\Speech.wav&ClassList=ListManager\OptClassSet&ClassPath=C:\LangID\&Out=SpeechLang1.ctm

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to identify languages in the Speech.wav file using the language classifiers specified in the OptClassSet list, and to write the identification results to the SpeechLang1.ctm file.

This action returns a token. You can use the token to:

IDOL Speech Server displays the results in XML format in your web browser. You can also open the .ctm file from the configured Speech Server temporary directory (or another location if you specified a path in the Out parameter).

The following is an example of the .ctm output produced by the LangIdSegWav task.

1 L1 0.00 30.58 English 1.000 1.252
1 L2 0.00 30.58 German 0.686 1.252
1 L3 0.00 30.58 French 0.550 1.252
1 L1 30.58 28.30 German 1.000 1.306
1 L2 30.58 28.30 English 0.562 1.306
1 L3 30.58 28.30 Italian 0.517 1.306
1 L1 58.88 31.12 English 1.000 1.295
1 L2 58.88 31.12 French 0.680 1.295
1 L3 58.88 31.12 German 0.511 1.295

From left to right, the columns in the .ctm file contain:

The example shows a 90-second file being recognized in segments, each approximately 30 seconds in duration. For the first segment, English is the language that is identified as being the most likely (L1), followed by German (L2) and French (L3). For the next segment, German has the highest confidence score. For the final segment, English has the highest confidence score again. In SEGMENTED mode, it is common to see different results for each segment, because the language might change throughout the file. CUMULATIVE mode assesses the most dominant language across the whole file, so you would not expect to see these changes.

The following example shows some of the same information displayed in XML format.

<lid_transcript>
	<lid_record>
		<start>0.000</start>
		<end>30.580</end>
		<label>English</label>
		<score>1.000</score>
		<confidence>1.252</confidence>
		<rank>1</rank>
	</lid_record>
	<lid_record>
		<start>0.000</start>
		<end>30.580</end>
		<label>German</label>
		<score>0.686</score>
		<confidence>1.252</confidence>
		<rank>2</rank>
	</lid_record>
</lid_transcript>

This output format is common to the SEGMENTED and CUMULATIVE modes. For the BOUNDARY mode, the output is a little different. The main results file, specified by the Out parameter, is unchanged in format. However, the time points occur whenever a language change is detected, instead of after a fixed time period. The BOUNDARY mode also produces an extra results file, specified by the OutB parameter. To retrieve this file, send a GetResults action that includes the parameter Label=OutB. The file provides information on the boundary change points only. For example:

1 X 21.97 21.97 English_French 1.000
1 X 61.82 61.82 French_English 1.000

From left to right, the columns in this file contain:

The following example shows the same results displayed in XML format:

<lib_transcript>
	<lib_record>
		<time>21.970</time>
		<from>English</from>
		<to>French</to>
	</lib_record>
	<lib_record>
		<time>61.820</time>
		<from>French</from>
		<to>English</to>
	</lib_record
</lib_transcript>

The XML shows that the language changed from English to French after 21.970 seconds, and then from French back into English after 61.820 seconds.


_HP_HTML5_bannerTitle.htm