After you have selected and prepared the training text files, you can build the custom language model.
To build the language model
Create a list that contains the file names (including file extensions) of all training text files. You do not have to include the file paths because you can use the DataPath
parameter to specify the directory path in the next step.
For more information about IDOL Speech Server's list manager, see Create and Manage Lists.
Send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to LanguageModelBuild . |
ContentDatabase
|
The IDOL Content component database to retrieve text data from. This parameter has an effect only if you set ContentHost . If you set ContentHost but do not set this parameter, IDOL Speech Server retrieves text from all databases. |
ContentHost
|
The host name or IP address of the IDOL Content component that you want to retrieve training text data from. |
ContentPort
|
The ACI port of the IDOL Content component that you want to retrieve training text data from. This parameter has an effect only if you set ContentHost . By default, IDOL Speech Server uses port 9100 to contact the IDOL Content component. |
ContentTextTag
|
The IDOL document fields that contain the text that you want to use to train the language model. Separate multiple field names with spaces, commas, or plus symbols (+). By default, IDOL Speech Server uses the content of the DRECONTENT document field as training text. |
DataList
|
The list that specifies the training text files. |
DataPath
|
The path to the directory that contains the files specified in the DataList parameter. |
KeepList
|
The path to a file that contains a list of words that the language model must contain. For more information on the format of the file, see the IDOL Speech Server Reference. |
Lang
|
The language pack to use as a base (for example, ENUK-tel ). |
NewLanguageModel
|
The name to give the custom language model that is generated. You must include the file extension (.tlm ) in the parameter. |
NewLmInfoFile
|
The output Language Model Information file name. If you set this parameter, you must include the file extension ( If you do not set this parameter, the file has the same as the generated language model (and is located in the same directory), but with the extension NOTE:
You can use the |
NewDictionary
|
The name of the dictionary to generate; usually it is the same value as If you do not set the |
DoSmoothing
|
If you are using a custom language model for a transcript alignment task, set DoSmoothing to False . Otherwise, you can use the default value of True . |
If the training text files contain Japanese, Korean, Mandarin, or Taiwanese Mandarin languages, set the DoSegment
parameter.
DoSegment
|
Set to True to enable text segmentation. |
For example:
http://localhost:13000/action=AddTask&Type=LanguageModelBuild&DataList=ListManager/Langmodel&DataPath=C:\LanguageModelFiles&Lang=ENUK-tel&NewLanguageModel=mymodel.tlm
This action uses port 13000
to instruct IDOL Speech Server, which is located on the local machine, to use the training text specified in the Langmodel
list and the ENUK-tel
language pack to build a new language model and dictionary file, both named mymodel
. This action also calculates a recommended interpolation weight at the end of the language model building process.
The interpolation weight is only a suggested weight–you can choose to set other weights.
The new language models are placed in the custom language models folder that is specified by the CustomLmDir
parameter in the IDOL Speech Server configuration file.
This action returns a token. You can use the token to:
GetResults
action to retrieve the recommended interpolation weight for the custom language model. See Get Task Results.
|