LanguageModelBuild

The LanguageModelBuild task builds a new language model from a set of text files.

Parameters

Parameter Description Required
Type The task name. Set to LanguageModelBuild. Yes
BaseDictionary The base dictionary for the language model.  
DataList The list of training text files. Yes
DataPath The path to the directory containing the training text files listed in DataList. Yes
DoDctGen Whether to generate a dictionary.  
DoNorm Whether to perform text normalization.  
DoSmoothing Whether to enable smoothing.  
DoSegment Whether to segment text.  
DropList A list of words to exclude from the vocabulary of the custom language model.  
KeepList A list of words that must appear in the vocabulary of the custom language model.  
Lang The language pack to use as a foundation. Yes
Log The name of the log file to write.  
NewLanguageModel The custom language model to generate. Yes
NewDictionary The dictionary to generate. Yes
VocabSize The maximum size of the vocabulary to include in the custom language model.  

Example

http://localhost:13000/action=AddTask&Type=LanguageModelBuild&DataList=ListManager/Langmodel&DataPath=C:\LanguageModelFiles&Lang=ENUK-tel&NewLanguageModel=mymodel&NewDictionary=mymodel&DoSmoothing=False

This action uses port 13000 to instruct HPE IDOL Speech Server, which is located on the local machine, to use the training text specified in the Langmodel list and the ENUK-tel language pack to build a new language model and dictionary file, both named mymodel. This action also calculates a recommended interpolation weight at the end of the language model building process.

Note: The interpolation weight is only a suggested weight–you can choose to set other weights.

The new language models are placed in the custom language models folder, as specified in the HPE IDOL Speech Server configuration file.


_HP_HTML5_bannerTitle.htm