The LanguageModelBuild
task builds a new language model from a set of text files.
Parameter | Description | Required |
---|---|---|
Type | The task name. Set to LanguageModelBuild . |
Yes |
BaseDictionary | The base dictionary for the language model. | |
DataList | The list of training text files. | Yes |
DataPath | The path to the directory containing the training text files listed in DataList. | Yes |
DoDctGen | Whether to generate a dictionary. | |
DoNorm | Whether to perform text normalization. | |
DoSmoothing | Whether to enable smoothing. | |
DoSegment | Whether to segment text. | |
DropList | A list of words to exclude from the vocabulary of the custom language model. | |
KeepList | A list of words that must appear in the vocabulary of the custom language model. | |
Lang | The language pack to use as a foundation. | Yes |
Log | The name of the log file to write. | |
NewLanguageModel | The custom language model to generate. | Yes |
NewDictionary | The dictionary to generate. | Yes |
VocabSize | The maximum size of the vocabulary to include in the custom language model. |
http://localhost:13000/action=AddTask&Type=LanguageModelBuild&DataList=ListManager/Langmodel&DataPath=C:\LanguageModelFiles&Lang=ENUK-tel&NewLanguageModel=mymodel&NewDictionary=mymodel&DoSmoothing=False
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to use the training text specified in the Langmodel
list and the ENUK-tel
language pack to build a new language model and dictionary file, both named mymodel
. This action also calculates a recommended interpolation weight at the end of the language model building process.
The interpolation weight is only a suggested weight–you can choose to set other weights.
The new language models are placed in the custom language models folder, as specified in the HPE IDOL Speech Server configuration file.
|