The LanguageModelBuild task builds a new language model from a set of text files.
| Parameter | Description | Required |
|---|---|---|
| Type | The task name. Set to LanguageModelBuild. |
Yes |
| BaseDictionary | The base dictionary for the language model. | |
| BuildLabel | The build label to use for the language model. | |
| ContentDatabase | The IDOL Content component database to use to retrieve training text. | |
| ContentHost | The host name or IP address of the IDOL Content component to retrieve training text from. | |
| ContentPort | The ACI port of the IDOL Content component to retrieve training text from. | |
| ContentTextTag | The IDOL fields to retrieve text data from. | |
| DataList | The list of training text files. | Yes |
| DataPath | The path to the directory containing the training text files listed in DataList. | Yes |
| DiagFile | The file to write the diagnostic information to. | |
| DiagLevel | The level of detail to include in the diagnostic information. | |
| DoDctGen | Whether to generate a dictionary. | |
| DoNorm | Whether to perform text normalization. | |
| DoSmoothing | Whether to enable smoothing. | |
| DoSegment | Whether to segment text. | |
| DropList | A list of words to exclude from the vocabulary of the custom language model. | |
| KeepList | A list of words that must appear in the vocabulary of the custom language model. | |
| KeepTemp | Whether to keep the temporary text files for diagnostics | |
| Lang | The language pack to use as a foundation. | Yes |
| Log | The name of the log file to write. | |
| NewDictionary | The dictionary to generate. | Yes |
| NewLanguageModel | The custom language model to generate. | Yes |
| NewLMInfoFile | The Language Model Information file to generate. | |
| VocabSize | The maximum size of the vocabulary to include in the custom language model. |
http://localhost:15000/action=AddTask&Type=LanguageModelBuild&DataList=ListManager/Langmodel&DataPath=C:\LanguageModelFiles&Lang=ENUK-tel&NewLanguageModel=mymodel&NewDictionary=mymodel&DoSmoothing=False
This action uses the training text specified in the Langmodel list and the ENUK-tel language pack to build a new language model and dictionary file, both named mymodel. This action also calculates a recommended interpolation weight at the end of the language model building process.
The interpolation weight is only a suggested weight–you can choose to set other weights.
The new language models are placed in the custom language models folder, as specified in the IDOL Speech Server configuration file.
|
|