Japanese, Korean, Mandarin, and Taiwanese Mandarin languages do not separate words with whitespace. You must segment text in these languages into words before IDOL Speech Server can process them.
To segment text
Send an AddTask
action to IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to SegmentText . |
Lang
|
The language pack to use. |
TxtFileIn
|
The text file to segment. |
TxtFileOut
|
The text file to write the segmented text to. |
Pgf
|
The pronunciation information file to use. |
To exempt a section of text from segmentation, move the section to a new line and add hash symbols (#) at the beginning and end of the section. You must also set the IgnoreHashLines
parameter:
IgnoreHashLines
|
Set to True to exempt sections bounded by hash symbols from segmentation. |
For example:
http://localhost:15000/action=AddTask&Type=SegmentText&Lang=JAJP&TxtFileIn=C:/Data/Japanese.txt&TxtFileOut=JA_seg.txt&PgfFile=T:\LP\ENUK\ver-ENUK-5.0.pgf
This action segments text in the Japanese.txt
file and writes the results to the JA_seg.txt
file in the Temp
directory.
|