Word Recompounding

Some languages use word compounding, which means that typical words are too long for optimal speech recognition. Hungarian, Turkish. and Hebrew are typical examples. In these cases, you must break words down into constituent word fragments (or morphemes) for best performance.

The following example schema takes a CTM file that contains word fragments, and recompounds them into words.

[Recompound]
0 = w1 <- ctm(READ, input)
1 = w2 <- postproc(R, w1)
2 = output <- wout(_,w2)

[ctm]
file = $params.in
format = ctm

[postproc]
rcmpAllowSuffix = true
rcmpValidList = $params.validList

[wout]
file = $params.out
format = ctm 
0 The ctm module reads the word fragment labels (with prefix and suffix hyphens indicating fragments).
1 The postproc module (mode R) adds prefixes to the following word and suffixes to the preceding word.
2 Output to a new CTM file.

_FT_HTML5_bannerTitle.htm