The transcript aligner compares the speech-to-text transcript and the original transcript text to produce an aligned transcript. The aligner either uses words as whole units or breaks them down into phonemes or letters. You can therefore select one of three modes:
words
prons
(phonemes)letters
In addition, the alignment algorithm can also work in one of two polarity modes:
To run the transcript alignment task
Send an AddTask
action to HPE IDOL Speech Server, and set the following parameters:
Type
|
The task name. Set to TranscriptAlign . |
TxtFile
|
The normalized transcript file. |
CtmFile
|
The speech-to-text transcript produced for the audio file. |
Out
|
The file to write the aligned transcript to. |
MatchType
|
The mode–either words , prons , or letters . |
For example:
http://localhost:13000/action=AddTask&Type=TranscriptAlign&TxtFile=C:\data\transcript.txt&CtmFile=C:\misc\speechtext.ctm&Out=AlignedTranscript.ctm&MatchType=words
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to compare the original transcript transcript.txt
with the speech-to-text transcript speechtext.ctm
to produce an aligned transcript, AlignedTranscript.ctm
. The action instructs HPE IDOL Speech Server to use the words
alignment mode.
The output file is in the following format:
1 A 0.000 0.420 behind 1.000 1 A 0.420 7.790 it 1.000 1 A 8.210 2.870 all 1.000 1 A 11.080 0.000 <s> 1.000 1 A 11.080 0.000 Teaism 1.000 1 A 11.080 0.000 was 1.000 1 A 11.080 0.000 Taoism 1.000 1 A 11.080 0.000 in 1.000 1 A 11.080 0.000 disguise 1.000 1 A 11.080 0.000 <s> 1.000
From left to right, the columns in the output data file contain:
This action returns a token. You can use the token to:
|