HPE IDOL Speech Server allows you to adapt the acoustic models that are available out of the box to more closely match the acoustic properties of particular sets of audio data. Adapting the model using data that closely represents (in terms of recording quality and accents) the audio that you expect to process should improve speech-to-text results.
Adapting an acoustic model involves a series of steps:
Prepare the data set. The data set must include audio and verbatim transcripts of the audio. Preparation of the files involves:
WavToPlh
task converts the files.TranscriptAlign
task can produce these timestamps.AmTrain
task to ingest the audio and transcription data. AmTrainFinal
task to produce the updated acoustic model. These steps are covered in detail in the following sections.
Note: For this procedure, your HPE IDOL Speech Server license must include the align
module (required for transcription alignment).
|