[audiopreproc] Module Configuration

The audiopreproc module can perform several analysis operations on audio samples, covering audio quality as well as broad audio categorization.

Note: From the 11.2 release, audio preprocessing in HPE IDOL Speech Server uses new DNN technology, which provides improved performance and requires less tailoring of thresholds to specific audio types.

To use the new algorithm, you must set the AppDnnBase parameter, and change the task schemas. You can also balance performance and accuracy by setting the AppFrameDupl parameter.

The changes to the schemas are necessary because the new algorithm uses normalized feature vector input rather than audio samples.

For tasks that combine audio preprocessing and speech-to-text, you must include separate frontend and normalizer calls for audio preprocessing and speech-to-text, because the form of frontend feature vectors needed for the two tasks might be different. For example:

For more examples, see the HPE IDOL Speech Server tasks configuration file (speechserver-tasks.cfg). All the out-of-the-box tasks in the configuration file use the new algorithm, but the old algorithm has been retained for backwards compatibility, and can be used in the same way as before.

Input and Output

The audiopreproc module has five modes of operation. You can combine multiple modes into a single operation.

Mode Input Output Description
A a w Performs broad audio classification, breaking down the audio into segments of speech, silence, and non-speech/music.
C a   Reports overall percentages of clipping.
S a   Calculates the signal-to-noise ratio (SNR) over the entire file.
S a w Calculates the signal-to-noise ratio (SNR) over the entire file as well as producing SNR estimates across broad categorized speech segments using mode A.
T a d DTMF detection. Identifies tones corresponding to numbers 0-9, letters A-D, asterisk (*) and hash (#) keys.

Examples:

w ← audiopreproc (A, a)
output ← audiopreproc (S, a)
w ← audiopreproc (ACS, a)
d ← audiopreproc (T, a)

Parameters

AppDnnBase OutputLog
AppFrameDupl OutputLogLabel
MaxSegSize SampleFrequency
MaxSilThresh SNRFile
MinSegSize SpeechLab
MusicNoiseLab SpeechThreshOffset

_HP_HTML5_bannerTitle.htm