The audiopreproc
module can perform several analysis operations on audio samples, covering audio quality as well as broad audio categorization.
From the 11.2 release, audio preprocessing in HPE IDOL Speech Server uses new DNN technology, which provides improved performance and requires less tailoring of thresholds to specific audio types.
To use the new algorithm, you must set the AppDnnBase parameter, and change the task schemas. You can also balance performance and accuracy by setting the AppFrameDupl parameter.
The changes to the schemas are necessary because the new algorithm uses normalized feature vector input rather than audio samples.
Old implementation:
0 = a <- stream3(MONO, input) 1 = w1 <- audiopreproc3(A, a)
New implementation:
0 = a <- stream3(MONO, input) 1 = f1 <- frontend301(_, a) 2 = nf1 <- normalizer301(_, f1) 3 = w1 <- audiopreproc3(A, a, nf1)
For tasks that combine audio preprocessing and speech-to-text, you must include separate frontend
and normalizer
calls for audio preprocessing and speech-to-text, because the form of frontend
feature vectors needed for the two tasks might be different. For example:
For more examples, see the HPE IDOL Speech Server tasks configuration file (speechserver-tasks.cfg
). All the out-of-the-box tasks in the configuration file use the new algorithm, but the old algorithm has been retained for backwards compatibility, and can be used in the same way as before.
The audiopreproc
module has five modes of operation. You can combine multiple modes into a single operation.
Mode | Input | Output | Description |
---|---|---|---|
A
|
a
|
w
|
Performs broad audio classification, breaking down the audio into segments of speech, silence, and non-speech/music. |
C
|
a
|
|
Reports overall percentages of clipping. |
S
|
a
|
|
Calculates the signal-to-noise ratio (SNR) over the entire file. |
S
|
a
|
w
|
Calculates the signal-to-noise ratio (SNR) over the entire file as well as producing SNR estimates across broad categorized speech segments using mode A . |
T
|
a
|
d
|
DTMF detection. Identifies tones corresponding to numbers 0-9, letters A-D, asterisk (*) and hash (#) keys. |
Examples:
w ← audiopreproc (A, a)
output ← audiopreproc (S, a)
w ← audiopreproc (ACS, a)
d ← audiopreproc (T, a)
AppDnnBase | OutputLogLabel |
AppFrameDupl | SampleFrequency |
MaxSegSize | SNRFile |
MaxSilThresh | SpeechBias |
MinSegSize | SpeechLab |
MusicNoiseLab | SpeechThreshOffset |
OutputLog |
|