The AudioAnalysis
task runs all the audio preprocessing tasks that are supported by the audiopreproc
module in a single task.
For more information, see [audiopreproc] Module Configuration.
Parameter | Description | Required |
---|---|---|
Type | The task name. Specify AudioAnalysis . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
AppFrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
AudioUpsampling | Whether to allow audio upsampling if the input audio has a sample rate too low for the task. | |
CtmFile | The speech-to-text transcript produced for the audio file. | Yes |
DoDialTones | Whether to include dial tones in addition to DTMF tones, if tone detection is enabled. | |
DoToneClass | Whether to perform DTMF and dial tone identification. | |
EndTime | The end of an audio section to process. | |
File | The audio file to process. | Yes, if InputType is File . |
InputType | The type of audio to process (file, binary data, or stream). | |
Out | The XML file to write the audio analysis results to. | Yes |
Sfreq | The sample frequency of the audio file to process. | |
SpeechBias | Whether to bias towards speech (rather than music, noise, or silence) in the identification of audio segments. | |
StartTime | The beginning of an audio section to process. | |
SugdInputChannels | The channel layout of the input media file. This parameter does not apply when InputType is Stream . |
|
SugdInputFrequency | The sampling rate of the input media file. This parameter does not apply when InputType is Stream . |
http://localhost:15000/action=AddTask&Type=AudioAnalysis&File=C:\data\Sample.wav&Out=SampleAnalysis.xml
This action performs audio analysis on the Sample.wav
file and writes the results to the SampleAnalysis.xml
file.
The AudioAnalysis
log file provides information on several audio quality assessments. For example:
<autnresponse> <audiopreproc> <snr> <mean>20</mean> <audio_level>66</audio_level> </snr> <gain> <size>35</size> <energy>69</energy> </gain> <max_gain_difference>0</max_gain_difference> <clipping> <assessment>no</assessment> <percent_frames>0</percent_frames> </clipping> <categories> <speech_percent>77.3667</speech_percent> <silence_percent>7.45</silence_percent> <noise_music_percent>15.9</noise_music_percent> </categories> </audiopreproc> <resultDeleted>False</resultDeleted> </autnresponse>
The log file includes information on the following:
The gain level, and the actual energy level. The log file also includes a summary of the maximum difference in decibels between speaker levels across the whole file (<max_gain_difference>
). For a good quality waveform where the two speakers speak at a similar gain level, this number can be zero (or at least very low).
An assessment of the amount of clipping in the file, and the number of frames affected. The <assessment>
field can hold one of the following values:
|
no clipping |
insignificant
|
<= 0.1% of frames |
minor
|
<= 1% of frames |
moderate
|
<= 4% of frames |
heavy
|
> 4% of frames |
You can use the GetResults
action to retrieve this information; you do not need to specify a result label.
The AudioAnalysis
task also produces an additional audio classification .ctm
file. By default, this has the same name as the task token. You can use the GetResults
action with the label
parameter set to class
to retrieve this file.
|