The AudioAnalysis
task runs all the audio preprocessing tasks that are supported by the audiopreproc
module in a single task.
For more information, see [audiopreproc] Module Configuration.
Parameter | Description | Required |
---|---|---|
Type | The task name. Specify AudioAnalysis . |
Yes |
AppDnnBase | The location of the appResources directory, which contains the DNN and .ian files to use. |
|
CtmFile | The speech-to-text transcript produced for the audio file. | Yes |
File | The audio file to process. | Yes |
FrameDupl | The balance between performance and speed for audio preprocessing DNN classification. | |
Out | The XML file to write the audio analysis results to. | Yes |
Sfreq | The sample frequency of the audio file to process. | |
SugdInputChannels | The channel layout of the input media file. | |
SugdInputFrequency | The sampling rate of the input media file. |
http://localhost:13000/action=AddTask&Type=AudioAnalysis&File=C:\data\Sample.wav&Out=SampleAnalysis.xml
This action uses port 13000
to instruct HPE IDOL Speech Server, which is located on the local machine, to perform audio analysis on the Sample.wav
file and to write the results to the SampleAnalysis.xml
file.
The AudioAnalysis
log file provides information on several audio quality assessments. For example:
<autnresponse> <audiopreproc> <snr> <mean>20</mean> <audio_level>66</audio_level> </snr> <gain> <size>35</size> <energy>69</energy> </gain> <max_gain_difference>0</max_gain_difference> <clipping> <assessment>no</assessment> <percent_frames>0</percent_frames> </clipping> <categories> <speech_percent>77.3667</speech_percent> <silence_percent>7.45</silence_percent> <noise_music_percent>15.9</noise_music_percent> </categories> </audiopreproc> <resultDeleted>False</resultDeleted> </autnresponse>
The log file includes information on the following:
The gain level, and the actual energy level. The log file also includes a summary of the maximum difference in decibels between speaker levels across the whole file (<max_gain_difference>
). For a good quality waveform where the two speakers speak at a similar gain level, this number can be zero (or at least very low).
An assessment of the amount of clipping in the file, and the number of frames affected. The <assessment>
field can hold one of the following values:
|
no clipping |
insignificant
|
<= 0.1% of frames |
minor
|
<= 1% of frames |
moderate
|
<= 4% of frames |
heavy
|
> 4% of frames |
You can use the GetResults
action to retrieve this information; you do not need to specify a result label.
The AudioAnalysis
task also produces an additional audio classification .ctm
file. By default, this has the same name as the task token. You can use the GetResults
action with the label
parameter set to class
to retrieve this file.
|