Detect Indexed Clips in Audio

IDOL Speech Server can search an audio file or stream for clips that are present in an AFP database. In both cases, you use the AfpMatch task.

For more details about this standard task, see the IDOL Speech Server Reference.

To search audio for known clips

For example:

http://localhost:13000/action=AddTask&Type=AfpMatch&File=C:\Data\Sample.wav&Pack=Adverts&PackDir=C:\resources&Out=SearchResults.ctm

This action uses port 13000 to instruct IDOL Speech Server, which is located on the local machine, to search the Sample.wav file for sections that match audio clips in the Adverts database, and to write the results to the SearchResults.ctm file.

This action returns a token. You can use the token to:

Because the AFP task produces results while it processes the audio data, you can retrieve results before the task is complete.

The results are stored in CTM format, but you can send the GetResults action to view them in either CTM or XML format. An example of an AFP result in CTM format is:

1 1 2994.75 0.84 ADVERT1 5.59 2991.32 2995.08 2990.96 14 21
1 1 2995.59 0.55 ADVERT1 4.91 2991.32 2996.00 2990.96 15 23
1 1 2996.14 0.73 ADVERT1 5.30 2991.32 2996.60 2990.96 18 28
1 1 2996.87 0.52 ADVERT1 5.54 2991.32 2996.92 2990.96 20 31
1 1 2997.39 1.11 ADVERT1 4.82 2991.32 2997.96 2990.96 21 32
1 1 2998.50 1.22 ADVERT1 4.68 2991.32 2998.80 2990.96 23 35
1 1 2999.72 1.14 ADVERT1 4.45 2991.32 3000.08 2990.96 25 39
1 1 3000.86 2.35 ADVERT1 3.88 2991.32 3002.92 2990.96 29 45

From left to right, the columns in the output data file contain:

You should view each result line as an update to an ongoing match, rather than a complete result. While IDOL Speech Server processes the audio, it might return multiple results for the same track, starting at the same point in the audio. In this case, the number of hits increases between successive results, and the current end point of the match increases.

The final result in such a sequence is the complete section match result for the specified hypothesis. For example, in the previous example, the match of the reference track ADVERT1 completes with the result:

1 1 3000.86 2.35 ADVERT1 3.88 2991.32 3002.92 2990.96 29 45

This result shows that the processed audio matched the audio for the track ADVERT1 stored in the database between 2991.32s and 3002.92s. The estimated start point of the ADVERT1 data is actually 2990.96s (which suggests that the first 0.36s of the file was not matched). The match scored 29 hits, which is 45% of the total audio fingerprint features in the database clip. The last section of audio analyzed that contributed to this match was from 3000.86s to 3003.21s.

The following example shows the final output from the previous ADVERT1 match in XML format:

<afp_record>
	<rank>1</rank>
	<label>ADVERT1</label>
	<start>2991.32</start>
	<end>3002.92</end>
	<output>3003.210</output>
	<eststart>2990.96</eststart>
	<hits>29</hits>
	<score>45</score>
	<hitrate>2.636</hitrate>
	<scorerate>0.000</scorerate>
	<scoreavg>0.000</scoreavg>
</afp_record>

The <output> tags record the time that the final result was produced (the endpoint for processing the target audio, not just the matching section)

The <scorerate> and <scoreavg> tags are reserved for future functionality and always contain the value 0.000.


_HP_HTML5_bannerTitle.htm