Audio fingerprint (AFP) identification is the process of analyzing audio data to identify occurrences of known audio clips, such as specific pieces of music or particular adverts. This process is useful for detecting when specific adverts occur, to check for copyrighted music being played, to pick out commercial jingles, and so on.
The first step is to build a database of the audio clips to identify. Each clip is represented in the database by a sequence of distinctive features. Speech Server can analyze incoming audio to detect occurrences of the stored clips. Speech Server compares the target audio to the database clips and identifies sections that closely resemble the database clips.
Speech Server supports two approaches to performing audio fingerprinting:
AfpMatchWav
, AfpMatchStream
, and so on). This approach is highly scaleable, and very fast. AfptMatchWav
, AfptMatchStream
, and so on). The following section covers only the first approach in detail. However, for almost all the landmark-based audio fingerprinting tasks (with the exception of the AfpDatabaseOptimize
task), equivalent tasks exist using the template based approach. The names of these tasks are almost the same, but begin with Afpt
instead of Afp
. For more information on the template-based tasks (that is, those tasks that start with Afpt
), see the Speech Server Reference.
|