Standard Tasks

This table describes the standard tasks that are already defined in the tasks configuration file. You can run any of these tasks straight out of the box.

Task Description
AfpAddTrack Adds a new audio track to an audio fingerprint database.
AfpDatabaseInfo Returns a list of all tracks that are currently stored in the specified database.
AfpDatabaseOptimize Optimizes the internal indexing of the specified database. This task permanently removes files that have been tagged for deletion using the AfpRemoveTrack task, and optimizes lookup functions for newly added tracks.
AfpMatch Receives audio data in an audio file or stream, and searches it for any indexed audio sections.
AfpRemoveTrack Removes specified tracks from an audio fingerprint database.
AudioAnalysis Runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
AudioSecurity Detects and labels segments of audio that contain alarms, screams, breaking glass, or gunshots.
ClippingDetection Analyzes an audio file for the issue of audio clipping.
ClusterSpeech Clusters wide-band speech into speaker segments.
ClusterSpeechTel Clusters telephony speech into speaker segments.
ClusterSpeechToTextTel Clusters two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. Any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.
CombineFMD Combines several phoneme track files, which can then be used for phrase search.
CreateFMD Creates a phoneme time track (.fmd) file from a single audio file.
DialToneIdentification Detects and identifies dial tones in audio.
IvSpkId Performs iVector-based speaker identification on an audio file or stream.
IvSpkIdDevel Processes one or more speaker ID feature files to generate scores for tuning iVector-based score thresholds.
IvSpkIdDevelAudio Processes an audio file or stream to generate scores for tuning iVector thresholds.
IvSpkIdDevelFinal Calculates the iVector score threshold based on one or more development files, and generates a new set of iVectors with the thresholds.
IvSpkIdFeature

Uses an audio file that contains sample speech from one person to create speaker ID feature files for use in iVector based template training, and template score threshold development.

IvSpkIdEditThresh Modifies the threshold of a template file or a single template stored in an iVector template set file.
IvSpkIdSetInfo Produces a log file that lists the contents of the specified iVector template file or template set file.
IvSpkIdSetAdd

Adds a number of iVector speaker templates to a single speaker set file.

IvSpkIdSetDelete

Removes a speaker template from an iVector template set file.

IvSpkIdTrain Takes one or more speaker ID feature files containing speech data from the speaker to be trained, and creates a new iVector speaker template.
IvSpkIdTrainAudio

Takes an audio file or stream containing speech data from the speaker to be trained, and creates a new iVector speaker template file.

LangId Receives audio data from a file or stream, converts it into language identification features, and identifies the languages.
LangIdFeature Converts audio files in the relevant language into language identification feature (.lif) files, which are required for training classifiers.
LangIdOptimize Optimizes the balance between language classifiers. After training, some classifiers might be stronger than others because of properties of the training material and the languages in question. The optimization process weights the language models so that weaker languages have increased accuracy, without compromising accuracy for stronger language models. This process improves consistent performance.
LangIdTrain Reads in a set of language ID feature files created from audio representing a single language (using the LangIdFeature task), and uses this data to train a new language classifier.
LanguageModelBuild Builds a new language model from a set of text files.
LMListVocab Lists the most common words in the specified language model.
LMLookUp Verifies whether a specified word is present in the vocabulary of a particular language model, and, if it is present, how frequently it occurs.
LMPerplexity Analyzes the perplexity of a sample text file, when given a specific language model.
PhraseSearch Searches for a specified phrase or phrases in an audio file.
PunctuateCtm Adds punctuation to any .ctm file.
Scorer Scores the recognition transcript (such as that generated by the SpeechToText task), when given a reference transcript file.
SearchFMD Searches a phoneme track file for one or more specified phrases.
SegmentText Inserts whitespace between words in a text file (for languages that do not separate words with whitespace).
SNRCalculation Calculates SNR levels across an audio file.
SpeechSilClassification Segments an audio file into sections of speech, non-speech, and music.
SpeechToText Converts an audio file or stream into a text transcript.
SpeechToTextFilter Converts an audio file or stream into a text transcript and categorizes the audio so that you can remove any sections consisting of music or noise.
SpeechToTextTelephony Transcribes a telephony audio file or stream, including dial tones and DTMF dial tones.
TextNorm Takes a raw text transcription file and produces a normalized form (by removing punctuation, rewriting numbers as words, altering word cases, and so on).
TranscriptAlign If a transcript is available for an audio recording, the transcript alignment function can place time locations for each word in the transcript. You can use this function to align subtitles with audio or video files.
TranscriptCheck Checks how well a text transcript matches the audio data, and identifies large missing or erroneous sections.

For details about each task, including the required action and configuration parameters, see the IDOL Speech Server Reference.

Deprecated Tasks

The following table describes tasks that are included in the IDOL Speech Server configuration file for backwards compatibility. These tasks are deprecated, and might be deleted in future.

Task Description
AfpAddTrackStream Adds a new audio track to an audio fingerprint database, receiving the audio data as a stream, and converting to AFP features before indexing. Use AfpAddTrack.
AfpAddTrackWav Adds a new audio track to an audio fingerprint database, reading the data from an audio file, and converting to AFP features before indexing. Use AfpAddTrack.
AfpMatchStream Receives audio data as a binary stream, and searches it for any indexed audio sections. Use AfpMatch.
AfpMatchWav Reads in data from an audio file, and searches it for any indexed audio sections. Use AfpMatch.
AfptAddTrackStream Performs the same task as AfpAddTrackStream, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scalability. Use AfpAddTrack.
AfptAddTrackWav Performs the same task as AfpAddTrackWav, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scalability. Use AfpAddTrack.
AfptDatabaseInfo Performs the same task as AfpDatabaseInfo, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scalability. Use AfpDatabaseInfo.
AfptMatchStream Performs the same task as AfpMatchStream, but uses template-based matching as opposed to landmarks, which improves robustness to audio mismatches at the cost of scalability. Use AfpMatch.
AfptMatchWav Performs the same task as AfpMatchWav, but uses template-based matching as opposed to landmarks, which improves robustness to audio mismatches at the cost of scalability. Use AfpMatch.
AfptRemoveTrack Performs the same task as AfpRemoveTrack, but uses a template database (fptdb), which improves robustness to audio mismatches at the cost of scalability. Use AfpRemoveTrack.
AmTrain Presents training audio and transcription data to the acoustic model training process, and creates accumulator files that are used to produce a final adapted acoustic model.
AmTrainFinal Produces the adapted acoustic model, given a set of accumulator files created by the AmTrain task.
DataObfuscation Prepares training data with any sensitive or classified information concealed.
IvSpkIdDevelStream Takes a single audio stream, along with the name of the speaker the stream is associated with, and generates scores for tuning iVector thresholds. Use IvSpkIdDevelAudio.
IvSpkIdDevelWav Processes a single audio file to generate scores for tuning iVector thresholds. Use IvSpkIdDevelAudio.
IvSpkIdEvalStream

Runs iVector-based identification of any sections of an audio stream where the trained speakers are present. Use IvSpkId

IvSpkIdEvalWav Performs iVector-based speaker identification on a single audio file. Use IvSpkId
IvSpkIdSetEditThresh Modifies the threshold of a single template stored in an iVector template set file. Use IvSpkIdEditThresh.
IvSpkIdSetInfo Produces a log file that lists the contents of the specified iVector template set file. Use IvSpkIdInfo.
IvSpkIdTmpEditThresh

Modifies the threshold of a single iVector template file. Use IvSpkIdEditThresh.

IvSpkIdTmpInfo Produces a log file that lists the contents of the specified iVector template file. Use IvSpkIdInfo.
IvSpkIdTrainStream

Takes a single audio stream containing speech data from the speaker to be trained, and creates a new iVector speaker template file. Use IvSpkIdTrainAudio.

IvSpkIdTrainWav

Takes a single audio file containing speech data from the speaker to be trained, and creates a new iVector speaker template file. Use IvSpkIdTrainAudio.

LangIdBndLif Reads in language identification features from file and determines boundaries in the feature sequence where the language changes. Returns the language identification results between boundaries. Use LangId.
LangIdBndStream Receives audio data as a binary stream, converts the audio into language ID features, and determines boundaries where the language changes. Returns the language identification results between boundaries. Use LangId.
LangIdBndWav Reads in data from an audio file, converts it into language ID features, and determines boundaries where the language changes. Returns the language identification results between boundaries. Use LangId.
LangIdCumLif Reads in language ID features from file, and returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point). Use LangId.
LangIdCumStream Receives audio data as a binary stream, and converts it into language ID features. Returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point). Use LangId.
LangIdCumWav Reads in data from an audio file, and converts it into language ID features. Returns the running language identification score at periodic intervals (that is, the score for all the input data from the start to the current point). Use LangId.
LangIdSegLif Reads in language ID features from file, processes the data in fixed-sized chunks, and returns the language identification results for each chunk. Use LangId.
LangIdSegStream Receives audio data as a binary stream, and converts it into language ID features. Processes the data in fixed-sized chunks, and returns the language identification results for each chunk. Use LangId.
LangIdSegWav Reads in data from an audio file and converts it into language ID features. Processes the data in fixed-sized chunks and returns the language identification results for each chunk. Use LangId.
SegmentWav Attempts to segment audio into sections by speaker even if no trained speakers exist in the system. Use ClusterSpeech.
SpkIdDevel Processes speaker ID feature files to generate scores for tuning model thresholds. Use IvSpkIdDevel.
SpkIdDevelFinal Estimates the thresholds for a set of speaker templates. Use IvSpkIdDevelFinal.
SpkIdDevelStream Creates or updates a development (.atd) file for an audio stream. Use IvSpkIdDevelAudio.
SpkIdDevelWav Creates or updates a development (.atd) file for an audio file. Use IvSpkIdDevelAudio.
SpkIdEvalStream Analyzes an audio stream to identify any sections where the trained speakers are present. Use IvSpkId.
SpkIdEvalWav Analyzes an audio file to identify any sections where the trained speakers are present. Use IvSpkId
SpkIdFeature Creates a speaker ID feature file. Use IvSpkIdFeature.
SpkIdSetAdd Takes one or more audio template files, and adds them to an audio template set file. Use IvSpkIdSetAdd.
SpkIdSetDelete Removes a template from an audio template set file. Use IvSpkIdSetDelete.
SpkIdSetEditThresh Modifies the threshold of a single template in an audio template set file. Use IvSpkIdEditThresh.
SpkIdSetInfo Retrieves information on an audio template set file. Use IvSpkIdInfo.
SpkIdTmpEditThresh Modifies the threshold of a single template. Use IvSpkIdEditThresh.
SpkIdTmpInfo Retrieves information on an audio template file. Use IvSpkIdInfo.
SpkIdTrain Uses one or more feature files to train a speaker template. Use IvSpkIdTrain.
SpkIdTrainStream Takes an audio stream containing speech data from the speaker to be trained, and creates a new speaker template file. Use IvSpkIdTrainAudio.
SpkIdTrainWav Takes a single audio file containing speech data from the speaker to be trained, and creates a new speaker template file. Use IvSpkIdTrainAudio.
StreamToText Converts live audio into a text transcript. Use SpechToText.
StreamToTextMusicFilter Converts live audio into a text transcript and categorizes the audio so that you can remove any sections consisting of music or noise. Use SpeechToTextFilter.
TelWavToText Transcribes a telephony audio file, including dial tones and DTMF dial tones. Use SpeechToTextTelephony.
WavPhraseSearch Searches for a specified phrase or phrases in an audio file. Use PhraseSearch.
WavToFMD Creates a phoneme time track (.fmd) file from a single audio file. Use CreateFMD.
WavToPlh Reads data from an audio file and produces an audio feature (.plh) file, such as those used in the acoustic model adaptation process (the AmTrain task).
WavToText

Converts an audio file into a text transcript. Use SpeechToText.

NOTE:

To use WavToText to submit audio data as a binary data block for speech-to-text, submit the task data without specifying a .wav file.


_HP_HTML5_bannerTitle.htm