Standard Tasks

The following tasks are available out of the box. (For deprecated tasks, see Deprecated Tasks)

Task Description
AfpAddTrack Adds a new audio track to an Audio Fingerprint database.
AfpDatabaseInfo Returns a list of all tracks that are currently stored within the specified Audio Fingerprint database.
AfpDatabaseOptimize Optimizes the internal indexing of the specified Audio Fingerprint database. This task permanently removes files that have been tagged for deletion using the AfpRemoveTrack task, and optimizes lookup functions for newly added tracks.
AfpMatch Receives audio data in a file or stream and searches it for any sections that match audio indexed in an AFP database.
AfpRemoveTrack Removes specified audio tracks from an AFP database.
AudioAnalysis Runs all the audio preprocessing tasks that are supported by the audiopreproc module in a single task.
AudioSecurity Detects and labels segments of audio containing alarms, screams, breaking glass, or gunshots.
ClippingDetection Detects clipping in audio data.
CombineFMD Combines several phoneme time track files into a single file, which can then be used for phonetic phrase match.
ClusterSpeech Clusters wide-band speech into speaker segments.
ClusterSpeechTel Clusters telephony speech into speaker segments.
ClusterSpeechToTextTel Clusters two speakers in a phone call, and uses the resulting speaker clusters to improve speech-to-text performance slightly by using speaker-sided acoustic normalization. Any telephony artifacts such as dial tones or DTMF tones are included, interspersed with the recognized words.
CreateFMD Creates a phoneme time track file from a single audio file.
DialToneIdentification Detects and identifies DTMF dial tones in audio data.
IvSpkId

Runs iVector-based speaker identification on an audio file or stream.

IvSpkIdDevel Processes one or more speaker ID feature files to generate scores for tuning iVector-based score thresholds.
IvSpkIdDevelAudio Processes an audio file or stream to generate scores for tuning iVector thresholds.
IvSpkIdDevelFinal Calculates the iVector score threshold based on one or more development files, and generates a new set of iVectors with the thresholds.
IvSpkIdEditThresh Modifies the threshold of a of an iVector template file, or a single template stored in an iVector template set file.
IvSpkIdFeature

Uses an audio file that contains sample speech from one person to create speaker ID feature files for use in iVector based template training, and template score threshold development.

IvSpkIdInfo Produces a log file that lists the contents of the specified iVector template file or template set file.
IvSpkIdSetAdd

Adds a number of iVector speaker templates to a single speaker set file.

IvSpkIdSetDelete

Removes a speaker template from an iVector template set file.

IvSpkIdTrain Takes one or more speaker ID feature files containing speech data from the speaker to be trained, and creates a new iVector speaker template.
IvSpkIdTrainAudio

Takes an audio file or stream containing speech data from the speaker to be trained, and creates a new iVector speaker template file.

LangId Receives audio data from a file or binary stream, converts it into language identification features, and identifies the languages in the audio.
LangIdFeature Converts audio files containing the relevant language into language identification feature (.lif) files, which are required for training language classifiers.
LangIdOptimize Optimizes the balance between language classifiers in a classifier set.
LangIdTrain Reads in a set of language identification feature files created from audio representing a single language (using the LangIdFeature task), and uses this data to train a new language classifier.
LanguageModelBuild Builds a new language model from a set of text files.
LmListVocab Lists the most common words in the specified language model.
LmLookUp Verifies whether a specified word is present in the vocabulary of a particular language model and, if so, how frequently the word occurs.
LmPerplexity Analyzes the perplexity of a sample text file, when given a specific language model.
PhraseSearch Searches for a specified phrase or phrases in an audio file or stream.
PunctuateCtm Adds punctuation to any .ctm file
Scorer Scores a speech recognition transcript (such as that generated by the SpeechToText task), when given a reference transcript file.
SearchFMD Searches for specified phrases in a phoneme time track file.
SegmentText Inserts whitespace between words in a text file (for languages that do not separate words with whitespace).
SNRCalculation Analyzes the signal-to-noise levels across an audio file.
SpeechSilClassification Segments audio by contents: either speech, non-speech, or music.
SpeechToText Converts an audio file or stream into a text transcript.
SpeechToTextFilter Converts an audio file or stream into a text transcript and categorizes the audio so that you can remove any sections consisting of music or noise.
SpeechToTextTelephony Transcribes a telephony audio file or stream, including dial tones and DTMF dial tones.
TextNorm Takes a raw text transcription file and produces a normalized form (by removing punctuation, rewriting numbers as words, altering word cases, and so on).
TranscriptAlign If a transcript is available for an audio recording, the transcript alignment function can place time locations for each word in the transcript. This function is suitable for aligning subtitles to audio or video files.
TranscriptCheck Checks how well a text transcript matches the audio data, identifying large missing or erroneous sections.

_FT_HTML5_bannerTitle.htm