Install Speech-to-Text Language Packs
To run speech-to-text or speaker clustering, you must install a language pack. There are more than 60 language packs available for Media Server. Language packs can contain hundreds of megabytes of data, so they are not included in the Media Server installation and must be downloaded separately.
TIP: A language pack supports a single language and a single audio sample rate. For example, there is a language pack for processing US English (16kHz) and another for US English (8kHz). The 8kHz language packs are for processing telephony audio. For a list of available language packs, see Speech Analysis Supported Languages.
To install a language pack
- Download a language pack (such as
ENUK-23.2.0.zip
) from the support portal. Unless you are using only the legacy speech-to-text models, you must also download the common speech-to-text resources (SpeechToText-Common-23.2.0.zip
). - Extract the contents of the language pack into the folder
staticdata/speechtotext/
, wherestaticdata
is the folder specified by theStaticDataDirectory
parameter in the[Paths]
section of the Media Server configuration file. The default value of this parameter is thestaticdata
folder in the Media Server installation directory. - Unless you are using only the legacy speech-to-text models, extract the common resources into the folder
staticdata/speechtotext/
so that there is a folder namedCommon
containing the common resources. - To confirm that the language pack was installed successfully, start Media Server and run the action
ListSpeechLanguagePacks
. The response lists each language pack that is available, along with its supported sample rate.