Install Speech-to-Text Language Packs

To run speech-to-text or speaker clustering, you must install a language pack. There are more than 60 language packs available for Media Server. Language packs can contain hundreds of megabytes of data, so they are not included in the Media Server installation and must be downloaded separately.

TIP: A language pack supports a single language and a single audio sample rate. For example, there is a language pack for processing US English (16kHz) and another for US English (8kHz). The 8kHz language packs are for processing telephony audio. For a list of available language packs, see Speech Analysis Supported Languages.

To install a language pack

  1. Download a language pack (such as ENUK-23.2.0.zip) from the support portal. Unless you are using only the legacy speech-to-text models, you must also download the common speech-to-text resources (SpeechToText-Common-23.2.0.zip).
  2. Extract the contents of the language pack into the folder staticdata/speechtotext/, where staticdata is the folder specified by the StaticDataDirectory parameter in the [Paths] section of the Media Server configuration file. The default value of this parameter is the staticdata folder in the Media Server installation directory.
  3. Unless you are using only the legacy speech-to-text models, extract the common resources into the folder staticdata/speechtotext/ so that there is a folder named Common containing the common resources.
  4. To confirm that the language pack was installed successfully, start Media Server and run the action ListSpeechLanguagePacks. The response lists each language pack that is available, along with its supported sample rate.