Supported Languages and Common Encodings

This section lists the languages that the Content component supports, and the most common encodings for each language.

Content stores all data internally as UTF-8. The following tables describe how to handle input in other encodings.

Acehnese

Script: UTF8
[MyLanguage] section name: ACEHNESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Afrikaans

Script: Latin
[MyLanguage] section name: AFRIKAANS
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Albanian

Script: Latin
[MyLanguage] section name: ALBANIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Amharic

Script: UTF8
[MyLanguage] section name: AMHARIC
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Arabic

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Arabic
[MyLanguage] section name: ARABIC
For encoding: Set Encodings parameter to:

Windows-CP1256

ISO-8859-6

UTF-8

ARABIC

ARABIC_ISO

UTF8

Armenian

Script: UTF8
[MyLanguage] section name: ARMENIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Azeri

Script: Cyrillic
[MyLanguage] section name: AZERI
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Basque

Script: Latin
[MyLanguage] section name: BASQUE
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Belorussian

Script: Cyrillic
[MyLanguage] section name: BELORUSSIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Bengali

Script: UTF8
[MyLanguage] section name: BENGALI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Berber

Script: UTF8
[MyLanguage] section name: BERBER
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Bihari

Script: UTF8
[MyLanguage] section name: BIHARI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Bikol

Script: UTF8
[MyLanguage] section name: BIKOL
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Bishnupriya

Script: UTF8
[MyLanguage] section name: BISHNUPRIYA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Bosnian

Script: Latin
[MyLanguage] section name: BOSNIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Breton

Script: Latin
[MyLanguage] section name: BRETON
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Bulgarian

Script: Cyrillic
[MyLanguage] section name: BULGARIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Burmese

Script: UTF8
[MyLanguage] section name: BURMESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Catalan

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: CATALAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Cebuano

Script: UTF8
[MyLanguage] section name: CEBUANO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Cherokee

Script: UTF8
[MyLanguage] section name: CHEROKEE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Chinese Traditional

The language has stemming embedded in sentence breaking.

Script: Big-5
[MyLanguage] section name: CHINESE
For encoding: Set Encodings parameter to:

Big-5

UTF-8

CHINESETRADITIONAL

UTF8

Chinese Simplified

The language has stemming embedded in sentence breaking.

Script: GB2312-80
[MyLanguage] section name: CHINESE
For encoding: Set Encodings parameter to:

gb2312

UTF-8

CHINESESIMPLIFIED

UTF8

Chuvash

Script: UTF8
[MyLanguage] section name: CHUVASH
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Croatian

Script: Latin
[MyLanguage] section name: CROATIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Czech

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: CZECH
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Danish

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: DANISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Divehi

Script: UTF8
[MyLanguage] section name: DIVEHI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Dutch

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: DUTCH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

English

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: ENGLISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Erzya

Script: UTF8
[MyLanguage] section name: ERZYA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Esperanto

Script: UTF8
[MyLanguage] section name: ESPERANTO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Estonian

Script: Latin
[MyLanguage] section name: ESTONIAN
For encoding: Set Encodings parameter to:

Windows-CP1257

ISO-8859-4

UTF-8

NORTHERNEUROPEAN

NORTHERNEUROPEAN_ISO

UTF8

Ethiopic

Script: UTF8
[MyLanguage] section name: ETHIOPIC
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Faroese

Script: Latin
[MyLanguage] section name: FAROESE
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Finnish

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: FINNISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

French

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: FRENCH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Frisian

Script: UTF8
[MyLanguage] section name: FRISIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Gaelic

Script: Latin
[MyLanguage] section name: GAELIC
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Galician

Script: Latin
[MyLanguage] section name: GALICIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Georgian

Script: UTF8
[MyLanguage] section name: GEORGIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

German

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: GERMAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Gilaki

Script: UTF8
[MyLanguage] section name: GILAKI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Greek

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Greek
[MyLanguage] section name: GREEK
For encoding: Set Encodings parameter to:

Windows-CP1253

ISO-8859-7

UTF-8

GREEK

GREEK_ISO

UTF8

Greenlandic

Script: Latin
[MyLanguage] section name: GREENLANDIC
For encoding: Set Encodings parameter to:

Windows-CP1257

ISO-8859-4

UTF-8

NORTHERNEUROPEAN

NORTHERNEUROPEAN_ISO

UTF8

Guarani

Script: UTF8
[MyLanguage] section name: GUARANI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Gujarati

Script: UTF8
[MyLanguage] section name: GUJARATI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Haitian

Script: UTF8
[MyLanguage] section name: HAITIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Hausa

Script: UTF8
[MyLanguage] section name: HAUSA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Hawaiian

Script: UTF8
[MyLanguage] section name: HAWAIIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Hebrew

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Hebrew
[MyLanguage] section name: HEBREW
For encoding: Set Encodings parameter to:

Windows-CP1255

ISO-8859-8

UTF-8

HEBREW

HEBREW_ISO

UTF8

Hindi

Script: UTF8
[MyLanguage] section name: HINDI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Hungarian

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: HUNGARIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Icelandic

Script: Latin
[MyLanguage] section name: ICELANDIC
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Igbo

Script: UTF8
[MyLanguage] section name: IGBO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Ilokano

Script: UTF8
[MyLanguage] section name: ILOKANO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Indonesian

Script: Latin
[MyLanguage] section name: INDONESIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Italian

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: ITALIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Japanese

The language has stemming embedded in sentence breaking.

Script: Japanese
[MyLanguage] section name: JAPANESE
For encoding: Set Encodings parameter to:

Shift-JIS

EUC

JIS

UTF-8

SHIFTJIS

EUC

JIS

UTF8

Javanese

Script: UTF8
[MyLanguage] section name: JAVANESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kalmyk

Script: UTF8
[MyLanguage] section name: KALMYK
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kannada

Script: UTF8
[MyLanguage] section name: KANNADA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kapampangan

Script: UTF8
[MyLanguage] section name: KAPAMPANGAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kazakh

Script: Cyrillic
[MyLanguage] section name: KAZAKH
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Khmer

Script: UTF8
[MyLanguage] section name: KHMER
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kikongo

Script: UTF8
[MyLanguage] section name: KIKONGO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kinyarwanda

Script: UTF8
[MyLanguage] section name: KINYARWANDA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Kirundi

Script: UTF8
[MyLanguage] section name: KIRUNDI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Komi

Script: UTF8
[MyLanguage] section name: KOMI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Korean

The language has stemming embedded in sentence breaking.

Script: Hangul
[MyLanguage] section name: KOREAN
For encoding: Set Encodings parameter to:

KS C 5601-1987

KS C 5601-1992

UTF-8

KOREAN

KOREAN

UTF8

Kurdish

Script: Latin
[MyLanguage] section name: KURDISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Kyrgyz

Script: Cyrillic
[MyLanguage] section name: KYRGYZ
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Lao

Script: UTF8
[MyLanguage] section name: LAO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Lappish

Script: Latin
[MyLanguage] section name: LAPPISH
For encoding: Set Encodings parameter to:

Windows-CP1257

ISO-8859-4

UTF-8

NORTHERNEUROPEAN

NORTHERNEUROPEAN_ISO

UTF8

Latin

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: LATIN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Latvian

Script: Latin
[MyLanguage] section name: LATVIAN
For encoding: Set Encodings parameter to:

Windows-CP1257

ISO-8859-4

UTF-8

NORTHERNEUROPEAN

NORTHERNEUROPEAN_ISO

UTF8

Lingala

Script: UTF8
[MyLanguage] section name: LINGALA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Lithuanian

Script: Latin
[MyLanguage] section name: LITHUANIAN
For encoding: Set Encodings parameter to:

Windows-CP1257

ISO-8859-4

UTF-8

NORTHERNEUROPEAN

NORTHERNEUROPEAN_ISO

UTF8

Luxembourgish

Script: Latin
[MyLanguage] section name: LUXEMBOURGISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Macedonian

Script: Cyrillic
[MyLanguage] section name: MACEDONIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Malagasy

Script: UTF8
[MyLanguage] section name: MALAGASY
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Malay

Script: Latin
[MyLanguage] section name: MALAY
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Malayalam

Script: UTF8
[MyLanguage] section name: MALAYALAM
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Maltese

Script: UTF8
[MyLanguage] section name: MALTESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Manipuri

Script: UTF8
[MyLanguage] section name: MANIPURI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Maori

Script: Latin1
[MyLanguage] section name: MAORI
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Marathi

Script: UTF8
[MyLanguage] section name: MARATHI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Mazandarani

Script: UTF8
[MyLanguage] section name: MAZANDARANI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Mirandese

Script: UTF8
[MyLanguage] section name: MIRANDESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Mongolian

Script: Cyrillic
[MyLanguage] section name: MONGOLIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Nahuatl

Script: UTF8
[MyLanguage] section name: NAHUATL
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Navajo

Script: UTF8
[MyLanguage] section name: NAVAJO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Ndebele

Script: UTF8
[MyLanguage] section name: NDEBELE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Nepali

Script: UTF8
[MyLanguage] section name: NEPALI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Newari

Script: UTF8
[MyLanguage] section name: NEWARI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Norwegian

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: NORWEGIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Oriya

Script: UTF8
[MyLanguage] section name: ORIYA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Ossetian

Script: UTF8
[MyLanguage] section name: OSSETIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Panjabi

Script: UTF8
[MyLanguage] section name: PANJABI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Papiamentu

Script: UTF8
[MyLanguage] section name: PAPIAMENTU
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Persian

Script: UTF8
[MyLanguage] section name: PERSIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Polish

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: POLISH
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Portuguese

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: PORTUGUESE
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Pushto

Script: UTF8
[MyLanguage] section name: PUSHTO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Quechua

Script: UTF8
[MyLanguage] section name: QUECHUA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Rhaeto-Romance

Script: UTF8
[MyLanguage] section name: RHAETO-ROMANCE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Romanian

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: ROMANIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Russian

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Cyrillic
[MyLanguage] section name: RUSSIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Sakha

Script: UTF8
[MyLanguage] section name: SAKHA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Sami

Script: UTF8
[MyLanguage] section name: SAMI
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Sanskrit

Script: UTF8
[MyLanguage] section name: SANSKRIT
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Serbian

Script: Cyrillic
[MyLanguage] section name: SERBIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Sesotho

Script: UTF8
[MyLanguage] section name: SESOTHO
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Sesotho sa Leboa

Script: UTF8
[MyLanguage] section name: SESOTHOSALEBOA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Singhalese

Script: UTF8
[MyLanguage] section name: SINGHALESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Siswant

Script: UTF8
[MyLanguage] section name: SISWANT
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Slovak

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: SLOVAK
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Slovenian

Script: Latin
[MyLanguage] section name: SLOVENIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Somali

Script: Latin
[MyLanguage] section name: SOMALI
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Sorbian

Script: Latin
[MyLanguage] section name: SORBIAN
For encoding: Set Encodings parameter to:

Windows-CP1250

ISO-8859-2

UTF-8

EASTERNEUROPEAN

EASTERNEUROPEAN_ISO

UTF8

Spanish

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: SPANISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Sranan

Script: UTF8
[MyLanguage] section name: SRANAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Sundanese

Script: UTF8
[MyLanguage] section name: SUNDANESE
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Swahili

Script: Latin
[MyLanguage] section name: SWAHILI
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Swedish

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: SWEDISH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Syriac

Script: UTF8
[MyLanguage] section name: SYRIAC
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tagalog

Script: Latin
[MyLanguage] section name: TAGALOG
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Tahitian

Script: UTF8
[MyLanguage] section name: TAHITIAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tajik

Script: Cyrillic
[MyLanguage] section name: TAJIK
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Tamil

Script: UTF8
[MyLanguage] section name: TAMIL
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tatar

Script: Cyrillic
[MyLanguage] section name: TATAR
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Telugu

Script: UTF8
[MyLanguage] section name: TELUGU
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tetum

Script: UTF8
[MyLanguage] section name: TETUM
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Thai

Script: Thai
[MyLanguage] section name: THAI
For encoding: Set Encodings parameter to:

Windows-CP874/ISO-8859-11

UTF-8

THAI

UTF8

Tibetan

Script: UTF8
[MyLanguage] section name: TIBETAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tokpisin

Script: UTF8
[MyLanguage] section name: TOKPISIN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tongan

Script: UTF8
[MyLanguage] section name: TONGAN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tsonga

Script: UTF8
[MyLanguage] section name: TSONGA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Tswana

Script: UTF8
[MyLanguage] section name: TSWANA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Turkish

Script: Latin
[MyLanguage] section name: TURKISH
For encoding: Set Encodings parameter to:

Windows-CP1254/ISO-8859-9

UTF-8

TURKISH

UTF8

Turkmen

Script: UTF8
[MyLanguage] section name: TURKMEN
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Ukrainian

Script: Cyrillic
[MyLanguage] section name: UKRAINIAN
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Urdu

Script: UTF8
[MyLanguage] section name: URDU
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Uyghur

Script: UTF8
[MyLanguage] section name: UYGHUR
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Uzbek

Script: Cyrillic
[MyLanguage] section name: UZBEK
For encoding: Set Encodings parameter to:

Windows-CP1251

KOI8-R

ISO-8859-5

UTF-8

CYRILLIC

CYRILLIC_KOI8

CYRILLIC_ISO

UTF8

Valencian

Script: Latin
[MyLanguage] section name: VALENCIAN
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Venda

Script: UTF8
[MyLanguage] section name: VENDA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Vietnamese

Script: Vietnamese
[MyLanguage] section name: VIETNAMESE
For encoding: Set Encodings parameter to:

Windows-CP1258

UTF-8

VIETNAMESE

UTF8

Waraywaray

Script: UTF8
[MyLanguage] section name: WARAYWARAY
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Welsh

A stemming algorithm is available for this language and is applied by default. If you do not want to apply stemming to this language, set Stemming to False for this language.

Script: Latin
[MyLanguage] section name: WELSH
For encoding: Set Encodings parameter to:

Windows-CP1252/ISO-8859-1

UTF-8

ASCII

UTF8

Wolof

Script: UTF8
[MyLanguage] section name: WOLOF
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Xhosa

Script: UTF8
[MyLanguage] section name: XHOSA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Yiddish

Script: UTF8
[MyLanguage] section name: YIDDISH
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Yoruba

Script: UTF8
[MyLanguage] section name: YORUBA
For encoding: Set Encodings parameter to:
UTF-8 UTF8

Zulu

Script: UTF8
[MyLanguage] section name: ZULU
For encoding: Set Encodings parameter to:
UTF-8 UTF8