Eduction Grammar Reference

The following tables describe the grammar files that are available in the IDOL PII Package, and the entities that each provides.

  • Some grammars are available in two formats, ECR and EJR. For more information about which to use, see ECR and EJR Grammars.

    IMPORTANT: To use the EJR grammar files from the 23.4 package, you must use Eduction tools with a version of 12.9 or later.

  • In the entity names:

    • the abbreviation CC refers to a two-letter country code. For a list of available country codes, see Country Codes.

    • the abbreviation LLL refers to a three-letter language code. For a list of available languages, see Languages.

    TIP: You can use the Eduction parameter EntityN to specify which entities you want to extract. This parameter accepts wildcards, so you can extract entities of a specific type for all supported countries or languages. For example, to match postal addresses for all countries specify a value of pii/address/??. To match dates of birth in all languages, specify pii/date/dob/context/???.

  • Some grammars have a CJVTK (Chinese, Japanese, Korean, Vietnamese, and Thai) version, for matching entities in Japanese. This grammar is separate for performance reasons. CJKVT languages do not have spaces in the text to separate words, so additional processing is required for sentence breaking.

    The CJKVT grammars generally match Kanji and Romanized versions of the text where appropriate, as well as half-width and full-width characters (in the output, full-width forms are normalized to half-width).

  • Many entities return components, in addition to the full match. For more information, and examples, see Components.

NOTE: The IDOL PII Package is backwards-compatible with the IDOL GDPR package. You can continue to use existing configurations that use entity names such as gdpr/address/CC or gdpr/telephone/CC. These entities are similar to the corresponding pii/* entity, but are limited to countries in the GDPR region. However, OpenText recommends that you use the pii/* entities instead, so that Eduction extracts matches for all supported countries.

address.ecr

Entity Description
pii/address/CC

A postal address.

In general, a score of one is given to an address that includes a numbered, common format street address (for example "23 North Road"), a known city (for example "London"), and a postal code in a viable format for the country (for example "SW1A 2AA"). Deviations from this form lead to score penalties. The ordering of these elements varies by country.

OpenText recommends that you use pre-filtering to improve the performance for this grammar. See Configure Pre-Filtering.

Example matches: "Schlosshoferstrasse 20, 1210 Vienna", "Avenida Juan Xxiii 20, 41006, Sevilla", "Abidei Hurriyet Cd Taner Palas Han 9 Kat:7 Dayre 9, 34437 Istanbul", "162-168 Regent Street, London, W1B 5TG".

This entity returns the addresses in a normalized format by default. The normalized form standardizes apartment and house numbers, expands shortened forms of region names, removes additional punctuation, and converts the text to uppercase. For example ABIDEI HURRIYET CD TANER PALAS APT 9, KAT:7, D:9, 34437 ISTANBUL. The exact order depends on the country.

You can turn off normalization by setting normalize_addresses=false in the address_stoplist.lua script. This option can improve performance when you do not need normalization.

This entity returns components. See Components.

pii/address/landmark/CC A postal address landmark. For example "Address".
pii/address/streetlocation/context/CC A street location (house number and street name), with context. For example "Address: 123, Mill Road".
pii/address/streetlocation/nocontext/CC A street location (house number and street name), without context. For example "123, Mill Road".
pii/address/streetlocation/landmark/CC A street location landmark. For example "Address"
pii/address/city/context/CC A city or town, with context. For example "City: London".
pii/address/city/nocontext/CC A city or town, without context. For example "London".
pii/address/city/landmark/CC A city or town landmark. For example "City".
pii/address/postcode/context/CC A postal code, with context. For example "Postcode: CB4 0WZ".
pii/address/postcode/nocontext/CC A postal code, without context. For example "CB4 0WZ".
pii/address/postcode/landmark/CC A postal code landmark. For example "Postcode".
pii/address/country/context/CC A country, with context. For example "Country: United Kingdom".
pii/address/country/nocontext/CC A country, without context. For example "United Kingdom".
pii/address/country/landmark/CC A country landmark. For example "Country".

address_cjkvt.ecr

Entity Description
pii/address/CC

A postal address.

In general:

  • For China, a score of one is given to an address with a numbered street address, a known township or town, district, city or county, and province.

  • For Japan, a score of one is given to an address that includes a numbered, common format street address (for example "四丁目1番2-34号"), a city or ward (for example 津島市), and a postal code in a viable format for the country (for example "123-4567").

  • For Taiwan, a score of one is given to an address that includes a numbered street address, a known township, district, city, or county, and a valid postal code.

  • For South Korea, a score of one is given to an address with a numbered street address, a known township or town, district, city or county, province and a valid postal code.

Deviations from these forms lead to score penalties.

OpenText recommends that you use pre-filtering to improve the performance for this grammar. See Configure Pre-Filtering

Example matches: "日本、〒123-4567神奈川県津島市城南区月形町八重洲四丁目1番2-34号", "1-2-34, Yaesu 4-Chome, Nanae, Atsuta, Hekinan, Kagoshima, 123-4567, Japan", "10603台北市大安區金山南路2段55號", "No.55, Sec. 2, Jinshan S. Rd., Daan Dist., Taipei City 10603", "北京市房山区长阳镇北广阳城大街8号", "No.99 Xingjian Road, Cixi City, Zhejiang Province, China".

This entity returns the addresses in a normalized format. The normalized form standardizes apartment and house numbers, removes additional punctuation, and for Romanized text, it converts the text to uppercase. CJKVT native script is not normalized to ASCII, and Romanized text is not normalized to CJKVT native script.

You can turn off normalization by setting normalize_addresses=false in the address_stoplist.lua script. This option can improve performance when you do not need normalization.

This entity returns components. See Components.

pii/address/landmark/CC A postal address landmark. For example "住所".
pii/address/streetlocation/contextCC An address first line, with context. For example "住所: 八重洲 四丁目1番2 -34号" or "Address: No.55, Sec. 2, Jinshan S. Rd.".
pii/address/streetlocation/nocontext/cjkvt/CC An address first line in CJKVT native script, without context. For example "八重洲四丁目1番2-34号", or "金山南路2段55號".
pii/address/streetlocation/nocontext/latin/CC An address first line in romanized text, without context. For example "1-2-34, Yaesu 4-Chome", or "No.55, Sec. 2, Jinshan S. Rd.".
pii/address/streetlocation/nocontext/CC An address first line in CJKVT native script or romanized text, without context. For example 八重洲四丁目1番2-34号" or "1-2-34, Yaesu 4-Chome".
pii/address/streetlocation/landmark/CC An address first line landmark. For example "住所", "住址", or "Address".
pii/address/settlement/context/CC A settlement, with context. For example, in Japan a town or city, or in Taiwan a district (區 qū) or township (鎮 zhèn/鄉 xiāng). For example "市区町村: 津島市城南区月形町", "鄉鎮市區:板橋區", or "City/Ward/Town/Village: Nanae, Atsuta, Hekinan".
pii/address/settlement/nocontext/cjkvt/CC A settlement in CJKVT native script, without context. For example "津島市城南区月形町", or "板橋區".
pii/address/settlement/nocontext/latin/CC A settlement in romanized text, without context. For example "Nanae, Atsuta, Hekinan", or "Banqiao District".
pii/address/settlement/nocontext/CC A settlement in CJKVT native script or romanized text, without context. For example "津島市城南区月形町", "板橋區", or "Nanae, Atsuta, Hekinan".
pii/address/settlement/landmark/CC A settlement landmark. For example "市区町村", "鄉鎮市區", or "City/Ward/Town/Village".
pii/address/region/context/CC A region, with context. For Taiwan, this is a county (縣 xiàn) or municipality (市 shì). For example "都道府県: 神奈川県", 縣市:宜蘭縣", or "Prefecture: Kagoshima".
pii/address/region/nocontext/cjkvt/CC A region in CJKVT native script, without context. For example "神奈川県", or "宜蘭縣".
pii/address/region/nocontext/latin/CC A region in romanized text, without context. For example "Kagoshima", or "Yilan County".
pii/address/region/nocontext/CC A region in CJKVT native script or romanized text, without context. For example "神奈川県", "新北市", or "Kagoshima".
pii/address/region/landmark/CC A region landmark. For example "都道府県", "縣市", or "Prefecture".
pii/address/postcode/context/CC A postal code, with context. For example "郵便番号: 123-4567", "Postcode: 1234567", or "郵遞區號106-409".
pii/address/postcode/nocontext/CC A postal code, without context. For example "123-4567", or "106-409".
pii/address/postcode/landmark/CC A postal code landmark. For example "郵便番号", "郵遞區號", or "Postcode".
pii/address/country/context/CC A country, with context. For example "国: 日本", "国:中華民國", or "Country: Japan".
pii/address/country/nocontext/cjkvt/CC A country in CJKVT native script, without context. For example "日本", or "中華民國".
pii/address/country/nocontext/latin/CC A country in romanized text, without context. For example "Japan".
pii/address/country/nocontext/CC A country in CJKVT native script or romanized text, without context. For example "日本" or "Japan".
pii/address/country/landmark/CC A country landmark. For example "国" or "Country".

banking and banking_cjkvt (ECR and EJR available)

Entity Description
pii/banking/account_number/context/CC

An individual bank account number, with context. For example "Bank account number: 1234567".

This entity returns components. See Components.

pii/banking/account_number/landmark/CC An individual bank account number landmark. For example "Bank account number".
pii/banking/account_number/nocontext/CC An individual bank account number, without context. For example "1234567".
pii/banking/context/CC

Bank account numbers sufficient to identify an individual account, with context. For example "Bank account details: 40-38-02 31618080".

This entity returns components. See Components.

pii/banking/iban/context/CC An International Bank Account Number (IBAN), with context. For example "IBAN: DE75512108001245126199".
pii/banking/iban/landmark An International Bank Account Number (IBAN) landmark. For example "IBAN".
pii/banking/iban/nocontext/CC An International Bank Account Number (IBAN), without context. For example "DE75512108001245126199".
pii/banking/landmark/CC A bank account landmark. For example "Bank account details".
pii/banking/nocontext/CC Bank account numbers sufficient to identify an individual account, without context. For example "40-38-02 31618080".
pii/banking/roll_number/context/gb

A UK building society roll number, with context. For example "Roll number: 1234/123456789".

This entity returns components. See Components.

pii/banking/roll_number/landmark/gb A UK building society roll number landmark. For example "Roll number".
pii/banking/roll_number/nocontext/gb A UK building society roll number, without context. For example "1234/123456789".
pii/banking/routing_number/context/CC

A routing number, usually used to identify a bank branch, with context. For example "Sort code: 40-38-02".

This entity returns components. See Components.

pii/banking/routing_number/landmark/CC A routing number landmark. For example "Sort code".
pii/banking/routing_number/nocontext/CC A routing number, usually used to identify a bank branch, without context. For example "40-38-02".
pii/banking/swiftcode/context/CC A SWIFT/BIC code, with context. For example "SWIFT code: EFGHCAVVV".
pii/banking/swiftcode/landmark A SWIFT/BIC code landmark. For example "SWIFT code".
pii/banking/swiftcode/nocontext/CC A SWIFT/BIC code, without context. For example "EFGHCAVVV".

date and date_cjkvt (ECR and EJR available)

Entity Description
pii/date/dob/context/LLL

A date of birth, written numerically or using words. For example "date of birth 1/1/2018", "GEBOORTEDATUM: 01/01/2018"

This entity returns dates in the normalized ISO-8601 format YYYY-MM-DD.

You can turn off normalization by setting normalize_dates=false in the pii_postprocessing.lua script. This option can improve performance when you do not need normalization.

pii/date/nocontext/LLL A calendar date, written numerically or using words, without context. For example "01.03.1918", "2018_01_01", "вторник, 30 октомври 2018".
pii/date/dob/landmark/LLL A date of birth landmark, such as "DOB" or "Fecha de nacimiento".

device_id (ECR and EJR available)

Entity Description
pii/device_id/ip/nocontext An IP address, without context.
pii/device_id/imei/nocontext An IMEI (International Mobile Equipment Identity), without context.
pii/device_id/imeisv/nocontext An IMEISV (International Mobile Equipment Identity software version) , without context.
pii/device_id/mac_address/nocontext A MAC address, without context.
pii/device_id/meid/nocontext A MEID (Mobile Equipment Identifier), without context.
pii/device_id/iccid/nocontext An ICCID (Integrated Circuit Card Identifier), without context.
pii/device_id/imsi/nocontext An IMSI (International Mobile Subscriber Identity), without context.
pii/device_id/msisdn/nocontext A MSISDN (Mobile Station International Subscriber Directory Number)

driving and driving_cjkvt (ECR and EJR available)

Entity Description
pii/driving/context/CC

A driving license number with context. For example: "australian automobile association: 103 805 501", or "driver's license: A234567890".

This entity matches both the driving license number, and the personal number or driver number, if present. On the standard European driving license, these are fields 5 and 4d.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/driving/nocontext/CC

A driving license number, without context.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

pii/driving/landmark/CC A driving license landmark, such as "Driver's license" or "Driving Licence".

health (ECR and EJR available)

Entity Description
pii/health/ehic/context/CC

An EHIC personal identification number with context. For example "EHIC: UK 1234 5678 " or "TSE: 123456789012".

This entity returns components. See Components.

pii/health/ehic/nocontext/CC

An EHIC personal identification number without context. For example "123456-789A".

This entity returns components. See Components.

pii/health/ehic/landmark/CC An EHIC landmark, such as "EHIC" or "EHIC PIN".
pii/health/ehic/context/all

An EHIC personal identification number with context, for EU countries and Switzerland.

This entity returns components. See Components.

pii/health/ehic/nocontext/all

An EHIC personal identification number without context, for EU countries and Switzerland.

This entity returns components. See Components.

pii/health/ehic/landmark/all An EHIC landmark, such as "EHIC" or "EHIC PIN", for EU countries and Switzerland.
pii/health/id/context/au An Australian Medicard (card) number or Individual Healthcare Identifier (IHI) with context. For example "Medicare Card Number: 3501 80315 1-6".
pii/health/id/context/br A Brazilian Cartão Nacional de Saúde (CNS, also known as SUS) number with context, for example "CNS: 190129759240018".
pii/health/id/context/ca

A Canadian health insurance (card) number with context. For example "health insurance: 12345-6789", or "assurance-maladie: 12345-6789".

This entity matches health ID numbers for the following healthcare schemes:

  • Alberta Health Care Insurance Plan
  • BC Medical Services Plan
  • Manitoba Health Card
  • New Brunswick Medicare
  • Newfoundland and Labrador Medical Care Plan
  • Northwest Territories Health Care Plan
  • Nova Scotia Health Insurance Program
  • Nunavut Health Care Plan
  • Ontario Health Insurance Plan
  • Prince Edward Island (PEI) Health Card
  • Quebec Health Insurance Card
  • Saskatchewan Health Card
  • Yukon Health Care Insurance Plan
  • Canadian Forces
  • RCMP
  • Veteran Affairs
  • NSOU
pii/health/id/context/ch A Swiss health insurance card number with context. For example "Schweizerische Krankenversicherungskarte: 12345678901234567890".
pii/health/id/context/es A Spanish health insurance card number with context. For example "CatSalut: ABCD 1 123456 12 1".
pii/health/id/context/fr A French Carte Vitale number with context. For example "INSEE: 187090100100141".
pii/health/id/context/gb A British NHS number with context. For example "NHS Number: 943 476 5919".
pii/health/id/context/nz A New Zealand National Health Index (NHI) number with context. For example "NHI Number: CGC2720".
pii/health/id/context/us A US health insurance number with context. For example "Medicare ID: 1EG4-TE5-MK72".
pii/heath/id/nocontext/CC A health number, such as a British NHS number or French Carte Vitale number, without context.
pii/health/id/landmark/CC A health number landmark, such as "NHS number" or "Medicare ID".

The following entities match health plan numbers for Canadian provinces. The text matched by these entities would also be matched by pii/health/id/context/ca or pii/health/id/nocontext/ca.

Entity Description
pii/health/id/context/alberta/ca Alberta Health Care Insurance Plan number, with context.
pii/health/id/context/british_columbia/ca BC Medical Services Plan number, with context.
pii/health/id/context/manitoba/ca Manitoba Health Card number, with context.
pii/health/id/context/new_brunswick/ca New Brunswick Medicare number, with context.
pii/health/id/context/newfoundland/ca Newfoundland and Labrador Medical Care Plan number, with context.
pii/health/id/context/northwest_territories/ca Northwest Territories Health Care Plan number, with context.
pii/health/id/context/nova_scotia/ca Nova Scotia Health Insurance Program number, with context.
pii/health/id/context/nunavut/ca Nunavut Health Care Plan number, with context.
pii/health/id/context/ontario/ca Ontario Health Insurance Plan number, with context.
pii/health/id/context/prince_edward_island/ca Prince Edward Island (PEI) Health Card number, with context.
pii/health/id/context/quebec/ca Quebec Health Insurance Card number, with context.
pii/health/id/context/saskatchewan/ca Saskatchewan Health Card number, with context.
pii/health/id/context/yukon/ca Yukon Health Care Insurance Plan number, with context.
pii/health/id/nocontext/alberta/ca Alberta Health Care Insurance Plan number, without context.
pii/health/id/nocontext/british_columbia/ca BC Medical Services Plan number, without context.
pii/health/id/nocontext/manitoba/ca Manitoba Health Card number, without context.
pii/health/id/nocontext/new_brunswick/ca New Brunswick Medicare number, without context.
pii/health/id/nocontext/newfoundland/ca Newfoundland and Labrador Medical Care Plan number, without context.
pii/health/id/nocontext/northwest_territories/ca Northwest Territories Health Care Plan number, without context.
pii/health/id/nocontext/nova_scotia/ca Nova Scotia Health Insurance Program number, without context.
pii/health/id/nocontext/nunavut/ca Nunavut Health Care Plan number, without context.
pii/health/id/nocontext/ontario/ca Ontario Health Insurance Plan number, without context.
pii/health/id/nocontext/prince_edward_island/ca Prince Edward Island (PEI) Health Card number, without context.
pii/health/id/nocontext/quebec/ca Quebec Health Insurance Card number, without context.
pii/health/id/nocontext/saskatchewan/ca Saskatchewan Health Card number, without context.
pii/health/id/nocontext/yukon/ca Yukon Health Care Insurance Plan number, without context.

health_cjkvt (ECR and EJR available)

Entity Description
pii/health/id/context/CC

A heath number with context. For example "記号21700023".

This entity gives a lower score to matches with more ambiguous landmarks (such as 番号 and "Number"). However, if two matches occur together (for example "記号 21700023 番号 21"), the entity can match both with a higher score.

This entity returns components. See Components.

pii/heath/id/nocontext/CC

A health number, such as a Japanese Health Insurance Card number, without context.

This entity returns components. See Components.

pii/health/id/landmark/CC A health number landmark, such as "保険者番号" or "Insurer number".

internet.ecr

Entity Description
pii/internet/email/context/CC An email address, with context. For example "courrier électronique: jsmith@mailserver.com".
pii/internet/email/nocontext/CC An email address, without context. For example "jsmith@mailserver.com".
pii/internet/email/landmark/CC An email address landmark. For example "courrier électronique".

medical_terms.ecr

Entity Description
pii/medical_terms/LLL

A medical condition or procedure. For example "abdominal hernia". This entity is available for all GDPR languages.

pii/medical_terms/blood_test/eng

A blood test. For example "9 panel urine test".
pii/medical_terms/lab_test/eng A laboratory test. For example "1, 25 dihydroxyvitamin D".
pii/medical_terms/surgical_procedure/eng A surgical procedure. For example "abdominal liposuction".
pii/medical_terms/specialty/eng A medical specialty. For example "allergy and immunology".
pii/medical_terms/drug_brand/eng A trade name for a medical drug. For example "Abelcet".
pii/medical_terms/drug_generic/eng A generic name for a medical drug. For example "Abacavir".
pii/medical_terms/medication/eng A medication description. For example "Altoprev tablets for oral use".
pii/medical_terms/disability/social_security/engus An impairment for the purpose of disability evaluation under social security in the US. For example "adrenal glands carcinoma".
pii/medical_terms/disease_condition/eng A disease or medical condition. For example "1p36 deletion syndrome".
pii/medical_terms/lifestyle/eng A lifestyle that relates to medical conditions. For example "smoking".
pii/medical_terms/icd10cm/eng An ICD10 medical condition code (see https://www.icd10data.com/ICD10CM/Codes).
pii/medical_terms/icd10pcs/eng An ICD10 procedure code (see https://www.icd10data.com/ICD10PCS/Codes).

medical_terms_cjkvt.ecr

Entity Description
pii/medical_terms/LLL A medical condition or procedure.

mrtd (ECR and EJR available)

Entity Description
pii/mrtd/mrp

A machine readable passport. For example "P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<<
5333244280GBR8812049F2509286<<<<<<<<<<<<<<00
"

This entity returns components. See Components.

pii/mrtd/mrotd/td1 A machine readable TD1-size travel document. For example "IDD<<T220001293<<<<<<<<<<<<<<<
6408125<2010315D<<<<<<<<<<<<<4
MUSTERMANN<<ERIKA<<<<<<<<<<<<<
"

mrtd_cjkvt (ECR and EJR available)

Entity Description
pii/mrtd/mrp_cjkvt

A CJKVT machine readable passport line. For example "P<JPN<<<<<<<KEIKO<INOUE<<<<<<<<<<<<<<<<<<<<<"

This entity returns components. See Components.

pii/mrtd/mrotd/td1_cjkvt A CJKVT machine readable TD1-size travel document line. For example "KEIKO<<INOUE<<<<<<<<<<<<<<<<<<"

name.ecr

Entity Description
pii/name/CC

A full personal name, in title case or upper case.

This entity returns the names in a normalized format, in the form GIVEN NAME SURNAME, for example JOHN SMITH.

You can turn off normalization by setting normalize_names=false in the names_stoplist.lua script. You can also turn off score adjustment, by setting rescore_names=false in the names_stoplist.lua script. This option can improve performance when you do not need the normalization or score refinement.

This entity returns components. See Components.

pii/name/landmark/CC A full name landmark. For example "name".
pii/name/given_name/context/CC A given name, with context. For example "Forename: John".
pii/name/given_name/nocontext/CC A given name, without context. For example "John".
pii/name/given_name/landmark/CC A given name landmark. For example "Forename".
pii/name/surname/context/CC A surname with context. For example "Surname: Smith".
pii/name/surname/nocontext/CC A surname without context. For example "Smith".
pii/name/surname/landmark/CC A surname landmark. For example "Surname".
pii/name/pre_title/CC A title that precedes a name. For example "Ms".
pii/name/post_title/CC A title that follows a name. For example "Esq".
pii/name/title_surname/CC A title and surname. For example "Mr. Smith".

name_cjkvt.ecr

Entity Description
pii/name/CC

A full personal name, in romanized text or CJKVT native script. Romanized names can be in title case or upper case, and can be in the order given name surname or surname given name. CJKVT native script names must be surname given name. For Japanese, either form can include honorifics.

This entity returns the names in a normalized format, in the form GIVEN NAME SURNAME, for example KEIKO NAKAMURA.

You can turn off normalization by setting normalize_names=false in the name_stoplist.lua script. You can also turn off score adjustment, by setting rescore_names=false in the name_stoplist.lua script. This option can improve performance when you do not need the normalization or score refinement.

This entity returns components. See Components.

pii/name/cjkvt/CC

A full personal name in CJKVT native script. For example "山田直樹".

This entity returns components. See Components.

pii/name/latin/CC

A romanized full personal name. For example "Hiroshi Tanaka-san".

This entity returns components. See Components.

pii/name/landmark/CC A full name landmark. For example "名前".
pii/name/given_name/context/cjkvt/CC A given name in CJKVT native script, with context. For example "名前: 恵 ".
pii/name/given_name/nocontext/cjkvt/CC A given name in CJKVT native script, without context. For example "恵 ".
pii/name/given_name/nocontext/cjkvt_spaced/CC A given name in CJKVT native script, separated by spaces, and without context. For example "建 國". This entity is primarily to allow you to create patterns that match alternative name formats.
pii/name/given_name/context/latin/CC A romanized given name, with a context landmark in CJKVT native script. For example "名前: Keiko".
pii/name/given_name/nocontext/latin/CC A romanized given name, without context. For example "Keiko".
pii/name/given_name/context/CC A given name in romanized text or CJKVT native script, with a context landmark in CJKVT native script. For example "名前: 恵 ".
pii/name/given_name/nocontext/CC A given name in romanized text or CJKVT native script, without context. For example "直樹 ".
pii/name/given_name/landmark/CC A given name landmark in CJKVT native script. For example: "名前"
pii/name/surname/context/cjkvt/CC A surname in CJKVT native script, with context. For example "名字: 山田".
pii/name/surname/nocontext/cjkvt/CC A surname in CJKVT native script, without context. For example "山田".
pii/name/surname/nocontext/cjkvt_spaced/ A given name in CJKVT native script, separated by spaces, and without context. For example "欧 阳". This entity is primarily to allow you to create patterns that match alternative name formats.
pii/name/surname/context/latin/CC A romanized surname, with a context landmark in CJKVT native script. For example "名字: Nakamura".
pii/name/surname/nocontext/latin/CC A romanized surname, without context. For example "Nakamura".
pii/name/surname/context/CC A surname in romanized text or CJKVT native script, with a context landmark in CJKVT native script. For example "名字: 山田".
pii/name/surname/nocontext/CC A surname in romanized text or CJKVT native script, without context. For example "山田".
pii/name/surname/landmark/CC A surname landmark in CJKVT native script. For example "名字".
pii/name/pre_title/nocontext/CC
pii/name/pre_title/cn - deprecated
pii/name/pre_title/kr - deprecated
pii/name/pre_title/tw - deprecated
A title that precedes a name in romanized text. For example "Ms".
pii/name/post_title/nocontext/latin/CC A title that follows a name in romanized text. For example "Junior".
pii/name/post_title/nocontext/cjkvt/CC A title that follows a name in CJKVT native script. For example "さん".
pii/name/post_title/nocontext/CC
pii/name/post_title/cn - deprecated
pii/name/post_title/kr - deprecated
pii/name/post_title/tw - deprecated
A title that follows a name in romanized text, or CJKVT native script. For example "Junior" or "さん".
pii/name/title_surname/latin/CC A title and surname in romanized text. For example "Dr Tan".
pii/name/title_surname/cjkvt/CC A title and surname in CJKVT native script. For example "譚医生".
pii/name/title_surname/CC A title and surname in romanized text, or Japanese script for Japan (jp). For example "Dr Tan" or "譚医生".

national_id and national_id_cjkvt (ECR and EJR available)

Entity Description
pii/id/context/CC

A national identity number with context. For information about the supported ID numbers, see Supported National ID Numbers.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/id/employment_insurance/context/jp A Japanese unemployment insurance number, with context. For example "雇用保険番号 1234-123456-1".
pii/id/employment_insurance/landmark/jp A Japanese unemployment insurance number landmark. For example "雇用保険番号"
pii/id/employment_insurance/nocontext/jp A Japanese unemployment insurance number, without context. For example "1234-123456-1".
pii/id/nocontext/CC

A national identity number without context. For information about the supported ID numbers, see Supported National ID Numbers.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/id/landmark/CC A national identity number landmark, such as "National insurance number" or "Social security number".
pii/id/pension/context/jp A Japanese pension number, with context. For example "基本年金番号 1234567890".
pii/id/pension/landmark/jp A Japanese pension number landmark. For example "基本年金番号".
pii/id/pension/nocontext/jp A Japanese pension number, without context. For example "1234567890".
pii/id/redacted/context/us A redacted or partially redacted US social security number, with context. At least one masking character, x, X, or *, must be present. For example "SSN: xxx-xx-3333".
pii/id/redacted/nocontext/us A redacted or partially redacted US social security number, without context. At least one masking character, x, X, or *, must be present. For example "xxx-xx-3333".

nationality and nationality_cjkvt (ECR and EJR available)

Entity Description
pii/nationality/adj/context/CC A nationality adjective with context. For example, "Nationality: British".
pii/nationality/adj/nocontext/CC A nationality adjective without context. For example, "British".
pii/nationality/adj/landmark/CC A nationality adjective landmark. For example, "Nationality".
pii/nationality/noun/context/CC A nationality noun with context. For example, "Country: Britain".
pii/nationality/noun/nocontext/CC A nationality noun without context. For example: "Britain".
pii/nationality/noun/landmark/CC A nationality noun landmark. For example, "Country".
pii/nationality/any/context/CC Any combination of nationality adjective and noun landmark and value. For example, "Country: British", or "Nationality: British".
pii/nationality/any/nocontext/CC Any nationality adjective or noun. For example, "Britain" or "British".
pii/nationality/any/landmark/CC Any nationality adjective or landmark. For example, "Nationality" or "Country".

passport and passport_cjkvt (ECR and EJR available)

Entity Description
pii/passport/context/CC

A passport number with context. For example "Passport number: 533324428", "Passport Number: P4366918", or "italian passaporti AA5275702".

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/passport/nocontext/CC

A passport number without context. For example "533324428", "C015918", or "14CV28142".

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/passport/landmark/CC A passport landmark, such as "Passport" or "Pasaporte". For information about cases where the landmark and passport number do not match or have an ambiguous match, see Ambiguous Entities.

postcode and postcode_cjkvt

Entity Description
pii/postcode/context/CC

A postal code with context. For example "PLZ: 1210", "Poštanski broj: 10000", or "Cod poştal: 235200".

This entity returns components. See Components.

pii/postcode/nocontext/CC

A postal code without context. For example "2700-439 AMADORA", "75018", or "W1B 5TG".

This entity returns components. See Components.

pii/postcode/landmark/CC A postal code landmark, such as "Postcode" or "Postleitzahl".

telephone.ecrand telephone_cjkvt.ecr

Entity Description
pii/telephone/context/CC

A telephone number with context. For example "Tel: +44 1234 224050", "Telephone: (201)-222-4344", or "numéro de téléphone: +1-902-861-7000".

For the telephone_cjkvt grammar, numbers can be ASCII or full-width numbers.

NOTE: To ensure that this entity performs correctly, set your TangibleCharacters configuration to include the following characters: (

)+-. For more information, see Configure Tangible Characters.

This entity returns the telephone number in the normalized format +NNNNN, starting with the country code. For example +12012224344.

This entity returns components. See Components.

pii/telephone/nocontext/CC

A telephone number without context. For example: "(204)-243-9955", "+39 055 326 43 11", or "44 20 7499 9000".

For the telephone_cjkvt grammar, numbers can be ASCII or full-width numbers.

NOTE: To ensure that this entity performs correctly, set your TangibleCharacters configuration to include the following characters: ()+-. For more information, see Configure Tangible Characters.

This entity returns the telephone number in the normalized format +NNNNN, starting with the country code. For example +12042439955.

This entity returns components. See Components.

pii/telephone/landmark/CC

A telephone number landmark, such as "Tel:" or "Telefon No".

For the telephone_cjkvt grammar, landmarks are available only in CJKVT native script.

tin and tin_cjkvt (ECR and EJR available)

Entity Description
pii/tin/context/CC

A tax identification number with context. For example "ITIN: 911-92-3333", or "TIN-numre: 101111113". For more examples, see Example Tax Identification Numbers.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/tin/nocontext/CC

A tax identification number without context. For example "756.3047.5009.62", or "Z1234567R". For more examples, see Example Tax Identification Numbers.

NOTE: By default, when there are multiple possible matches, post-processing returns only one country, which is the most efficient option. You can enable ambiguous entity matching by setting ambiguous_entities=true in the pii_postprocessing.lua script. This option returns multiple possible country matches.

This entity returns components. See Components.

pii/tin/landmark/CC A tax identification number landmark, such as "ITIN" or "TIN-numre".
pii/tin/vatin/context/CC A VAT identification number with context. For example "NUIS: ALK99999999L" or "VAT Reg No GB 980 7806 84".
pii/tin/vatin/nocontext/CC A VAT identification number without context. For example "ALK99999999L" or " GB 980 7806 84".
pii/tin/vatin/landmark/CC A VAT identification number landmark. For example "NUIS" or "VAT Reg No".

travel (ECR and EJR available)

Entity Description
pii/travel/context/us

A US passport card number with context. For example "Passport card number: C12345678".

This entity returns components. See Components.

pii/travel/nocontext/us

A US passport card number without context. For example "C12345678".

This entity returns components. See Components.

pii/travel/landmark/us A US passport card number landmark. For example "Passport card number".

voter_id (ECR and EJR available)

Entity Description
pii/voter_id/context/CC

A voter ID number with context. This entity is available only for GB, IN, and MX. For example "Electoral Photo Identity Card: GDN0225185".

pii/voter_id/nocontext/CC

A voter ID number without context. This entity is available only for GB, IN, and MX. For example "GDN0225185".

pii/voter_id/landmark/CC A voter ID number landmark. This entity is available only for GB, IN, and MX. For example "Electoral Photo Identity Card".