Standard Grammar – Source

Eduction includes standard grammar files in source form (XML) and their compiled equivalents (ECR). The source files import compiled Eduction standard grammar files and illustrate sample usage. Customers can modify these XML source files and recompile them to customize a grammar for the needs of an Eduction application. The following table lists public entities defined in the XML source files. It excludes the public entities that are republished from the imported Eduction ECR grammar files.

File

Entity

Description

measure.xml

measure/all/eng

An editable collection of patterns that match length, area, volume, and mass.

money.xml1When matching symbols in the money entities, the Eduction option MatchWholeWord must be set to 0 (false). Otherwise, when encountering a string such as $10.70, Eduction will not recognize that $ is the start of a token. Instead, it looks only for matches starting on the 1 and on the 7, and will not return $10.70.

money/all

All currency amounts.

NOTE:

This grammar file supports some English alphabetic numbers, for example, seven cents, $12 million, one hundred dollars, £5m.

pci_dss.xml

pci_dss/person_name/engus

pci_dss/date/engus

pci_dss/credit_card/engus

pci_dss/bank_names/engus

Person names.

Dates.

Credit and debit card numbers.

Bank names.

pii.xml

pii/person_name/engus

pii/phone_number/engus

pii/email_address/engus

pii/ip_address/engus

pii/social_security/engus

pii/car_numberplate/engus

pii/driver_license/engus

pii/credit_card/engus

pii/date/engus

pii/country

pii/state/engus

pii/county/engus

pii/city/engus

pii/address/engus

pii/zipcode/engus

pii/age/engus

pii/gender/engus

pii/race/engus

pii/job_title/engus

pii/disease_and_condition/engus

pii/account_number/engus

pii/license_number/engus

pii/facebook_url/engus

Personal names.

Phone numbers.

Email addresses.

IP addresses.

Social Security numbers.

Car license plate numbers.

Driver’s license numbers.

Credit and debit card numbers.

Dates.

Countries.

U.S. states or possessions.

U.S. counties.

U.S. cities.

Geographical addresses.

U.S. zipcodes.

Age.

Gender.

Race.

Job title.

Disease or medical condition.

Generic account number with 6-8 digits in a predictable context.

Generic license number with specific alphanumeric format.

Example URL for a personal Web page (Facebook).

place_europe.xml

place/country/europe

place/country_uppercase/europe

place/city1/europe

place/city1_uppercase/europe

place/city2/europe

place/city2_uppercase/europe

place/region/Europe

place/region_uppercase/Europe

European country in English (and some local languages).

European country in English and local languages (uppercase).

European settlement with over 100,000 inhabitants, in local language.

European settlement with over 100,000 inhabitants, in local language (uppercase).

European settlement with between 10,000 and 100,000 inhabitants, in local language.

European settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase).

High-level administrative division, in local language.

High-level administrative division, in local language (uppercase).

place_south_america.xml

place/country/south_america

place/country_uppercase/south_america

place/city1/south_america

place/city1_uppercase/south_america

place/city2/south_america

place/city2_uppercase/south_america

place/island/south_america

place/island_uppercase/south_america

place/region/south_america

place/region_uppercase/south_america

South American country in English, Spanish, or Portuguese.

South American country in English, Spanish, or Portuguese (uppercase).

South American settlement with over 100,000 inhabitants, in local language.

South American settlement with over 100,000 inhabitants, in local language (uppercase).

South American settlement with between 10,000 and 100,000 inhabitants, in local language.

South American settlement with between 10,000 and 100,000 inhabitants, in local language (uppercase).

South American island, in local language.

South American island, in local language (uppercase).

High-level administrative division, in local language.

High-level administrative division, in local language (uppercase).

retention.xml

retention/admission_date

retention/discharge_date

retention/birth_date

retention/age/eng

Admission date.

Discharge date.

Birth date.

Age.

sample.xml sample/solar_system A simple entity for planets of the solar system.

sentiment_user_chi.xml

sentiment/user_client_name

sentiment/user_client_brand

sentiment/user_client_rv1_name

sentiment/user_client_rv1_brand

sentiment/user_third_party_company_name

sentiment/user_third_party_company_brand

sentiment/user_positive_adjective

sentiment/user_negative_adjective

sentiment/user_positive_noun

sentiment/user_negative_noun

sentiment/user_neutral_noun

sentiment/user_positive_verb

sentiment/user_negative_verb

sentiment/user_neutral_verb

sentiment/user_positive_idiom

sentiment/user_negative_idiom

You can use these files to modify the sentiment analysis grammar files for the relevant languages to give access to extra domain-specific vocabulary.

sentiment_user_ara.xml

sentiment_user_cze.xml

sentiment_user_eng.xml

sentiment_user_fre.xml

sentiment_user_ger.xml

sentiment_user_ita.xml

sentiment_user_pol.xml

sentiment_user_por.xml

sentiment_user_rus.xml

sentiment_user_spa.xml

sentiment_user_tur.xml

sentiment/user_positive_adjective

sentiment/user_negative_adjective

sentiment/user_neutral_adjective

sentiment/user_positive_adverb

sentiment/user_negative_adverb

sentiment/user_neutral_adverb

sentiment/user_positive_noun

sentiment/user_negative_noun

sentiment/user_neutral_noun

sentiment/user_positive_verb

sentiment/user_negative_verb

sentiment/user_neutral_verb

sentiment/user_positive_match

sentiment/user_negative_match

sentiment/user_good_noun (English only)

 

The entities above incorporate the compiled Eduction entities in combination with Eduction XML grammar to create additional entities. The XML illustrates how to use the compiled Eduction entities. You can modify these XML files and compile them into Eduction ECR files that can then be used for specific applications.

The Eduction grammar files have three advantages:


_FT_HTML5_bannerTitle.htm