Grammars

In File Analysis Suite, a grammar is a group of entities (a word, phrase, or block of information) used to identify information. These entities are based on patterns that identify specific types of information, such as social security numbers, names, telephone numbers, addresses, and so on.

You use grammars to search for or filter documents within a workspace to include in a workbook or place on hold.

File Analysis Suite includes several built-in grammars () that are comprised of specific built-in entities (). You can also create custom grammars () and entities () to suite your specific needs, including adding a custom entity to a built-in grammar.

When managing grammars and entities, keep the following in mind:

You cannot edit the name or description of built-in grammars.
You cannot edit the details of built-in entities.
You can add a custom entity to a built-in grammar.

Built-in grammars and entities

Grammar	Grammar description	Entity information type
Contact Data	Any information that can be used to contact an individual, such as postal addresses, phone numbers, and email addresses. This grammar includes information for multiple languages and countries.	Addresses A postal address. This entity returns the addresses in a normalized format by default. The normalized form standardizes apartment and house numbers, removes additional punctuation, and converts the text to uppercase. For example, "ABIDEI HURRIYET CD TANER PALAS APT 9" or "KAT:7, D:9, 34437 ISTANBUL". The exact order depends on the country. For CJKVT, this entity returns the addresses in a normalized format. The normalized form standardizes apartment and house numbers, removes additional punctuation, and for Romanized text, it converts the text to uppercase. CJKVT native script is not normalized to ASCII, and Romanized text is not normalized to CJKVT native script. Email Addresses Email address. For example, "jsmith@mailserver.com". Email address with mailto: prefix. For example, "mailto:jsmith@mailserver.com". Phone Numbers A telephone number with context. For example "Tel: +44 1234 224050", "Telephone: (204)-243-9955", or "numéro de téléphone: +1-902-861-7000". For CJKVT, numbers can be ASCII or full-width numbers.
Devices	Any information that can identify electronic devices, such as IP and MAC addresses.	Device ID An identification number for a computing device (such as a computer, tablet, or smart phone). The following device IDs are included. An IP address, without context. An IMEI (International Mobile Equipment Identity), without context. An IMEISV (International Mobile Equipment Identity software version) , without context. A MAC address, without context. A MEID (Mobile Equipment Identifier), without context. An ICCID (Integrated Circuit Card Identifier), without context. An IMSI (International Mobile Subscriber Identity), without context. A MSISDN (Mobile Station International Subscriber Directory Number), without context.
Financial Data	Any personal data related to financial data such as bank accounts, IBAN, salary information, and so on. This grammar includes information for multiple languages and countries.	Bank Account Numbers A bank account number. The following bank account patterns are included. Canadian bank account number. The entity recognizes known account number formats for particular banks, and generic seven- or 12-digit numbers. This entity does not include the CPA transit numbers. French bank account number. German bank account number. United Kingdom bank account number, including the sort code. The sort code and account number must be separated by white space. The account number can be any eight-digit number. Ireland bank account number. United States bank account number, including the American Bankers Association routing number, in fraction or MICR format. The routing information and account information must be separated by a single space. The account number can be four to 17 digits. Bank Details A name of a bank. Major bank names for the following countries are included. Canadian banks. For example, "Canadian Imperial Bank of Commerce". UK banks. For example, "HSBC". U.S. banks. For example, "Morgan Stanley". Credit Card Numbers Any credit card number. The following credit card formats are included. American Express: American Express credit card account numbers are 15 digits in lengths, and generally start with either 34 or 37. For example, "378124403602370". Bankcard: Bankcard credit card number (discontinued in 2006). China Union Pay: Most China UnionPay card numbers have prefixes from 620 to 625, and range in length from 16 to 19 characters. DanKort Diners Club Carte Blanche, Diners Club International, Diners Club enRoute: Most Diners Club credit card numbers are 16 or 14 digits long. For example, "30544726571210" (Carte Blanche), "36072371463677" (International), or "5484308289255581" (North America). Discover: Discover credit card numbers start with 6011, 622126 to 622925, 644, 645, 646, 647, 648, 649, or 65, and are 16 digits long. For example, "6011541256841963". InstaPayment: InstaPayment credit card numbers start with either 637, 638, or 639, and are 16 digits long. For example, "6393519709142682". JCB: JCB credit card numbers consist of 16 digits. Either the first four digits must be 3088, 3096, 3112, 3158, or 3337, or the first eight digits must be in the range 35280000 to 35899999. For example, "3158745776935953". Laser: (discontinued in 2014) Laser credit card numbers start with 6304, 6706, 6771, or 6709, and are between 16 to 19 digits long. For example, "6709682431878947". Maestro: Maestro credit card numbers start with 5018, 5020, 5038, 5893, 6304, 6759, 6761, 6762, or 6763, and are between 16 to 19 digits long (although they can have as few as 12 digits). For example, "5018452935461261". Mastercard: Mastercard credit card numbers start with 51, 52, 53, 54, or 55, and are between 16 to 19 digits long. For example, "5500 0000 0000 0004". Solo: (discontinued in 2011) Solo credit card numbers are 16-digit, 18-digit, or 19-digit long. For example, "6331101999990016". Switch: (rebranded as Maestro) Switch credit card numbers are 16-digit, 18-digit, or 19-digit long and begin with 4903, 4905, 4911, 4936, 564182, 633110, 6333, or 6759. For example, "6759649826438453". Visa: Most Visa credit card numbers start with 4 and are 16 digits long; however, there are a few that consist of 13 digits. The numbers are always spaced in four groups of four digits each. For example, "4929 8198 5006 5312". IBAN (International Bank Account Number) Undelimited or space-delimited International Bank Account Number (IBAN) for each supported country. For more information on IBAN formatting requirements for each country, see https://www.iban.com/structure.html. Sort Codes A bank sort code. The following sort code formats are included. 8-digit German bank sort code. For example, "10019610". United Kingdom bank sort code. For example, "301007", "30-10-07", or "30 10 07". This entity recognizes any valid sort code. United Kingdom bank account number, including the sort code. The sort code and account number must be separated by white space. The account number can be any eight-digit number. Ireland bank sort code. For example, "906005", "90-60-05", or "90 60 05".
Government ID	Government issued identification information such as drivers license, passport, social security, and so on. This grammar includes information for multiple languages and countries.	Driving License Numbers A driving license number with context. For example: "australian automobile association: 103 805 501", or "driver's license: A234567890". This entity matches both the driving license number, and the personal number or driver number, if present. On the standard European driving license, these are fields 5 and 4d. Machine Readable Passport A machine readable passport number. For example "P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<< 5333244280GBR8812049F2509286<<<<<<<<<<<<<<00" A CJKVT machine readable passport line. For example "P<JPN<<<<<<<KEIKO<INOUE<<<<<<<<<<<<<<<<<<<<<". Machine Readable TD-1 Travel Document A machine readable TD1-size travel document number. For example, "IDD<<T220001293<<<<<<<<<<<<<<< 6408125<2010315D<<<<<<<<<<<<<4 MUSTERMANN<<ERIKA<<<<<<<<<<<<<". A CJKVT machine readable TD1-size travel document line. For example, "KEIKO<<INOUE<<<<<<<<<<<<<<<<<<". National ID A national identity number with context. For example, "SSN 111-22-3333", "National Insurance Number AB 12 34 56 C", "Code INSEE 187090100100141", or "ImmiCard AMS123456". NOTE: PossibleTurkish national identity numbers are identified without context. Each country has their own format. Passport Numbers A passport number with context. For example, "Passport number: 533324428", "Passport Number: P4366918", or "italian passaporti AA5275702". Social Security Tax ID A tax identification number (TIN or ITIN) with context. For example "ITIN: 911-92-3333", or "TIN-numre: 101111113". Each country has their own format. VAT number A value added tax identification number (VATIN) with context. For example "NUIS: ALK99999999L" or "VAT Reg No GB 980 7806 84".
Identification Data	Any personal data closely related to the identity of an individual such as name, date of birth, gender, salutation, title, and so on. This grammar includes information for multiple languages and countries.	Date of Birth A date of birth, written numerically or using words. For example "date of birth 1/1/2018", "GEBOORTEDATUM: 01/01/2018" Genders A gender or family relation in the English, French, or German language, either in a word or in context. For example, "lady", "father", "Dame", "voisines", "Frau", or "mensch". Names A full personal name, in title case or upper case. For example, "John Smith", "KEIKO NAKAMURA", or "山田恵". For CJKVT, a full personal name, in romanized text or CJKVT native script. Romanized names can be in title case or upper case, and can be in the order given name surname or surname given name. CJKVT native script names must be surname given name. For Japanese, either form can include honorifics.
Nationalities	Any nationality. This grammar matches nationalities written in English or French, such as "French" or "Francais".	Nationalities Any combination of nationality adjective and noun landmark and value, with context. For example, "Country: British", or "Nationality: British".
Sensitive Data	Any personal information that defines the racial or ethnic origin of an individual. This grammar matches racial or ethnic origin written in English or French, such as "caucasian" or "caucasien".	Racial Ethnic Origin A reference to ethnicity or race identification. For example, "White", "Fijian", "Inuit", or "Irish". United Kingdom identity code. For example, IC1, IC2. Ethnic groups in the French language. For example, "Africain" or "Autres".

To create a custom grammar

From the primary navigation panel, click Grammars.

The Grammars page opens.
Click NEW GRAMMAR.

The New Grammar dialog opens.

Complete the details for the new grammar.

Option	Description
Name	Type a meaningful, unique name for the grammar. Limits: Maximum 50 characters.
Description	Type a meaningful description for the new grammar. Limits: Maximum 250 characters.
Case Sensitivity	Select whether to enforce case sensitivity. For example, if an entity within this grammar is for a specific name, John Doe, Select On to match only "John Doe". Select Off to match "John Doe", "john doe", "jOHn doE", and any other combination of lower and upper case characters, as long as the spelling of the name is identical. NOTE: By default, entities within this grammar inherit this value. You can choose to not inherit this value at the entity level.

Option

Description

Name

Type a meaningful, unique name for the grammar.

Limits: Maximum 50 characters.

Description

Type a meaningful description for the new grammar.

Limits: Maximum 250 characters.

Case Sensitivity

Select whether to enforce case sensitivity.

For example, if an entity within this grammar is for a specific name, John Doe,

Select On to match only "John Doe".
Select Off to match "John Doe", "john doe", "jOHn doE", and any other combination of lower and upper case characters, as long as the spelling of the name is identical.

NOTE: By default, entities within this grammar inherit this value. You can choose to not inherit this value at the entity level.

Click SAVE.

The custom grammar is created.

Entities

An entity represents a word, phrase, or block of information based on a regular expression pattern. In addition to the built-in entities that are included in the built-in grammars, you can create custom entities to identify information that might be specific to your environment. Entities can be defined by a specific pattern or by terms.

As an example using a pattern, you identify your customers with a customer ID that follows a specific syntax, or pattern. Over time, your customer ID pattern has changed in length to accommodate for growth. You create a custom grammar, "Customer IDs", to identify managed items that reference the customer IDs. You then create one or more custom entities within the custom grammar that define these patterns. You can create multiple custom entities, one for each customer ID pattern you want to identify, or create a single custom entity that includes multiple patterns. Consider the following:

If you need to discriminate between the different customer ID patterns, create multiple custom entities, one for each of your customer ID patterns. The entities display separately in the dashboards, and you will need to include all entities and "OR" them in searches to get results for both patterns.
If you do not need to discriminate between the different customer ID patterns (you just care about identifying customer IDs, no matter the pattern), create a single custom entity that includes multiple patterns, one for each of your customer ID patterns. All ID patterns display together as a single entity in the dashboards. In searches, you will need to include the single entity.

TIP: Entity patterns are defined using standard regular expressions. Grammars and entities in File Analysis Suite are derived from Micro Focus IDOL. For more information about the regular expression syntax for entities, see Entity syntax.

If you define the entity using terms, you can manually entered the desired terms or load the terms from an existing term list.

To create a custom entity

On the Grammars page, click in the row for the grammar to which you want to add a custom entity.

Do one of the following:
- Click NEW ENTITY.
- Click the add entity icon () displayed in the right column for the selected grammar.
The New Entity dialog opens.

Complete the options for the new custom entity.

Option	Description
Name	Type a meaningful, unique name for the new entity. Limits: Maximum 50 characters.
Description	Type a meaningful description for the new entity. Limits: Maximum 250 characters.
Case Sensitivity	Specify whether the data must match the character case of the defined pattern.
Tangible Characters	Type any punctuation characters to treat as part of the word, rather than as word boundaries within the regular expression pattern.
Languages	Type the name of languages this entity can match against. As you type, languages that match what you have typed display. Click the desired language to add it to the entity. Example: This entity identifies months of the year, in English, German, or French.
Countries	Type the name of the country this entity must match. As you type, countries that match what you have typed display. Click the desired country to add it to the entity. TIP: If you plan to define a geographic region, you do not need to identify countries within the defined region. Example: This entity identifies telephone numbers, specific to the pattern of the United Kingdom.
Regions	Type the name of a geographical region this entity must match. As you type, regions that match what you have typed display. Click the desired region to add it to the entity Geographic regions and included countries Americas (AMER) - Argentina, Bolivia, Brazil, British Virgin Islands, Canada, Chile, Columbia, Costa Rica, Cuba, Dominican Republic, Equador, El Salvador, Guatemala, Mexico, Nicaragua, Paraguay, Peru, Saint Lucia, United States, Uraquay, Venezuela Asia-Pacific (APAC) - Australia, India, Indonesia, Malaysia, New Zealand, Pakistan, Phillippines, Sri Lanka, Timor-Leste Asia-Pacific [APAC (Including CJK)] - Australia, China, Hong Kong, India, Indonesia, Japan, Macao, Malaysia, New Zealand, Pakistan, Phillippines, Singapore, South Korea, Sri Lanka, Taiwan, Thailand, Tmor-Leste European Economic Area (EEA) - Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden Europe, Middle East, Africa (EMEA) - Albania, Austria, Azerbaijan, Bahrain, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Faroe Islands, Finland, France, Germany, Greece, Greenland, Hungary, Iceland, Iran, Iraq, Ireland, Israel, Italy, Jordan, Kazakhstan, Kosovo, Kuwait, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Moldova, Monaco, Montenegro, Netherlands, Nigeria, North Macedonia, Norway, Palestine, Poland, Portugal, Qatar, Romania, Russia, San Marino, Sao Tome And Principe, Saudi Arabia, Serbia, Seychelles, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, Turkey, Ukraine, United Arab Emirates, United Kingdom, Uzbekistan, Vatican Worldwide - all countries except those in CJK region Worldwide (including CJK) - all countries NOTE: CJK includes China, Hong Kong, Japan, Macao, Singapore, South Korea, Taiwan, and Thailand. Example: This entity identifies drivers license numbers, specific to the pattern of the Americas (AMER).

NOTE: When selecting grammars and entities for repositories and repository templates, countries are grouped into regions to make selection easier. The relationship between grammars and entities can be extended with custom entities. When you create a custom entity, all of the countries defined for the entity are associated with all of the regions defined for the entity. For example, you create a custom entity and assign the APAC region and Japan. When you create repositories and repository templates going forward, grammars related to Japan will be automatically included when you select the APAC region.

Select whether this entity is based on a pattern or terms.
Define, test, and add patterns.
1. Select Enter Pattern(s)
2. In the pattern text box, type the pattern for the data you want to identify using regular expressions.
  
  For the customer ID example, to identify a customer ID that includes two letters followed by a hyphen and then six numbers, type \D{2}-\d{6}.
3. In the Sample Text box, type the regular expression for the pattern you want to identify.
  
  For the customer ID example, type AB-123456.
4. Click TEST.
  - If the sample text is matched by the defined pattern, the sample text is highlighted in yellow.
  - If the sample text does not match the defined pattern, you see a message above the Sample Text box that there was not a match to the pattern.
5. When you are satisfied with the defined pattern, click ADD.
  
  The pattern moves to the Patterns box.
  - To retest a single pattern added to the Patterns box, click the row for the desired pattern and click EDIT. The pattern moves back to the pattern text box.
  - To remove a pattern added to the Patterns box, click the row for the desired pattern and click DELETE. The pattern is removed completely.
6. Repeat to add more patterns to this entity.
Define, test, and add terms.
1. Select Enter Term(s).
2. Do one of the following.
  - In the term text box, type the terms you want to identify, one term per line (press Enter on your keyboard to enter another line).
  - Click Add from Term List.
    
    In the Add from Term List dialog, select the desired term list to use and then click OK.
    
    All terms in the selected term list are added to the term text box. Delete any terms you do not want to include.
3. In the Sample Text box, type the information you want to identify.
4. Click TEST.
  - If the sample text is matched by the defined terms, the sample text is highlighted in yellow.
  - If the sample text does not match the defined terms, you see a message above the Sample Text box that there was not a match to the pattern.
5. When you are satisfied with the defined terms, click ADD.
  
  All terms in the term text box move to the Terms box.
  - To retest a single term added to the Terms box, click the row for the desired term and click EDIT. The term moves back to the term text box.
  - To remove a term added to the Terms box, click the row for the desired term and click DELETE. The term is removed completely.
6. Repeat to add more terms to this entity.
Click SAVE.

The new custom entity is created within the selected grammar.

To edit a custom entity

On the Grammars page, click the down arrow to the left of the grammar to which the custom entity belongs.

The grammar expands and the list of associated entities displays.
Click the name of the custom entity () you want to edit.

TIP: You can also click or hover over the row of the desired custom entity and then click the edit icon ().

The Edit Entity dialog opens.
Make the desired changes.
To edit the patterns in this custom entity:
- To add another pattern, type a regular expressions pattern in the pattern text box and then click ADD.
- To edit an existing pattern, click the row for the desired pattern in the Patterns box and then click EDIT.
  
  The selected pattern moves to the pattern text box. Edit the pattern and then click ADD to move it back to the Patterns box.
- To delete an existing pattern, click the row for the desired pattern in the Patterns box and then click DELETE.
  
  In the confirmation dialog, click YES to confirm the action.
To edit the terms in this custom entity:
- To add another term, type the term in the term text box and then click ADD.
- To edit an existing pattern, click the row for the desired term in the Terms box and then click EDIT.
  
  The selected pattern moves to the term text box. Edit the term and then click ADD to move it back to the Terms box.
- To delete an existing term, click the row for the desired term in the Terms box and then click DELETE.
  
  In the confirmation dialog, click YES to confirm the action.
Click SAVE when done.

The custom entity is updated.