Grammars
In File Analysis Suite, a grammar is a group of entities (a word, phrase, or block of information) used to identify information. These entities are based on patterns that identify specific types of information, such as social security numbers, names, telephone numbers, addresses, and so on.
You use grammars to search for or filter documents within a workspace to include in a workbook or place on hold.
File Analysis Suite includes several built-in grammars (
) that are comprised of specific built-in entities (
). You can also create custom grammars (
) and entities (
) to suite your specific needs, including adding a custom entity to a built-in grammar.
When managing grammars and entities, keep the following in mind:
-
You cannot edit the name or description of built-in grammars.
-
You cannot edit the details of built-in entities.
-
You can add a custom entity to a built-in grammar.
| Grammar | Grammar description | Entity information type |
|---|---|---|
| Contact Data |
Any information that can be used to contact an individual, such as postal addresses, phone numbers, and email addresses. This grammar includes information for multiple languages and countries. |
Addresses
A postal address. This entity returns the addresses in a normalized format by default. The normalized form standardizes apartment and house numbers, removes additional punctuation, and converts the text to uppercase. For example, "ABIDEI HURRIYET CD TANER PALAS APT 9" or "KAT:7, D:9, 34437 ISTANBUL". The exact order depends on the country. For CJKVT, this entity returns the addresses in a normalized format. The normalized form standardizes apartment and house numbers, removes additional punctuation, and for Romanized text, it converts the text to uppercase. CJKVT native script is not normalized to ASCII, and Romanized text is not normalized to CJKVT native script. Email Addresses
Email address. For example, "jsmith@mailserver.com". Email address with mailto: prefix. For example, "mailto:jsmith@mailserver.com". Phone Numbers
A telephone number with context. For example "Tel: +44 1234 224050", "Telephone: (204)-243-9955", or "numéro de téléphone: +1-902-861-7000". For CJKVT, numbers can be ASCII or full-width numbers. |
| Devices | Any information that can identify electronic devices, such as IP and MAC addresses. |
Device ID
An identification number for a computing device (such as a computer, tablet, or smart phone). The following device IDs are included.
|
| Financial Data |
Any personal data related to financial data such as bank accounts, IBAN, salary information, and so on. This grammar includes information for multiple languages and countries. |
Bank Account Numbers
A bank account number. The following bank account patterns are included.
Bank Details
A name of a bank. Major bank names for the following countries are included.
Credit Card Numbers
Any credit card number. The following credit card formats are included.
IBAN (International Bank Account Number)
Undelimited or space-delimited International Bank Account Number (IBAN) for each supported country. For more information on IBAN formatting requirements for each country, see https://www.iban.com/structure.html. Sort Codes
A bank sort code. The following sort code formats are included.
|
| Government ID |
Government issued identification information such as drivers license, passport, social security, and so on. This grammar includes information for multiple languages and countries. |
Driving License Numbers
A driving license number with context. For example: "australian automobile association: 103 805 501", or "driver's license: A234567890". This entity matches both the driving license number, and the personal number or driver number, if present. On the standard European driving license, these are fields 5 and 4d. Machine Readable Passport
A machine readable passport number. For example "P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<< 5333244280GBR8812049F2509286<<<<<<<<<<<<<<00" A CJKVT machine readable passport line. For example "P<JPN<<<<<<<KEIKO<INOUE<<<<<<<<<<<<<<<<<<<<<". Machine Readable TD-1 Travel Document
A machine readable TD1-size travel document number. For example, "IDD<<T220001293<<<<<<<<<<<<<<< 6408125<2010315D<<<<<<<<<<<<<4 MUSTERMANN<<ERIKA<<<<<<<<<<<<<". A CJKVT machine readable TD1-size travel document line. For example, "KEIKO<<INOUE<<<<<<<<<<<<<<<<<<". National ID
A national identity number with context. For example, "SSN 111-22-3333", "National Insurance Number AB 12 34 56 C", "Code INSEE 187090100100141", or "ImmiCard AMS123456". NOTE: PossibleTurkish national identity numbers are identified without context. Each country has their own format. Passport Numbers
A passport number with context. For example, "Passport number: 533324428", "Passport Number: P4366918", or "italian passaporti AA5275702". Social Security Tax ID
A tax identification number (TIN or ITIN) with context. For example "ITIN: 911-92-3333", or "TIN-numre: 101111113". Each country has their own format. VAT number
A value added tax identification number (VATIN) with context. For example "NUIS: ALK99999999L" or "VAT Reg No GB 980 7806 84". |
| Identification Data |
Any personal data closely related to the identity of an individual such as name, date of birth, gender, salutation, title, and so on. This grammar includes information for multiple languages and countries. |
Date of Birth
A date of birth, written numerically or using words. For example "date of birth 1/1/2018", "GEBOORTEDATUM: 01/01/2018" Genders
A gender or family relation in the English, French, or German language, either in a word or in context. For example, "lady", "father", "Dame", "voisines", "Frau", or "mensch". Names
A full personal name, in title case or upper case. For example, "John Smith", "KEIKO NAKAMURA", or "山田恵". For CJKVT, a full personal name, in romanized text or CJKVT native script. Romanized names can be in title case or upper case, and can be in the order given name surname or surname given name. CJKVT native script names must be surname given name. For Japanese, either form can include honorifics. |
| Nationalities |
Any nationality. This grammar matches nationalities written in English or French, such as "French" or "Francais". |
Nationalities
Any combination of nationality adjective and noun landmark and value, with context. For example, "Country: British", or "Nationality: British". |
| Sensitive Data |
Any personal information that defines the racial or ethnic origin of an individual. This grammar matches racial or ethnic origin written in English or French, such as "caucasian" or "caucasien". |
Racial Ethnic Origin
A reference to ethnicity or race identification. For example, "White", "Fijian", "Inuit", or "Irish". United Kingdom identity code. For example, IC1, IC2. Ethnic groups in the French language. For example, "Africain" or "Autres". |
-
From the primary navigation panel, click Grammars.
The Grammars page opens.
-
Click NEW GRAMMAR.
The New Grammar dialog opens.
-
Complete the details for the new grammar.
Option Description Name Type a meaningful, unique name for the grammar.
Limits: Maximum 50 characters.
Description Type a meaningful description for the new grammar.
Limits: Maximum 250 characters.
Case Sensitivity Select whether to enforce case sensitivity.
For example, if an entity within this grammar is for a specific name, John Doe,
-
Select On to match only "John Doe".
-
Select Off to match "John Doe", "john doe", "jOHn doE", and any other combination of lower and upper case characters, as long as the spelling of the name is identical.
NOTE: By default, entities within this grammar inherit this value. You can choose to not inherit this value at the entity level.
-
-
Click SAVE.
The custom grammar is created.
-
On the Grammars page, click the name of the custom grammar (
) you want to edit.TIP: You can also click or hover over the row for the custom grammar and then click the edit icon (
).The Edit grammar dialog opens.
-
Make the necessary changes and then click SAVE.
The edits to the custom grammar are saved.
-
On the Grammars page, click or hover over the row for the custom grammar (
) you want to delete.Additional icons display in the right column.
-
Click the delete icon (
) associated with the desired custom grammar. -
In the confirmation dialog, click YES to confirm the action.
The custom grammar is deleted.
Entities
An entity represents a word, phrase, or block of information based on a regular expression pattern. In addition to the built-in entities that are included in the built-in grammars, you can create custom entities to identify information that might be specific to your environment. Entities can be defined by a specific pattern or by terms.
As an example using a pattern, you identify your customers with a customer ID that follows a specific syntax, or pattern. Over time, your customer ID pattern has changed in length to accommodate for growth. You create a custom grammar, "Customer IDs", to identify managed items that reference the customer IDs. You then create one or more custom entities within the custom grammar that define these patterns. You can create multiple custom entities, one for each customer ID pattern you want to identify, or create a single custom entity that includes multiple patterns. Consider the following:
-
If you need to discriminate between the different customer ID patterns, create multiple custom entities, one for each of your customer ID patterns. The entities display separately in the dashboards, and you will need to include all entities and "OR" them in searches to get results for both patterns.
-
If you do not need to discriminate between the different customer ID patterns (you just care about identifying customer IDs, no matter the pattern), create a single custom entity that includes multiple patterns, one for each of your customer ID patterns. All ID patterns display together as a single entity in the dashboards. In searches, you will need to include the single entity.
TIP: Entity patterns are defined using standard regular expressions. Grammars and entities in File Analysis Suite are derived from Micro Focus IDOL. For more information about the regular expression syntax for entities, see Entity syntax.
If you define the entity using terms, you can manually entered the desired terms or load the terms from an existing term list.
-
On the Grammars page, click in the row for the grammar to which you want to add a custom entity.
Do one of the following:
-
Click NEW ENTITY.
-
Click the add entity icon (
) displayed in the right column for the selected grammar.
The New Entity dialog opens.
-
-
Complete the options for the new custom entity.
Option Description Name Type a meaningful, unique name for the new entity.
Limits: Maximum 50 characters.
Description Type a meaningful description for the new entity.
Limits: Maximum 250 characters.
Case Sensitivity Specify whether the data must match the character case of the defined pattern. Tangible Characters Type any punctuation characters to treat as part of the word, rather than as word boundaries within the regular expression pattern.
Languages Type the name of languages this entity can match against. As you type, languages that match what you have typed display.
Click the desired language to add it to the entity.
Example: This entity identifies months of the year, in English, German, or French.
Countries Type the name of the country this entity must match. As you type, countries that match what you have typed display.
Click the desired country to add it to the entity.
TIP: If you plan to define a geographic region, you do not need to identify countries within the defined region.
Example: This entity identifies telephone numbers, specific to the pattern of the United Kingdom.
Regions Type the name of a geographical region this entity must match. As you type, regions that match what you have typed display.
Click the desired region to add it to the entity
Geographic regions and included countries-
Americas (AMER) - Argentina, Bolivia, Brazil, British Virgin Islands, Canada, Chile, Columbia, Costa Rica, Cuba, Dominican Republic, Equador, El Salvador, Guatemala, Mexico, Nicaragua, Paraguay, Peru, Saint Lucia, United States, Uraquay, Venezuela
-
Asia-Pacific (APAC) - Australia, India, Indonesia, Malaysia, New Zealand, Pakistan, Phillippines, Sri Lanka, Timor-Leste
-
Asia-Pacific [APAC (Including CJK)] - Australia, China, Hong Kong, India, Indonesia, Japan, Macao, Malaysia, New Zealand, Pakistan, Phillippines, Singapore, South Korea, Sri Lanka, Taiwan, Thailand, Tmor-Leste
-
European Economic Area (EEA) - Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden
-
Europe, Middle East, Africa (EMEA) - Albania, Austria, Azerbaijan, Bahrain, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Faroe Islands, Finland, France, Germany, Greece, Greenland, Hungary, Iceland, Iran, Iraq, Ireland, Israel, Italy, Jordan, Kazakhstan, Kosovo, Kuwait, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Moldova, Monaco, Montenegro, Netherlands, Nigeria, North Macedonia, Norway, Palestine, Poland, Portugal, Qatar, Romania, Russia, San Marino, Sao Tome And Principe, Saudi Arabia, Serbia, Seychelles, Slovakia, Slovenia, South Africa, Spain, Sweden, Switzerland, Turkey, Ukraine, United Arab Emirates, United Kingdom, Uzbekistan, Vatican
-
Worldwide - all countries except those in CJK region
-
Worldwide (including CJK) - all countries
NOTE: CJK includes China, Hong Kong, Japan, Macao, Singapore, South Korea, Taiwan, and Thailand.
Example: This entity identifies drivers license numbers, specific to the pattern of the Americas (AMER).
NOTE: When selecting grammars and entities for repositories and repository templates, countries are grouped into regions to make selection easier. The relationship between grammars and entities can be extended with custom entities. When you create a custom entity, all of the countries defined for the entity are associated with all of the regions defined for the entity. For example, you create a custom entity and assign the APAC region and Japan. When you create repositories and repository templates going forward, grammars related to Japan will be automatically included when you select the APAC region.
-
-
Select whether this entity is based on a pattern or terms.
Define, test, and add patterns.-
Select Enter Pattern(s)
-
In the pattern text box, type the pattern for the data you want to identify using regular expressions.
For the customer ID example, to identify a customer ID that includes two letters followed by a hyphen and then six numbers, type \D{2}-\d{6}.
-
In the Sample Text box, type the regular expression for the pattern you want to identify.
For the customer ID example, type AB-123456.
-
Click TEST.
-
If the sample text is matched by the defined pattern, the sample text is highlighted in yellow.
-
If the sample text does not match the defined pattern, you see a message above the Sample Text box that there was not a match to the pattern.
-
-
When you are satisfied with the defined pattern, click ADD.
The pattern moves to the Patterns box.
-
To retest a single pattern added to the Patterns box, click the row for the desired pattern and click EDIT. The pattern moves back to the pattern text box.
-
To remove a pattern added to the Patterns box, click the row for the desired pattern and click DELETE. The pattern is removed completely.
-
-
Repeat to add more patterns to this entity.
Define, test, and add terms.-
Select Enter Term(s).
-
Do one of the following.
-
In the term text box, type the terms you want to identify, one term per line (press Enter on your keyboard to enter another line).
-
Click Add from Term List.
In the Add from Term List dialog, select the desired term list to use and then click OK.
All terms in the selected term list are added to the term text box. Delete any terms you do not want to include.
-
-
In the Sample Text box, type the information you want to identify.
-
Click TEST.
-
If the sample text is matched by the defined terms, the sample text is highlighted in yellow.
-
If the sample text does not match the defined terms, you see a message above the Sample Text box that there was not a match to the pattern.
-
-
When you are satisfied with the defined terms, click ADD.
All terms in the term text box move to the Terms box.
-
To retest a single term added to the Terms box, click the row for the desired term and click EDIT. The term moves back to the term text box.
- To remove a term added to the Terms box, click the row for the desired term and click DELETE. The term is removed completely.
-
-
Repeat to add more terms to this entity.
-
-
Click SAVE.
The new custom entity is created within the selected grammar.
-
On the Grammars page, click the down arrow to the left of the grammar to which the custom entity belongs.
The grammar expands and the list of associated entities displays.
-
Click the name of the custom entity (
) you want to edit.TIP: You can also click or hover over the row of the desired custom entity and then click the edit icon (
).The Edit Entity dialog opens.
-
Make the desired changes.
To edit the patterns in this custom entity:-
To add another pattern, type a regular expressions pattern in the pattern text box and then click ADD.
-
To edit an existing pattern, click the row for the desired pattern in the Patterns box and then click EDIT.
The selected pattern moves to the pattern text box. Edit the pattern and then click ADD to move it back to the Patterns box.
-
To delete an existing pattern, click the row for the desired pattern in the Patterns box and then click DELETE.
In the confirmation dialog, click YES to confirm the action.
To edit the terms in this custom entity:-
To add another term, type the term in the term text box and then click ADD.
-
To edit an existing pattern, click the row for the desired term in the Terms box and then click EDIT.
The selected pattern moves to the term text box. Edit the term and then click ADD to move it back to the Terms box.
-
To delete an existing term, click the row for the desired term in the Terms box and then click DELETE.
In the confirmation dialog, click YES to confirm the action.
-
-
Click SAVE when done.
The custom entity is updated.
-
On the Grammars page, click or hover over the row of the custom entity (
) you want to delete.Action icons display in the right column.
-
Click the delete icon (
) associated with the custom entity. -
In the confirmation dialog, click YES to confirm the action.
The selected custom entity is deleted.