Grammar classes, types, and rules
Fusion includes several built-in grammars classes () and types (
) that are comprised of specific built-in grammar rules (
). You can also create custom grammar classes (
) and grammar rules (
) to suit your specific needs, including adding a custom grammar rule to a built-in grammar class. Grammar classes, types, and rules are the basis for grammar sets.
When managing grammars, keep the following in mind.
-
Changes you make to grammar classes and rules affect the grammar sets they are included in.
-
You cannot edit the name or description of built-in grammar classes, types, or rules.
-
You can add a custom grammar rule to a built-in grammar class (top-level).
-
To view additional information about a grammar class, type, or rule, click in the row for the class, type, or rule and then click the open detail panel icon (
).
Grammar class | Description | Grammar type and rule information |
---|---|---|
Contact Data |
Any information that can be used to contact an individual, such as postal addresses, phone numbers, and email addresses. This grammar includes information for multiple languages and countries. |
Addresses
A postal address. This grammar rule returns the addresses in a normalized format by default. The normalized form standardizes apartment and house numbers, removes additional punctuation, and converts the text to uppercase. For example, "ABIDEI HURRIYET CD TANER PALAS APT 9" or "KAT:7, D:9, 34437 ISTANBUL". The exact order depends on the country. For CJKVT, this grammar rule returns the addresses in a normalized format. The normalized form standardizes apartment and house numbers, removes additional punctuation, and for Romanized text, it converts the text to uppercase. CJKVT native script is not normalized to ASCII, and Romanized text is not normalized to CJKVT native script. Email Addresses
Email address. For example, "jsmith@mailserver.com". Email address with mailto: prefix. For example, "mailto:jsmith@mailserver.com". Phone Numbers
A telephone number with context. For example, "Tel: +44 1234 224050", "Telephone: (204)-243-9955", or "numéro de téléphone: +1-902-861-7000". For CJKVT, numbers can be ASCII or full-width numbers. |
Devices and Vehicles | Any information that can identify electronic devices, such as IP and MAC addresses and identify vehicles by VIN number. |
Device ID
An identification number for a computing device (such as a computer, tablet, or smart phone). The following device IDs are included.
VIN Number
A vehicle identification number without context. For example, "JH4DB1550MS003978". |
Financial Data |
Any personal data related to financial data such as bank accounts, IBAN, salary information, and so on. This grammar includes information for multiple languages and countries. |
Bank Account Numbers
A bank account number. The following bank account patterns are included.
Bank Details
A name of a bank. Major bank names for the following countries are included.
Credit Card Numbers
Any credit card number. The following credit card formats are included.
IBAN (International Bank Account Number)
Undelimited or space-delimited International Bank Account Number (IBAN) for each supported country. For more information on IBAN formatting requirements for each country, see https://www.iban.com/structure.html. Routing Numbers
A bank routing number. The following sort code formats are included.
|
Government ID |
Government issued identification information such as drivers license, passport, social security, and so on. This grammar includes information for multiple languages and countries. |
Driving License Numbers
A driving license number with context. For example: "australian automobile association: 103 805 501", or "driver's license: A234567890". This grammar rule matches both the driving license number, and the personal number or driver number, if present. On the standard European driving license, these are fields 5 and 4d. EHIC Number
A European Heath Insurance Card number with context. For example "EHIC: UK 1234 5678 " or "TSE: 123456789012". Healthcare ID Number
A healthcare identification number with context. Each country has their own format, such as the following examples.
Machine Readable Passport
A machine readable passport number or TD 1-size travel document number. For example "P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<< 5333244280GBR8812049F2509286<<<<<<<<<<<<<<00" or "IDD<<T220001293<<<<<<<<<<<<<<< 6408125<2010315D<<<<<<<<<<<<<4 MUSTERMANN<<ERIKA<<<<<<<<<<<<<". A CJKVT machine readable passport number or or TD 1-size travel document number. For example "P<JPN<<<<<<<KEIKO<INOUE<<<<<<<<<<<<<<<<<<<<<" or "KEIKO<<INOUE<<<<<<<<<<<<<<<<<<". National ID
A national identity number with context. For example, "SSN 111-22-3333", "National Insurance Number AB 12 34 56 C", "Code INSEE 187090100100141", or "ImmiCard AMS123456". NOTE: Possible Turkish national identity numbers are identified without context. Each country has their own format. Passport Numbers
A passport number with context. For example, "Passport number: 533324428", "Passport Number: P4366918", or "italian passaporti AA5275702". Pension Number
A pension identification number with context. For example "基本年金番号 1234567890". NOTE: Only Japanese pension numbers are included at this time. Social Security Taxation ID
A tax identification number (TIN or ITIN) with context. For example "ITIN: 911-92-3333", or "TIN-numre: 101111113". Each country has their own format. Unemployment Insurance Number
An unemployment insurance number with context. For example "基本年金番号 1234567890". NOTE: Only Japanese unemployment insurance numbers are included at this time. VAT number
A value added tax identification number (VATIN) with context. For example "NUIS: ALK99999999L" or "VAT Reg No GB 980 7806 84". Voter ID
Voter ID numbers from any of the supported countries around the world including CJK countries. |
Identification Data |
Any personal data closely related to the identity of an individual such as name, date of birth, gender, salutation, title, and so on. This grammar includes information for multiple languages and countries. |
Date of Birth
A date of birth, written numerically or using words. For example "date of birth 1/1/2018", "GEBOORTEDATUM: 01/01/2018" Genders
A gender or family relation in the English, French, or German language, either in a word or in context. For example, "lady", "father", "Dame", "voisines", "Frau", or "mensch". Names
A full personal name, in title case or upper case. For example, "John Smith", "KEIKO NAKAMURA", or "山田恵". For CJKVT, a full personal name, in romanized text or CJKVT native script. Romanized names can be in title case or upper case, and can be in the order given name surname or surname given name. CJKVT native script names must be surname given name. For Japanese, either form can include honorifics. |
Medical Data |
Any personal data related to medical information such as medical procedures or conditions. This grammar includes information for multiple languages |
Medical Terms
Medical terms and information related to laboratory tests, diseases or conditions, generic or brand drug names, or specialties. The following grammar rules are currently included in English only.
Additional medical terms are included in supported languages. US Social Security Disability
An impairment for the purpose of disability evaluation under social security in the US. For example "adrenal glands carcinoma". NOTE: Only includes English at this time. |
Nationalities |
Any nationality. This grammar matches nationalities written in English or French, such as "French" or "Francais". |
Nationalities
Any combination of nationality adjective and noun landmark and value, with context. For example, "Country: British", or "Nationality: British". |
Other Sensitive Data |
Any personal information that defines the racial or ethnic origin of an individual. This grammar matches racial or ethnic origin written in English or French, such as "caucasian" or "caucasien". |
Racial Ethnic Origin
A reference to ethnicity or race identification. For example, "White", "Fijian", "Inuit", or "Irish". United Kingdom identity code. For example, IC1, IC2. Ethnic groups in the French language. For example, "Africain" or "Autres". |
IMPORTANT: To ensure accuracy, test custom grammar rules against a small sample dataset before applying to full datasets.
-
From the primary navigation panel, click Grammars > Manage Grammars.
The Manage Grammars page opens.
-
Click NEW GRAMMAR CLASS.
The New Grammar Class dialog opens.
-
Complete the details for the new grammar class.
Option Description Grammar Class Name Type a meaningful, unique name for the grammar class.
Limits: Maximum 50 characters.
Description Type a meaningful description for the new grammar class.
Limits: Maximum 250 characters.
Case Sensitivity Select whether to enforce case sensitivity.
For example, if an entity within this grammar is for a specific name, John Doe,
-
Select On to match only "John Doe".
-
Select Off to match "John Doe", "john doe", "jOHn doE", and any other combination of lower and upper case characters, as long as the spelling of the name is identical.
NOTE: By default, grammar rules within this grammar class inherit this value. You can choose to not inherit this value at the grammar rule level.
-
-
Click SAVE.
The custom grammar class is created.
-
On the Manage Grammars page, click or hover over the row for the desired custom grammar class (
).
Do one of the following:
-
Click the edit icon (
) that displays in the right column.
-
Click the row for the desired custom grammar class, click the open detail panel icon (
), and then click EDIT.
The Edit Grammar Class dialog opens.
-
-
Make the necessary changes and then click SAVE.
The edits to the custom grammar class are saved.
-
On the Manage Grammars page, click or hover over the row for the desired custom grammar class (
).
Do one of the following:
-
Click the delete icon (
) that displays in the right column.
-
Click the row for the desired custom grammar class, click the open detail panel icon (
), and then click DELETE.
-
-
In the confirmation dialog, click YES to confirm the action.
The custom grammar class is deleted.
Grammar rules
A grammar rule represents a word, phrase, or block of information to be identified and is based on a regular expression pattern. In addition to the built-in grammar rules that are included in the built-in grammar types and classes, you can create custom grammar rules to identify information that might be specific to your environment. Grammar rules can be defined by a specific pattern or by terms.
As an example using a pattern, you identify your customers with a customer ID that follows a specific syntax, or pattern. Over time, your customer ID pattern has changed in length to accommodate for growth. You create a custom grammar class, "Customer IDs", to identify managed items that reference the customer IDs. You then create one or more custom grammar rules within the custom grammar class that define these patterns. You can create multiple custom grammar rules, one for each customer ID pattern you want to identify, or create a single custom grammar rule that includes multiple patterns. Consider the following:
-
If you need to discriminate between the different customer ID patterns, create multiple custom grammar rules, one for each of your customer ID patterns. The grammar rules display separately in the dashboards, and you will need to include all grammar rules and "OR" them in searches to get results for both patterns.
-
If you do not need to discriminate between the different customer ID patterns (you just care about identifying customer IDs, no matter the pattern), create a single custom grammar rule that includes multiple patterns, one for each of your customer ID patterns. All ID patterns display together as a single grammar rule in the dashboards. In searches, you will need to include the single grammar rule.
TIP: Grammar rule patterns are defined using standard regular expressions. Grammars in Fusion are derived from OpenText IDOL. For more information about the regular expression syntax for grammar rules, see Grammar rule syntax.
If you define the grammar rule using terms, you can manually enter the desired terms or load the terms from an existing term list.
When managing grammar rules, keep the following in mind.
-
You cannot edit the name or details of built-in grammar rules.
-
You can add a custom grammar rule to a built-in grammar class.
-
To view additional information about a grammar rule, click in the row for the grammar rule and then click the open detail panel icon (
). Details about the grammar rule display on the GENERAL tab.
-
To configure data masking for grammar rules, see Data masking
-
From the primary navigation panel, click Grammars > Manage Grammars.
The Manage Grammars page opens.
-
Click in the row for the grammar class to which you want to add a custom grammar rule.
Do one of the following:
-
Click NEW GRAMMAR RULE.
-
Click the add grammar rule icon (
) displayed in the right column for the selected grammar.
-
Click the open detail panel icon (
) and then click NEW GRAMMAR RULE.
The New Grammar Rule dialog opens.
-
-
Complete the options for the new custom grammar rule.
Option Description Grammar Rule Name Type a meaningful, unique name for the new grammar rule.
Limits: Maximum 50 characters.
Description Type a meaningful description for the new grammar rule.
Limits: Maximum 250 characters.
Case Sensitivity Select whether to enforce case sensitivity.
For example, if an grammar rule within this grammar class is for a specific name, John Doe,
-
Select Inherit from grammar class to match the selection made for the parent grammar class. (Default)
-
Select On to match only "John Doe".
-
Select Off to match "John Doe", "john doe", "jOHn doE", and any other combination of lower and upper case characters, as long as the spelling of the name is identical.
Tangible Characters Type any punctuation characters to treat as part of the word, rather than as word boundaries within the regular expression pattern.
Languages Type the name of languages this grammar rule can match against. As you type, languages that match what you have typed display.
Click the desired language to add it to the grammar rule.
Example: This grammar rule identifies months of the year, in English, German, or French.
Countries Select the country that this grammar rule must match. You can also type the name of the country. As you type, countries that match what you have typed display.
Click the desired country to add it to the grammar rule.
TIP: If you plan to define a geographic region, you do not need to identify countries within the defined region.
Example: This grammar rule identifies telephone numbers, specific to the pattern of the United Kingdom.
Regions Select the geographical region this grammar rule must match. You can also type the name of the region. As you type, regions that match what you have typed display.
Click the desired region to add it to the grammar rule.
Geographic regions and included countries-
Americas (AMERICAS) - Argentina, Bolivia, Brazil, British Virgin Islands, Canada, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Paraguay, Peru, Saint Lucia, United States, Uruguay, Venezuela
-
Asia-Pacific (APAC) - Australia, India, Cambodia, Indonesia, Malaysia, New Zealand, Pakistan, Philippines, Sri Lanka, Timor-Leste
-
Asia-Pacific [APAC (Including CJK)] - Australia, Cambodia, China, Hong Kong, India, Indonesia, Japan, Macao, Malaysia, New Zealand, Pakistan, Philippines, Singapore, South Korea, Sri Lanka, Taiwan, Thailand, Timor-Leste, Vietnam
-
European Economic Area (EEA) - Austria, Belgium, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden
-
Europe, Middle East, Africa (EMEA) - Albania, Andorra, Austria, Azerbaijan, Bahrain, Belarus, Belgium, Bosnia and Herzegovina, Bulgaria, Croatia, Cyprus, Czechia, Denmark, Egypt, Estonia, Faroe Islands, Finland, France, Georgia, Germany, Gibraltar, Greece, Greenland, Hungary, Iceland, Iran, Iraq, Ireland, Israel, Italy, Jordan, Kazakhstan, Kosovo, Kuwait, Latvia, Lebanon, Libya, Liechtenstein, Lithuania, Luxembourg, Macedonia, Malta, Mauritania, Mauritius, Moldova, Monaco, Montenegro, Netherlands, Nigeria, North Macedonia, Norway, Palestine, Poland, Portugal, Qatar, Romania, Russia, San Marino, Sao Tome and Principe, Saudi Arabia, Serbia, Seychelles, Slovakia, Slovenia, South Africa, Spain, Sudan, Sweden, Switzerland, Tunisia, Turkey, Ukraine, United Arab Emirates, United Kingdom, Uzbekistan, Vatican City
-
Worldwide - all countries except those in CJK region
-
Worldwide (including CJK) - all countries
NOTE: CJK includes China, Hong Kong, Japan, Macao, Singapore, South Korea, Taiwan, Thailand, and Vietnam.
Example: This grammar rule identifies drivers license numbers, specific to the pattern of the Americas (AMER).
-
-
Select whether this grammar rule is based on a pattern or terms.
Define, test, and add patterns.-
Select Enter Pattern(s)
-
In the pattern text box, type the pattern for the data you want to identify using regular expressions.
For the customer ID example, to identify a customer ID that includes two letters followed by a hyphen and then six numbers, type \D{2}-\d{6}.
-
In the Sample Text box, type the regular expression for the pattern you want to identify.
For the customer ID example, type AB-123456.
-
Click TEST.
-
If the sample text is matched by the defined pattern, the sample text is highlighted in yellow.
-
If the sample text does not match the defined pattern, you see a message above the Sample Text box that there was not a match to the pattern.
-
-
When you are satisfied with the defined pattern, click ADD.
The pattern moves to the Patterns box.
-
To retest a single pattern added to the Patterns box, click the row for the desired pattern and click EDIT. The pattern moves back to the pattern text box.
-
To remove a pattern added to the Patterns box, click the row for the desired pattern and click DELETE. The pattern is removed completely.
-
-
Repeat to add more patterns to this grammar rule.
Define, test, and add terms.-
Select Enter Term(s).
-
Do one of the following.
-
In the term text box, type the terms you want to identify, one term per line (press Enter on your keyboard to enter another line).
-
Click Add from Term List.
In the Add from Term List dialog, select the desired term list to use and then click OK.
All terms in the selected term list are added to the term text box. Delete any terms you do not want to include.
NOTE: Terms from term lists are treated as keywords when used as criteria for a grammar rule. Define individual words instead of phrases.
-
-
In the Sample Text box, type the information you want to identify.
-
Click TEST.
-
If the sample text is matched by the defined terms, the sample text is highlighted in yellow.
-
If the sample text does not match the defined terms, you see a message above the Sample Text box that there was not a match to the pattern.
-
-
When you are satisfied with the defined terms, click ADD.
All terms in the term text box move to the Terms box.
-
To retest a single term added to the Terms box, click the row for the desired term and click EDIT. The term moves back to the term text box.
- To remove a term added to the Terms box, click the row for the desired term and click DELETE. The term is removed completely.
-
-
Repeat to add more terms to this grammar rule.
-
-
Click SAVE.
The new custom grammar rule is created within the selected grammar class.
-
On the Manage Grammars page, click the down arrow to the left of the grammar class to which the custom grammar rule belongs.
The grammar class expands and the list of associated grammar rules displays.
-
Click the in the row for the custom grammar rule (
) you want to edit.
Do one of the following:
-
Click the edit icon (
) that displays in the right column.
-
Click the open detail panel icon (
), and then click EDIT on the General tab.
The Edit Grammar Rule dialog opens.
-
-
Make the desired changes.
To edit the patterns in this custom grammar rule:-
To add another pattern, type a regular expressions pattern in the pattern text box and then click ADD.
-
To edit an existing pattern, click the row for the desired pattern in the Patterns box and then click EDIT.
The selected pattern moves to the pattern text box. Edit the pattern and then click ADD to move it back to the Patterns box.
-
To delete an existing pattern, click the row for the desired pattern in the Patterns box and then click DELETE.
In the confirmation dialog, click YES to confirm the action.
To edit the terms in this custom grammar rule:-
To add another term, type the term in the term text box and then click ADD.
-
To edit an existing pattern, click the row for the desired term in the Terms box and then click EDIT.
The selected pattern moves to the term text box. Edit the term and then click ADD to move it back to the Terms box.
-
To delete an existing term, click the row for the desired term in the Terms box and then click DELETE.
In the confirmation dialog, click YES to confirm the action.
-
-
Click SAVE when done.
The custom grammar rule is updated.
-
On the Manage Grammars page, click or hover over the row of the custom grammar rule (
) you want to delete.
Do one of the following:
-
Click the delete icon (
) that displays in the right column.
-
Click the open detail panel icon (
), and then click DELETE on the General tab.
-
-
In the confirmation dialog, click YES to confirm the action.
The selected custom grammar rule is deleted.