Built-in grammars and entities
File Analysis Suite includes grammars and entities derived from Micro Focus IDOL Eduction. The following grammar and entity information is presented as a reference and represents the type of information supported by the grammars and entities that are built into File Analysis Suite.
Grammar | Grammar description | Entity information type |
---|---|---|
Contact Data |
Any information that can be used to contact an individual, such as postal addresses, phone numbers, and email addresses. This grammar includes information for multiple languages and countries. |
Addresses
A postal address. This entity returns the addresses in a normalized format by default. The normalized form standardizes apartment and house numbers, removes additional punctuation, and converts the text to uppercase. For example, "ABIDEI HURRIYET CD TANER PALAS APT 9" or "KAT:7, D:9, 34437 ISTANBUL". The exact order depends on the country. For CJKVT, this entity returns the addresses in a normalized format. The normalized form standardizes apartment and house numbers, removes additional punctuation, and for Romanized text, it converts the text to uppercase. CJKVT native script is not normalized to ASCII, and Romanized text is not normalized to CJKVT native script. Email Addresses
Email address. For example, "jsmith@mailserver.com". Email address with mailto: prefix. For example, "mailto:jsmith@mailserver.com". Phone Numbers
A telephone number with context. For example, "Tel: +44 1234 224050", "Telephone: (204)-243-9955", or "numéro de téléphone: +1-902-861-7000". For CJKVT, numbers can be ASCII or full-width numbers. |
Devices and Vehicles | Any information that can identify electronic devices, such as IP and MAC addresses and identify vehicles by VIN number. |
Device ID
An identification number for a computing device (such as a computer, tablet, or smart phone). The following device IDs are included.
VIN Number
A vehicle identification number without context. For example, "JH4DB1550MS003978". |
Financial Data |
Any personal data related to financial data such as bank accounts, IBAN, salary information, and so on. This grammar includes information for multiple languages and countries. |
Bank Account Numbers
A bank account number. The following bank account patterns are included.
Bank Details
A name of a bank. Major bank names for the following countries are included.
Credit Card Numbers
Any credit card number. The following credit card formats are included.
IBAN (International Bank Account Number)
Undelimited or space-delimited International Bank Account Number (IBAN) for each supported country. For more information on IBAN formatting requirements for each country, see https://www.iban.com/structure.html. Sort Codes
A bank sort code. The following sort code formats are included.
|
Government ID |
Government issued identification information such as drivers license, passport, social security, and so on. This grammar includes information for multiple languages and countries. |
Driving License Numbers
A driving license number with context. For example: "australian automobile association: 103 805 501", or "driver's license: A234567890". This entity matches both the driving license number, and the personal number or driver number, if present. On the standard European driving license, these are fields 5 and 4d. EHIC Number
A European Heath Insurance Card number with context. For example "EHIC: UK 1234 5678 " or "TSE: 123456789012". Healthcare Number
A healthcare identification number with context. Each country has their own format, such as the following examples.
Machine Readable Passport
A machine readable passport number or TD 1-size travel document number. For example "P<GBRUK<SPECIMEN<<ANGELA<ZOE<<<<<<<<<<<<<<<< 5333244280GBR8812049F2509286<<<<<<<<<<<<<<00" or "IDD<<T220001293<<<<<<<<<<<<<<< 6408125<2010315D<<<<<<<<<<<<<4 MUSTERMANN<<ERIKA<<<<<<<<<<<<<". A CJKVT machine readable passport number or or TD 1-size travel document number. For example "P<JPN<<<<<<<KEIKO<INOUE<<<<<<<<<<<<<<<<<<<<<" or "KEIKO<<INOUE<<<<<<<<<<<<<<<<<<". National ID
A national identity number with context. For example, "SSN 111-22-3333", "National Insurance Number AB 12 34 56 C", "Code INSEE 187090100100141", or "ImmiCard AMS123456". NOTE: Possible Turkish national identity numbers are identified without context. Each country has their own format. Passport Numbers
A passport number with context. For example, "Passport number: 533324428", "Passport Number: P4366918", or "italian passaporti AA5275702". Pension Number
A pension identification number with context. For example "基本年金番号 1234567890". NOTE: Only Japanese pension numbers are included at this time. Social Security Taxation ID
A tax identification number (TIN or ITIN) with context. For example "ITIN: 911-92-3333", or "TIN-numre: 101111113". Each country has their own format. Unemployment Insurance Number
An unemployment insurance number with context. For example "基本年金番号 1234567890". NOTE: Only Japanese unemployment insurance numbers are included at this time. VAT number
A value added tax identification number (VATIN) with context. For example "NUIS: ALK99999999L" or "VAT Reg No GB 980 7806 84". |
Identification Data |
Any personal data closely related to the identity of an individual such as name, date of birth, gender, salutation, title, and so on. This grammar includes information for multiple languages and countries. |
Date of Birth
A date of birth, written numerically or using words. For example "date of birth 1/1/2018", "GEBOORTEDATUM: 01/01/2018" Genders
A gender or family relation in the English, French, or German language, either in a word or in context. For example, "lady", "father", "Dame", "voisines", "Frau", or "mensch". Names
A full personal name, in title case or upper case. For example, "John Smith", "KEIKO NAKAMURA", or "山田恵". For CJKVT, a full personal name, in romanized text or CJKVT native script. Romanized names can be in title case or upper case, and can be in the order given name surname or surname given name. CJKVT native script names must be surname given name. For Japanese, either form can include honorifics. |
Medical Data |
Any personal data related to medical information such as medical procedures or conditions. This grammar includes information for multiple languages |
Medical Terms
Medical terms and information related to laboratory tests, diseases or conditions, generic or brand drug names, or specialties. The following entity types are currently included in English only.
Additional medical terms are included in supported languages. US Social Security Disability
An impairment for the purpose of disability evaluation under social security in the US. For example "adrenal glands carcinoma". NOTE: Only includes English at this time. |
Nationalities |
Any nationality. This grammar matches nationalities written in English or French, such as "French" or "Francais". |
Nationalities
Any combination of nationality adjective and noun landmark and value, with context. For example, "Country: British", or "Nationality: British". |
Sensitive Data |
Any personal information that defines the racial or ethnic origin of an individual. This grammar matches racial or ethnic origin written in English or French, such as "caucasian" or "caucasien". |
Racial Ethnic Origin
A reference to ethnicity or race identification. For example, "White", "Fijian", "Inuit", or "Irish". United Kingdom identity code. For example, IC1, IC2. Ethnic groups in the French language. For example, "Africain" or "Autres". |