Grammars

The set of rules that describe a type of entity is known as a grammar. Some grammars contain a dictionary of terms. Others contain an expression that defines the pattern that the type of entity follows.

Knowledge Discovery has a large number of these grammars as standard, which allow you to extract common entities. You can also add custom items to the available grammars, or add a custom grammar of your own.

Standard Grammars

Named Entity Recognition has a large set of standard grammar files that includes entities to extract:

  • place names

  • person names

  • company names

  • legal terms

  • credit card numbers

  • social security numbers

  • phone numbers

  • addresses

  • dates

  • times

  • Internet addresses

  • weights and measurements

  • protected security information

It includes all major languages and geographical locations. You can easily set up Named Entity Recognition to extract the information supported by the standard grammars.

Premium Grammars

In addition to the standard grammars, Named Entity Recognition includes several premium grammar sets. Premium grammars provide curated and maintained grammars for the extraction of entities for privacy, auditing and security applications.

The following premium grammar sets are available: 

  • PII Grammars (Personally Identifiable information). Identify personal information as mandated by Data Protection legislation.

  • PHI Grammars (Protected Health Information). Identify personal information used within the healthcare industry.

  • PCI Grammars (Payment Card Industry). Identify information related to financial payment cards, such as credit cards.

  • Government Grammars. Identify governmental document markings.

Custom Grammars

You can create your own grammar files to define entities that you want to extract. The Named Entity Recognition grammar file uses standard XML formats with a simple document type definition. You can define entities using UNIX-like regular expressions and Named Entity Recognition-specific operations and extensions.

You can build up complex entities by referencing existing entities. The standard grammar files therefore offer a large collection of resources that you can build on to meet your own needs. You can extend and replace the entities in standard grammar files with your own entities.