A grammar file defines one or more entities that you want to extract.
Standard grammars. Eduction includes a collection of grammar files covering common entities such as names, social security numbers, postal addresses, telephone numbers, and so on. For a complete list of standard grammars, see Standard Grammars.
Standard grammar files are licensed by category and by language, so that you can have a license for any combination of category (for example, sentiment, place, or person) and language.
User grammars. You can extend the capabilities of Eduction by writing your own grammar files, either from scratch or by referencing existing entities.
NOTE: To reference the standard grammars in your own grammar files, you must have an appropriate license.
Grammar files are created in XML format, and can be compiled into the proprietary ECR format. Compiling a grammar file into the ECR format makes it much faster to load at runtime.
Most of the standard grammar files are available only in ECR format. However, the Eduction package also includes several XML source grammars to allow you to easily extend the standard grammars (see Standard Grammar – Source). You can compile these, and your custom user grammars by using the edktool command-line tool.
NOTE: Eduction can also use XML grammar files directly (that is, without compiling them to ECR files). However, in most cases Micro Focus recommends that you compile your grammars to improve performance.
There are two main ways to define entities:
You can define entities recursively, and rules can refer to entities in other grammar files. This allows you to create more complicated entities that match data such as URLs or postal addresses.
Related Topics
|