A grammar file defines one or more entities that you want to extract.
Standard grammars. Eduction includes a collection of grammar files covering common entities such as names, social security numbers, postal addresses, telephone numbers, and so on. For a complete list of standard grammars, see Standard Grammars.
Standard grammar files are licensed by category and by language, so that it is possible to be licensed for any combination of category (for example, sentiment, place, or person) and language.
User grammars. You can extend the capabilities of Eduction by writing your own grammar files, either from scratch or by referencing existing entities.
To reference the standard grammars in your own grammar files, you must have an appropriate license.
Grammar files are created in XML format, and can be compiled into ECR format. Compiling a grammar file into the ECR format makes it much faster to load at run-time. Most of the standard grammar files are supplied only in ECR format.
Entities can be defined in several ways. You might define a dictionary of possible matches, for example to extract names of people or places. Alternatively, you might specify what a match looks like without having to list each possibility. The latter approach would be suitable for extracting dates and times, or telephone numbers, because these conform to a known pattern.
Entities can be defined recursively, and rules can refer to entities in other grammar files. This allows you to create more complicated entities that match data such as URLs or postal addresses.
|