Extend Grammars

The standard grammars provided by Eduction provide good coverage for common items of information that you would normally want to extract from your data. They are designed so that you can easily reference them in any custom grammars that you create.

For some data, the coverage provided might not be sufficient. In this case, you can extend the entities provided with new entries to improve the recall of the extraction (the percentage of matches that are actually returned, out of the total number of matches that should return in theory).

You cannot edit the standard grammars in place because they are provided in .ECR format. You can, however, add more entries to an existing entity in an .ECR grammar file by extending it in a custom grammar file in XML format.

When to Extend a Grammar

You should consider extending a grammar if the recall of the existing grammar is low. Work out what items are not being matched by the existing grammar, and add these as new entries in the appropriate entities in your custom grammar. You can compile the custom grammar (using edktool) before you use it, to allow Eduction to load it quicker. You can then replace the original grammar file with the new grammar file.

Add More Entries to an Entity

To add more entries to an entity, create a new XML grammar file. In the new grammar file, include the .ECR file that contains the entity that you want to extend. Ensure that your grammar file defines the same grammar and entity as the included grammar file. The full entity name, including the grammar prefix, must match for the grammar extension to work. Set the extend mode of the entity in your new grammar to Append, and add the extra entries in the entity.

Replace the Current Entities

Although most of the time you would add new entries when you extend a grammar, you can sometimes choose to replace it entirely. To do this, set the extend mode of the entity in your new grammar file to Replace.

Extend the Sentiment Grammars

Grammar extension is particularly useful when you use Eduction for sentiment analysis.

There are two main reasons why you might extend the sentiment grammar file.


_FT_HTML5_bannerTitle.htm