The standard grammars provided by Eduction provide good coverage for common items of information that you would normally want to extract from your data. They are designed so that you can easily reference them in any custom grammars that you create.
For some data, the coverage provided might not be sufficient. In this case, you can extend the entities provided with new entries to improve the recall of the extraction (the percentage of matches that are actually returned, out of the total number of matches that should return in theory).
You cannot edit the standard grammars in place because they are provided in .ECR format. You can, however, add more entries to an existing entity in an .ECR grammar file by extending it in a custom grammar file in XML format.
You should consider extending a grammar if the recall of the existing grammar is low. Work out what items are not being matched by the existing grammar, and add these as new entries in the appropriate entities in your custom grammar. You can compile the custom grammar (using edktool) before you use it, to allow Eduction to load it quicker. You can then replace the original grammar file with the new grammar file.
To add more entries to an entity, create a new XML grammar file. In the new grammar file, include the .ECR file that contains the entity that you want to extend. Ensure that your grammar file defines the same grammar and entity as the included grammar file. The full entity name, including the grammar prefix, must match for the grammar extension to work. Set the extend mode of the entity in your new grammar to Append
, and add the extra entries in the entity.
Although most of the time you would add new entries when you extend a grammar, you can sometimes choose to replace it entirely. To do this, set the extend mode of the entity in your new grammar file to Replace
.
Grammar extension is particularly useful when you use Eduction for sentiment analysis.
There are two main reasons why you might extend the sentiment grammar file.
You want Eduction to find some of the matches it misses because some of the positive or negative adjectives and adverbs in your data are not included in the compiled grammar. To do this, you simply extend the appropriate entities with the new entries.
You want to change the sentiment for some objects. This option is currently available only for the English sentiment grammar.
For example, the phrase Company A is much better than Company B might be positive or negative depending on whether you are with Company A or Company B. If you are with Company A, you can make Eduction return a match from the sentence with a positive sentiment by adding Company A to an entity that lists entries that you consider good.
|