Eduction returns entities based on the extraction rules from the grammars and dictionaries.
The edktool command-line tool includes a test mode to measure extraction relevance precision and recall. This mode allows you to check how well your grammar works on your text data.
Precision and recall are measures that compare the results that a human marks and results that the engine returns. The following terms describe result relevance as used in Eduction.
True Positives (TP). Human-marked results that are also marked by the engine. That is, an entity that the engine returns is confirmed as true by the person marking the document.
False Positives (FP). Engine-marked results that are not marked by a human. That is, an entity that the engine returns is not confirmed by the person marking the document.
True Negatives (TN). Results that are not marked either by the person marking the document, or the engine.
False Negatives (FN). Human-marked results that are not marked by the engine. That is, an entity that the engine does not return has been marked as true by the person marking the document.
From these relevance terms, you can determine precision and recall as follows:
Recall is the percentage of true relevant entities that are extracted by an extraction rule, that is,
TP / (TP + FN) * 100
Precision is the percentage of extracted entities that are true entities, that is,
TP / (TP + FP) * 100
For more information about edktool, see Compile and Test Grammars.
|