Results Relevance

Eduction returns entities based on the extraction rules from the grammars and dictionaries.

The edktool command-line tool includes a test mode to measure the precision and recall of your extraction, to determine the result relevance. This mode allows you to check how well your grammar works on your text data.

Precision and recall are statistical measures that compare the results that a human marks and results that the engine returns. The following terms describe result relevance as used in Eduction.

  • True Positives (TP). Results that are identified by both a human and the engine. That is, the engine returns an entity that is confirmed as true by the person marking the document.

  • False Positives (FP). Results that are identified by the engine, and are not marked by a human. That is, the engine returns an entity that is not confirmed by the person marking the document.

  • True Negatives (TN). Results that are not marked by either the person marking the document, or the engine.

  • False Negatives (FN). Results that are marked by a human, and are not marked by the engine. That is, the engine does not return an entity that has been marked as true by the person marking the document.

From these relevance terms, you can determine precision and recall as follows:

  • Recall is the percentage of true relevant entities that are extracted by a rule:

    TP / (TP + FN) * 100
  • Precision is the percentage of extracted entities that are true entities:

    TP / (TP + FP) * 100

For more information about edktool, see Compile and Test Grammars.