Open topic with navigation
Eduction is the process of finding entities in documents, and typically extracting them to form additional fields in the document. Entities are words, phrases, or blocks of information. Eduction finds entities by using predefined or custom grammars, which use linguistic, regular expression, or term-based rules to define the pattern to match. Common entities include names, addresses, phone numbers, and credit card numbers.
Eduction allows you to automatically extract entities from your documents and add them as metadata to your documents before indexing into IDOL Server. You can extract common search phrases from documents before indexing, and tag the documents with this data. You can then use the tags to make it very quick and easy to search for these common values. It is therefore a useful tool for preprocessing data.
Eduction is most commonly used in IDOL Server as a plug-in module for pre-index processing. It is also available in a stand-alone command-line tool, an API (for the C++ and Java languages), or an ACI server. The command-line tool (edktool) is used for compiling grammars and testing, in addition to extraction.