Eduction
The Eduction processor uses Named Entity Recognition (Eduction) to extract entities from text. An entity is a word, phrase, or block of information. For example, you can use Named Entity Recognition to extract names, addresses, telephone numbers, and dates from document content or metadata.
For more information about Named Entity Recognition and how to configure Named Entity Recognition, refer to the Named Entity Recognition User and Programming Guide.
OpenText recommends that you use the Advanced configuration Guided Setup to configure your Eduction processor.
TIP: The Eduction processor includes resource files for many standard grammars. You can also upload the Named Entity Recognition grammars package, which includes grammars for the premium GOV, PBI, PCI, PHI, and PII grammar packages). For properties such as Resource Files or [MyPostProcessingTask] Script you can enter the name of a OpenText grammar file or Lua script, such as address_eng.ecr or normalize_money.lua. The processor uses the corresponding file from its resources directory.
Properties
| Name | Default Value | Description |
|---|---|---|
| IDOL License Service |
An IdolLicenseServiceImpl that provides a way to communicate with a Knowledge Discovery License Server. |
|
| Named Entity RecognitionConfiguration Parameters |
The configuration options available in the Eduction processor are Named Entity Recognition configuration parameters, which allow you to choose the entities to extract, and configure the extraction process. The available parameters are the same as the standard Named Entity Recognition Configuration Parameters, with simplified names (for example including spaces between words). You must set Entity, Entity Field, and Resource Files for the processor. The Post Processing Tasks and Pre Filtering Tasks options require additional configuration. To add a post-processing or pre-filtering task, select the option and type a task name, and click Ok. In the processor configuration, click Apply, and then reopen the processor configuration. The configuration now includes additional configuration options for the task. For more information about how to configure Named Entity Recognition, refer to the Named Entity Recognition User and Programming Guide and Named Entity Recognition Grammars User Guide. |
Relationships
| Name | Description |
|---|---|
| success | Successfully processed FlowFiles are routed to this relationship. |
| failure | FlowFiles that had an invalid or unknown format. |
Advanced Configuration
The Eduction processor has an advanced configuration interface, which provides guided setup for configuration. This interface also includes tabs that allow you to upload grammar packs, and code editors that allow you to create custom grammars and write Lua post-processing scripts, along with samples to demonstrate how to complete common tasks.
To open the advanced configuration interface, right-click the processor and click Advanced.
Upload Grammar Packages
The Grammar Packs tab allows you to upload grammar packages to your Eduction processor. You can use this tab to choose a downloaded grammar ZIP package file and upload it to NiFi Ingest to use in the processor.
Create Custom Grammars
The Custom Grammars tab has a text editor that allows you to create a custom grammar, along with the DTD that describes the grammar format, and several sample grammars that provide simple examples that you can use to create your own grammars. For more information about grammar formats, refer to the Named Entity Recognition User and Programming Guide.
Write Custom Processing Scripts
The Custom Processing tab has a text editor that allows you write a custom Lua post-processing script to use for matches from this processor. It includes samples that provide information about the main script functions that you can use to process matches. For more information about Lua post-processing, refer to the Named Entity Recognition User and Programming Guide.