Eduction

The Eduction processor uses IDOL Eduction to extract entities from text. An entity is a word, phrase, or block of information. For example, you can use Eduction to extract names, addresses, telephone numbers, and dates from document content or metadata.

For more information about Eduction and how to configure Eduction, refer to the Eduction User Guide.

Properties

Name Default Value Description
Entity  

A comma-separated list of entities to extract.

To specify several entities, you can use wildcard expressions. For example: place/city1/*,place/city2/*. The * wildcard matches any number of characters, and the ? wildcard matches a single character.

You must also set the Resource Files property to the location of the resource files that contain your chosen entities.

If you do not set this property, Eduction looks for all entities in the grammar files specified by the Resource Files property.

Entity Field  

A comma-separated list of document fields in which to write the matches from Eduction. The value of this property must have the same number of values as the Entity property.

If you did not set the Entity property, do not set this property. In this case, field names are generated automatically, matching the entities that were found by converting the entity names to uppercase and replacing slashes with an underscore. For example, if the entity edk_common_entities/place is found, Eduction generates a field named EDK_COMMON_ENTITIES_PLACE.

Resource Files   A comma-separated list of compiled ECR files containing Eduction grammar entries. At least one resource file is required. You can match multiple resource files with wildcard expressions. You can use the * wildcard to match any number of characters, or the ? wildcard to match a single character.
Search Fields DRECONTENT A comma-separated list of document fields to search for entities, for example DRECONTENT or DRETITLE.
Simple Output False A Boolean value that specifies whether to add only the matched text to the document fields specified by the Entity Field property. To add only the matched text, set this parameter to true. With the default value, false, the fields will have subfields that contain the matched text, the offset, and the score.
eduction_configuration_parameter_name   You can add additional properties that match the name of an Eduction configuration parameter, and set an appropriate value. For more information about the configuration parameters that you can use to configure Eduction, refer to the Eduction User Guide.

Relationships

Name Description
success Successfully processed FlowFiles are routed to this relationship.
failure FlowFiles that had an invalid or unknown format.

_FT_HTML5_bannerTitle.htm