CategorizeDocument

The CategorizeDocument processor makes requests (action=CategorySuggestFromText and action=BinaryCatQuery) to a Category component, to categorize incoming documents based on their text content.

For each matching category, the processor writes a new field to the document metadata:

<Category>
    <Name>...</Name>
    <Weight>...</Weight>
</Category>

The processor has an advanced configuration interface that you can use to:

  • list, add, and modify binary categories
  • list, add, and modify categories
  • categorize text, for testing purposes

For more information about categorization, refer to the Knowledge Discovery Administration Guide.

Properties

Name Default Value Description
IDOL License Service  

An IdolLicenseServiceImpl that provides a way to communicate with a Knowledge Discovery License Server.

Category Host   The host name or IP address of the Category component.
Category Port   The Category component ACI port.
Request Timeout 60 The maximum amount of time to wait, in seconds, for a response from the Category component.
Binary Categories  

A list of binary categories to query, or * to test incoming documents against all binary categories.

TIP: You can view and train binary categories in the "Binary Categories" tab of the advanced configuration interface.

Minimum Weight   The minimum threshold that must be met for a document to be categorized (for the category to be added to the document metadata).
SSL Config Service   An optional IdolSSLConfigServiceImpl that specifies the settings to use to communicate with the Category component over SSL/TLS. Set this property if your Category component has been configured to accept connections over SSL.

Relationships

Name Description
success FlowFiles that were processed successfully.
failure FlowFiles that were not processed successfully.