CategorizeDocument
The CategorizeDocument processor makes requests (action=CategorySuggestFromText and action=BinaryCatQuery) to a Category component, to categorize incoming documents based on their text content.
For each matching category, the processor writes a new field to the document metadata:
<Category>
<Name>...</Name>
<Weight>...</Weight>
</Category>
The processor has an advanced configuration interface that you can use to:
- list, add, and modify binary categories
- list, add, and modify categories
- categorize text, for testing purposes
For more information about categorization, refer to the Knowledge Discovery Administration Guide.
Properties
| Name | Default Value | Description |
|---|---|---|
| IDOL License Service |
An IdolLicenseServiceImpl that provides a way to communicate with a Knowledge Discovery License Server. |
|
| Category Host | The host name or IP address of the Category component. | |
| Category Port | The Category component ACI port. | |
| Request Timeout | 60 | The maximum amount of time to wait, in seconds, for a response from the Category component. |
| Binary Categories |
A list of binary categories to query, or TIP: You can view and train binary categories in the "Binary Categories" tab of the advanced configuration interface. |
|
| Minimum Weight | The minimum threshold that must be met for a document to be categorized (for the category to be added to the document metadata). | |
| SSL Config Service | An optional IdolSSLConfigServiceImpl that specifies the settings to use to communicate with the Category component over SSL/TLS. Set this property if your Category component has been configured to accept connections over SSL. |
Relationships
| Name | Description |
|---|---|
| success | FlowFiles that were processed successfully. |
| failure | FlowFiles that were not processed successfully. |