Categorization

In categorization, you assign documents to categories according to what the document and the category have in common.

A category defines a subject of interest. You can train categories with a set of documents, or by constructing Boolean expressions or text that describes the subject.

You can use categories to return documents that match the category, or to suggest the categories that a document or piece of text might match.

Categorization automatically identifies the ideas that documents contain, and classifies these documents according to their content. You can tag the documents according to the category, and you can use the tags as a trigger for a further workflow process, such as approval or examination.

Categories provide an intuitive way to navigate documents, and to filter out unimportant information from a large volume of information, and focus only on what matters.

There are three kinds of categorization: conceptual categorization, binary categorization, and simple categorization. The following section describes conceptual categorization.

Before you can use categories, you must train and build them.

Create Categories

You can either create categories manually, or automatically create categories from your documents.

TIP:

Automatic clustering also allows you to analyze your content, and generate a visualization that allows you to see the clusters of content.

Train Categories

Category training describes the kinds of documents that the category should find. Training can include:

Micro Focus recommends that you train categories with many example documents.

Build Categories

When IDOL Server builds a category, it analyzes the training and produces a list of the most important terms in the training, with a list of weights to indicate how important each term is. It gives terms a higher weight when they occur more often in the different training buffers (relative to how often it occurs in the content index).

You can impose restrictions on how often terms must appear in the training, or the terms that must be explicitly included or discarded, even allowing for stop lists. In addition, you can apply an attenuation factor, which determines how quickly a term decreases in importance if it is not relevant.

For details, see the settings given in the Troubleshoot Categorization section.


_FT_HTML5_bannerTitle.htm