Cluster from Snapshots

IDOL can identify clusters by using representations of what is present in the set of documents at a certain point in time. This representation is known as a snapshot. The aim of this type of clustering is to identify trends in the data, rather than to include every document from the total set.

Snapshots contain details of potential clusters, known as seeds. A seed is made up of the document, and the best scoring results from a Suggest action. The number and quality of suggested documents must exceed specified levels.

With scheduling, you can take snapshots at regular intervals to keep track of the set of documents as it changes over time. You can also use snapshots to create visual representations of the state of your set at a particular time, or over a particular time frame.

Workflow

Micro Focus recommends that you perform clustering options with the Cluster actions in the Category component. To create a set of clusters, run the following actions:

  1. ClusterSnapshot. This action takes a snapshot of your data. You can also add a query, if you want to find clusters in a certain set of documents.

  2. ClusterCluster. This action identifies clusters in the specified snapshot and saves them to disk for later use.

  3. ClusterResults. This action returns the clusters as an XML response. It returns details for each clusters, including the title, document details, and an importance score (see Algorithmic Outline).

Things to Consider

Consider the following points when you are deciding whether to use clustering from snapshots:


_FT_HTML5_bannerTitle.htm