Query Result Clustering

You can create clusters immediately from the results of a Query or Suggest. All documents in the results set are addressed by this method.

Workflow

To cluster the results of a Query or Suggest action, you simply add the Cluster action parameter. In addition to the ordinary query results, IDOL Server now returns clustering information in each result. For example:

<autn:hit>
   <autn:reference>http://en.wikipedia.org/wiki/Fauna of Australia</autn:reference>
   <autn:id>24152</autn:id>
   <autn:section>0</autn:section>
   <autn:weight>90.48</autn:weight>
   <autn:cluster>4</autn:cluster>
   <autn:clustertitle>supercontinent Gondwana, Cretaceous, fauna, MYA</autn:clustertitle>
   <autn:links>FAUNA,FLORA</autn:links>
   <autn:database>Default</autn:database>
   <autn:title>Fauna of Australia</autn:title>
</autn:hit>

Every result is tagged with the ID and title of the cluster it is assigned to. There is no restriction on the number of clusters that can be created, or on how many documents make up a cluster. For example, you might have a cluster with only one document.

You can use two configuration parameters to control clustering behavior:

Algorithmic Outline

In query result clustering, IDOL Server selects the most relevant document in the query result set as the basis for the first cluster. It then compares the remaining results to this document, and adds them to the cluster if the relevance to the first document exceeds the configured ClusterThreshold. IDOL Server then applies this process to the remaining unclustered documents, and continues until all results are assigned to a cluster.

A query returns 10 results, numbered 0 to 9. Document 0 is the basis of the first cluster. When compared to this basis document, documents 1 and 2 have a relevance score higher than 50, so they are added to the cluster. So the first cluster contains documents 0, 1, and 2.

This process continues for the remaining documents. Document 3 is the basis for the second cluster, and documents 4, 5, 6, and 7 have high enough relevance scores and are added to this cluster. Document 8 is the basis of the third cluster, and document 9 is similar to it. No results remain, so the process ends.

IDOL Server creates a title for each cluster according to the best terms and phrases contained in the cluster documents.

Things to Consider

Consider the following points when you are deciding whether to use clustering from query results:


_FT_HTML5_bannerTitle.htm