Taxonomy

In IDOL, a taxonomy is a hierarchy of categories, ordered so that the broadest categories are at the top, and more specific categories are nearer the bottom.

You might have a taxonomy with a Sport root category, and its child categories might be Soccer, Cricket, Golf, Rugby, and so on. The child categories then also have their own child categories. The soccer category might have subcategories called Teams, Grounds, Competitions, Leagues, and so on.

NOTE:

The terms taxonomy and hierarchy are often used interchangeably.

You can create taxonomies manually or automatically.

Create a Taxonomy Manually

You can create a taxonomy manually by using the CategoryCreate, CategoryCopy, and CategoryMove actions. You specify a suitable parent category to add the category to.

This approach is the best method if you have designed a taxonomy, and you have a good knowledge of the documents in your index.

You might want to create a Documents Reviewed category, with the subcategories Accepted, Rejected, and Undecided. You can assign documents to the subcategories according to their status or what they contain.

Create a Taxonomy Automatically

The TaxonomyGenerate action takes a defined set of documents, and attempts to generate a meaningful taxonomy, based on the contents of the documents.

The action trains a category using the set of documents that you define, and extracts the generated terms from the category. It uses those terms to form a hierarchy, according to how often each pair of terms occur together in the documents. After it assembles the hierarchy, it creates categories in place of the term, with each category named after that term.

The category training is free text, consisting of the category’s name, and the names of its immediate children.

In the Sport taxonomy, the training for the sport category is “sport soccer football cricket golf rugby”, and the training for the soccer category is “soccer teams grounds competitions leagues”.

The action also automatically builds all the categories in the new taxonomy.

You can also run automatic taxonomy generation regularly, by using the Category scheduling operations.

Example

The following action creates a financial-based taxonomy and adds it under the category with ID 123:

action=TaxonomyGenerate&DREQuery=finance economy stock market money options&Parent=123

You can specify more complex DRE queries, but they must be percent encoded.

The following action creates a financial based taxonomy based on documents from the last 20 days:

action=TaxonomyGenerate&DREQuery=action%3DQuery%26Text%3Dfinance%20economy%20stock%20market%20money%20options%26FieldText%3DRANGE%7B-20%2C0%7D%3ADREDATE&Parent=123

Write a Taxonomy to Disk

Rather than adding a taxonomy to an existing category hierarchy, you can write it to disk as a directory structure. To do this, set the WriteTaxonomy parameter to True in the TaxonomyGenerate action. IDOL creates the taxonomy directories in the Category/TAXONOMY/HIERDOCS subfolder of your IDOL installation.

You can add the training documents to the taxonomy directories by using the DownloadDocAction parameter. See Taxonomy Configuration.

Improve Taxonomies

A generated taxonomy is only as good as the documents used to train the category at the start of the process. If the results are unacceptable or strange, examine the training documents.

Documents that contain HTML fragments or nonsense content will likely result in a poor taxonomy.

Use the parameters in Taxonomy Configuration to improve your results. You can get better results if you have some idea of what you want your finalized taxonomy to look like (that is, how deep it must be, how many concepts to include, and so on).


_FT_HTML5_bannerTitle.htm