Binary categories are designed to answer a particular question with two possible answers (usually yes or no).
You define the question by training the category with positive and negative training. Positive training defines content that you want to match (the yes answer), and negative training defines content that you do not want to include (the no answer).
Binary categories are most useful when you can identify examples of what you want to match and what you do not want to match. Normal categories match documents that are similar to their training, but you cannot define content that you do not want to match.
You train binary categories with documents or text, for the positive and negative case.
For example, if you have a binary category that is designed to find spam documents:
the positive training is a set of spam documents.
the negative training is a set of documents that represent the kind of non-spam content you expect.
In binary categorization, you match a document against the binary category.
If the document matches the positive training, the document gives a positive match. If the document matches the negative training, it gives a negative match.
The categorization returns a confidence score that indicates how well the document matches the binary category.
You can use binary categorization to tag your content, in the same way as normal categorization.
If you perform binary categorization before indexing, you can use the results to decide whether to index the content. For example, you might create a spam category and tag the spam documents that it finds. You can then prevent IDOL from indexing these documents, or route them to a human expert for checking.
You can use binary categories to find out whether a document gives a yes or no answer to a question. Here are a few examples of the kinds of questions that you could use a binary category for:
Does this document adhere to company policy?
Is this document suitable for work purposes?
Is this e-mail a phishing attempt?
Is this document helpful?
Is this document critical?
Does this e-mail represent a threat?
Does this CV indicate a suitable candidate for a job opening?
Does this stock represent a good fit for our portfolio?
Is this legal document relevant for this court case?
|