This section describes some ways that you can improve the results from Automatic Query Guidance.
The QuerySummaryAdvanced
parameter turns on the advanced algorithm required for Dynamic Clustering and Automatic Query Guidance. You can alternatively set the QuerySummaryPlus
parameter to True
. In this mode, IDOL Server uses an improved phrase selection algorithm, which is marginally slower.
HPE recommends that you turn on QuerySummaryPlus
, unless the very small performance increase is critical.
The key to good AQG results is that they are generated from good results sets. For example, running QuerySummary
on a set of only 10 documents rarely gives strong results. The more documents in the results set (that is, the higher your value for MaxResults
) the better the results are. The only limit is to fit to your performance requirements.
For this reason, you usually perform AQG with two IDOL queries. The first gets the results that you actually want to display on the screen (for example, the top ten list in a user interface pane). The second query is the QuerySummary
query, with a much higher MaxResults
value, and setting Print
to NoResults
to avoid returning the unnecessary document matches.
This method improves the response time by not returning results that are not useful.
You can control the number of elements that the QuerySummary
returns by using the QuerySummaryLength
configuration parameter. You should set this parameter to a value that returns sufficient elements for your purposes. You rarely need a value of more than 100, because it is unlikely that there are more than 100 strong elements in a particular results set.
The number of useful elements varies with the query. To determine whether to use the element, you should look at information such as the occurrence counts or cluster ID (see Query Summary Response Format).
You might use only elements that appear in more than five documents in the results set, or those that have a positive cluster ID.
IDOL Server generates elements by analyzing the most important terms in the results set. By default, it uses the top 50 terms in the set, which is not enough for QuerySummaryAdvanced
.
You can increase the value of the QuerySummaryTerms
configuration parameter to increase the quality of the elements. In general, use a value of 1000 or lower. Higher values do not improve the quality any further, and might eventually reduce performance. HPE recommends that you test with different values to determine the value that gives you the best balance of performance and quality for your environment.
By default, IDOL uses the SourceType fields to generate the AQG results, as well as the TitleType fields. If you have not configured any SourceType
fields, then it uses all Index fields, which might include fields that are not suitable for this type of analysis. HPE recommends that you set SourceType
fields to only those containing clean natural language text.
To prevent extremely long documents causing slow-performing queries, AQG uses only the first 6000 characters from each document. You can modify this value by using the QuerySummaryMaxDocLength
configuration parameter. However, HPE recommends that you change this setting only in advanced cases.
By default, AQG generation does not use numeric terms. You can change this behavior by using the QuerySummaryNumbers
configuration parameter.
You might get unwanted AQG elements if the documents that you use to generate AQG results contain boiler-plate text, such as a disclaimer or e-mail signature. You can prevent these elements by automatically detecting very common phrases in the index when you start IDOL, which you can manually edit if required. Use the QuerySummaryStopPhraseMode
configuration parameter.
Similarly, you can provide a list of phrases to favor when choosing AQG elements, by using the QuerySummaryWhiteListMode
configuration parameter.
When creating AQG results from financial documents, you might provide a list of financial terms and phrases that you want to give higher weighting.
You can modify the stop phrase list and the white list by using the DREQUERYSUMMARYMANAGEMENT
index action. You can also view the phrases on these lists by using the QuerySummaryManagement
action. For more information, refer to the IDOL Server Reference.
Other parameters can affect the quality of your terms, because they can affect the results set from which IDOL Server generates the elements.
Setting Combine
to Simple
might improve quality if many sections of the same few documents dominate your results set.
Furthermore, if you expand a term (for example, Apollo), the top 100 documents of a query might all be about the moon landings, with the Greek Mythology results too far down the list to be reflected in the elements. In this case, setting Sort to Random
might improve results, by giving a more representative sample.
If you set Sort
to Random
for a multiple term query, you might need to also set the MinLinks
, MinScore
, or MatchAllTerms
parameters to prevent poor matches from dominating the results. For more information, refer to the IDOL Server Reference.
|