ProperNames

The ProperNames configuration parameter controls whether terms are created from pairs of consecutive words in index fields. The rationale behind this is to increase the relevance of results by matching pairs of associated words in a query with documents in which those words are also paired.

When you search for George Washington you want documents in which those words appear consecutively to have a higher score than a document containing the text I saw George Bush speak in Washington, D. C.

NOTE:

In most cases, HPE does not recommend using ProperNames. It can increase the number of terms and the size of the index considerably, whilst achieving only marginal gains in most queries.

Advanced Search

When you turn on AdvancedSearch (or AdvancedPlus and AdvancedCaseSearch), IDOL Server implicitly uses WNEAR as the default query operator. This method ensures that a search for George Washington matches documents that contain those words consecutively with a higher score than documents in which they occur further apart.

HPE recommends that you use AdvancedSearch to achieve this functionality, rather than use ProperNames.

Other Uses

You can use ProperNames to match stop words in some situations, such as when they occur as part of a capitalized phrase. For example, you might want a query for The Queen to weight a document with those exact words higher than one only containing Queen (or indeed the queen) despite the fact that the is configured as a stop word.

With the appropriate setting (for example, ProperNames=7), IDOL Server indexes a term for THEQUEEN to allow this. The same is true for pairs of stop words (for example, The Who or Take That).

In situations, such as plagiarism or near-duplicate detection, where you want to match documents containing a significant amount of the same text, rather than conceptually similar documents, you can use ProperNames to help. In fact, setting IDOL to index only proper name terms optimizes this process.

Configuration Options

You can set the ProperNames configuration parameter in each language configuration section to one of the following values:

Value Tokenization of And The Cats Dogs ran away
0 CAT DOG RAN AWAY
1 CAT CATSDOG DOG RAN AWAY
2 CAT CATSDOG DOG DOGSRAN RAN RANAWAY AWAY
3 ANDTH CAT CATSDOG DOG RAN AWAY
4 ANDTH THECAT CAT CATSDOG DOG RAN AWAY
5 ANDTHE CAT CATSDOGS DOG RAN AWAY
6 ANDTHE THECATS CAT CATSDOGS DOG RAN AWAY
7 ANDTHE THECATS CAT DOG RAN AWAY

Bitwise Configuration Values

For specialized usage, you can also set the ProperNames parameter by using bitwise values. You can combine any of the following values by adding multiple bits.

Bit Short name Description
8 stem Stem any ProperNames term.
16 case Return only capitalized ProperNames terms.
32 neither Return ProperNames terms if neither is a stop word.
64 one Return ProperNames terms if exactly one is a stop word.
128 both Return ProperNames terms if both are stop words.
256 only Only return ProperNames terms.

The standard configurable value of ProperNames then have the following meanings:

Value Bitwise equivalent Sum
0 0 0
1 56 8+16+32
2 40 8+32
3 184 8+16+32+128
4 248 8+16+32+64+128
5 176 16+32+128
6 240 16+32+64+128
7 208 16+64+128

_HP_HTML5_bannerTitle.htm