Boolean and Proximity Search

You can use the Query action to submit standard Boolean queries to IDOL Server, and to submit proximity queries that allow you to give words that appear close together in the search string a higher weighting.

IDOL Server uses APCM (Adaptive Probabilistic Concept Modeling) to rank the results.

NOTE: You must specify all search operators in capital letters.

Boolean Operators

You can apply the operators in the following table to words, exact phrases, or other Boolean expressions in your query.

AND

Binary operator. Both terms must match in every document that returns. For example:

action=Query&Text=cat AND dog

This query returns only documents that contain both cat and dog.

NOT

Unary operator. Excludes the term that follows NOT from all the returned documents. For example:

action=Query&Text=cat NOT dog

This query returns only documents that contain cat but not dog.

NOTE: To use NOT to exclude multiple terms, you must use brackets. Otherwise, NOT applies only to the term that immediately follows it. To use NOT to exclude a phrase, you must put the phrase in quotation marks. For example:

Document 1: I went to the city for the New Year

Document 2: I went to New York City for the New Year

The following query does not match either of the above documents:

action=Query&Text=city NOT (New York)

The following queries match the first document but not the second:

action=Query&Text=city NOT ("New York")
action=Query&Text=city NOT "New York"
OR

Binary operator. One or both terms must appear for the document to return. This option is the default behavior if you do not specify an operator between two terms. For example:

action=Query&Text=cat OR dog

This query returns documents that contain cat, dog, or both terms.

EOR or XOR

Binary operator. Logical exclusive OR. A document must contain only one of the terms for it to return. This operator is rarely used. For example:

action=Query&Text=cat XOR dog

This query returns only documents that contain either the term cat or the term dog. Documents that contain both cat and dog do not return.

()

Bracketed expressions. IDOL Server evaluates brackets from left to right. You can nest bracketed expressions. Brackets dictate the precedence and behavior of combined operator statements. For example:

action=Query&Text=(fish EOR pie) AND (chips EOR mash)

This query returns documents that contain one of the following combinations:

fish and chips

fish and mash

pie and chips

pie and mash

Proximity Search Operators

You can apply the operators in the following table to words, exact phrases, or other Boolean expressions to run a proximity search.

NOTE: If the two specified words are adjacent to each other, their proximity is 1, if one word separates them, their distance is 2 and so on.

By default, the proximity ignores stop words (for example, because and is a stop word, the terms cat and dog have a proximity of 1 both in the text cat dog and cat and dog). However, if you set the StopWordIndex configuration parameter to index stop words, IDOL Content Component includes stop words in the proximity calculation.

NOTE: Proximity operators work recursively so that nested Boolean queries can have proximity operators apply to brackets or phrases. For example, in the expression

(term1) NEAR10 ((term2) DNEAR2 (term3))

the NEAR10 operator ensures that term1 is in proximity to an occurrence of term2 within two of term3.

NEARN

Returns only documents in which the second term is within N words of the first term; that is, the terms are N or fewer words apart. If you do not specify N, NEAR defaults to 5. For example:

action=Query&Text=red NEAR1 green

This query returns only documents in which the term red is adjacent to the term green. Documents that contain red green or green red return. Documents that contain red orange green do not return (the terms are not close enough to each other).

DNEARN

Directed NEAR. Returns only documents in which the second term is within N words of the first term, in the specified order. If you do not specify N, DNEAR defaults to 5. For example:

action=Query&Text=red DNEAR2 green

This query returns only documents in which the term green follows the term red, and is no more than two words away from the term red. Documents that contain red orange green return. Documents that contain green orange red or red orange blue green do not return.

WNEARN

Weighted NEAR (with OR operation). Proximity operator that returns documents that contain either of the two terms. It promotes relevance when term spacing is less than the specified N word distance (closer together implies higher relevance). If you do not specify N, WNEAR defaults to 5. For example:

action=Query&Text=red WNEAR7 green

This query returns documents that contain the term red or the term green. It gives extra relevance to documents in which red and green appear seven or fewer words apart in a piece of text. This weight increases as the terms get closer to each other. Documents in which the terms occur more than seven words apart, or in which only one term occurs, return with the normal relevance.

YNEARN

Weighted NEAR (with AND operation). Proximity operator that returns documents that contain both of the terms. It promotes relevance when term spacing is less than the specified N word distance (closer together implies higher relevance). If you do not specify N, YNEAR defaults to 5. For example:

action=Query&Text=red YNEAR7 green

This query returns only documents that contain both the term red and the term green. Extra relevance is given to documents in which red and green appear seven or fewer words apart in a piece of text. This weight increases as the terms get closer to each other. Documents in which the terms occur more than seven words apart return with the normal relevance.

BEFORE

Returns only documents in which the first term precedes the second one. For example:

action=Query&Text=red BEFORE green

This query returns only documents in which the term green appears later than the term red.

NOTE: For a FieldText query with the BEFORE operator to successfully compare two occurrences of the same field, you must set the XMLFullStructure configuration parameter to True in the IDOL Server configuration file [Server] section.

AFTER

Returns only documents in which the first term appears later than the second one. For example:

action=Query&Text=red AFTER green

This query returns only documents in which the term red appears later than the term green.

NOTE: For a FieldText query with the AFTER operator to successfully compare two occurrences of the same field, you must set the XMLFullStructure configuration parameter to True in the IDOL Server configuration file [Server] section.

XNEAR

Returns only documents in which the second term is exactly N words from the first term. For example:

action=Query&Text=cats+XNEAR2+dogs

This query returns only documents in which the term dogs follows the term cats and is exactly two words away from the term cats. Documents that contain, for example, cats scare dogs return, while documents that contain dogs eat cats or cats, dogs do not return.

This operator is available only for the Text action parameter.

SENTENCE

Returns only documents in which the second term is in the same sentence as the first term. For example:

action=Query&Text=cats+SENTENCE+dogs

This query returns only documents in which the term dogs occurs in the same sentence as the word cats.

IDOL Content Component breaks the document into sentences by using a number of criteria. The most important criteria is the detection of an end of sentence marker, which includes a period (.), question mark (?), or exclamation point (!), as well as their multibyte variants. However, the presence of one of these characters is not always sufficient to mark the end of a sentence, because these characters are often used in abbreviations, names, and other items for purposes other than the end of a sentence. To locate a more accurate sentence boundary, IDOL Content Component also uses characteristics such as capitalization and syntactic observations.

SENTENCENN

Returns only documents in which the second term is in the same sentence as the first term, and they are within NN words of each other. For example:

action=Query&Text=cats+SENTENCE10+dogs

This query returns only documents in which the term dogs occurs in the same sentence as, and within 10 words of, the word cats.

NOTE: SENTENCE0 has the same behavior as SENTENCE.

DSENTENCE

Returns only documents in which the second term occurs later than the first term, in the same sentence. For example:

action=Query&Text=cats+DSENTENCE+dogs

This query returns only documents in which the term dogs occurs later in the same sentence than the word cats.

DSENTENCENN

Returns only documents in which the second term occurs later than the first term, and within NN words in the same sentence. For example:

action=Query&Text=cats+DSENTENCE10+dogs

This query returns only documents in which the term dogs occurs later in the same sentence as, and within 10 words of, the word cats.

NOTE: DSENTENCE0 has the same behavior as DSENTENCE.

PARAGRAPH

Returns only documents in which the second term is in the same paragraph as the first term. For example:

action=Query&Text=red+PARAGRAPH+green

This query returns only documents in which the term green occurs in the same paragraph as the word red. The words do not have to occur in the same sentence in the paragraph.

PARAGRAPHNN

Returns only documents in which the second term is in the same paragraph as the first term, and they are within NN words of each other. For example:

action=Query&Text=cats+PARAGRAPH20+dogs

This query returns only documents in which the term dogs occurs in the same paragraph as, and within 20 words of, the word cats.

NOTE: PARAGRAPH0 has the same behavior as PARAGRAPH.

Other Search Operators

If you set XMLFullStructure to True in the [Server] section of the IDOL Server configuration file, you can use the following operators in Text and FieldText queries to return XML documents in which fields or attributes occur together, or do not occur together.

WHEN Return only XML documents in which two fields that have the same parent field contain specified terms or phrases.
WHENN Return only XML documents in which two fields that have the same parent field at N levels from the root level contain specified terms or phrases.
NOTWHEN Return only XML documents in which the first field and value pair occurs, but the second field and value does not occur in the same parent field.

Examples

Consider the following three XML documents:

Document 1:

<DOC>
   <car>
      <make>audi</make>
      <color>red</color>
   </car>
   <car>
      <make>mercedes</make>
      <color>silver</color>
   </car>
</DOC>

Document 2:

<DOC>
   <car>
      <make>audi</make>
      <color>silver</color>
   </car>
   <car>
      <make>mercedes</make>
      <color>red</color>
   </car>
</DOC>

Document 3:

<DOC>
   <car>
      <make>audi</make>
      <body> 
         <color>red</color>
      </body>
   </car>
   <car>  
      <make>mercedes</make>
      <body> 
         <color>silver</color>
      </body>
   </car>
</DOC>
  • The following query returns only XML documents in which the make and color fields are direct children of the same parent field, and contain the values audi (in the make field) and red (in the color field).

    action=Query&Text=audi:make+WHEN+red:color

    This query returns Document 1, but not Document 2 or Document 3.

  • The following query returns only XML documents in which the make field contains the value audi, and the color field contains the value red, within a parent field that is two levels from the root field.

    action=Query&Text=audi:make+WHEN2+red:color

    This query returns Document 1 and Document 3, but not Document 2.

  • The following query returns only XML documents in which the make field contains the value audi, under a parent field that does not also have a color field with the value red.

    action=Query&Text=audi:make+NOTWHEN+red:color

    This query returns Document 2 and Document 3, but not Document 1.

Nested Expressions

You can use complex nested expressions with the WHEN and NOTWHEN operators in query text. For example:

action=Query&Text=(London:CITY WHEN English:LANG) WHEN ("United Kingdom":COUNTRY)
action=Query&Text=(Lincoln:CITY NOTWHEN Nebraska:STATE) NOTWHEN ("United Kingdom":COUNTRY)

Field Attributes

You can use WHEN and NOTWHEN to return only XML documents in which two attributes that occur in the same field contain (or do not contain) specified terms or phrases.

For example:

  • The following query returns only XML documents in which the LANG and CAPITAL attributes occur in the same field, and contain the values English (in the LANG attribute) and Cape Town (in the CAPITAL attribute).

    action=Query&Text=English:_ATTR_LANG WHEN "Cape Town":_ATTR_CAPITAL

    This query returns a document that contains the following field:

    <COUNTRY CAPITAL="Cape Town" LANG="English" POP="44">South Africa</COUNTRY>

    The following field does not match the query:

    <COUNTRY CAPITAL="Cape Town" LANG="Afrikaans" POP="10">South Africa</COUNTRY>
  • The following query returns only XML documents in which the LANG attribute contains the value English in a field where the COUNTRY attribute does not contain the value UK.

    action=Query&Text=English:_ATTR_LANG NOTWHEN London:_ATTR_CAPITAL

    This query returns a document that contains the following field:

    <COUNTRY CAPITAL="Canberra" LANG="English" POP="44">Australia</COUNTRY>

    It does not match the following field:

    <COUNTRY CAPITAL="London" LANG="English" POP="44">United Kingdom</COUNTRY>

You can also use WHEN and NOTWHEN to return only XML documents in which a field contains a specified term or phrase, and has an attribute that has a specific value.

For example:

action=Query&Text=Fr.html:_ATTR_HREF WHEN France:A

This query returns only XML documents in which the A field contains the value France, and has an HREF attribute with the value Fr.html.

The following document, for example, returns in this query:

<XML>
   <DOC>
      <A HREF="Fr.html">France</A>
   </DOC>
</XML>

The following document does not return:

<XML>
   <DOC>
      <A HREF="France.html">France</A> is next to <A HREF="Fr.html">Belgium</A>
   </DOC>
</XML>