Boolean and Proximity Search
You can use the Query action to submit standard Boolean queries to IDOL Server, and to submit proximity queries that allow you to give words that appear close together in the search string a higher weighting.
IDOL Server uses APCM (Adaptive Probabilistic Concept Modeling) to rank the results.
NOTE: You must specify all search operators in capital letters.
Boolean Operators
You can apply the operators in the following table to words, exact phrases, or other Boolean expressions in your query.
AND
|
Binary operator. Both terms must match in every document that returns. For example: action=Query&Text=cat AND dog This query returns only documents that contain both cat and dog. |
NOT
|
Unary operator. Excludes the term that follows action=Query&Text=cat NOT dog This query returns only documents that contain cat but not dog. NOTE: To use Document 1: I went to the city for the New Year Document 2: I went to New York City for the New Year The following query does not match either of the above documents: action=Query&Text=city NOT (New York) The following queries match the first document but not the second: action=Query&Text=city NOT ("New York") action=Query&Text=city NOT "New York" |
OR
|
Binary operator. One or both terms must appear for the document to return. This option is the default behavior if you do not specify an operator between two terms. For example: action=Query&Text=cat OR dog This query returns documents that contain cat, dog, or both terms. |
EOR or XOR |
Binary operator. Logical exclusive action=Query&Text=cat XOR dog This query returns only documents that contain either the term cat or the term dog. Documents that contain both cat and dog do not return. |
()
|
Bracketed expressions. IDOL Server evaluates brackets from left to right. You can nest bracketed expressions. Brackets dictate the precedence and behavior of combined operator statements. For example: action=Query&Text=(fish EOR pie) AND (chips EOR mash) This query returns documents that contain one of the following combinations: fish and chips fish and mash pie and chips pie and mash |
Proximity Search Operators
You can apply the operators in the following table to words, exact phrases, or other Boolean expressions to run a proximity search.
NOTE: If the two specified words are adjacent to each other, their proximity is 1
, if one word separates them, their distance is 2
and so on.
By default, the proximity ignores stop words (for example, because and is a stop word, the terms cat and dog have a proximity of 1
both in the text cat dog
and cat and dog
). However, if you set the StopWordIndex configuration parameter to index stop words, IDOL Content Component includes stop words in the proximity calculation.
NOTE: Proximity operators work recursively so that nested Boolean queries can have proximity operators apply to brackets or phrases. For example, in the expression
(term1) NEAR10 ((term2) DNEAR2 (term3))
the NEAR10
operator ensures that term1
is in proximity to an occurrence of term2
within two of term3
.
NEARN
|
Returns only documents in which the second term is within action=Query&Text=red NEAR1 green This query returns only documents in which the term red is adjacent to the term green. Documents that contain |
DNEARN
|
Directed action=Query&Text=red DNEAR2 green This query returns only documents in which the term green follows the term red, and is no more than two words away from the term red. Documents that contain |
WNEARN
|
Weighted action=Query&Text=red WNEAR7 green This query returns documents that contain the term red or the term green. It gives extra relevance to documents in which red and green appear seven or fewer words apart in a piece of text. This weight increases as the terms get closer to each other. Documents in which the terms occur more than seven words apart, or in which only one term occurs, return with the normal relevance. |
YNEARN
|
Weighted action=Query&Text=red YNEAR7 green This query returns only documents that contain both the term red and the term green. Extra relevance is given to documents in which red and green appear seven or fewer words apart in a piece of text. This weight increases as the terms get closer to each other. Documents in which the terms occur more than seven words apart return with the normal relevance. |
BEFORE
|
Returns only documents in which the first term precedes the second one. For example: action=Query&Text=red BEFORE green This query returns only documents in which the term green appears later than the term red. NOTE: For a FieldText query with the |
AFTER
|
Returns only documents in which the first term appears later than the second one. For example: action=Query&Text=red AFTER green This query returns only documents in which the term red appears later than the term green. NOTE: For a FieldText query with the |
XNEAR
|
Returns only documents in which the second term is exactly action=Query&Text=cats+XNEAR2+dogs This query returns only documents in which the term dogs follows the term cats and is exactly two words away from the term cats. Documents that contain, for example, This operator is available only for the Text action parameter. |
SENTENCE
|
Returns only documents in which the second term is in the same sentence as the first term. For example: action=Query&Text=cats+SENTENCE+dogs This query returns only documents in which the term dogs occurs in the same sentence as the word cats. IDOL Content Component breaks the document into sentences by using a number of criteria. The most important criteria is the detection of an end of sentence marker, which includes a period (.), question mark (?), or exclamation point (!), as well as their multibyte variants. However, the presence of one of these characters is not always sufficient to mark the end of a sentence, because these characters are often used in abbreviations, names, and other items for purposes other than the end of a sentence. To locate a more accurate sentence boundary, IDOL Content Component also uses characteristics such as capitalization and syntactic observations. |
SENTENCENN
|
Returns only documents in which the second term is in the same sentence as the first term, and they are within action=Query&Text=cats+SENTENCE10+dogs This query returns only documents in which the term dogs occurs in the same sentence as, and within 10 words of, the word cats. NOTE: |
DSENTENCE
|
Returns only documents in which the second term occurs later than the first term, in the same sentence. For example: action=Query&Text=cats+DSENTENCE+dogs This query returns only documents in which the term dogs occurs later in the same sentence than the word cats. |
DSENTENCENN
|
Returns only documents in which the second term occurs later than the first term, and within action=Query&Text=cats+DSENTENCE10+dogs This query returns only documents in which the term dogs occurs later in the same sentence as, and within 10 words of, the word cats. NOTE: |
PARAGRAPH
|
Returns only documents in which the second term is in the same paragraph as the first term. For example: action=Query&Text=red+PARAGRAPH+green This query returns only documents in which the term green occurs in the same paragraph as the word red. The words do not have to occur in the same sentence in the paragraph. |
PARAGRAPHNN
|
Returns only documents in which the second term is in the same paragraph as the first term, and they are within action=Query&Text=cats+PARAGRAPH20+dogs This query returns only documents in which the term dogs occurs in the same paragraph as, and within 20 words of, the word cats. NOTE: |
Other Search Operators
If you set XMLFullStructure to True
in the [Server]
section of the IDOL Server configuration file, you can use the following operators in Text and FieldText queries to return XML documents in which fields or attributes occur together, or do not occur together.
WHEN
|
Return only XML documents in which two fields that have the same parent field contain specified terms or phrases. |
WHENN
|
Return only XML documents in which two fields that have the same parent field at N levels from the root level contain specified terms or phrases. |
NOTWHEN
|
Return only XML documents in which the first field and value pair occurs, but the second field and value does not occur in the same parent field. |
Examples
Consider the following three XML documents:
Document 1: <DOC> <car> <make>audi</make> <color>red</color> </car> <car> <make>mercedes</make> <color>silver</color> </car> </DOC> |
Document 2: <DOC> <car> <make>audi</make> <color>silver</color> </car> <car> <make>mercedes</make> <color>red</color> </car> </DOC> |
Document 3: <DOC> <car> <make>audi</make> <body> <color>red</color> </body> </car> <car> <make>mercedes</make> <body> <color>silver</color> </body> </car> </DOC> |
-
The following query returns only XML documents in which the
make
andcolor
fields are direct children of the same parent field, and contain the valuesaudi
(in themake
field) andred
(in thecolor
field).action=Query&Text=audi:make+WHEN+red:color
This query returns Document 1, but not Document 2 or Document 3.
-
The following query returns only XML documents in which the
make
field contains the valueaudi
, and thecolor
field contains the valuered
, within a parent field that is two levels from the root field.action=Query&Text=audi:make+WHEN2+red:color
This query returns Document 1 and Document 3, but not Document 2.
-
The following query returns only XML documents in which the
make
field contains the valueaudi
, under a parent field that does not also have acolor
field with the valuered
.action=Query&Text=audi:make+NOTWHEN+red:color
This query returns Document 2 and Document 3, but not Document 1.
Nested Expressions
You can use complex nested expressions with the WHEN
and NOTWHEN
operators in query text. For example:
action=Query&Text=(London:CITY WHEN English:LANG) WHEN ("United Kingdom":COUNTRY)
action=Query&Text=(Lincoln:CITY NOTWHEN Nebraska:STATE) NOTWHEN ("United Kingdom":COUNTRY)
Field Attributes
You can use WHEN
and NOTWHEN
to return only XML documents in which two attributes that occur in the same field contain (or do not contain) specified terms or phrases.
For example:
-
The following query returns only XML documents in which the
LANG
andCAPITAL
attributes occur in the same field, and contain the valuesEnglish
(in theLANG
attribute) andCape Town
(in theCAPITAL
attribute).action=Query&Text=English:_ATTR_LANG WHEN "Cape Town":_ATTR_CAPITAL
This query returns a document that contains the following field:
<COUNTRY CAPITAL="Cape Town" LANG="English" POP="44">South Africa</COUNTRY>
The following field does not match the query:
<COUNTRY CAPITAL="Cape Town" LANG="Afrikaans" POP="10">South Africa</COUNTRY>
-
The following query returns only XML documents in which the
LANG
attribute contains the valueEnglish
in a field where theCOUNTRY
attribute does not contain the valueUK
.action=Query&Text=English:_ATTR_LANG NOTWHEN London:_ATTR_CAPITAL
This query returns a document that contains the following field:
<COUNTRY CAPITAL="Canberra" LANG="English" POP="44">Australia</COUNTRY>
It does not match the following field:
<COUNTRY CAPITAL="London" LANG="English" POP="44">United Kingdom</COUNTRY>
You can also use WHEN
and NOTWHEN
to return only XML documents in which a field contains a specified term or phrase, and has an attribute that has a specific value.
For example:
action=Query&Text=Fr.html:_ATTR_HREF WHEN France:A
This query returns only XML documents in which the A
field contains the value France
, and has an HREF
attribute with the value Fr.html
.
The following document, for example, returns in this query:
<XML> <DOC> <A HREF="Fr.html">France</A> </DOC> </XML>
The following document does not return:
<XML> <DOC> <A HREF="France.html">France</A> is next to <A HREF="Fr.html">Belgium</A> </DOC> </XML>