Term Separators
The IDOL Content component automatically generates separators for each language to determine where one term ends and another begins. These include characters such as spaces, tabs, carriage returns, and line feeds.
To ensure that Content uses a character as a separator, specify it in the AugmentSeparators configuration parameter. Content replaces all separator characters with a space.
For example, the following table describes the query matching for when AugmentSeparators=,-
.
Indexed string | Query terms matched |
---|---|
second-hand guitar
|
|
NOTE: The hyphen is a separator only if it is not listed in HyphenChars, because HyphenChars
takes precedence over separators.
To ensure that Content does not use a character as a separator, specify it in the DiminishSeparators configuration parameter. Content removes nonseparators at index time.
For example, the following table describes the query matching for when DiminishSeparators=_%
.
Indexed string | Query terms matched |
---|---|
file_name
|
filename |
To ensure that Content indexes a character as its own token, specify it in the SoftSeparators
configuration parameter.
For example, the following table describes the query matching for when SoftSeparators=1234567890
.
Indexed string | Query terms matched |
---|---|
459
|
|
In this example, Content tokenizes all numbers as single digits, so that 459
is indexed as 4 5 9
.