Word indexing parameters
IMPORTANT: These rules only apply when you are not using SQL Text Indexing.
The following are the rules Content Manager uses when, while indexing, it encounters certain types of characters in text fields:
- / - words with a forward slash in them, for example, advantage/disadvantage, will be indexed as one single string and as individual words.
- Split words that have a forward slash in between, for example, advantage / disadvantage, will be indexed as individual words unless they are dates or numeric, for example, 2002/2003.
NOTE: Indexing process of a forward slash (/) - the indexing process will handle the slash by indexing the entire word containing the slash by considering the slash the same as any other alphabetic character. It will then index the word by considering the slash to be a space.
If the character string before or after the forward slash contains any numbers, Content Manager will treat the word as a reference and not split it.
So for example, the text: The rain in spain/portugal/morocco falls mainly on plain no. 04/2345 would be indexed with the following words:
THE
RAIN
IN
SPAIN/PORTUGAL/MOROCCO
SPAIN
PORTUGAL
MOROCCO
FALLS
MAINLY
ON
PLAIN
NO
04/2345
04
2345
Note also that double slash indicates the start of a comment.
- () or [] or <> - brackets around a word are stripped off, for example, (CANBERRA) is indexed as CANBERRA
- ''or '' - double and single quotes are stripped off, for example, 'GREAT' ends up as GREAT and 'BARRIER' is indexed as BARRIER
- Numbers, even single digit numbers, are kept
- The following values will be stripped when indexing:
- Leading or Trailing < > ( ) ' ' ' '
- The following values will be indexed as part of a word, but not as single characters:
- @ # $ % ^ & * -
- ~ - the tilde is indexed only when enclosed in the word or trailing, not leading
- The following single character symbols mark the end of a word and will not be indexed:
- Tab
- Carriage Return
- Line Feed
- Space
- Comma (,) if previous or next character is a blank space
- Semi colon (;)
- Colon (:)
- Exclamation mark (!)
- Question mark (?)
- Full stop (.) if previous character is not a number
- Single character special symbols are ignored, including the following:
- Equals sign (=)
- Plus sign (+)
- Barrier (|)
- Back slash (\)
- Bracket ({)
- Bracket (})
- Square bracket ([)
- Square bracket (])
- Apostrophe (`)