Document content search methods
Document content searches search the text of electronic documents attached to records.
There are three methods available for document content searches:
- Document Content - to search for a term in the content of electronic documents
NOTE: For IDOL Content Indexes, when searching for content phrases, you should exclude text symbols, like for example, forward slash (/) from your search criteria, as these are special characters, and the search results may not be what you would expect.
To search for a phrase that contains a text symbol, use the idol search method below and, for example, the criteria:
*&FieldText=STRING{feb28/09}:DRECONTENT, which returns records with electronic documents that contain feb28/09, but not feb28 09.TIP: The default operator when searching for multiple terms using the Document Content search method is AND.
- IDOL Query - to use advanced IDOL queries to search for terms in the content of electronic documents, using AND, OR and proximity statements.
You can actually use the IDOL Query search method to build any search using the IDOL syntax, which may or may not involve document content.
To set the record content that IDOL should be indexing, your Content Manager administrator can use the Record tab option Use content search engine for record search clauses. See Search Options tab for details.
One example are date range searches using the syntax *&FieldText=RANGE{1/1/2013,31/1/2013}:TD_DATEREGISTERED.
See IDOL queries below.
- Elasticsearch Query – to allow direct querying of the Elasticsearch index using a raw JSON search string. For more complex searches and to allow users to perform queries that may not be possible using the standard search methods in Content Manager, this search method allows any valid JSON text to be submitted to Elasticsearch. The search must be a standard Elasticsearch query clause that returns a list of document IDs for matched Records. The document ID of a Record in the Elasticsearch index contains the Record’s URI. All such searches must conform to the following pattern
{ “query”: <search clauses > }
As an example, to return all the Records in the Elasticsearch index, the following search string could be used:{
"query": { "match_all": { } },
"_source": false
}
NOTE: Any search that can be completely handled by IDOL or Elasticsearch, whether it is a content type search (including Title/Notes), or a specialised IDOL or Elasticsearch syntax query string, will return results in weighted order, with the highest scoring results returned first.
NOTE: To use the document content search methods, document content searching has to be configured correctly. See Content Manager Enterprise Studio Help about setting up document content indexing and searching.
- On the Search menu, click Find Records
- At the bottom of the dialogue, click the Editor button and make sure that the Boolean search editor is selected
- Click KwikSelect beside the Search By box
- In the Search Methods dialog box, category Text Search, select Document Content or IDOL Query, or Elasticsearch Query and click OK.
- With Document Content, type the text to search for in the box Enter the search criteria for a content search.
With IDOL Query, enter the IDOL query string in the box. See IDOL queries below for more information.
With Elasticsearch Query, enter the JSON search string in the box. See the Elasticsearch documentation online for further information about the query DSL syntax. - Click OK.
Content Manager returns the list of records.
- To refine your search, press F7
As opposed to the relatively simple Search for Records dialogue box, you can also use multiple Boolean options to find the document content you want:
- On the Search menu, click Find Records
- At the bottom of the dialogue, click the Editor button and make sure that the Boolean search editor is selected
- Click KwikSelect beside the Search By box
- In the Search Methods dialogue box, category Text Search, select Document Content, and click OK
- Type the words to search for in the box Enter the search criteria for a content search
- To the right of the box, click the folder button.
The dialogue box with the tab Document Content Search appears.
- Use one of the Boolean options by clicking one of the buttons:
- AND - documents which contain both of the entered words or phrases anywhere in the text of the document
- OR - documents that contain any one of the entered words or phrases
- BUT - documents that contain the first word or phrase, but not the second
-
Click the New button to add another search term. It will be added as an AND or OR clause depending on which radio button is currently selected. The Delete button removes the currently selected search term. Use the Insert button to place a search term before the currently selected search term. To negate a search term, use the NOT button, and to group search terms, use the parentheses button (…). The Reset button will clear all the search terms
- Alternatively, use the following search operators in the Search Text field - you do have to type them in capital letters:
- AND - for documents which contain both of the entered words or phrases anywhere in the text of the document
- OR - for documents that contain any one of the entered words or phrases
- NOT - for documents that contain the first word or phrase, but not the second. It is not necessary for it to be typed in as AND NOT.
Use the asterisk character (*) to represent one or more unspecified characters.
The question mark (?) wildcard is not available for document content searches.
You can send a search directly to the IDOL Server by using the Content Manager search method IDOL Query. Using this search method, you need to use the IDOL syntax to build the query in the field Enter the search criteria for a content search. Content Manager passes the query directly to the IDOL Server as part of the text= search parameter.
Example:
dogs AND (birds OR cats) - finds all records that contain the word dogs and either the word birds or cats in either document content, title or notes.
You can use other IDOL search parameters by using an ampersand (&) and the parameter name and value in your query.
To distinguish query syntax punctuation from punctuation within strings you are searching for, double-percent-encode commas and curly braces within strings. Query syntax punctuation should be left unencoded. There should be no space before or after a separator comma.
- Use quotes around the phrase, for example, "fruit market"
- Separate multiple phrases using a space, for example, "fruit market" "organic food"
When you are not quite sure of the spelling of some of the words you are searching for, you can use a fuzzy query, which returns results that contain words that are similar to what you entered: DREFUZZY(my search text), for example: DREFUZZY(jamboree).
You can also specifiy a tolerance level for IDOL, which means that it will find more items that are less similar to your search term. Use DREFUZZY<1-6>(my search term), for example, DREFUZZY3(jamboree). A value higher than 6 is not recommended, as the results will be too dissimilar.
The standard IDOL search searches across document content as well as Title and Notes fields. To restrict the search to a specific field, include the field name. A content search uses the field name DRECONTENT, whereas the title field is identified by TS_TITLE. For example, to search for the term cat in electronic document content andTitle field, you would use: cat:DRECONTENT, cat:TS_TITLE
Another example: (dogs or cats):TS_TITLE - finds all records whose Title field contains either cats or dogs
The standard IDOL search searches across document content as well as Title and Notes fields. You can use the FieldText parameter prefixed with an ampersand (&) to search on other IDOL-indexed fields, or for other advanced searching, in addition to other search criteria.
Parameters to use with FieldText
- Match - a value that exactly matches one or more specified strings.
Example: *&FieldText=MATCH{DOC_1234}:TS_NUMBER - finds the record with number DOC_1234 - Range - a date search.
Example: *&FieldText=RANGE{01/01/2013,31/01/2013}:TD_DATEREGISTERED - finds all the records with a Date Registered value in January 2013 - Wild - a wildcard search.
Example: *&FieldText=WILD{*.html}:TS_EXTENSION - finds all the HTML documents
Content Manager fields as they appear in IDOL
| Content Manager field | IDOL field |
|---|---|
| Title | TS_TITLE |
| Record Number | TS_NUMBER |
| Document Content | DRECONTENT |
| Owner Location | TS_OWNERLOCATION |
| Extension | TS_EXTENSION |
| Contact Location | TS_CONTACT |
| Notes | TS_NOTES |
| Date Created | TD_DATECREATED |
| Date Registered | TD_DATEREGISTERED |
| Record URI | TN_URI |
The Location fields have a format of Location name and Location URN, for example, Smith, John [trim:45/loc/1234].
Date fields have the format of YYYY-MM-DD HH:MM:SS.
For details on the IDOL syntax, refer to the IDOL Server Help files.