SQL Text Index

The SQL (Structured Query Language) Text Index feature allows organizations to index the Content Manager word indexes, such as Title, Notes, text type Additional Fields, Classifications, Location Addresses and so on, using native database indexing.

SQL Text Indexing is supported for MS SQL Server, Oracle and PostgreSQL RDBMs (see CM23.4_Spec.pdf for supported versions), and depends on the respective text indexing component of the RDBMS.

IMPORTANT: A pre-requisite for using SQL Text Indexing on MS SQL Server and Oracle, is that you must have their associated 'full text' features installed and configured before upgrading or enabling SQL Text Indexing. PostgreSQL has built in support for text searching, so no additional components are required.

NOTE: When creating a new Content Manager dataset, you can opt to initialize it to use a SQL Text Indexing, see Creating a Content Manager dataset - Initialization dialog box for details. When creating a new dataset there is no requirement to re-index.

To access the SQL Text Index options:

  1. Expand Datasets
  2. Right-click the dataset you want to work with and point to SQL Text Indexing. The available options are:
    • Recreate
    • Check
    • Reindex
    • Synchronize Stoplist
    • Recreate indexes

NOTE: These options can also be accessed from the Dataset ribbon, in the SQL Text Index group.

Configuring the SQL Text Index

  1. Expand Datasets
  2. Right-click the dataset you want to work with and point to SQL Text Indexing and click Recreate. The Configuring SQL Text Indexing dialog is displayed.

NOTE: The SQL Text Indexing dialog is also displayed as a part of the new dataset creation process; as well as when the SQL Text Index is re-created via the right-click SQL Text Indexing - Create option.

  1. Word breaker language - by default this is set to Neutral. Select the word breaker language from the drop-down. The available languages are Dutch, English, French, and German.
  2. If you wish to ignore the accents on characters so all words are indexed as the same words, select Ignore accents on characters (treats garçon and garcon as identical words).
  3. For Oracle datasets only, the options Parallel Degree and the Word Type - Synchronization Types are available.
    • Parallel Degree - by default this is set to 4 - set the number of parallel execution servers associated with a single operation. This can be set up to 99.
    • Word Type Synchronization Types - from the drop-down list, select the Synchronization Type, select from:
      • On Commit - select this option to synchronize the indexes at the point of creation of the object.
      • On Regular interval - select this option to synchronize the indexes at regular intervals. The interval period is set in the Interval (Mins) option.
      • Use custom text indexing elements for index - select this option if you have custom text indexing. This option will be enabled only when you add or modify the preferences (tswordlist and tslexer) and reindexing with this option will affect the changes to index tokenization to the Content Manager full-text index tables.

        For Content Manager to work with custom text index elements, add/modify the two database full text index objects, tswordlist and tslexer as per your requirement.

        For details on full text indexing settings and preferences, and how each of these will impact the functionality or the way of working of the indexes and its performance, see:

        https://docs.oracle.com/en/database/oracle/oracle-database/19/ccref/oracle-text-indexing-elements.html#GUID-22680996-89CF-4666-89E6-BE8B9431DB02.

        Some of the settings and preferences may increase storage space and have an have impact on the performance.

        The following is an example code snippet to create or modify the full text indexes objects:

        declare
        v_cnt pls_integer;
        begin
        SELECT count(1) into v_cnt FROM CTX_USER_PREFERENCES WHERE PRE_NAME='TSWORDLIST';
        if v_cnt > 0 THEN
        ctx_ddl.drop_preference('tswordlist');
        END IF;
        end;
        /
        begin
        ctx_ddl.create_preference('tswordlist', 'BASIC_WORDLIST');
        ctx_ddl.set_attribute('tswordlist','PREFIX_INDEX','TRUE');
        ctx_ddl.set_attribute('tswordlist','PREFIX_MIN_LENGTH',3);
        ctx_ddl.set_attribute('tswordlist','PREFIX_MAX_LENGTH', 4);
        ctx_ddl.set_attribute('tswordlist','SUBSTRING_INDEX', 'YES');
        end;
        /
        
        declare
        v_cnt pls_integer;
        begin
        SELECT count(1) into v_cnt FROM CTX_USER_PREFERENCES WHERE PRE_NAME='TSLEXER';
        if v_cnt > 0 THEN
        ctx_ddl.drop_preference('tslexer');
        END IF;
        end;
        /
        begin
        ctx_ddl.create_preference('tslexer', 'BASIC_LEXER');
        ctx_ddl.set_attribute('tslexer', 'index_text', 'YES');
        --ctx_ddl.set_attribute('tslexer', 'printjoins', '*,:;_');
        ctx_ddl.set_attribute('tslexer','skipjoins', '-');
        end;
        /
        
        DROP INDEX "TINDXTSTXTIDXLO" ;
        
        CREATE INDEX "TINDXTSTXTIDXLO" ON "TSTXTIDXLO" ("TEXT")
        INDEXTYPE IS "CTXSYS"."CONTEXT"  PARAMETERS ('Wordlist tswordlist lexer tslexer SYNC( ON COMMIT ) STOPLIST tsstoplist STORAGE tsstore ')
        PARALLEL 1 ;
        ALTER INDEX "TINDXTSTXTIDXLO" PARALLEL 4;

        NOTE: Once you enable the Use custom text indexing elements for index option, you must re-index the records.

        NOTE: When you re-index the records, for the Use custom text indexing elements for index option to remain enabled, make sure to select all the Word Types in the Select Word Types tab.

  4. Click OK.

IMPORTANT: When upgrading to SQL Text indexing it is necessary to run a Reindex to ensure all existing supported records/objects can be searched for using text search methods.

Checking the SQL Text Index

  1. Expand Datasets
  2. Right-click the dataset you want to work with and point to SQL Text Indexing and click Check.

Content Manager will run a check to determine that all the required schema elements are available. If issues are found, a warning to repair the index will be displayed.

NOTE: Content Manager will display SQL Text Index is partially created against the Dataset name on the Enterprise Studio home page. This will indicate that a Check and repair is required. Once the repair has completed, a re-index is required.

Reindexing the SQL Text Index

When a new SQL Text Index is created, or there has been an ingestion of objects into the CM dataset, it is necessary to reindex the dataset so the objects are searchable.

  1. Expand Datasets
  2. Right-click the dataset you want to work with and point to SQL Text Indexing and click Reindex. The SQL Text Reindexing Tool dialog is displayed.
  3. In the Select Word Types tab, select the objects to be reindexed.
  4. In the Configuring SQL Text Indexing tab, select the required options. See Configuring SQL Text Indexing.
  5. In the Options tab, select the required options:

    • Limit how many rows can be updated at any one time - select this option to limit the number of rows that are processed at one time and execute the operation in batches. This option may be of use when reindexing large datasets. Set the maximum number of rows to update at any one time. If the reindexing processes are timing out, reduce this number.
    • Use CTAS approach when rows to reindex exceeds - option to reindex using CTAS approach that has better performance than the default one.
    • Use single thread per word type - if checked, the re-indexing process runs in parallel using a single thread per word type.
    • Insert notes in chunks - if checked, the notes and text of additional fields are copied using Insert or Update command in chunks.

  6. Click OK to run the reindex. The Dataset Work In Progress dialog will be displayed.
  7. Select the Pause After option you want to apply:
    • Each Step - pause after each step is completed
    • Major Steps - pause after each major step is completed
    • Complete All Steps - complete all steps without pausing
  8. Click Start to start the upgrade process.
  9. Once all steps are completed, click Done to close the Dataset Work In Progress dialog. If required, click View Log to view the generated log file.

Synchronize Stoplist

Select this option to synchronize the SQL Text custom stoplist with the Content Manager noise words.

IMPORTANT: If using PostgreSQL, the trimstopword PostgreSQL stop word file must be located in the PostgreSQL \share\tsearch_data folder before running the Stoplist synchronization.

Recreate SQL Text indexes

Select this option to recreate SQL Text indexes.

  1. Expand Datasets
  2. Right-click the dataset you want to work with and point to SQL Text Indexing and click Recreate indexes. The Recreating SQL Text indexes dialog is displayed.
  3. Select one or more check boxes or click Select All to select indexes to recreate. The following are the options:

    • Record Titles
    • Records Notes
    • Classifications
    • Schedules
    • Thesaurus Terms
    • Location Addresses
    • Communication Addresses
    • Archiving Events
    • Workflows
    • Activities
    • Text Additional Fields
  4. In the Configuring SQL Text Indexing tab, select the required options. See Configuring SQL Text Indexing.
  5. Click OK.