Content Index Properties dialog box
The Content Index Properties dialog box enables you to configure document content indexing in Content Manager. To access the Content Index properties dialog, expand the Datasets node, right-click on the dataset name, point to Content Index and click Properties.
See also Setting up document content indexing and searching and CM23.4_IDOL_DCI_Install_Config.pdf or CM23.4_ElasticsearchInstall_Config.pdf.
CAUTION: You must suspend the content indexing processor before changing the properties of the document content index. See Configuring event processing.
- Right-click the dataset to work with and point to Content Index and then click Create Elasticsearch Content Index. The Elasticsearch Content index properties - <database name> dialog appears.
- Provide detailed logging for content indexing operations - select this option to generate detailed log files.
- A CFS connector is also indexing this dataset - select this option if you have also installed and configured the CFS connector. Content Manager will send messages to the CFS connector to ensure that the Elasticsearch index is up to date with changes to the Content Manger database.
- Use system HTTP proxy when connecting to Elasticsearch - select this option if your organization needs to connect to Elasticsearch using a proxy.
- Elasticsearch Server URL - enter the Elasticsearch Server URL. If using the X-Pack authentication, specify the https-based URL for the ES server here.
-
Elasticsearch Index Name - Name of the Elasticsearch index to use for this CM dataset. The characters in the name must be alphanumeric characters or underscores.
- Index text that was deleted from a document with revision tracking - select this option to index content that was deleted from a document that was edited using revision tracking/ "tracked changes".
- Index hidden text from Microsoft Office files - select this option to index content that is marked 'hidden' in documents, for example, hidden cells in Microsoft Excel.
- Enable Stemming - from the drop-down list select the Elasticsearch stemmer to use, if the required stemmer is not available, type in the name of the Elasticsearch stemmer. If a custom stemmer is entered, you should ensure that is supported by the Elasticsearch version you are using.
If no stemmer is chosen the content will be indexed without stemming.
Selecting a stemmer that matches the language used in the majority of the records and documents will ensure the return of more relevant search results.
This option will be disabled once the index has been created and it cannot be changed unless the index is removed and re-created. - Number of Shards - sets the number of shards contained in each Elasticsearch index. The default setting is 3. Customers should avoid setting many shards per index as it can have an impact on performance. It is recommended that organisations avoid a shard size of more than 50GB.
This option will be disabled and cannot be changed once the index has been created.
An example: if the index size is set to 60 GB and the number of shards to three, the index will have three shards each having a size of about 20GB. Note, the final index size will vary depending on how you have configured shard replication on your Elasticsearch cluster. In this example replication is turned off. - Enable index lifecycle management policy - select this option to rollover the index to another index when the index rollover capacity is reached. This is useful in large environments that are ingesting lots of new content. If this option is not selected and an index size is not specified an index has the potential to reach a theoretical maximum at which point the Elasticsearch index may experience issues, for example, shard size exceeds 50GB. Using a lifecycle policy based on index size allows you to keep the shard size under 50GB and grow you content index over time. Indexes can be rolled over based on the following settings:
- Index Rollover Maximum Size - set the maximum size of the index before it is rolled over to another index.
- Index Rollover Maximum Age - set the maximum age of the index before it is rolled over to a another index.
- Index Rollover Maximum Documents - set the maximum number of Elasticsearch JSON documents that can be created before it is rolled over to another index.
Index Rollover Maximum Primary Shard Size - triggers a rollover when the size of the largest primary shard reaches this value.
- Index Rollover Maximum Primary Documents - triggers a rollover when the number of documents in a primary shard reaches this value.
These options will be disabled and cannot be changed once the index has been created.
- Index text that was deleted from a document with revision tracking - select this option to index content that was deleted from a document that was edited using revision tracking/ "tracked changes".
- Index hidden text from Microsoft Office files - select this option to index content that is marked 'hidden' in documents, for example, hidden cells in Microsoft Excel.
- Index the tag names within XML files - select this option to index XML Format text files.
- Set Microsoft Rights Management Services (RMS) credentials - enter the credentials for the Microsoft Rights Management Services (RMS) to allow Microsoft Information Protection (MIP) encrypted documents to be indexed.
- Tenant ID - enter the RMS Tenant ID.
- Set Client Secret - if required, click the Set Client Secret option to set the password.
- Client ID - enter the RMS Client ID.
- Tenant ID - enter the RMS Tenant ID.
-
Elasticsearch Transaction Limit (KB) - sets the size limit for content to be sent to Elasticsearch for indexing. The transaction limit can be set between 1KB and 1 000 000 KB.
• Documents that are indexed via the Content Index event processor are queued and processed in a batch based on a timer event. If the amount of text extracted for the queued documents is greater than the transaction limit it is split up and sent in as many transactions as required to complete the indexing. If it’s less than the transaction size, it is sent as a part of the timer event.
• Documents that are reindexed on the client, via the Reindex option in the Elasticsearch group on the Administration tab in the Content Manager client, the transactions are sent to Elasticsearch by the client. If all the extracted text is smaller than the transaction limit, the client sends that, otherwise it is split into batches based on the size of the transaction limit until all documents have been indexed. - Elastic Document Content Field size limit (KB) - set the maximum size of the Content field in an Elasticsearch document that is created as a part of the text extraction process. If there is a large file being indexed the extraction process will create multiple Elasticsearch documents and link them together by URI so they are all referenced to the same record. The limits for the document content field size are 1KB to 100 000KB.
- Search Result Buffer Size - default value of 2000. For customers running searches that retrieve many results, increasing the buffer size will reduce the number of service calls to the Elasticsearch server to retrieve all the results.
- Maximum number of results in any search - the maximum number of results Content Manager will get back from Elasticsearch, by default, this is set to 100,000. If the number of results found by a search exceeds this number, a warning displays and the user only gets this number. The maximum value that can be set is 10,000,000.
-
Elasticsearch Request timeout (s) - Limits how long Content Manager will wait for Elasticsearch to process a single request.
- Elasticsearch scrolling search timeout (s) - default value is 600 - Limits how long Content Manager Elasticsearch should keep the search context alive. It accepts a value between 60 and 86400. When scroll timeout occurs during an Elasticsearch query, a warning message is displayed Content Manager Workgroup Server on 'local' reported an error. Could not serialize the server-side recordset. The search context has expired for this Elasticsearch query. You will need to reissue the request.
- Maximum size of uncompressed file that can be content indexed (MB) - default: 2048. Content Manager does not index the content of documents whose file size is greater than this number in megabytes.
- Maximum size of any archive file (zip, gz, tar) that can be content indexed (MB) - default: 2048. Content Manager does not index the content of archive files whose file size is greater than this number in megabytes.
Enable Elasticsearch X-Pack authentication - select this option to enable X-Pack authentication. You can either specify a user name and password or a certificate.
- User Name - enter the user name to connect to the Elasticsearch server.
- Set Password - click Set Password to enter the password for the user being used for authentication.
- Client Certificate - if using a certificate for the X-Pack authentication, the certificate must be installed to the Personal store of the Local Computer account. Once installed, the certificate will appear in the Client Certificate drop-down list. Select the required certificate from the drop-down list.
TIP: If you get certification validation errors, ensure that the local computer trusts the certificate authority of the certificate that the Elasticsearch server is presenting.
Enable Amazon Web Service (AWS) authentication - select this option to use the Amazon Web Service (AWS) version of the Elasticsearch service.
- Access Key - enter the AWS Access Key. This is the username for the AWS IAM user that has access to the Elasticsearch services inside AWS.
Secret Key - enter the AWS Secret Key. This is the password for the Access Key user.
-
AWS Region - enter the AWS Region name.
- Use Amazon Security Token Service (STS) - To create and provide trusted user with temporary security credentials that can control access to your AWS resources.
- URL – enter the URL for STS endpoint (example: https://sts. ap-southeast-1.amazonaws.com)
- Region - enter the AWS STS Region name (example: ap-southeast-1)
- RoleARN - enter the Role that delegates access to the Amazon AWS resource for the AWS IAM user.
- Query Server Name - the fully qualified name of the computer that runs the Content Manager IDOL Service; the main IDOL Server
- Port - default: 9000. IDOL Server port number, used for searching. See next option for more.
- Index Port - default: 9001. Used for sending documents to the server.
The port numbers should remain at their defaults unless you specifically changed the IDOL configuration file in the folder C:\Micro Focus Content Manager\IDOL\TRIM IDOL Service or your equivalent.
- Use encrypted communications (OEM IDOL) - select this option if the IDOL server is the OEM IDOL that is shipped with Content Manager.
- Server Instance name - name of the instance on the IDOL server that this dataset communicates with. Defaults to CM_[database ID]. Should not be changed, unless you re-index the dataset afterwards, as Content Manager Enterprise Studio would create a new instance on the IDOL server when the instance name changes.
- Create Instance - select this option to create the IDOL instance for this Content Manager dataset to use CM_[database ID]
- Delete Instance - select this option to delete the IDOL instance. This will destroy the content index, requiring a full reindex (or restore from backup) to get it back.
- Test - click to test the setup. If it fails, it means that Content Manager cannot communicate with the IDOL Server. Check that the Content Manager Enterprise Studio IDOL configuration data above is correct, for example the server name. Next, check that the IDOL Server is running correctly:
- For the IDOL OEM version, check your setup against the installation and configuration instructions in CM23.4_IDOL_DCI_Install_Config.pdf
- If you are using IDOL Enterprise Server, check your IDOL setup against the IDOL documentation. If you are using the Content Manager CFS connector, see CM23.4_IDOL_DCI_Install_Config.pdf for installation and configuration details.
- Index directly into IDOL via CM Event Processor - if this option is selected whenever Content Manager detects a document or record metadata requires re-indexing, it will call the IDOL server directly to do this using the Content Manager IDOL format.
- A CFS connector is indexing this dataset (for an external database, or for searching within Content Manager) - select this option if you are using an Enterprise version of IDOL, and have also installed and configured the CFS connector. Content Manager will send messages to the CFS connector to ensure that the IDOL index is up to date with changes to the Content Manger database.
- Maximum queued transactions - default: 20, which is the number of records in the queue. The smaller this number, the shorter the queue, which makes processing more reliable. There should be no need to change this number other than for troubleshooting. In that case, you could even reduce it to 1 to process one record at a time.
- Maximum size of any file that can be content indexed (MB) - default: 200. Content Manager does not index the content of documents whose file size is greater than this number in megabytes.
- Maximum size of any archive file (zip, gz, tar) that can be content indexed (MB) - default: 50. Content Manager does not index the content of archive files whose file size is greater than this number in megabytes.
- Provide extensive logging for index management - select for detailed log files.
See Logging for the Content Manager and IDOL content indexing log file locations
- Index text that was deleted from a document with revision tracking - select this option to index content that was deleted from a document that was edited using revision tracking/ "tracked changes".
- Index hidden text from Microsoft Office files - select this option to index content that is marked 'hidden' in documents, for example, hidden cells in Microsoft Excel.
- Index the tag names within XML files - select this option to index XML Format text files.
- Set Microsoft Rights Management Services (RMS) credentials - enter the credentials for the Microsoft Rights Management Services (RMS) to allow Microsoft Information Protection (MIP) encrypted documents to be indexed.
- Tenant ID - enter the RMS Tenant ID.
- Set Client Secret - if required, click the Set Client Secret option to set the password.
- Client ID - enter the RMS Client ID.
- Tenant ID - enter the RMS Tenant ID.
- IDOL Connection Timeout (milliseconds) - default: 60000, which is one minute and the number of milliseconds for the Workgroup Server connection with IDOL before timing out. If you are finding that searches are timing out, you may want to try to change this value. It should have a higher value than the value of QueryTimeoutInMilliseconds in the IDOL configuration file C:\Program Files\Micro Focus\Content Manager\IDOL\TRIM IDOL Service\TRIM IDOL Service.cfg. To diagnose the issue: When IDOL times out before Content Manager, then it returns a subset of the result, which appears in Content Manager as usual. When Content Manager times out before IDOL, then the Workgroup Server displays an error message.
- Maximum number of results in any search - default value of 100,000 - the maximum number of results Content Manager will get back from IDOL. If the IDOL part of a search finds more than this number, a warning displays and the user only gets this number. The maximum value this can be set to is 10,000,000.
- Search Result Buffer Size - default value of 2000. For customers running searches that retrieve many results, increasing the buffer size will reduce the number of service calls to the IDOL server to retrieve all the results.
- Find records from other datasets - for when a user is searching against an IDOL instance that was created from a different dataset (e.g. an indexed production database where the IDOL index is copied across to a training database). Content Manager won't search multiple IDOL instances, but if this isn't checked, then there won't be any IDOL results that don't match the current database.
NOTE: If the Maximum IDOL Results value is altered, the IDOL configuration files will also need to be updated so that the values match. See Advanced configuration of OEM IDOL.
The options on this tab allow for a connection to an Enterprise IDOL instance via SSL/TLS.
For a basic setup, only the following option is required to be enabled:
- Enable SSL/TLS connection to IDOL - select this option to enable TLS between the client and the IDOL server. The version of TLS will depend on what is configured in IDOL and what is supported by the client operating system. Any errors due to invalid certificates will be ignored by the client.
Communication from the client to an IDOL service is done using both a HTTP/REST interface, as well as using IDOL's ACI API. The REST component requires a certificate from the local certificate store and the ACI API component requires separate certificate/key file in PEM format.For more secure environments where certificate checking is enabled, or required, the following fields are required:
- Client Certificate - from the drop-down list select the SSL certificate used to establish a secure connection with the IDOL Proxy component using HTTP/REST. This is a certificate that has been added to the local certificate store and must include the private key.
- View Certificate - click View Certificate to view the associated certificate.
- ACI Client API Certificate Path - click Browse to navigate to and select the SSL Certificate file used to identify the client to the IDOL Proxy component via the ACI API. The required file format is PEM.
-
ACI Client API Private Key Path - click Browse to navigate to and select the Private Security Key file corresponding to the selected SSL Certificate (Client Certificate). The required file format is PEM.
-
ACI Client API CA Certificate Path - click Browse to navigate to and select the Certificate Authority (CA) certificate file of a trusted authority. The required file format is PEM. The IDOL Proxy component trusts communication only with a peer that provides a certificate signed by the specified CA.
- Click OK to save your changes
- To continue document content indexing, set the content index processor to Enabled.
- In the menu ribbon, click Save and then Deploy to save your configuration changes and deploy them to the Workgroup Servers