Processing and processing agent
The following FAQs address questions about file processing and the processing agent.
Some tasks require multiple processes, or steps, to complete and the task requested is in between steps and waiting to be picked up for the next step. For example, you want to send the items in a workbook to a target. For the items in this workbook, only the metadata was indexed and the source and destination are managed by different agent clusters. In this scenario, the items must be collected before they can be sent to the defined target. This task may show a "waiting" task status after the items are collected as the task waits to be picked up to send the items to the target.
The assigned agent may not be reachable. If the task status remains "waiting", ensure that the agents in the agent cluster assigned to the task are running and accessible. Specifically, verify that the agentAPI service is running on the agent host assigned to perform the task.
OpenText Core Data Discovery & Risk Insights tracks file deletions when a processing job runs against a dataset. A job run occurs when a dataset is updated, either run on a schedule or manually updated from the Manage Datasets page in Connect (click the inline update icon for the dataset or the Update button in the dataset detail panel).
File systems
OpenText Core Data Discovery & Risk Insights tracks file deletions by directly comparing with the original file system location identified by the dataset path. Items are removed from the application index seven days after the deletion from the source location is detected. If an item within a container file (such as ZIP) is deleted in the original file system location, the item is removed from the index as part of updating the container file when the job run occurs. In this case, the item may be removed from the index sooner than seven days after deletion is detected.
Exchange
No deletion detection from Exchange. OpenText Core Data Discovery & Risk Insights retains items it has already processed until a delete action is initiated from the application.
SharePoint
OpenText Core Data Discovery & Risk Insights tracks the deletion of managed SharePoint items made at the original source location using the SharePoint change logs. Each time processing is run on a dataset—on a schedule, or on demand—OpenText Core Data Discovery & Risk Insights checks the SharePoint change logs for deleted items. For each managed item that is deleted in SharePoint, that item is deleted from OpenText Core Data Discovery & Risk Insights. If an item within a container file (such as ZIP) is deleted in SharePoint, the item is removed from the application as part of updating the container file when the job run occurs.
To ensure accurate tracking of items deleted from SharePoint, ensure that the SharePoint datasets in OpenText Core Data Discovery & Risk Insights are updated more often than the maximum number of days SharePoint change logs are kept. OpenText Core Data Discovery & Risk Insights uses information from the SharePoint change logs to identify deleted SharePoint items to be removed from the application. Without this information, items deleted from your SharePoint environment cannot be removed from the application. SharePoint items that have been added or modified are appropriately updated in OpenText Core Data Discovery & Risk Insights.
For example, if your SharePoint change logs are configured to be stored for 60 days, verify that your SharePoint datasets are updated at least every 59 days.
CAUTION: Failure to rescan SharePoint datasets before SharePoint change logs are purged will result in items being tracked incorrectly in OpenText Core Data Discovery & Risk Insights. Using the same example, if your SharePoint change logs are configured to be stored for 60 days and your SharePoint datasets are updated every 90 days, you will lose 30 days of important information about deleted items—items deleted during this 30 day time frame will not be removed from the application.
The loss of information cannot be reconciled in OpenText Core Data Discovery & Risk Insights; you would have to create a new dataset and start over.
Content Manager
OpenText Core Data Discovery & Risk Insights tracks the deletion of managed Content Manager items using the Content Manager delete events. Each time processing is run on a dataset—on a schedule, or on demand—the application checks the delete events. For each managed item that is deleted in Content Manager, the application deletes that item from the index. If an item within a container file (such as ZIP) is deleted from Content Manager, the item is removed from the index as part of updating the container file when the job run occurs.
To ensure accurate tracking of items deleted from Content Manager, ensure that the Content Manager datasets in OpenText Core Data Discovery & Risk Insights are updated more often than the maximum number of days the Documentum audit trails are kept. For example, if your Content Manager administrator purges delete events every 60 days, verify that your Content Manager datasets are updated at least every 59 days.
Documentum
OpenText Core Data Discovery & Risk Insights tracks deletions of managed Documentum items using the Documentum audit trail. OpenText Core Data Discovery & Risk Insights uses the dm_destroy event from the Documentum audit trail to identify deleted Documentum items to be removed from the application. Without this information, items deleted from your Documentum environment cannot be removed from the application. Documentum items that have been added or modified are appropriately updated in OpenText Core Data Discovery & Risk Insights.For example, if your Documentum audit trail is configured to be purged after 90 days, verify that your Documentum datasets are updated at least every 89 days.
CAUTION: Failure to rescan Documentum datasets before the Documentum audit logs are purged will result in items being tracked incorrectly in OpenText Core Data Discovery & Risk Insights. Using the same example, if your Documentum audit trail is configured to be purged after 60 days and your Documentum datasets are updated every 90 days, you will lose 30 days of important information about deleted items—items deleted during this 30 day time frame will not be removed from the application.
The loss of information cannot be reconciled in OpenText Core Data Discovery & Risk Insights; you would have to create a new dataset and start over.
Google Drive
OpenText Core Data Discovery & Risk Insights tracks the deletion of managed Google Drive items using the change log for the Google drive defined by the source in Connect. Each time processing is run on a dataset—on a schedule, or on demand—the application checks the change logs for deleted items. For each managed item that is deleted in Google Drive, the application deletes that item from the application index. If an item within a container file (such as ZIP) is deleted in Google Drive, the item is removed from the index as part of updating the container file when the job run occurs.
To ensure accurate tracking of items deleted from Google Drive, ensure that the Google Drive datasets in Connect are updated more often than the maximum number of days Google Drive change logs are kept. For example, the default retention for change logs is 30 days. Verify that your Google Drive datasets are updated at least every 29 days.
As the processing agent reads each file, it generates a SHA384 (Secure Hash Algorithm) checksum of the file. This is a hash function which takes an input, in this case the file, and produces an item hash value that is stored in the index. This means that OpenText Core Data Discovery & Risk Insights generates a fingerprint that can identify a file, excluding the name of the file.
Using the function of the deduplication task workbook, you define a dataset to represent official records to compare against, or you define rules to identify master items to compare against. The identified duplicate items (based on the metadata and content associated with the hash) and all family members of those items (such as attachments or parent item) are added to the workbook.