Processing and processing agent
The following FAQs address questions about file processing and the processing agent.
Some tasks require multiple processes, or steps, to complete and the task requested is in between steps and waiting to be picked up for the next step. For example, you want to send the items in a workbook to a target. For the items in this workbook, only the metadata was indexed and the source and destination are managed by different agent clusters. In this scenario, the items must be collected before they can be sent to the defined target. This task may show a "waiting" task status after the items are collected as the task waits to be picked up to send the items to the target.
The assigned agent may not be reachable. If the task status remains "waiting", ensure that the agents in the agent cluster assigned to the task are running and accessible. Specifically, verify that the agentAPI service is running on the agent host assigned to perform the task.
Fusion tracks file deletions when a processing job runs against a Fusion dataset. A job run occurs when a dataset is updated, either run on a schedule or manually updated from the Manage Datasets page in Connect (click the inline update icon for the dataset or the Update button in the dataset detail panel).
File systems
Fusion tracks file deletions by directly comparing with the original file system location identified by the dataset path. Items are removed from the Fusion index seven days after the deletion is detected. If an item within a container file (such as ZIP) is deleted in the original file system location, the item is removed from the index as part of updating the container file when the Fusion job run occurs. In this case, the item may be removed from Fusion sooner than seven days after deletion is detected.
Exchange
No deletion detection from Exchange. Fusion retains items it has already processed until a delete action is initiated from Fusion.
SharePoint
Fusion tracks the deletion of managed SharePoint items using the SharePoint change logs. Each time processing is run on a dataset—on a schedule, or on demand—Fusion checks the SharePoint logs for deleted items. For each managed item that is deleted in SharePoint, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted in SharePoint, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from SharePoint, ensure that the SharePoint datasets in Fusion are updated more often than the maximum number of days SharePoint logs are kept. For example, if your SharePoint logs are configured to be stored for 60 days, verify that your SharePoint datasets are updated at least every 59 days.
Content Manager
Fusion tracks the deletion of managed Content Manager items using the Content Manager delete events. Each time processing is run on a dataset—on a schedule, or on demand—Fusion checks the delete events. For each managed item that is deleted in Content Manager, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted from Content Manager, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from Content Manager, ensure that the Content Manager datasets in Fusion are updated more often than Content Manager administrator purges delete events. For example, if your Content Manager administrator purges delete events every 60 days, verify that your Content Manager datasets are updated at least every 59 days.
Google Drive
Fusion tracks the deletion of managed Google Drive items using the change log for the Google drive defined by the Fusion repository. Each time processing is run on a dataset—on a schedule, or on demand—Fusion checks the change logs for deleted items. For each managed item that is deleted in Google Drive, Fusion deletes that item from the Fusion index. If an item within a container file (such as ZIP) is deleted in Google Drive, the item is removed from the index as part of updating the container file when the Fusion job run occurs.
To ensure accurate tracking of items deleted from Google Drive, ensure that the Google Drive datasets in Fusion are updated more often than the maximum number of days Google Drive change logs are kept. For example, the default retention for change logs is 30 days. Verify that your Google Drive datasets are updated at least every 29 days.
As the processing agent reads each file, it generates a SHA1 (Secure Hash Algorithm) checksum of the file. This is a hash function which takes an input, in this case the file, and produces an item hash value that is stored in the Fusion index. This means that Fusion generates a fingerprint that can identify a file, excluding the name of the file.
Using the function of the deduplication task workbook, you define a dataset to represent official records to compare against, or you define rules to identify master items to compare against. The identified duplicate items (based on the metadata and content associated with the hash) and all family members of those items (such as attachments or parent item) are added to the workbook.