Data collection

When repositories are created in Connect, the data within those repositories was processed in one of several ways. The manner in which the processing occurred directly reflects whether an individual item can be previewed in File Analysis Suite.

  • If only metadata was processed, the content of the processed items was not collected and stored in the index. These items cannot be viewed in File Analysis Suite.

  • If metadata and content were processed (an analyzed document), a plain text version of the content was collected and stored in the index. These items can be viewed in File Analysis Suite in a simplified view (no formatting).

When you cannot view the content of the individual items, your ability to fully review items may be impacted. From within individual File Analysis Suite data sources, you have the option to collect items that were not previously collected when processed. During collection, items are de-duplicated to ensure that only one copy of each item is stored and tracked.

Once a collection has started, you can view the progress of the collection in the data source detail panel. You can only run one collection against a data source at any given time. If you need to run another collection, wait for the current collection to complete.

For any documents placed on hold that were not originally collected, a collection is automatically triggered at the time the hold is created. You can view the status of the collection on the COLLECTION ACTIVITY card on the workspace Overview page.

TIP: Once you gather documents into workbooks, you have the option to process all documents in a workbook to index the content (if only metadata was processed) or collect the documents. This method can potentially reduce the number of documents you collect since you have already limited your document set to what is in a given workbook.

You can also cancel a collection that is in progress. Items in mid-process will complete and be collected; all further collection for this data source will stop. Once canceled, a collection cannot be restarted. You must run another collection to collect items from the data source.