Manage repositories

In File Analysis Suite, a repository defines the subpath on a data source, the rules and schedule for processing, and the grammar rules to identify within the data during processing.

Once you have created at least one source, you can create as many repositories as necessary for that source. For example, you want to process data from only specific directories on a given file system. You would create a source for the file system and then create a repository for each of the directories on that file system that contain data you want to process. This lets you focus the processing on only the desired data, omitting known irrelevant data.

On the Manage Repositories page, you can filter the list of repositories by repository TYPE and choose whether to VIEW the list sorted by sources. The analysis information specifies the processing conducted for the repository. The document count for each configured repository reflects the number of parent documents processed (extracted attachments are not included in the count). The document size for each repository represents the size on disk.

From the Manage Repositories page, you can view additional information about each repository.

  • Hover over or click in the row for a repository to display action icons to edit (edit icon), scan (scan icon), activate/deactivate (activate icon/deactivate icon), and delete (delete icon) the repository. For repositories with documents, you can go to the data volume chart (data volume chart inline icon), focused on the selected repository. For file system and SharePoint repositories containing data, you can go to the sensitive data heat map (sensitive data heat map inline icon) in Analyze, focused on the repository.

  • View the processing options (analysis) initially selected for the repository.

  • Click anywhere in the row for the desired repository and then click the open detail panel icon (open detail panel icon) to display repository details. The detail panel includes the options defined for the repository as well as key information about the documents in the repository.

    From the detail panel, you can edit, scan, activate/deactivate, and delete the repository. For repositories with documents, you can go to the data volume chart (data volume chart detail panel icon), focused on the selected repository. For file system and SharePoint repositories containing data, you can go to the sensitive data heat map (sensitive data heat map detail panel icon) in Analyze, focused on the repository.

    • Click the Change link next to the schedule information to open the Edit repository dialog to the Schedule information.

    • If the grammar sets have been updated since the repository was created, a warning icon () and a Re-analyze link display next to the Grammar Sets information. The grammar set as applied to the repository that has been updated (is "out of sync") displays in red text.

      IMPORTANT: Re-analyzing the repository to update the grammar sets is optional. Changes to the grammar sets for this repository are not automatically updated for the repository. Re-analyzing a repository for which the content has not been stored requires reprocessing the full repository and may take additional time and impact server load.

    • On the METRICS tab, view the number of documents that have metadata only processed, analyzed, content stored, collected, and are on hold.

    • On the GRAMMARS tab, view the grammar types and grammar rules defined for the repository.

    • On the ACTIVITY tab, view the details of the last 10 activities performed. If more than 10 activities have been performed, click the MORE link to see the full list for the repository on the Agent Activity page.

    • On the SECURITY tab, view the security policy and associated users and groups that have been given specific access to the repository.

Once created, you can edit the repository as needed. When editing a repository, keep the following in mind.

  • If items exist in this repository, some options may be dimmed and cannot be edited.

  • If items exist in this repository, you cannot change the agent cluster type from a cloud-based cluster to an on-premises cluster, or from an on-premises cluster to a cloud-based cluster.

  • If you change any of the credentials, you are required to re-enter the password for the defined user.

  • You can load the schedule, attribute, and grammar options from a template, even if the repository is based on a different template. This action overrides any existing schedule, attribute, or grammar options and can be refined further as needed.

  • Changes to existing grammar selections trigger a reprocessing of the repository.

    If you have repositories that include grammars applied before the introduction of grammar sets, you will see the originally applied grammars in the list of selected grammars. If you select grammar sets, the originally applied grammars are overridden.

CAUTION: Do not change the repository location—File System directory, SharePoint site URL, or Content Manager dataset—unless the location has actually changed. The physical location (or Exchange group name) must have changed (or be changed) prior to updating the location in this repository.

Do not change an Exchange group name unless the name of the group has changed. The group name in Exchange must be changed prior to updating the group name in this repository. Changes to group names may affect tracking of delete activities.

If you want to create a repository with the same definitions of the repository you are editing, create a new repository.

If you need to process a repository outside of the scheduled run time, you can manually start a scan of the repository. If you request to scan a repository and the repository is currently processing, the scan request is not acted upon. The scan action cannot be taken while the repository is initializing following repository creation.

Once created, a repository can be deactivated and then activated as needed. A deactivated repository cannot be processed. If the repository was already processing data, no additional data is processed once the repository is deactivated. Deactivated repositories cannot be edited either. Deactivated repositories display a gray icon next to the repository name.

You can remove the connection to a repository ("delete" the repository) if there are no active data sources associated with the repository (through workspaces) and no documents associated with the repository are on hold. If the repository has associated documents, you can deactivate the repository but you cannot delete it.