Manage sources

In OpenText Core Data Discovery & Risk Insights, a source defines the initial connection to a specific data platform through a selected agent cluster. A dataset defines the subset of the source, the rules and schedule for processing, and the entities to identify within the data during processing.

You must create at least one agent cluster before you can create sources and associated datasets. Because a dataset defines a subset of a source, you must create a source before you can create an associated dataset.

On the Manage Sources page, you can filter the list of sources by source TYPE and AGENT CLUSTER, or search for a source by name.

TIP: For unstructured sources, the size displayed shows for either "Root Documents" or "Root + Child Documents". To change the view, click "Root Documents" or "Root + Child Documents" above the data cards and select the other view. Root documents are top-level documents. Child documents are documents within a container, such as a ZIP file.

From the Manage Sources page, you can view additional information about each source.

NOTE: Your assigned application permissions affect your access to some functions.

  • Hover over or click in the row for a source to display action icons.

    • You can edit (edit icon) or delete (delete icon) the source.

    • For unstructured data sources with documents, you can go to the data volume chart (data volume chart inlne icon), focused on the source. For file system and SharePoint sources containing data, you can go to the sensitive data heat map (sensitive data heat map inline icon) in Manage, focused on the source.

    • For structured data sources, you can inventory (structured data inventory icon) the source to identify tables present in the source and go to the data volume chart (data volume chart inlne icon), focused on the source.

  • Click anywhere in the row for the desired source and then click the open detail pane icon (open detail panel icon). You can take action against the source and review details about the source.

    • You can edit and delete the source.

    • For unstructured data sources with documents, you can go to the data volume chart (data volume chart detail panel icon), focused on the source. For file system and SharePoint sources containing data, you can also go to the sensitive data heat map (sensitive data heat map detail panel icon) in Analyze, focused on the source.

    • For structured data sources, you can inventory the source to identify tables included in the source.

Review the following tasks and considerations for each source type:

  • For all source types, at least one agent cluster must exist prior to creating a source. Selection of an agent cluster is required when you create a source.

  • For all source types, keep in mind that processing of data does not occur at the source level, only at the dataset level.

  • For all source types, you have the option to limit access in OpenText Core Data Discovery & Risk Insights by granting only specific users or groups access to the source.

    CAUTION: If limiting access to a source and an underlying dataset, users without access will not be able to view workspaces with a data source that includes the dataset or view individual items that originated from the dataset.

  • For Exchange sources,

    • before creating Exchange sources and datasets, see Exchange connection to complete additional tasks required for processing data from Exchange.

    • the Exchange source uses an agent to connect to the mail server to process new items, as well as items that already exists on the mail server. This method processes items based on user mailboxes and therefore includes folder information and is subject to user action (such as delete).

  • For file system sources,

    • ensure that your CIFS shares can be accessed by the "System" account.

    • if your file system allows long paths, the machine hosting the processing agent must also be enabled for long paths.

  • For SharePoint sources, only the latest revision of a document is processed. See SharePoint connection to complete the tasks necessary for connection.

  • For OneDrive sources, only the latest revision of a document is processed. See OneDrive connection to complete the tasks necessary for connection.

  • For Content Manager sources, see Content Manager integration to complete the tasks necessary for connection.

  • For Google Drive sources,

    • a dataset connects to a single user's Google Drive. To process items for multiple users, create a Google Drive dataset for each desired user. See Google Drive connection to complete the tasks necessary for connection.

      NOTE: OpenText Core Data Discovery & Risk Insights supports processing of data from Google Workspace's Drive; data from personal Google Drives is not supported.

    • shortcut files that exist on Google drives are not processed.

  • For Documentum sources, the Documentum host URL must be unique across sources. You can define a single Documentum repository per dataset. For additional information about connecting to Documentum sources, see Documentum connection.

  • For Extended ECM sources, you can create a single dataset per source. For additional information about connecting to Extended ECM, see Extended ECM connection.

  • For Confluence sources, see Confluence connection to complete the tasks necessary for connection.

  • For structured data sources, once a source has been inventoried, you cannot change the agent cluster from a cloud agent to a non-cloud agent, or from a non-cloud agent to a cloud agent. For additional information about connecting to structured data, see Structured data connection.

Create a source

You can create sources as needed to connect to data you want processed. The available options vary based on the source type. Follow the instructions for the desired source type.

Edit a source

Sources can be edited as needed. When editing a source, keep the following in mind.

  • If items exist in associated unstructured datasets with OCR enabled or if the structured source has been inventoried, you cannot change the agent cluster type from a cloud-based cluster to a non-cloud cluster, or from a non-cloud cluster to a cloud-based cluster.

  • If you change a path or any of the credentials, you are required to re-enter the password for the defined user.

  • If you change the Security permissions, it may take up to five minutes to take affect. Alternately, the user can update their roles and permissions more quickly by logging out of OpenText Core Data Discovery & Risk Insights and then logging back in.

    NOTE: If an associated user is currently viewing a workspace affected by the change, the user must navigate away from and back to the workspace to view the change.

Inventory a structured data source

The initial inventory of a structured data source is generated when the source is created and saved. The action of inventorying returns the available schemas, the number of tables, and row count. You can manually start an inventory of a structured source to retrieve updates from the source location.

Remove connection to a source

You can remove the connection to a source ("delete" the source) as needed. For unstructured data sources, there cannot be any datasets associated with the source; for structured data sources, there cannot be any datasets associated with the source and there cannot be an active inventory process for the source.

IMPORTANT: If you delete a structured data source, the associated inventory information is removed from OpenText Core Data Discovery & Risk Insights.