Manage sources

In Fusion, a source defines the initial connection to a specific data platform through a selected agent cluster. A dataset defines the subset of the source, the rules and schedule for processing, and the entities to identify within the data during processing.

You must create at least one agent cluster before you can create sources and associated datasets. Because a dataset defines a subset of a source, you must create a source before you can create an associated dataset.

On the Manage Sources page, you can filter the list of sources by source TYPE and AGENT CLUSTER, or search for a source by name.

From the Manage Sources page, you can view additional information about each source.

NOTE: Your assigned Fusion permissions affect your access to some functions.

  • Hover over or click in the row for a source to display action icons.

    • You can edit (edit icon) or delete (delete icon) the source.

    • For unstructured data sources with documents, you can go to the data volume chart (data volume chart inlne icon), focused on the source. For file system and SharePoint sources containing data, you can go to the sensitive data heat map (sensitive data heat map inline icon) in Manage, focused on the source.

    • For structured data sources, you can inventory (structured data inventory icon) the source to identify tables present in the source and go to the data volume chart (data volume chart inlne icon), focused on the source.

  • Click anywhere in the row for the desired source and then click the open detail panel icon (open detail panel icon). You can take action against the source and review details about the source.

    • You can edit and delete the source.

    • For unstructured data sources with documents, you can go to the data volume chart (data volume chart detail panel icon), focused on the source. For file system and SharePoint sources containing data, you can also go to the sensitive data heat map (sensitive data heat map detail panel icon) in Analyze, focused on the source.

    • For structured data sources, you can inventory the source to identify tables included in the source.

The following source types are supported.

Source type Version or platform supported
file system CIFS/SMB2.0 shares
Exchange 2016, 2019, Office 365
SharePoint 2016, Office 365
Content Manager

10.1.x

IMPORTANT: Only Microsoft SQL Server RDB datasets are supported at this time.

Google Drive

not applicable

TIP: A source is associated with a Google Workspace for the domain that includes the desired users' drives. Datasets for the source are associated with a single user account Google Drive.

Documentum 22.4.0 or later
Extended ECM 23.2.2 or later
structured data

Review the following tasks and considerations for each source type:

  • For all source types, at least one agent cluster must exist prior to creating a source. Selection of an agent cluster is required when you create a source.

  • For all source types, keep in mind that processing of data does not occur at the source level, only at the dataset level.

  • For all source types, you have the option to limit access in Fusion by granting only specific users or groups access to the source.

    CAUTION: If limiting access to a source and an underlying dataset, users without access will not be able to view workspaces with a data source that includes the dataset or view individual items that originated from the dataset.

  • For Exchange sources:

    • Before creating Exchange sources and datasets, see Exchange connection to complete additional tasks required for processing data from Exchange.

    • The Exchange source uses an agent to connect to the mail server to process new items, as well as items that already exists on the mail server. This method processes items based on user mailboxes and therefore includes folder information and is subject to user action (such as delete).

  • For file system sources,

    • Ensure that your CIFS shares can be accessed by the "System" account.

    • If your file system allows long paths, the machine hosting the processing agent must also be enabled for long paths.

  • For SharePoint sources, only the latest revision of a document is processed. See SharePoint connection to complete the tasks necessary for connection.

  • For Content Manager sources, see Content Manager integration to complete the tasks necessary for connection.

  • For Google Drive sources,

    • A dataset connects to a single user's Google Drive. To process items for multiple users, create a Google Drive dataset for each desired user. See Google Drive connection to complete the tasks necessary for connection.

      NOTE: Fusion supports processing of data from Google Workspace's Drive; data from personal Google Drives is not supported.

    • Shortcut files that exist on Google drives are not processed.

  • For Documentum sources, the Documentum host URL must be unique across sources. You can define a single Documentum repository per dataset. For additional information about connecting to Documentum sources, see Documentum connection.

  • For Extended ECM sources, you can create a single dataset per source. For additional information about connecting to Extended ECM, see Extended ECM connection.

  • For structured data sources, once a source has been inventoried, you cannot change the agent cluster from a cloud agent to a non-cloud agent, or from a non-cloud agent to a cloud agent. For additional information about connecting to structured data, see Structured data connection.

You can remove the connection to a source ("delete" the source) as needed. For unstructured data sources, there cannot be any datasets associated with the source; for structured data sources, there cannot be any datasets associated with the source and there cannot be an active inventory process for the source.

IMPORTANT: If you delete a structured data source, the associated inventory information is removed from Fusion.