Manage sources
In Fusion, a source defines the initial connection to a specific data platform through a selected agent cluster. A dataset defines the subset of the source, the rules and schedule for processing, and the entities to identify within the data during processing.
You must create at least one agent cluster before you can create sources and associated datasets. Because a dataset defines a subset of a source, you must create a source before you can create an associated dataset.
On the Manage Sources page, you can filter the list of sources by source TYPE and AGENT CLUSTER, or search for a source by name.
From the Manage Sources page, you can view additional information about each source.
NOTE: Your assigned Fusion permissions affect your access to some functions.
-
Hover over or click in the row for a source to display action icons.
-
You can edit (
) or delete (
) the source.
-
For unstructured data sources with documents, you can go to the data volume chart (
), focused on the source. For file system and SharePoint sources containing data, you can go to the sensitive data heat map (
) in Manage, focused on the source.
-
For structured data sources, you can inventory (
) the source to identify tables present in the source and go to the data volume chart (
), focused on the source.
-
-
Click anywhere in the row for the desired source and then click the open detail panel icon (
). You can take action against the source and review details about the source.
-
You can edit and delete the source.
-
For unstructured data sources with documents, you can go to the data volume chart (
), focused on the source. For file system and SharePoint sources containing data, you can also go to the sensitive data heat map (
) in Analyze, focused on the source.
-
For structured data sources, you can inventory the source to identify tables included in the source.
-
The following source types are supported.
Source type | Version or platform supported |
---|---|
file system | CIFS/SMB2.0 shares |
Exchange | 2016, 2019, Office 365 |
SharePoint | 2016, Office 365 |
Content Manager |
10.1.x IMPORTANT: Only Microsoft SQL Server RDB datasets are supported at this time. |
Google Drive |
not applicable TIP: A source is associated with a Google Workspace for the domain that includes the desired users' drives. Datasets for the source are associated with a single user account Google Drive. |
Documentum | 22.4.0 or later |
Extended ECM | 23.2.2 or later |
structured data |
Platforms supported by the built-in structured data processor
Platforms supported by connection to Structured Data Manager
NOTE: For supported versions, see the Structured Data Manager Certification Matrix.
|
Review the following tasks and considerations for each source type:
-
For all source types, at least one agent cluster must exist prior to creating a source. Selection of an agent cluster is required when you create a source.
-
For all source types, keep in mind that processing of data does not occur at the source level, only at the dataset level.
-
For all source types, you have the option to limit access in Fusion by granting only specific users or groups access to the source.
CAUTION: If limiting access to a source and an underlying dataset, users without access will not be able to view workspaces with a data source that includes the dataset or view individual items that originated from the dataset.
-
For Exchange sources:
-
Before creating Exchange sources and datasets, see Exchange connection to complete additional tasks required for processing data from Exchange.
-
The Exchange source uses an agent to connect to the mail server to process new items, as well as items that already exists on the mail server. This method processes items based on user mailboxes and therefore includes folder information and is subject to user action (such as delete).
-
-
For file system sources,
-
Ensure that your CIFS shares can be accessed by the "System" account.
-
If your file system allows long paths, the machine hosting the processing agent must also be enabled for long paths.
-
-
For SharePoint sources, only the latest revision of a document is processed. See SharePoint connection to complete the tasks necessary for connection.
-
For Content Manager sources, see Content Manager integration to complete the tasks necessary for connection.
-
For Google Drive sources,
-
A dataset connects to a single user's Google Drive. To process items for multiple users, create a Google Drive dataset for each desired user. See Google Drive connection to complete the tasks necessary for connection.
NOTE: Fusion supports processing of data from Google Workspace's Drive; data from personal Google Drives is not supported.
-
Shortcut files that exist on Google drives are not processed.
-
-
For Documentum sources, the Documentum host URL must be unique across sources. You can define a single Documentum repository per dataset. For additional information about connecting to Documentum sources, see Documentum connection.
-
For Extended ECM sources, you can create a single dataset per source. For additional information about connecting to Extended ECM, see Extended ECM connection.
-
For structured data sources, once a source has been inventoried, you cannot change the agent cluster from a cloud agent to a non-cloud agent, or from a non-cloud agent to a cloud agent. For additional information about connecting to structured data, see Structured data connection.
-
From the primary navigation panel, click Sources > Manage Sources.
The Manage Sources page opens.
-
Click NEW SOURCE.
The New Source dialog opens to the General page.
-
Complete the General options for the new source and then follow the dialog prompts for the remaining options.
Option Description Source name Type a meaningful, unique name for the source.
Limits: Maximum 50 characters.
Description (Optional) Type a meaningful description of the source.
Limits: Maximum 250 characters.
Source type Select whether the source will process Unstructured Data or Structured Data. The available source types update based on your selection.
Click the desired source type. The source type cannot be changed after the source is created.
IMPORTANT: To process structured data, structured data connection tasks must be completed.
Agent cluster Select the agent cluster that will manage this source.
Only the clusters that support the selected source are shown.
For Exchange, SharePoint, Google Drive, Documentum, and Extended ECM sources, you can select Cloud Cluster to manage a cloud-based source with the built-in cloud-based agent.
Click NEXT.
-
Complete the Connection options to define the connection to the selected source type.
For File System sourcesOption Description Directory
[Enter directory UNC]
Type the UNC path on the source that is your top-level connection point.
Limits:
-
The source path cannot be more than a single directory beyond the host. For example,
\\server01.domain.com\folderA
. Further path refinement is defined by datasets. -
The hostname portion of the source path can contain only the following characters.
-
upper and lowercase alpha-numeric characters
-
.
(period) -
-
(dash) -
_
(underscore)
-
-
The source path cannot contain any of the following special characters.
-
<
(less than) -
>
(greater than) -
:
(colon) -
"
(double quote) -
|
(vertical bar or pipe) -
?
(question mark) -
*
(asterisk) -
/
(forward slash)
-
-
The path cannot contain
.
or..
before, after, or in between slashes (\
) with no other characters.-
Not valid:
\\company.domain.com\..
\\company.domain.com\.
-
Valid:
\\company.domain.com\abc..
\\company.domain.com\.abc
-
Username
Type the user name of the user that has access to the source directory you want to process data from.
Limits: Use the format
domain\user
ormachine_name\user
.Password Type the password for the defined user.
For Exchange sourcesOption Description [Exchange type] Select Exchange server (default).
IMPORTANT: Once the source is created, the Exchange type cannot be changed.
Username
Type the user name of the Exchange mail user account that has access to the mail servers you want to process data from.
Limits: Use the format
domain\user
ormachine_name\user
.Password Type the password for the defined user.
Email Address Type the email address for the defined user. For Exchange Online (O365) sourcesThe values for the following options were generated when Exchange collection for Fusion was registered in the Exchange admin center during implementation of Fusion. If you do not have this information, contact the administrator of your Fusion environment.
Option Description [Exchange type] Select Exchange Online (O365).
IMPORTANT: Once the source is created, the Exchange type cannot be changed.
Application ID
Type the value for the Application (client) ID as defined in the Exchange admin center. Directory ID Type the value for the Directory (tenant) ID as defined in the Exchange admin center. Client Secret Type the value for the Client secret as defined in the Exchange admin center. For SharePoint sourcesIMPORTANT: Avoid creating more than one source for any single data location in your environment.
Option Description [SharePoint type] Select SharePoint (default).
IMPORTANT: Once the source is created, the SharePoint type cannot be changed.
SharePoint site URL Type the fully qualified URL to the base of the SharePoint site.
If assigned to a cloud agent cluster, you must define the site URL using a secure protocol (HTTPS). A secure connection is always recommended, but not required for non-cloud agent clusters.
Example site collection URL:
https://company.sharepoint.com/sites/team01
Example web application URL:
https://company.sharepoint.com/
IMPORTANT: SharePoint subsites of the defined site are not processed as part of the site. When defining the source to process from SharePoint site and its subsite, create multiple datasets and define the full sub-URL for each subsite.
The path cannot contain
.
or..
before, after, or in between slashes with no other characters.-
Not valid:
https://company.domain.com/sites/..
https://company.domain.com/./sites
-
Valid:
https://company.domain.com/sites/abc..
https://company.domain.com/sites/.abc/123
Username
Type the user name of the SharePoint user that has access to the SharePoint servers you want to process data from.
Limits: Use the format
domain\user
ormachine_name\user
.Password Type the password for the defined user.
For SharePoint Online (365) sourcesOption Description [SharePoint type] Select SharePoint Online (O365).
IMPORTANT: Once the source is created, the SharePoint type cannot be changed.
Azure Environment Select the Azure environment where your SharePoint Online instance is hosted. SharePoint site URL Type the fully qualified URL to the base of the SharePoint site.
If assigned to a cloud agent cluster, you must define the site URL using a secure protocol (HTTPS). A secure connection is always recommended, but not required for non-cloud agent clusters.
Example site collection URL:
https://company.sharepoint.com/sites/team01
Example web application URL:
https://company.sharepoint.com/
IMPORTANT: SharePoint subsites of the defined site are not processed as part of the site. When defining the source to process from SharePoint site and its subsite, create multiple datasets and define the full sub-URL for each subsite.
Limits: The path cannot contain
.
or..
before, after, or in between slashes with no other characters.-
Not valid:
https://company.domain.com/sites/..
https://company.domain.com/./sites
-
Valid:
https://company.domain.com/sites/abc..
https://company.domain.com/sites/.abc/123
Application ID
Type the value for the Application (client) ID generated as part of the connection tasks.
Client Secret Type the value for the Client secret generated as part of the connection tasks.
For Content Manager sourcesOption Description Dataset URL Type the fully qualified URL of the Content Manager dataset to be accessed. Username
Type the user name of the Content Manager user that was created to access the Content Manager data you want to process.
Limits: Use the format
domain\user
ormachine_name\user
.Password Type the password for the defined user.
For Google Drive sourcesOption Description Application Name
Type the application name for the desired project as defined in your Google Cloud Platform.
Certificate Upload (JSON) File Click Choose File. Browse to and select the JSON file with the connection details for the project.
NOTE: This file was created as part of the Google Drive connection tasks and contains pertinent connection information.
For Documentum sourcesOption Description Host URL
Type the fully qualified URL of the machine hosting the Documentum application.
If assigned to a cloud agent cluster, you must define the host URL using a secure protocol (HTTPS). A secure connection is always recommended, but not required for non-cloud agent clusters.
Username Type the user name of the Documentum user that will be used to access Documentum data.
Password Type the password for the defined user. For Extended ECM sourcesOption Description Host URL
Type the fully qualified URL of the machine hosting the Extended ECM application.
If assigned to a cloud agent cluster, you must define the host URL using a secure protocol (HTTPS). A secure connection is always recommended, but not required for non-cloud agent clusters.
Username Type the user name of the Extended ECM user that will be used to access Extended ECM data.
Password Type the password for the defined user. For sources based on custom adaptersThere are no connection options. Source Options defined when you created the custom adapter will be defined when you create datasets associated with this source. For more information about custom adapters, see Custom adapters.
For Oracle sourcesOption Description Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Database Name Type the name of the desired database. Instance Name Type the name of the desired Oracle instance. Username Type the user name for the database user as appropriate for the environment.
Password Type the password for the defined user.
For Db2 sourcesOption Description Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Database Name Type the name of the desired database. Username Type the user name for the database user as appropriate for the environment.
Password Type the password for the defined user.
For SQL Server sourcesOption Description Authentication Type Select whether to authenticate to the server using SQL Server Authentication or Windows Authentication. Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Instance Name Type the name of the SQL Server instance that contains the desired database. Database Name Type the name of the desired database. Username Type the user name for the database administrator user.
Limits: Displays only when SQL Server Authentication is selected.
Password Type the password for the defined user.
Limits: Displays only when SQL Server Authentication is selected.
For PostgreQL sourcesOption Description Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Database Name Type the name of the desired database. Username Type the user name for the database administrator user.
Password Type the password for the defined user.
For MySQL sourcesOption Description Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Database Name Type the name of the desired database. Username Type the user name for the database administrator user.
Password Type the password for the defined user.
For JDBC sourcesOption Description URL Type the URL of the database to connect to, including the database type, database name/IP, and port.
Use the format appropriate for the database type.
Examples
jdbc:postgresql://myPostgreSQLhost:5432/sdm77
jdbc:mysql://198.51.100.0:3306/mysql
jdbc:db2://myPostgreSQLhost:50000/sample
jdbc:sqlserver://myPostgreSQL.host.com:1433;databaseName=CM101;user=sa;password=Welcome1
jdbc:sybase:Tds:198.51.100.0:5000/master jdbc:oracle:thin:@16.103.3.88:1521/orclpdb
jdbc:oracle:thin:@localhost:1521:xe
jdbc:oracle:thin:@(DESCRIPTION=(SDU=32768)(enable=broken)(LOAD_BALANCE=yes)(ADDRESS=(PROTOCOL=TCP)(HOST=gvu2707.london.com)(PORT=1525)) (ADDRESS=(PROTOCOL=TCP)(HOST=gvu2923.london.com)(PORT=1525))(CONNECT_DATA=(SERVICE_NAME=SAPDEMI)))
Username Type the user name for the database administrator user.
Password Type the password for the defined user.
For Sybase sourcesOption Description Host Type the FQDN or IP address of the host of the database that contains the data you want to process.
Port Type the port number used for communication. Database Name Type the name of the desired database. Username Type the user name for the database administrator user.
Password Type the password for the defined user.
Click NEXT.
-
-
Complete the Security options to define whether you want to limit access to the source to specific users and groups.
CAUTION: If limiting access to a source and an underlying dataset, users without access will not be able to view workspaces includes the dataset or view individual items that originated in the dataset.
Option Description Grant access to all users Select to not limit access to this source (default). Specify the users and groups that will have access
Select to limit access to the source to only the defined users and groups. List of Users/Groups Define the users and groups that will have access to items originating from this source.
-
In the Enter name or email address box, begin typing a name or email address of a user. As you enter a string in the field, the interface displays names or email addresses matching the string.
-
Click Add to add the selected user or group to the source access list.
To remove a user or group from the source access, hover over the name in the User/Group column and then click the corresponding remove icon (
).
-
-
Click FINISH.
The new source is created.
For structured data sources, inventory of the schemas and tables in the defined database begins.
-
On the Manage Sources page, click the name of the source you want to edit.
TIP: You can also do one of the following:
-
Click or hover over the row for the desired source and then click the edit icon (
).
-
Click the row for the desired source, click the open detail panel icon (
), and then click EDIT.
The Edit Source dialog opens.
-
-
Make the necessary changes.
-
If items exist in associated unstructured datasets or if the structured source has been inventoried, you cannot change the agent cluster type from a cloud-based cluster to anon-cloud cluster, or from a non-cloud cluster to a cloud-based cluster.
-
If you change any of the credentials, you are required to re-enter the password for the defined user.
CAUTION: Do not change the source location on the Connection page—File System directory, SharePoint site URL, or Content Manager dataset, database Host—unless the host or path has actually changed. The host or path must have changed (or be changed) prior to updating the location for this source in Fusion.
-
-
Click FINISH.
The source information is edited.
For structured data sources, inventory begins if the connection information was edited.
On the Manage Sources page, click or hover over the row for the desired structured data source and then click the inventory icon ().
TIP: You can also click the row for the desired structured data source, click the open detail panel icon (), and then click INVENTORY.
The inventory action begins. When completed, create datasets for the structured data source.
You can remove the connection to a source ("delete" the source) as needed. For unstructured data sources, there cannot be any datasets associated with the source; for structured data sources, there cannot be any datasets associated with the source and there cannot be an active inventory process for the source.
IMPORTANT: If you delete a structured data source, the associated inventory information is removed from Fusion.
-
On the Manage Sources page, click or hover over the row for the desired source. Icons display in the right column.
Click the delete icon (
).
TIP: You can also click in the row for the desired source, open the detail panel (
) and then click DELETE.
-
In the confirmation dialog, click YES to confirm the action.
The connection to the source is deleted.