Sources and datasets

Datasets reflect the data in your internal network that is connected to Fusion for the purpose of processing. A dataset is associated with a source that identifies the type of data platform to be processed. The source defines the initial connection to the data platform through a selected agent cluster. The dataset defines the location of data on a source, the rules and schedule for processing, and the grammar rules to identify within the data during processing.

Example

  • File system source: \\machine.domain.com\directory01 (no data from this level is processed)

    • dataset: \directoryA\directoryZ (data at this level and beneath is processed)

    • dataset: \directoryB\directoryY (data at this level and beneath is processed)

    • dataset: \directoryC (data at this level and beneath is processed)

    • dataset: subpath is left blank (data at the source level and beneath is processed)

When configuring the processing options, you will choose what to process and analyze (metadata or metadata and content) and what to store (identified grammar and tag values or these values and the content). These choices impact the information that can be identified and managed, the size of your index in Fusion, and potentially the amount of time to complete each processing task.

You must create at least one agent cluster before you can create sources and associated datasets. Because a dataset defines the location of data on a source, you must create a source before you can create an associated dataset.

For details on supported source types, see Manage sources.