Scope by Directory

Scoping by a particular directory or folder requires the use of the hierarchical markers in the srs.scan_data table.

These markers assist with determining parent and child folders as well as all subordinate file system entries for a given directory or set of directories.

Field Description Notes

idx

Entry index

Unique per scan

parent_idx

Index of parent directory

All sibling file system entries will have the same parent index.

path_depth

Current path depth relative to the root path

The root path is always depth zero (0).

Other paths such as shares may have the same depth as the root path, but can be distinguished by path_type.

ns_left

ns_right

Nested set indexes for current entry

Nested set markers provide a quick way to determine all subordinates for a given directory.

See examples below for details.

The following example selects all file system entries subordinate to and including the specified target path.

Copy
Example - Scope by Directory
WITH root_path AS (
    SELECT
        sd.ns_left,
        sd.ns_right,
        sd.scan_id
    FROM srs.current_fs_scandata AS sd
    WHERE sd.fullpath_hash = srs.path_hash('\\SERVER\VOLUME\path\subpath')
      AND sd.path_type = 2
)
SELECT
    sd.*
FROM srs.current_fs_scandata AS sd
JOIN root_path AS rp ON rp.scan_id = sd.scan_id
 AND rp.ns_left <= sd.ns_left
 AND rp.ns_right >= sd.ns_right;

In this example, we are using two SELECT statements: one to get the information for the desired root path, and one to pull all subordinate entries along with the root path. Notice how the JOIN filter in the second SELECT statement uses not only the scan_id to limit the particular scan(s) of interest, but also uses the ns_left and ns_right fields to keep the data set limited to file entries in the folder hierarchy.

In the following diagram, an example of the nested set model calculations are shown with an example structure under \\Server\Share. In this example, exactly 1,000 file system entries exist, including files, folders, and the share itself.

For each node in the scanned file structure, a left (ns_left) and right (ns_right) value are assigned. The values are assigned by traversing the imaginary path from the root down the left side of the structure, incrementing the ns_left values by one. Once a leaf node is encountered, the incrementing value continues, but is now assigned to ns_right.

This process continues until the entire graph of the file structure has been traversed, and the root path is finally assigned the last number for its ns_right value.

The nested set model has the following characteristics, some of which are vital to hierarchical processing, such as determining subordinate objects:

• The root path will always have a ns_left value of 1 and an ns_right value of 2n, where n = the total number of entries

• For any given container object (folder, share, etc.), all subordinate entries can be found by searching for all objects in the scan having an ns_left value greater than the container path’s ns_left value, and an ns_right value less than the container path’s ns_right value.

• Nested set is generally the fastest method available in relational data models for retrieving all subordinate objects when representing hierarchical data.

For more information on the nested set model, see https://en.wikipedia.org/wiki/Nested_set_model.