4.1 Content Has Duplicate File Reports

A Content Hashed Duplicate File report provides more advanced duplicate file detection over the Duplicate File built-in report which compares only filenames and metadata.

With the introduction of File Reporter 4.0, a new scanning option allows for Agents to produce a content based hash for specific files. These hashes can then be compared to identify duplicate files.

NOTE:For information on collecting content hashes, see Creating a Scan Policy in the File Reporter 4.1 Administration Guide.

Through https://filequerycookbook, you can copy and paste the Content Hashed Duplicate File Report custom query into the Query Editor and export a report layout into the Report Designer. This custom query and associated report identifies duplicate files based on hash comparisons and the parameters you set.

4.1.1 Determining Prerequisites

  • Create a file system scan policy for each of the target paths on which you want to report.

  • With the Generate content file hashes option selected in the Scan Policy Editor of each scan policy, conduct a file system scan on each target path.

  • Install the Client Tools.

    The Client Tools include the Query Editor and the Report Designer that will be used in these procedures.

  • Decide how you want the report to be generated and follow the applicable procedures.

    • To generate a delimited text file that you can take into other tools for customized searching and presentations, you can copy or create an SQL query with the query editor covered in Creating a Report in the File Reporter 4.1 Client Tools Guide.

    • To generate the report using the Report Designer and produce a formatted report layout, proceed with Using the Report Designer in the File Reporter 4.1 Client Tools Guide.

4.1.2 Designing the Report

This option lets you utilize both the custom query and the associated report layout design for the “Content Hash Duplicate File Report” from https://filequerycookbook.com.

NOTE:A detailed discussion of the Report Designer, along with procedures for familiarizing yourself with the interface are available in Using the Report Designer in the File Reporter 4.1 Client Tools Guide.

  1. On the File Query Cookbook site at https://www.filequerycookbook.com, locate and download the “Content Hashed Duplicate File Report.”

    The file is saved as zipped file.

  2. Unzip the downloaded file and open the .sql file in a text editor.

    You will eventually paste this custom query into the Query Editor.

  3. From the Start menu, launch the File Reporter 4.1 Report Designer.

  4. Enter the login credentials and click Login.

    All of your saved Custom Query reports are listed.

  5. Click New Custom Query, give it a name, then click Create.

    The Report Designer Query Editor is launched.

  6. From the text editor you used in Step 2, copy the custom query and paste it into the Query Editor.

  7. In the line beginning with WHERE, edit the UNC paths so that they are specific to the content file hashed shares on which you want to report.

    The custom query only includes two paths so if you want more, extend the line to include more paths by adding srs.path_hash('\\server\share\path') to the comma delimited sd.fullpath_hash IN portion of the where clause for each desired path.

  8. (Conditional) At the bottom of the custom query, modify the q.item_count and q.size settings to the minimum number of duplicates and file sizes (in bytes), respectively, to include in the report.

  9. Click Execute to see a preview of the report data.

  10. Click Save.

  11. Click Design Layout.

  12. Click Open.

  13. Locate the .repx file that you saved and unzipped in Step 2 and click Open.

    The layout template appears in the Report Designer.

  14. Click Download All Data.

  15. In the subsequent dialog box, click Yes.

    This runs the query in the database and loads data into the report template.

  16. Click Print Preview to review the report findings.

    Note how the hashes are listed with a total number for each and the location of each, meaning the total number of duplicate files and their locations.

  17. Save the report by doing one of the following:

    • From the Export To drop-down menu, select the file type you want to save the report layout to.

    • Click Save Report to save the report as a .PRNX file that you can open in the Report Viewer and if you want later, export the report to the desired file type.