Let's explore a script that leverages DataGravity's file fingerprints to identify the top 10 duplicate files on a given department share or virtual machine.
The Workflow
- Export fingerprints and file names to File List (CSV format)
- Run the FindDuplicateFiles.ps1 powershell script
- List the Top 10 duplicate files and space they are consuming
Files and Fingerprints
DataGravity makes it easy to identify files and their unique SHA-1 fingerprints on a share or virtual machine (VMware or Hyper-V). In this example we are going to gather the file names and fingerprints in the Sales department share.
The Script:
FindDuplicateFiles.ps1 -csvFilePath "c:\temp\sales.csv" -top 10
Script parameters:
-csvFilePath is the path to the CSV file we downloaded in the first step which contains a list of the files and file fingerprints. This is an export from DataGravity's Search.
-top optional parameter that if specified will show the top number of duplicate files
Listing and Validating Duplicates
Let's run the script to return the top 10 duplicate files, and their file size.
These can of course be validated as the example below returns duplicate files consuming the most space.
The full powershell script is listed below, and available on my Powershell repo on GitHub.