Deleting Dormant Data with Powershell

One of my favorite forms of managing data is to DELETE it.  One of my favorite ways to delete things is with SPEED and CONFIDENCE.

I have been quoted as saying that "DELETE is the best form of de-duplication" - in fact it is 100% dedupe. Some of the best data to DELETE is the stuff that no one is using: dormant data.  So putting my automation hat on, let's explore a script that helps DELETE things quickly but still provides us with the ability to UNDO using DataGravity File Analytics for Dormant Data.

The workflow:

  1. Export Dormant Data to CSV File List
  2. Run the ArchiveDormantData.ps1 powershell script
  3. Optionally create an archive txt file to notify it has been deleted
  4. Validate space savings and recover individual files if required.  

Identify Dormant Data:

DataGravity makes it easy to identify and download a list of all the dormant data.  In this example we are going to grab anything that hasn't been updated, read or touched within a year or more on the Marketing share.

The script:

ArchiveDormantData.ps1 -ShareFilePath "\\CorporateDrive\Marketing" -csvFilePath "c:\temp\Marketing.csv" -logFile "C:\Temp\DormantDataDelete.log" -ArchiveStub

Script parameters:

-ShareFilePath is the path to the data where the dormant data lives to be deleted.  In our example it is the Marketing share.

-csvFilePath is the path to the CSV file we downloaded in the first step which contains a list of the files to be deleted.  This is an export from the DataGravity's Dormant Data.

-logFile is an optional location for where we want to log the activity of what has been removed.

-ArchiveStub optional parameter that if specified will create a TXT stub in the place of the deleted file

Validate and Recover if necessary (The undo button)

If you get anything wrong or delete the wrong thing, it is always handy to have an UNDO button.  There are several ways to do that using backup/recovery tools, and in this case since we are already using DataGravity, we can crate a manual discovery point to changes and restore any files if required.

The full powershell script is listed below, and available on my Powershell repo on GitHub.  Big thanks to Will Urban for the heavy lifting on this one.  Happy DELETING.

#########################################
## DataGravity Inc.
## October 2015 - Updated January 2016
## Delete files from exported CSV file
## Tested with DataGravity Software v2.2
## Free to distribute and modify
## THIS SCRIPT IS PROVIDED WITHOUT WARRANTY, ALWAYS FULLY BACK UP DATA BEFORE INVOKING ANY SCRIPT
## ALWAYS VERIFY NO BLANK ROWS IN BETWEEN DATA IN CSV
##########################################
##########################################
## Instructions:
## 1) Use DataGravity UI to filter by files, dormant data, etc
## 2) Export CSV file
## 3) Use Excel/OpenOffice if more filtering is needed, use commas only not ;
## 4) Modify script paths
## 5) Run script
## 6) Take Discovery Point and verify file deletion
## 7) Optional - Verify Archive Stubs if -ArchiveStub parameter specified
##
##
## Ex. ArchiveDormantData.ps1 -ShareFilePath "\\10.100.15.40\Sales$" -csvFilePath "c:\temp\sales.csv" -logFile "C:\Temp\DataGravity Delete From CSV.log" -ArchiveStub
##
##########################################
##----------------------------------------
## Input Paramaters
##----------------------------------------
param (
[Parameter(Mandatory=$true)]
[string]$ShareFilePath,
[Parameter(Mandatory=$true)]
[string]$csvFilePath,
[Parameter(Mandatory=$true)]
[string]$logFile,
[switch] $ArchiveStub
)
$deletedFileCount = 0
$date = Get-Date
##########################################
## Start Logging
"Processing started (on " + $date + "): " | Out-File $logFile -append
"--------------------------------------------" | Out-File $logFile -append
## Import CSV and delete the file from the share and path
Import-CSV $csvFilePath | ForEach-Object {
$shareID = $_.share_id
$owner = $_.owner
$lastModTime = $_.lastmodtime
$mimeType = $_.mimeType
$tags = $_.tags
$size = $_.size
$contentState = $_.contentstate
$deleteFilePath = $_.filepath
# Swap out / for \ in CSV file
$deleteFilePath = $deleteFilePath | ForEach-Object {$_ -Replace "/","\"}
# Delete the File
$fullFilePath = $shareFilePath+$deleteFilePath
Write-Host $fullFilePath
Remove-Item -verbose $fullFilePath -Force
$deletedFileCount = $deletedFileCount + 1
"Deleting $deleteFilePath" | Out-File $logFile -append
#Create an Archive Stub if parameter was set
if ($ArchiveStub) {
# Create a new "This file has been archived" text file
$archivedFilePath = $fullFilePath + ".archive.txt"
Write-Host $archivedFilePath
"This file has been deleted and archived by IT (on " + $date + ") " | Out-File $archivedFilePath -append
}
}
Write-Host "Deleted " $deletedFileCount " files."