DiskBoss Duplicate Files Finder

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

DiskBoss Duplicate Files Finder

Flexense Ltd.

DiskBoss
File & Disk Manager

Duplicate Files Finder

Version 1.2
Mar 2011

Flexense Ltd. www.flexense.com [email protected]

DiskBoss Duplicate Files Finder

Flexense Ltd.

Product Overview
DiskBoss is an automated, rule-based file and disk manager allowing one to search and classify files, perform disk space utilization analysis, detect and remove duplicate files, organize files according to user-defined rules and policies, copy large amounts of files in a fault-tolerant way, synchronize disks and directories, cleanup wasted disk space, etc.

All file management operations are integrated in a centralized and easy-to-use GUI application with a built-in file navigator allowing one to execute any required operation in a single mouse click. Frequently used file management operations may be pre-configured as user-defined commands and executed using the GUI application or direct desktop shortcuts. DiskBoss is a highly extendable and customizable data management solution allowing one to design custom file classification plugins and purpose-built file management operations using an open and easy-to-use XML-Based format. Custom disk space analysis and file management operations may be integrated into the product, executed periodically at specific time intervals, performed as conditional actions in other operations or automatically triggered by one or more changes in a disk or directory. In addition, IT administrators are provided with extensive database integration capabilities allowing one to submit disk space analysis, file classification, duplicate files detection and file search reports into an SQL database. Reports from multiple servers and desktop computers may be submitted to a centralized SQL database allowing one to display charts showing the used disk space, file categories and duplicate files per user or per host and providing an indepth visibility into how disk space is used, what types of files are stored and how much space is wasted on duplicate files across the entire enterprise. Finally, IT professionals and enterprises are provided with DiskBoss Server a server-based product version, which runs in the background as a service and is capable of executing all disk space analysis and file management operations in a fully automatic and unattended mode. DiskBoss Server can be managed and configured locally or through the network using a free network client GUI application or the DiskBoss command line utility, which provides the user with the ability to integrate DiskBoss features and capabilities into other products and solutions.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Duplicate Files Finder


DiskBoss' built-in duplicate files finder provides a large capabilities allowing one to identify and cleanup duplicate storage devices. The duplicate files finder shows detected duplicate files, replace duplicate files with links to originals number of advanced features and files on desktops, servers and NAS duplicates and allows one to delete or delete duplicates.

The user is provided with the ability to categorize and filter detected duplicate files by the file extension, category, file size, user name, last access time, etc. Moreover, DiskBoss allows one to generate various types of charts and export reports to the HTML, text and CSV formats.

Power users and IT professionals are provided with policy-based duplicate files detection and removal capabilities allowing one to define custom duplicate files detection and cleanup commands and execute them in a fully automatic mode using the DiskBoss' GUI application or the command line utility. Finally, corporations and enterprises are provided with the ability to submit reports from multiple servers and desktop computers to a centralized SQL database allowing one to analyze the disk space wasted on duplicate files across the entire enterprise.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Detecting Duplicates in a Disk or Directory


In order to detect duplicate files in one or more disks or directories, select the required directories in the DiskBoss' file navigator and press the Duplicates button located on the main toolbar. DiskBoss will scan the selected files and directories and display a dialog showing the list of detected duplicate file sets.

For each duplicate file set, DiskBoss shows the name of the original file, the number of duplicate files in the set, the size of each file in the set, the amount of wasted disk space and the currently selected duplicates removal action. In order to see all duplicate files related to a set, click on the set item in the set list.

The duplicate set dialog shows all duplicate files related to the set and allows one to select the original file, the duplicate files and the duplicates removal action. In order to select a file as the original, select the file item, press the right mouse button and select the Set as Original File menu item. In order to see more information about a file, just click on the file item in the file list. Once finished selecting the duplicate files, use the removal actions combo box located in the bottom-left corner of the dialog to select an appropriate duplicates removal action.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Selecting Duplicate Files Removal Actions


The DiskBoss' duplicate files finder allows one to delete duplicate files, move duplicates to another directory or replace duplicates with links pointing to the original file in each specific set of duplicate files. In order to select a specific duplicates removal action for one or more sets of duplicate files, select the sets in the set list, press the right mouse button and select an appropriate duplicate files removal action. Warning: There are many duplicate files in the Windows system directory, which are important for proper operation of the operating system. Removal of duplicate files located in the Windows system directory may permanently damage the operating system and render the computer completely non-functional.

By default, DiskBoss selects the oldest file in each set as the original file and all other files in the set as duplicates. In order to change that, select one or more sets, press the right mouse button and select the Select Oldest Files as Duplicates menu item. Alternatively, open the set dialog, select any arbitrary file in the set as the original file, select an appropriate duplicates removal action that should be executed for this specific set and select one or more duplicate files in the set that the removal action should be applied to.

Executing Duplicate Files Removal Actions


Once finished selecting duplicates and removal actions, press the Preview button to see the duplicate files removal actions preview dialog. The duplicates removal actions preview dialog shows the selected duplicate files and removal actions that will be executed and allows one to review and manually confirm each specific action before execution. The operating system and other system applications may have a large number of duplicate files located in various system directories. These duplicate files may be very important for proper operation of the operating system and other system applications and it is highly dangerous to remove these duplicate files. To be on the safe side, use the duplicates removal actions only for your own documents, music files, videos, etc.

In order to execute the selected duplicates removal actions, press the Execute button located in the bottom-right corner of the Preview dialog. DiskBoss will process the selected duplicate files and execute the specified duplicates removal actions.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Using File Filters and Categories


The DiskBoss' duplicate files finder allows one to categorize and filter duplicate files by the file extension, category, size, user name, etc. The user is provided with the ability to apply multiple file filters, display specific types of duplicate files and apply duplicate files removal actions to or export reports showing filtered files only.

In order to set one or more file filters, select an appropriate type of file categories in the categories combo box, select one or more file filters in the filters view, press the right mouse button and select the Apply Selected Filters menu item.

With active file filters, DiskBoss shows duplicate files matching the selected filters, exports reports showing matching files only and significantly simplifies selection of duplicates removal actions for specific file types or file categories. In order to clear the selected file filters, just press the Clear button located on the right side of the categories selector.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Showing Duplicate Files Pie Charts


The duplicate files finder allows one to display charts showing the amount of wasted disk space and the number of duplicate files per extension, file type, file size, user name, etc. In order to open the charts dialog, press the Charts button located on the dialogs toolbar.

The charts dialog displays information for the displayed duplicate files and the currently selected categories of duplicate files. In order to display a chart for another category of duplicates, select an appropriate category in the categories combo box and then open the charts dialog.

The charts dialog allows one copy the displayed chart image to the clipboard making it very easy to integrate DiskBoss charts into users reports and presentations. Finally, the user is provided with the ability to customize the information displayed on the charts status bar.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Saving Duplicate Files Reports


DiskBoss allows one to save lists of detected duplicate files to HTML, text and Excel CSV reports. In addition, the user is provided with the ability to save DiskBoss' native reports, which preserve all information about each specific duplicate files detection operation and may be imported to an SQL database using DiskBoss Ultimate.

In order to save a report file, press the Save button located on the dialogs toolbar, select an appropriate report format, enter the report file name and press the Save button. Optionally, limit the report to a specific number of duplicate file sets and/or select the Save Compressed Report option to save a compressed report file.

A typical report file includes information about the date and time of the duplicate files detection operation, the name of the host computer the operation was performed on, a list of top 10 file categories according to the currently selected categories mode followed by the list of duplicate file sets detected in the processed disks and directories. For each set of duplicate files, DiskBoss shows the name of the original file, the number of duplicate files in the set and the amount of wasted disk space.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Exporting Reports to an SQL Database


IT professionals and enterprises are provided with the ability to submit reports listing duplicate files detected on multiple storage systems, servers and desktop computers to a centralized SQL database enabling system and storage administrators to gain an in-depth visibility into amounts of duplicate files and wasted disk space across the entire enterprise.

In order to submit a report to an SQL database, press the Save button located on the dialogs toolbar, select the SQL Database report format and press the Save button. Before exporting a report to an SQL database, the user needs to open the options dialog, enable the ODBC interface and specify the name of the ODBC data source, the database user name and password to use for database export operations.

For each report in the database, DiskBoss shows the report date and time, the name of the host computer the operation was performed on, disks and directories that were processed, the total amount of disk space and the number of files that were processed and the report title. In order to open a report, just click on the report item in the report list.

DiskBoss Duplicate Files Finder

Flexense Ltd.

Analyzing Duplicate Files Per User


DiskBoss Ultimate and DiskBoss Server provide the ability to analyze duplicate files owned by multiple users and detected on one or more servers or desktop computers and display charts showing the amount of wasted disk space and the number of duplicate files per user.

Important: By default, processing and display of user names is disabled. In order to enable this capability, open the options dialog and enable this option.

In order to analyze duplicate files per user, connect DiskBoss Ultimate to an SQL Database and submit reports containing duplicates owned by multiple users to the SQL database using the DiskBoss GUI application or the DiskBoss command line utility. Once reports are in the database, open the Database dialog and press the Users button to open the Users Statistics dialog.

diskboss -duplicates -dir \\server\share -host <Host Name> -save_to_database

The simplest way to submit reports from multiple servers or desktop computers is to use the DiskBoss command line utility to detect duplicate files on all required hosts through the network. In order to simplify submission of reports to the SQL database, the command line utility may be executed on the same host where the SQL database is installed on. In this case, the user needs to specify one or more network shares to be processed and the host name to be set for each report.

diskboss -duplicates -dir <Local Directory> -save_report <File Name>

Another option is to execute the command line utility on each specific host, save duplicate files reports and later submit report files from all hosts to the SQL database using the DiskBoss GUI application. In this case, there is no need to set the host name, which will be set automatically to the name of the host the command line utility is executed on.

10

DiskBoss Duplicate Files Finder

Flexense Ltd.

Analyzing Duplicate Files Per Host


DiskBoss Ultimate and DiskBoss Server provide the ability to submit duplicate files reports from multiple servers and desktop computers into a centralized SQL database, analyze reports and display various types of charts showing the amount of duplicate disk space and the number of duplicates per host allowing one to gain an in-depth visibility into amounts of duplicate files across the entire enterprise.

In order to analyze reports from multiple hosts, the user needs to connect DiskBoss to an SQL Database, perform duplicate files search on multiple hosts using the DiskBoss GUI application or the DiskBoss command line utility and submit reports from all hosts to the SQL database. Once reports from all hosts are in the database, open the Database dialog and press the Hosts button to open the Hosts Statistics dialog.

diskboss -duplicates -dir \\server\share -host <Host Name> -save_to_database

The simplest way to submit reports from multiple servers or desktop computers is to use the DiskBoss command line utility to detect duplicate files on all required hosts through the network. In order to simplify submission of reports to the SQL database, the command line utility may be executed on the same host where the SQL database is installed on. In this case, the user needs to specify one or more network shares to be processed and the host name to be set for each report.

diskboss -duplicates -dir <Local Directory> -save_report <File Name>

Another option is to execute the command line utility on each specific host, save duplicate files reports and later submit report files from all hosts to the SQL database using the DiskBoss GUI application. In this case, there is no need to set the host name, which will be set automatically to the name of the host the command line utility is executed on.

11

DiskBoss Duplicate Files Finder

Flexense Ltd.

Detecting Duplicates in Specific File Types


One of the most powerful capabilities of DiskBoss is the ability to perform disk analysis and file management operations on files matching user-specified criteria. In order to be able focus of specific types of duplicate files, the user is provided with the ability to define one or more file matching rules specifying files that should be processed by the DiskBoss' duplicate file finder. Files not matching the specified rules, will be just skipped from the duplicate files detection process.

In order to add one or more file matching rules to a duplicate files detection operation, open the operation dialog, select the rules tab and press the Add button located on the right side of the dialog. Once finished adding file matching rules, select an appropriate rules logic and press the Save button.

Advanced Duplicate Files Detection Options


The DiskBoss' duplicate files finder provides a large number of advanced options allowing one to customize duplicate files detection operations for user-specific hardware and storage configurations. The General tab allows one to control the file signature type, the file scanning mode, the maximum number of duplicate file sets to display in the results dialog and the file filter, which may be used to limit the operation to specific files using a file name pattern.

The Performance tab provides the ability to intentionally slow down the duplicate files detection process in order to minimize the potential performance impact on running production systems. The Exclude tab allows one to define one or more subdirectories to be excluded from the duplicate files detection process.

12

DiskBoss Duplicate Files Finder

Flexense Ltd.

Using Automatic Duplicate Files Removal Actions


DiskBoss Ultimate and DiskBoss Server provide the user with the ability to automatically execute one or more duplicate files removal actions for files matching user-specified rules. In order to define one or more automatic duplicates removal actions, open the operation dialog, select the Actions tab and press the Add button.

On the Action dialog select the original file detection mode, an appropriate duplicates removal action and specify one or more file matching rules defining files the action should be applied to. During runtime, DiskBoss will process detected duplicate files, apply the specified file matching rules, detect the original file and execute the duplicates removal actions for files matching the specified rules and policies.

By default, DiskBoss executes automatic duplicates removal actions in the Auto-Select mode, which selects the specified actions and displays the duplicates removal actions preview dialog allowing one to review and manually confirm each specific action. After testing the duplicate file detection operation in the preview mode, change the actions mode to Execute to automatically execute the specified duplicates removal actions without showing the actions preview dialog.

13

DiskBoss Duplicate Files Finder

Flexense Ltd.

Finally, IT administrators are provided with the DiskBoss command line utility allowing one to execute automatic duplicate files detection and removal operations from batch files and shell scripts, periodically remove duplicates from servers and enterprise storage system and integrate DiskBoss' duplicate files detection capabilities with other products and solutions.

The DiskBoss command line utility is available in DiskBoss Ultimate and DiskBoss Server and it is capable of executing user-defined duplicate files detection and removal commands defined in the DiskBoss GUI application and/or written in the DiskBoss' XML format.

User-Defined Duplicate Files Detection Commands


One of the most powerful and flexible capabilities of DiskBoss is the ability to pre-configure custom duplicate files detection and removal operations as user-defined commands and execute such commands in a single mouse click using the DiskBoss GUI application or direct desktop shortcuts.

User-defined commands may be managed and executed through the commands dialog or the commands tool pane. In order to add a new command through the commands pane, press the right mouse button over the pane and select the Add New Duplicate Files Search Command menu item. In order to execute a previously saved command, just click on the command item in the commands tool pane or create a direct desktop shortcut on the Windows desktop.

14

DiskBoss Duplicate Files Finder

Flexense Ltd.

Detecting Duplicates Using the Command Line Utility


In addition to the DiskBoss GUI application, DiskBoss Ultimate provides a command line utility allowing one to execute duplicate files detection and removal operations from batch files and shell scripts. The command line tool is located in the <ProductDir>\bin directory. Command Line Syntax: diskboss -duplicates -dir <Input Directory 1> [ ... <Input Directory X> <Options> ] Parameters: -dir < Directory 1> [ ... < Directory X> -file <File 1> <File 2> ] This parameter specifies the list of input directories or files to process. In order to ensure proper parsing of command line arguments, directories and file names containing space characters should be double quoted. Options: -signature_type <MD5 | SHA1 | SHA256> This parameter sets the type of algorithm used to calculate signatures of files. By default, DiskBoss uses the SHA256 algorithm. -exclude_dir <Exclude Directory 1> [ ... <Exclude Directory X> ] This parameter specifies the list of directories that should be excluded from processing. In order to ensure proper parsing of command line arguments, directories containing space characters should be double quoted. -filter <FileFilter> This parameter sets the directory search filter (default *.*). -workers <WorkingThreadCount> This parameter sets the number of working threads to process files. DiskBoss is optimized for Multi-Core and Multi-CPU computers and is capable of distributing the workload to an unlimited number of CPUs. By default, DiskBoss processes files with one working thread. -max_dup_set <MaxNumberOfDuplicateSets> This parameter sets the maximum number of duplicate file sets to report about. By default, DiskBoss will report about up to 1000 duplicate file sets sorted by the amount of wasted storage space. -min_wasted_space <MinWastedStorageSpace> This parameter sets the minimum amount of wasted storage space to report about. By default, DiskBoss will report about duplicate file sets wasting at least 1 MBytes of storage space. -save_html_report | save_csv_report | save_text_report [ ReportFileName ] This parameter saves a report file. If no file name is specified, DiskBoss will automatically generate a file name according to the following template: diskboss_duplicates_[date]_[time].html -v -help - This command shows the products version, revision and build date. - This command shows the command line usage information.

15

You might also like