.\" Copyright (c) 2004 B. Luevelsmeyer .\" All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $Id: dupfind.1,v 1.5 2007/05/03 14:35:02 bernd Exp $ .Dd December 30, 2004 .Dt DUPFIND 1 .Os .Sh NAME .Nm dupfind .Nd "find duplicate files" .Sh SYNOPSIS .Nm .Op Fl cshdv0p .Op Ar pathname ... .Sh DESCRIPTION .Nm finds copies of files in the directory trees .Ar pathname . It distinguishes between hardlinked files, softlinked files, and actual copies. .Pp The options control the output. Per default, nothing is output. .Bl -tag -width ".Fl c" .It Fl c Write statistical summary at program exit. .It Fl s Report symbolic links. .It Fl h Report hard links. .It Fl d Report non-linked copies. .It Fl v Write debugging output. This is intended for development purposes. .It Fl 0 Do not consider files of size 0 to be copies of each other. .It Fl p Write a progress indicator to stderr (currently searched directory). .El .Pp The output, written to stdout, consists of one line per copy, each containing the names and the size of 2 identical files. Lines starting with ``=='' are hardlinks, those starting with ``||'' are actual copies, and those starting with ``->'' are softlinks; the filenames and the length (in bytes) are enclosed in ``><'' and separated with whitespace. The output is intended to be easily parseable by other programs such as .Xr awk 1 . An example shellskript (generating HTML pages with awk) is part of the source distribution. .Pp .Nm works by searching the directory trees recursively; it stores the filenames it encounters into a database along with the file lengths. If it later encounters a file with the same length, it calculates a checksum on both files (which is also stored in the database so it needs to be calculated at most once per file). If the checksums match too, then the files are compared directly. Usually only very few checksums need to be calculated, and almost no files need to be compared. .Sh ENVIRONMENT AND FILES .Nm uses two temporary .Xr db 3 databases for its work. These databases will be created in the path indicated by the .Ev TMPDIR environment variable. If the variable does not exist, the databases are created in ``/tmp''. The databases are not deleted at program end if the flag .Fl v is supplied on the command line. Their size is roughly proportional to the number of files found in the directory trees. .Sh EXAMPLES To find hardlinks and copies in the directory tree starting at ``/home'' and output a summary, while storing the databases in the current directory, use .Pp .Dl "env TMPDIR=. dupfind -hdc /home" .Sh DIAGNOSTICS Exit status is 0 on success, and not 0 if the program fails for any reason. The error messages should be self-explanatory; database errors will always abort the program immediately. .Sh SEE ALSO .Xr awk 1 , .Xr db 3 .Sh BUGS .Nm will occasionally miss some copies if the files are in active use.