Added a comment: List the duplicate filenames, then let the user decide what to do

This commit is contained in:
http://adamspiers.myopenid.com/ 2011-12-22 12:31:36 +00:00 committed by admin
parent 20482712d0
commit 97bef4af73

View file

@ -0,0 +1,35 @@
[[!comment format=mdwn
username="http://adamspiers.myopenid.com/"
nickname="Adam"
subject="List the duplicate filenames, then let the user decide what to do"
date="2011-12-22T12:31:29Z"
content="""
I have the same use case as Asheesh but I want to be able to see which filenames point to the same objects and then decide which of the duplicates to drop myself. I think
git annex drop --by-contents
would be the wrong approach because how does git-annex know which ones to drop? There's too much potential for error.
Instead it would be great to have something like
git annex finddups
While it's easy enough to knock up a bit of shell or Perl to achieve this, that relies on knowledge of the annex symlink structure, so I think really it belongs inside git-annex.
If this command gave output similar to the excellent `fastdup` utility:
Scanning for files... 672 files in 10.439 seconds
Comparing 2 sets of files...
2 files (70.71 MB/ea)
/home/adam/media/flat/tour/flat-tour.3gp
/home/adam/videos/tour.3gp
Found 1 duplicate of 1 file (70.71 MB wasted)
Scanned 672 files (1.96 GB) in 11.415 seconds
then you could do stuff like
git annex finddups | grep /home/adam/media/flat | xargs rm
"""]]