Added a comment: List the duplicate filenames, then let the user decide what to do

2011-12-22 12:31:36 +00:00 · 2011-12-22 12:31:36 +00:00 · 97bef4af73
commit 97bef4af73
parent 20482712d0
1 changed files with 35 additions and 0 deletions
--- a/doc/todo/wishlist:_Provide_a_34git_annex34_command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment
+++ b/doc/todo/wishlist:_Provide_a_34git_annex34_command_that_will_skip_duplicates/comment_4_f120d1e83c1a447f2ecce302fc69cf74._comment
@ -0,0 +1,35 @@
 [[!comment format=mdwn
 username="http://adamspiers.myopenid.com/"
 nickname="Adam"
 subject="List the duplicate filenames, then let the user decide what to do"
 date="2011-12-22T12:31:29Z"
 content="""
 I have the same use case as Asheesh but I want to be able to see which filenames point to the same objects and then decide which of the duplicates to drop myself.  I think
    git annex drop --by-contents
 would be the wrong approach because how does git-annex know which ones to drop?  There's too much potential for error.
 Instead it would be great to have something like
    git annex finddups
 While it's easy enough to knock up a bit of shell or Perl to achieve this, that relies on knowledge of the annex symlink structure, so I think really it belongs inside git-annex.
 If this command gave output similar to the excellent `fastdup` utility:
    Scanning for files... 672 files in 10.439 seconds
    Comparing 2 sets of files...
    2 files (70.71 MB/ea)
            /home/adam/media/flat/tour/flat-tour.3gp
            /home/adam/videos/tour.3gp
    Found 1 duplicate of 1 file (70.71 MB wasted)
    Scanned 672 files (1.96 GB) in 11.415 seconds
 then you could do stuff like
    git annex finddups | grep /home/adam/media/flat | xargs rm
 """]]