Added a comment: List the duplicate filenames, then let the user decide what to do
This commit is contained in:
parent
20482712d0
commit
97bef4af73
1 changed files with 35 additions and 0 deletions
|
@ -0,0 +1,35 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="http://adamspiers.myopenid.com/"
|
||||||
|
nickname="Adam"
|
||||||
|
subject="List the duplicate filenames, then let the user decide what to do"
|
||||||
|
date="2011-12-22T12:31:29Z"
|
||||||
|
content="""
|
||||||
|
I have the same use case as Asheesh but I want to be able to see which filenames point to the same objects and then decide which of the duplicates to drop myself. I think
|
||||||
|
|
||||||
|
git annex drop --by-contents
|
||||||
|
|
||||||
|
would be the wrong approach because how does git-annex know which ones to drop? There's too much potential for error.
|
||||||
|
|
||||||
|
Instead it would be great to have something like
|
||||||
|
|
||||||
|
git annex finddups
|
||||||
|
|
||||||
|
While it's easy enough to knock up a bit of shell or Perl to achieve this, that relies on knowledge of the annex symlink structure, so I think really it belongs inside git-annex.
|
||||||
|
|
||||||
|
If this command gave output similar to the excellent `fastdup` utility:
|
||||||
|
|
||||||
|
Scanning for files... 672 files in 10.439 seconds
|
||||||
|
Comparing 2 sets of files...
|
||||||
|
|
||||||
|
2 files (70.71 MB/ea)
|
||||||
|
/home/adam/media/flat/tour/flat-tour.3gp
|
||||||
|
/home/adam/videos/tour.3gp
|
||||||
|
|
||||||
|
Found 1 duplicate of 1 file (70.71 MB wasted)
|
||||||
|
Scanned 672 files (1.96 GB) in 11.415 seconds
|
||||||
|
|
||||||
|
then you could do stuff like
|
||||||
|
|
||||||
|
git annex finddups | grep /home/adam/media/flat | xargs rm
|
||||||
|
|
||||||
|
"""]]
|
Loading…
Reference in a new issue