Added a comment

2015-05-21 15:39:07 +00:00 · 2015-05-21 15:39:07 +00:00 · 712d261333
commit 712d261333
parent 4d036d0147
1 changed files with 29 additions and 0 deletions
--- a/doc/tips/finding_duplicate_files/comment_12_630b065f41019716a5f6848e0adcd0f0._comment
+++ b/doc/tips/finding_duplicate_files/comment_12_630b065f41019716a5f6848e0adcd0f0._comment
@ -0,0 +1,29 @@
+[[!comment format=mdwn
+ username="CandyAngel"
+ subject="comment 12"
+ date="2015-05-21T15:39:07Z"
+ content="""
+My method uses Perl to do a lot of the work, cutting out the need to sort and being careful about spaces and such. Below is an (**untested**) command line version (my version has the perl in ~/bin/annex-dupe.pl):
+
+    git annex find --format='${key} ${file}\n' > ~/tmp/annex_kf.txt
+    perl -pe '($k,$f) = split / /, $_, 2; $cf{$k}++; $_ = sprintf \"%s%s\n\", ($cf{$k}>1?\"\":\"#\", $f;' ~/tmp/annex_kf.txt > ~/tmp/annex_dupes.txt
+    grep '^#' ~/tmp/annex_dupes.txt | xargs -d'\n' git rm
+
+And the equivalent \"one liner\":
+
+    git annex find --format='${key} ${file}\n' \
+    | perl -pe '($k,$f) = split / /, $_, 2; $cf{$k}++; $_ = sprintf \"%s%s\n\", ($cf{$k}>1?\"\":\"#\", $f;' \
+    | grep '^#' \
+    | xargs -d'\n' git rm
+
+It works by getting a list of keys and paths and passing them to Perl, which prefixes the first instance of each key's path with a '#', which is removed by grep, leaving only duplicate paths being passed to xargs and thus, to 'git rm'.
+
+This can be particularly handy as it lets you delete duplicates from specific subdirectories, just by adding another 'grep DIR/PATH' in front of xargs, without worrying you will lose all references if all instances are in DIR/PATH (because the first one will have been removed from the file list by the first grep!).
+
+For example, after outputting all the duplicates (~/tmp/annex_dupe.txt), I will then do a:
+
+    grep '^#' ~/tmp/annex_dupes.txt | grep 'some/sub/dir/somewhere' | xargs -d'\n' git rm
+    git commit -m \"Cleaned up 'some/sub/dir/somewhere'\"
+
+loop, if I want more control over where things are removed from.
+"""]]