add escape_var hack

Makes it easy to find files with duplicate contents, anyway.. :)
This commit is contained in:
Joey Hess 2011-12-22 21:23:11 -04:00
parent 13a0c292b3
commit 7227dd8f21
4 changed files with 58 additions and 19 deletions

View file

@ -437,8 +437,10 @@ subdirectories).
Specifies a custom output format. The value is a format string,
in which '${var}' is expanded to the value of a variable. To right-justify
a variable with whitespace, use '${var;width}' ; to left-justify
a variable, use '${var;-width}'. Also, '\\n' is a newline, '\\000' is a NULL,
etc.
a variable, use '${var;-width}'; to escape unusual characters in a variable,
use '${escaped_var}'
Also, '\\n' is a newline, '\\000' is a NULL, etc.
* -c name=value

View file

@ -0,0 +1,21 @@
Maybe you had a lot of files scattered around on different drives, and you
added them all into a single git-annex repository. Some of the files are
surely duplicates of others.
While git-annex stores the file contents efficiently, it would still
help in cleaning up this mess if you could find, and perhaps remove
the duplicate files.
Here's a command line that will show duplicate sets of files grouped together:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --all-repeated=separate -f1 | \
sed 's/ [^ ]*$//'
Here's a command line that will remove one of each duplicate set of files:
git annex find --include '*' --format='${file} ${escaped_key}\n' | \
sort -k2 | uniq --repeated -f1 | sed 's/ [^ ]*$//' | \
xargs -d '\n' git rm
--[[Joey]]

View file

@ -25,4 +25,4 @@ I want this because I have copies of various of mine (photos, in particular) sca
(As I write this, I realize it's possible to parse the destination of the symlink in a way that does this..)
>
> [[done]]; see [[tips/finding_duplicate_files]] --[[Joey]]