git-annex (5.20140402) unstable; urgency=medium

* unannex, uninit: Avoid committing after every file is unannexed, for massive speedup. * --notify-finish switch will cause desktop notifications after each file upload/download/drop completes (using the dbus Desktop Notifications Specification) * --notify-start switch will show desktop notifications when each file upload/download starts. * webapp: Automatically install Nautilus integration scripts to get and drop files. * tahoe: Pass -d parameter before subcommand; putting it after the subcommand no longer works with tahoe-lafs version 1.10. (Thanks, Alberto Berti) * forget --drop-dead: Avoid removing the dead remote from the trust.log, so that if git remotes for it still exist anywhere, git annex info will still know it's dead and not show it. * git-annex-shell: Make configlist automatically initialize a remote git repository, as long as a git-annex branch has been pushed to it, to simplify setup of remote git repositories, including via gitolite. * add --include-dotfiles: New option, perhaps useful for backups. * Version 5.20140227 broke creation of glacier repositories, not including the datacenter and vault in their configuration. This bug is fixed, but glacier repositories set up with the broken version of git-annex need to have the datacenter and vault set in order to be usable. This can be done using git annex enableremote to add the missing settings. For details, see http://git-annex.branchable.com/bugs/problems_with_glacier/ * Added required content configuration. * assistant: Improve ssh authorized keys line generated in local pairing or for a remote ssh server to set environment variables in an alternative way that works with the non-POSIX fish shell, as well as POSIX shells. # imported from the archive
2014-04-02 21:42:53 +01:00 · 2014-04-02 21:42:53 +01:00 · b6d46c212e
commit b6d46c212e
7646 changed files with 245066 additions and 0 deletions
--- a/doc/tips/finding_duplicate_files/comment_10_2ed5aa8c632048b13e01d358883fa383._comment
+++ b/doc/tips/finding_duplicate_files/comment_10_2ed5aa8c632048b13e01d358883fa383._comment
@ -0,0 +1,12 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmTNrhkVQ26GBLaLD5-zNuEiR8syTj4mI8"
+ nickname="Juan"
+ subject="comment 10"
+ date="2013-08-31T18:20:58Z"
+ content="""
+I'm already spreading the word. Handling scientific papers, data, simulations and code has been quite a challenge during my academic career. While code was solved long ago, the three first items remained a huge problem.
+I'm sure many of my colleagues will be happy to use it.
+Is there any hashtag or twitter account? I've seen that you collected some of my tweets, but I don't know how you did it. Did you search for git-annex?
+Best,
+    Juan
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_11_5efc6b6ee1dfec88512183e9679ca616._comment
+++ b/doc/tips/finding_duplicate_files/comment_11_5efc6b6ee1dfec88512183e9679ca616._comment
@ -0,0 +1,24 @@
+[[!comment format=mdwn
+ username="sameerds"
+ ip="106.51.197.116"
+ subject="a shell script that handles spaces in file names"
+ date="2013-12-31T10:24:06Z"
+ content="""
+I used the following shell pipeline to remove duplicate files in one go:
+
+    (1) git annex find --format='${key}:${file}\n' \
+    (2)    | cut -d '-' -f 4- \
+    (3)    | sort \
+    (4)    | uniq --all-repeated=separate -w 40 \
+    (5)    | awk -vRS= -vFS='\n' '{for (i = 2; i <= NF; i++) print $i}' \
+    (6)    | cut -d ':' -f 2- \
+    (7)    | xargs -d '\n' git rm
+
+1. Generate a list of keys and file names separated by a colon (':').
+2. Cut out the initial part of the key so that the hash is at the beginning of the line. The `-f 4-` ensures that dashes in the filename do not result in truncation.
+3. Sort the entire list.
+4. Uniquify and print duplicates in groups separated by blank lines. Use the first 40 characters, which matches the length of a SHA1 hash. Other hashes will require a different length.
+5. Use awk to print all but the first line in each group. The empty `-vRS` sets blank line as the record separator, and the `-vFS` sets newline as the field separator. The for-loop prints each field except the first.
+6. Cut out the key and keep only the file name by relying on the colon introduced in the first step.
+7. Use xargs to separate file names by newline, which takes care of spaces in the file names. Send this list of arguments to `git rm`.
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_1_ddb477ca242ffeb21e0df394d8fdf5d2._comment
+++ b/doc/tips/finding_duplicate_files/comment_1_ddb477ca242ffeb21e0df394d8fdf5d2._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="http://adamspiers.myopenid.com/"
+ nickname="Adam"
+ subject="Cool"
+ date="2011-12-23T19:16:50Z"
+ content="""
+Very nice :)  Just for reference, here's [my Perl implementation](https://github.com/aspiers/git-config/blob/master/bin/git-annex-finddups).  As per [this discussion](http://git-annex.branchable.com/todo/wishlist:_Provide_a___34__git_annex__34___command_that_will_skip_duplicates/#comment-fb15d5829a52cd05bcbd5dc53edaffb2) it would be interesting to benchmark these two approaches and see if one is substantially more efficient than the other w.r.t. CPU and memory usage.
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_2_900eafe0a781018ff44b35ac232e3ad3._comment
+++ b/doc/tips/finding_duplicate_files/comment_2_900eafe0a781018ff44b35ac232e3ad3._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="bremner"
+ ip="156.34.89.108"
+ subject="problems with spaces in filenames"
+ date="2012-09-05T02:12:18Z"
+ content="""
+note that the sort -k2 doesn't work right for filenames with spaces in them. On the other hand, git-rm doesn't seem to like the escaped names from escaped_file.
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_3._comment
+++ b/doc/tips/finding_duplicate_files/comment_3._comment
@ -0,0 +1,39 @@
+[[!comment format=mdwn
+ username="mhameed"
+ ip="82.32.202.53"
+ subject="problems with spaces in filenames"
+ date="Wed Sep  5 09:38:56 BST 2012"
+ content="""
+
+Spaces, and other special chars can make filename handeling ugly.
+If you don't have a restriction on keeping the exact filenames, then 
+it might be easiest just to get rid of the problematic chars.
+
+    #!/bin/bash
+
+    function process() {
+        dir="$1"
+        echo "processing $dir"
+        pushd $dir >/dev/null 2>&1
+
+        for fileOrDir in *; do
+            nfileOrDir=`echo "$fileOrDir" | sed -e 's/\[//g' -e 's/\]//g' -e 's/ /_/g' -e "s/'//g" `
+            if [ "$fileOrDir" != "$nfileOrDir" ]; then
+                echo renaming $fileOrDir to $nfileOrDir
+                git mv "$fileOrDir" "$nfileOrDir"
+            else
+                echo "skipping $fileOrDir, no need to rename."
+            fi
+        done
+
+        find ./ -mindepth 1 -maxdepth 1 -type d | while read d; do
+        process "$d"
+        done
+        popd >/dev/null 2>&1
+    }
+
+    process .
+
+Maybe you can run something like this before checking for duplicates.
+
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_4_1494143a74cc1e9fbe4720c14b73d42b._comment
+++ b/doc/tips/finding_duplicate_files/comment_4_1494143a74cc1e9fbe4720c14b73d42b._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="bremner"
+ ip="156.34.89.108"
+ subject="more about spaces..."
+ date="2012-09-09T19:33:01Z"
+ content="""
+Ironically, previous renaming to remove spaces, plus some synching is how I ended up with these duplicates. For what it is worth, aspiers perl script worked out for me with a small modification. I just only printed out the duplicates with spaces in them (quoted).
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_5_1a35ca360468bcb84a67ad8d62a2ef7d._comment
+++ b/doc/tips/finding_duplicate_files/comment_5_1a35ca360468bcb84a67ad8d62a2ef7d._comment
@ -0,0 +1,8 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawkaBh9VNJ-RZ26wJZ4BEhMN1IlPT-DK6JA"
+ nickname="Alex"
+ subject="printing keys first is the easiest workaround"
+ date="2013-04-01T23:32:23Z"
+ content="""
+Since the keys are sure to have nos paces in them, putting them first makes working with the output with tools like sort, uniq, and awk simpler.
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_6_a6e88c93b31f67c933523725ff61b287._comment
+++ b/doc/tips/finding_duplicate_files/comment_6_a6e88c93b31f67c933523725ff61b287._comment
@ -0,0 +1,16 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawnkBYpLu_NOj7Uq0-acvLgWhxF8AUEIJbo"
+ nickname="Chris"
+ subject="Find files by key"
+ date="2013-05-03T04:14:55Z"
+ content="""
+Is there any simple way to search for files with a given key?
+
+At the moment, the best I've come up with is this:
+
+````
+git annex find --include '*' --format='${key} ${file}' | grep <KEY>
+````
+
+where `<KEY>` is the key. This seems like an awfully longwinded approach, but I don't see anything in the docs indicating a simpler way to do it. Am I missing something?
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_7_347b0186755a809594bd42feda6363e2._comment
+++ b/doc/tips/finding_duplicate_files/comment_7_347b0186755a809594bd42feda6363e2._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ nickname="joey"
+ subject="comment 7"
+ date="2013-05-13T18:42:14Z"
+ content="""
+@Chris I guess there's no really easy way because searching for a given key is not something many people need to do.
+
+However, git does provide a way. Try `git log --stat -S $KEY`
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_8_3af51722da0980b724facb184f0f66e9._comment
+++ b/doc/tips/finding_duplicate_files/comment_8_3af51722da0980b724facb184f0f66e9._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="https://www.google.com/accounts/o8/id?id=AItOawmTNrhkVQ26GBLaLD5-zNuEiR8syTj4mI8"
+ nickname="Juan"
+ subject="This is an awesome feature"
+ date="2013-08-28T13:40:23Z"
+ content="""
+Thanks. I have quite a lot of papers in PDF formats. Now I'm saving space, have them controlled, synchronized with many devices and found more than 200 duplicates.
+Is there a way to donate to the project? You really deserve it.
+Thanks.
+"""]]
--- a/doc/tips/finding_duplicate_files/comment_9_7b4b78a5cd253abfe4f6001049bf64f3._comment
+++ b/doc/tips/finding_duplicate_files/comment_9_7b4b78a5cd253abfe4f6001049bf64f3._comment
@ -0,0 +1,10 @@
+[[!comment format=mdwn
+ username="http://joeyh.name/"
+ ip="4.153.8.7"
+ subject="comment 9"
+ date="2013-08-28T20:25:20Z"
+ content="""
+@Juan the best thing to do is tell people about git-annex, help them use it, and file bug reports. Just generally be part of the git-annex community.
+
+(If you really want to donate to me, <http://campaign.joeyh.name/> is still open.)
+"""]]