From ea40b0002bdcc779df5fc9b8639e5d5fd3392f19 Mon Sep 17 00:00:00 2001 From: Steve Date: Sat, 22 Dec 2012 16:39:14 +0000 Subject: [PATCH 1/8] --- ..._branches_which_makes_it_inconsistent.mdwn | 100 ++++++++++++++++++ 1 file changed, 100 insertions(+) create mode 100644 doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn diff --git a/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn new file mode 100644 index 0000000000..30fe9b00ea --- /dev/null +++ b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn @@ -0,0 +1,100 @@ +The "git annex unused" command considers remote branches as well as local branches. This means that an +object may be considered unused or not depending on what remotes are present and when they were last synced. + +I ran into this issue when experimenting with using repos on removable storage. I'll post more about +what I was trying to do in the forum. I'm posting this in bugs as I believe the inconsistent behavior +should probably be considered a bug. + +#What steps will reproduce the problem? + +Here is a sample session illustrating the problem. At the end, you can see that the object is +not shown as unused, then the remote is removed and it is shown as unused, then the remote is added +back and the file is once again not shown as unused. + + /tmp/git $ mkdir 1 2 + /tmp/git $ cd 1 + /tmp/git/1 $ git init + Initialized empty Git repository in /tmp/git/1/.git/ + /tmp/git/1 $ git annex init 1 + init 1 ok + (Recording state in git...) + /tmp/git/1 $ git remote add 2 ../2 + /tmp/git/1 $ dd if=/dev/urandom of=file.bin count=100 + 100+0 records in + 100+0 records out + 51200 bytes (51 kB) copied, 0.0113172 s, 4.5 MB/s + /tmp/git/1 $ git annex add file.bin + add file.bin (checksum...) ok + (Recording state in git...) + /tmp/git/1 $ git commit -m 'added file' + [master (root-commit) 3c1ad30] added file + 1 files changed, 1 insertions(+), 0 deletions(-) + create mode 120000 file.bin + /tmp/git/1 $ cd ../2 + /tmp/git/2 $ git init + Initialized empty Git repository in /tmp/git/2/.git/ + /tmp/git/2 $ git annex init 2 + init 2 ok + (Recording state in git...) + /tmp/git/2 $ git remote add 1 ../1 + /tmp/git/2 $ git fetch 1 + warning: no common commits + remote: Counting objects: 13, done. + remote: Compressing objects: 100% (9/9), done. + remote: Total 13 (delta 0), reused 0 (delta 0) + Unpacking objects: 100% (13/13), done. + From ../1 + * [new branch] git-annex -> 1/git-annex + * [new branch] master -> 1/master + /tmp/git/2 $ git checkout -b master 1/master + Branch master set up to track remote branch master from 1. + Already on 'master' + /tmp/git/2 $ cd ../1 + /tmp/git/1 $ git fetch 2 + remote: Counting objects: 5, done. + remote: Compressing objects: 100% (3/3), done. + remote: Total 5 (delta 0), reused 0 (delta 0) + Unpacking objects: 100% (5/5), done. + From ../2 + * [new branch] git-annex -> 2/git-annex + * [new branch] master -> 2/master + /tmp/git/1 $ git rm file.bin + rm 'file.bin' + /tmp/git/1 $ git commit -m 'rmed file' + [master ab242b0] rmed file + 1 files changed, 0 insertions(+), 1 deletions(-) + delete mode 120000 file.bin + /tmp/git/1 $ git annex unused + unused . (checking for unused data...) (checking master...) (checking 2/master...) ok + /tmp/git/1 $ git remote rm 2 + /tmp/git/1 $ git annex unused + unused . (checking for unused data...) (checking master...) + Some annexed data is no longer used by any files: + NUMBER KEY + 1 SHA256E-s51200--e400e5abea095ad4364d8f97c5fe1a3f8a6db670b2dfee951d7c9674afc9a21d.bin + (To see where data was previously used, try: git log --stat -S'KEY') + + To remove unwanted data: git-annex dropunused NUMBER + + ok + /tmp/git/1 $ git remote add 2 ../2 + /tmp/git/1 $ git fetch 2 + From ../2 + * [new branch] git-annex -> 2/git-annex + * [new branch] master -> 2/master + /tmp/git/1 $ git annex unused + unused . (checking for unused data...) (checking master...) (checking 2/master...) ok + /tmp/git/1 $ + + +#What is the expected output? What do you see instead? + +I expected that the object's unused status would not change based on which remotes this particular +repo knows about. In other words, I expected the unused status to be based on the local branches +and possibly information in the git-annex branch. + +#What version of git-annex are you using? On what operating system? + +Gentoo Linux, git annex version 3.20121211 + +#Please provide any additional information below. From 8b202a30d6ce10b5e9996624152bdc7977dc67dd Mon Sep 17 00:00:00 2001 From: Steve Date: Sat, 22 Dec 2012 16:55:37 +0000 Subject: [PATCH 2/8] --- ..._manage_files_on_removable_media__63__.mdwn | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 doc/forum/Best_way_to_manage_files_on_removable_media__63__.mdwn diff --git a/doc/forum/Best_way_to_manage_files_on_removable_media__63__.mdwn b/doc/forum/Best_way_to_manage_files_on_removable_media__63__.mdwn new file mode 100644 index 0000000000..d6d1206ae0 --- /dev/null +++ b/doc/forum/Best_way_to_manage_files_on_removable_media__63__.mdwn @@ -0,0 +1,18 @@ +I have a bunch of removable storage devices and was planning on storing my data across +all of them. I've run into an annoyance, and would like to see if anybody has any +ideas. + +My goal was to have the full file tree on all the devices, but only a subset of the +annexed data. Where I have run into trouble is removing data from the system. It +seems that the "git annex unused" command checks remote branches as well as local ones +when determining whether an object is referred to. + +This means that if I remove a file that is stored locally, "git annex unused" doesn't +report the corresponding object as unused until I either connect and update all +removable storage *or* remove the remote corresponding to the removable storage. I +posted a bug about this inconsistency named +[[bugs/git annex unused considers remote branches which makes it inconsistent]]. + +If I used the removable storage as a special remote, then I wouldn't have this issue, +but I also wouldn't be able to conveniently use the files on it and manage the repo +from it either. From 4b8ae4f9e89b4a8e5476aaf4fde72ff46118fc9b Mon Sep 17 00:00:00 2001 From: Steve Date: Sat, 22 Dec 2012 16:57:35 +0000 Subject: [PATCH 3/8] added link to forum post --- ...d_considers_remote_branches_which_makes_it_inconsistent.mdwn | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn index 30fe9b00ea..1c3d297879 100644 --- a/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn +++ b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent.mdwn @@ -98,3 +98,5 @@ and possibly information in the git-annex branch. Gentoo Linux, git annex version 3.20121211 #Please provide any additional information below. + +The forum post describing what I was trying to accomplish is [[forum/Best way to manage files on removable media?]] From d98784aab49acbaa9f97f870c247d6495559dae4 Mon Sep 17 00:00:00 2001 From: "http://www.joachim-breitner.de/" Date: Sat, 22 Dec 2012 22:06:45 +0000 Subject: [PATCH 4/8] --- doc/bugs/optinally_transfer_file_unencryptedly.mdwn | 3 +++ 1 file changed, 3 insertions(+) create mode 100644 doc/bugs/optinally_transfer_file_unencryptedly.mdwn diff --git a/doc/bugs/optinally_transfer_file_unencryptedly.mdwn b/doc/bugs/optinally_transfer_file_unencryptedly.mdwn new file mode 100644 index 0000000000..d622fcdab6 --- /dev/null +++ b/doc/bugs/optinally_transfer_file_unencryptedly.mdwn @@ -0,0 +1,3 @@ +I have a git-annex repository on a NSLU 2, and transfers are much slower over ssh compared to unencrypted transfers (no wonder at that CPU speed). For the files that I am transferring, no encryption would be necessary. Unfortunately, ssh in Debian does not support "-c none" to disable encryption. + +It would be nice if git-annex would have a way of conveniently transferring files in another way than SSH. I’m not sure what a good way would be – maybe launching a one-shot HTTP-server on the sending end? Haskell libraries for that would be available... Of course it is not always the case that the host reachable with "ssh foo" is also reachable via TCP at "foo:1234"... And there are surely more problem. But still, it would be nice :-) From 57c4ac76c3cb8c9cc265a502b59715b067427b36 Mon Sep 17 00:00:00 2001 From: "https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo" Date: Sun, 23 Dec 2012 17:40:14 +0000 Subject: [PATCH 5/8] Added a comment --- ...t_1_13a7653d96ddf91f4492a9f3555a69aa._comment | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 doc/bugs/optinally_transfer_file_unencryptedly/comment_1_13a7653d96ddf91f4492a9f3555a69aa._comment diff --git a/doc/bugs/optinally_transfer_file_unencryptedly/comment_1_13a7653d96ddf91f4492a9f3555a69aa._comment b/doc/bugs/optinally_transfer_file_unencryptedly/comment_1_13a7653d96ddf91f4492a9f3555a69aa._comment new file mode 100644 index 0000000000..5e72d5f0a6 --- /dev/null +++ b/doc/bugs/optinally_transfer_file_unencryptedly/comment_1_13a7653d96ddf91f4492a9f3555a69aa._comment @@ -0,0 +1,16 @@ +[[!comment format=mdwn + username="https://www.google.com/accounts/o8/id?id=AItOawmBUR4O9mofxVbpb8JV9mEbVfIYv670uJo" + nickname="Justin" + subject="comment 1" + date="2012-12-23T17:40:13Z" + content=""" +Using a plain tcp connection would be simpler than HTTP, the sending side would just need to tell the receiver to listen on a port and write any data received to a file(or the reverse). Basically what you can do with netcat. + +I had a similar problem, but I found that using arcfour was fast enough: + +.ssh/config: + + Host slow + Ciphers arcfour + +"""]] From f0e3ce5edfb5eaa2e3f0a4aaf90b71b88edeeb21 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Dec 2012 19:26:23 +0000 Subject: [PATCH 6/8] Added a comment --- .../comment_2_31f154011ec26a463de7b1e307e49cb6._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/optinally_transfer_file_unencryptedly/comment_2_31f154011ec26a463de7b1e307e49cb6._comment diff --git a/doc/bugs/optinally_transfer_file_unencryptedly/comment_2_31f154011ec26a463de7b1e307e49cb6._comment b/doc/bugs/optinally_transfer_file_unencryptedly/comment_2_31f154011ec26a463de7b1e307e49cb6._comment new file mode 100644 index 0000000000..2050da5755 --- /dev/null +++ b/doc/bugs/optinally_transfer_file_unencryptedly/comment_2_31f154011ec26a463de7b1e307e49cb6._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.152.246.126" + subject="comment 2" + date="2012-12-23T19:26:23Z" + content=""" +You can configure multiple git remotes that access the same repository using different transports, and use an un-encrypted transport when necessary for speed. I sometimes use an NFS mount for this. +"""]] From 87a35859c9ad8de6fe95975b34de05942531b76e Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Dec 2012 19:29:27 +0000 Subject: [PATCH 7/8] Added a comment --- .../comment_3_33433bcfb1946b52f1f41b9158ab452d._comment | 8 ++++++++ 1 file changed, 8 insertions(+) create mode 100644 doc/bugs/optinally_transfer_file_unencryptedly/comment_3_33433bcfb1946b52f1f41b9158ab452d._comment diff --git a/doc/bugs/optinally_transfer_file_unencryptedly/comment_3_33433bcfb1946b52f1f41b9158ab452d._comment b/doc/bugs/optinally_transfer_file_unencryptedly/comment_3_33433bcfb1946b52f1f41b9158ab452d._comment new file mode 100644 index 0000000000..6b51701a0e --- /dev/null +++ b/doc/bugs/optinally_transfer_file_unencryptedly/comment_3_33433bcfb1946b52f1f41b9158ab452d._comment @@ -0,0 +1,8 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.152.246.126" + subject="comment 3" + date="2012-12-23T19:29:27Z" + content=""" +BTW, I have yet to find any Haskell http library that can upload files without buffering their full contents in memory. (Not, not even http-conduit.) If someone fixes that, git-annex's S3 and WebDAV support will get a lot better and I could consider adding something like what's suggested. +"""]] From 6c435adcde26e14628dfdf4350867a306a885fc6 Mon Sep 17 00:00:00 2001 From: "http://joeyh.name/" Date: Sun, 23 Dec 2012 19:51:53 +0000 Subject: [PATCH 8/8] Added a comment --- ...comment_1_a636ffe55b11c46a0afcc0b9a3a88cd4._comment | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent/comment_1_a636ffe55b11c46a0afcc0b9a3a88cd4._comment diff --git a/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent/comment_1_a636ffe55b11c46a0afcc0b9a3a88cd4._comment b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent/comment_1_a636ffe55b11c46a0afcc0b9a3a88cd4._comment new file mode 100644 index 0000000000..97af66bc71 --- /dev/null +++ b/doc/bugs/git_annex_unused_considers_remote_branches_which_makes_it_inconsistent/comment_1_a636ffe55b11c46a0afcc0b9a3a88cd4._comment @@ -0,0 +1,10 @@ +[[!comment format=mdwn + username="http://joeyh.name/" + ip="4.152.246.126" + subject="comment 1" + date="2012-12-23T19:51:52Z" + content=""" +The goal is not to consider an object unused that some other remote is known to rely on. We try as hard as we can to avoid losing data, at the expense of possibly not dropping unused content as early as possible. + +Running `git annex sync` or similar to get current with the state of all remotes before dropping objects they might still rely on seems reasonable from this perspective. +"""]]