Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2015-04-24 12:54:53 -04:00
commit 511225a515
7 changed files with 148 additions and 1 deletions

View file

@ -0,0 +1,27 @@
### Please describe the problem.
Use of git-annex import --clean-duplicates can cause data loss, where git-annex deletes content that it doesn't actually have a copy of (i.e. there is no duplicate).
### What steps will reproduce the problem?
I've written a quick'n'dirty test script which goes through a bunch of combinations and tests --clean-duplicates. Given an 'origin' repo containing 'a' and 'b' content and a clone of it ('import') which doesn't contain 'a' and 'b' content.
g-a import --clean-duplicates ~/tmp/importme (containing a, b and c) into 'import' after:
Origin is set to trusted in import, b is dropped from within origin:
b is deleted from importme even though no annexes have copies (reasonable, as origin is set to trusted and import thinks it has the content).
Origin is set to semitrusted in import, b is dropped within origin:
b is deleted from importme even though no annexes have copies (this is most likely one to bite people).
Origin is set to untrusted in import, b is dropped within origin:
b is deleted from importme even though no annexes have copies and git-annex has been explicitly told to not trust information about origin :( This is really surprising behaviour!
### What version of git-annex are you using? On what operating system?
* 5.20150409
* Arch Linux (git-annex-bin)
### Please provide any additional information below.
I can provide the script if it is wanted (coded in Perl, couple of non-core dependencies).

View file

@ -0,0 +1,36 @@
[[!comment format=mdwn
username="CandyAngel"
subject="comment 1"
date="2015-04-24T11:57:35Z"
content="""
Command to exemplify the \"worst case\" (untrusted causing deletion):
mkdir /tmp/ga-icd
cd /tmp/ga-icd
git init origin
cd origin
git commit -m create
git annex init origin
echo a > a
echo b > b
git annex add .
git commit -m files
mkdir /tmp/ga-icd/importme
echo a > a
echo b > b
echo c > c
cd /tmp/ga-icd
git clone origin import
git annex init import
So we now have origin (with content for 2 files), import which knows origin has content for both files and directory we want to clean up. The following causes 'b' to be lost (hope you have backups!).
cd /tmp/ga-icd/origin
git annex drop b --force
cd /tmp/ga-icd/import
git annex untrust origin
git annex import --clean-duplicates /tmp/ga-icd/importme
"""]]

View file

@ -0,0 +1,61 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawnSenxKyE_2Z6Wb-EBMO8FciyRywjx1ZiQ"
nickname="Walter"
subject="comment 12"
date="2015-04-23T21:03:36Z"
content="""
For completeness, here is the output when I get a file that *is* properly in the bucket (and you could use for any further testing you need to do).
While this may have been caused by some misconfiguration on my part (though I'm not entirely sure how that could happen, strangely it would be easier to muck up now enableremote doesn't create a new bucket), I feel the potential harm here (the location information being wrong) is quite serious. (I'm sure this point does not escape you).
[[!format sh \"\"\"
>git annex get --force --debug file.jpg --from cloud
[2015-04-23 21:52:41 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"git-annex\"]
[2015-04-23 21:52:41 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
[2015-04-23 21:52:41 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"log\",\"refs/heads/git-annex..cb0f954d09e3ea28171434e0e7499c84d1722fce\",\"-n1\",\"--pretty=%H\"]
[2015-04-23 21:52:41 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"log\",\"refs/heads/git-annex..573f75e01681e9bf2b513bc85e18fc250298a4d3\",\"-n1\",\"--pretty=%H\"]
[2015-04-23 21:52:41 BST] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"cat-file\",\"--batch\"]
[2015-04-23 21:52:41 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"ls-files\",\"--cached\",\"-z\",\"--\",\"file.jpg\"]
[2015-04-23 21:52:41 BST] chat: gpg [\"--batch\",\"--no-tty\",\"--use-agent\",\"--quiet\",\"--trust-model\",\"always\",\"--decrypt\"]
(checking cloud...) [2015-04-23 21:52:42 BST] String to sign: \"HEAD\n\n\nThu, 23 Apr 2015 20:52:42 GMT\n/BUCKET/GPGHMACSHA1--08b3dee71059819e3558ac9ef8b82ad87e2d8951\"
[2015-04-23 21:52:42 BST] Host: \"BUCKET.s3-ap-southeast-2.amazonaws.com\"
[2015-04-23 21:52:42 BST] Path: \"/GPGHMACSHA1--08b3dee71059819e3558ac9ef8b82ad87e2d8951\"
[2015-04-23 21:52:42 BST] Query string: \"\"
[2015-04-23 21:52:42 BST] Response status: Status {statusCode = 200, statusMessage = \"OK\"}
[2015-04-23 21:52:42 BST] Response header 'x-amz-id-2': 'f8bEclNud1KNHevvGPVHutG3V0TH/ixnMSuu3NBhEKRrWaUYtENbKyA5PyxCdSrz0REgq/Bgu1w='
[2015-04-23 21:52:42 BST] Response header 'x-amz-request-id': '7A344C3C3A27308E'
[2015-04-23 21:52:42 BST] Response header 'Date': 'Thu, 23 Apr 2015 20:52:43 GMT'
[2015-04-23 21:52:42 BST] Response header 'Last-Modified': 'Fri, 31 Oct 2014 07:03:03 GMT'
[2015-04-23 21:52:42 BST] Response header 'ETag': '\"66a85b0007a52d82e5bd29192ebdb510\"'
[2015-04-23 21:52:42 BST] Response header 'Accept-Ranges': 'bytes'
[2015-04-23 21:52:42 BST] Response header 'Content-Type': ''
[2015-04-23 21:52:42 BST] Response header 'Content-Length': '46058'
[2015-04-23 21:52:42 BST] Response header 'Server': 'AmazonS3'
[2015-04-23 21:52:42 BST] Response metadata: S3: request ID=<none>, x-amz-id-2=<none>
get file.jpg (from cloud...)
[2015-04-23 21:52:42 BST] String to sign: \"GET\n\n\nThu, 23 Apr 2015 20:52:42 GMT\n/BUCKET/GPGHMACSHA1--08b3dee71059819e3558ac9ef8b82ad87e2d8951\"
[2015-04-23 21:52:42 BST] Host: \"BUCKET.s3-ap-southeast-2.amazonaws.com\"
[2015-04-23 21:52:42 BST] Path: \"/GPGHMACSHA1--08b3dee71059819e3558ac9ef8b82ad87e2d8951\"
[2015-04-23 21:52:42 BST] Query string: \"\"
[2015-04-23 21:52:43 BST] Response status: Status {statusCode = 200, statusMessage = \"OK\"}
[2015-04-23 21:52:43 BST] Response header 'x-amz-id-2': 'LRDMgQAj+F81m3UqDebJ5CoZdyM/c2tMaFUvhjn8kjqq3x2Evy7O+wgLUiwE7lqascd0yrHR+xA='
[2015-04-23 21:52:43 BST] Response header 'x-amz-request-id': '068D946E995E7473'
[2015-04-23 21:52:43 BST] Response header 'Date': 'Thu, 23 Apr 2015 20:52:44 GMT'
[2015-04-23 21:52:43 BST] Response header 'Last-Modified': 'Fri, 31 Oct 2014 07:03:03 GMT'
[2015-04-23 21:52:43 BST] Response header 'ETag': '\"66a85b0007a52d82e5bd29192ebdb510\"'
[2015-04-23 21:52:43 BST] Response header 'Accept-Ranges': 'bytes'
[2015-04-23 21:52:43 BST] Response header 'Content-Type': ''
[2015-04-23 21:52:43 BST] Response header 'Content-Length': '46058'
[2015-04-23 21:52:43 BST] Response header 'Server': 'AmazonS3'
[2015-04-23 21:52:43 BST] Response metadata: S3: request ID=068D946E995E7473, x-amz-id-2=LRDMgQAj+F81m3UqDebJ5CoZdyM/c2tMaFUvhjn8kjqq3x2Evy7O+wgLUiwE7lqascd0yrHR+xA=
99% 22.5KB/s 0s[2015-04-23 21:52:44 BST] chat: gpg [\"--batch\",\"--no-tty\",\"--use-agent\",\"--quiet\",\"--trust-model\",\"always\",\"--batch\",\"--passphrase-fd\",\"14\",\"--decrypt\"]
ok
[2015-04-23 21:52:44 BST] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"hash-object\",\"-w\",\"--stdin-paths\",\"--no-filters\"]
[2015-04-23 21:52:44 BST] feed: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"update-index\",\"-z\",\"--index-info\"]
[2015-04-23 21:52:44 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"show-ref\",\"--hash\",\"refs/heads/git-annex\"]
(recording state in git...)
[2015-04-23 21:52:44 BST] read: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"write-tree\"]
[2015-04-23 21:52:44 BST] chat: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"commit-tree\",\"444e504a0ab73d01df08ef731e691205cfd485f5\",\"--no-gpg-sign\",\"-p\",\"refs/heads/git-annex\"]
[2015-04-23 21:52:44 BST] call: git [\"--git-dir=.git\",\"--work-tree=.\",\"--literal-pathspecs\",\"update-ref\",\"refs/heads/git-annex\",\"6e57ed008525cd58641c54a5ac6f07a960a7dc5c\"]
\"\"\"]]
"""]]

View file

@ -1,4 +1,4 @@
Posted a design for [[balanced_preferred_content]]. This would let
Posted a design for [[design/balanced_preferred_content]]. This would let
preferred content expressions assign each file to N repositories out of a
group, selected using Math. Adding a repository could optionally be
configured to automatically rebalance the files (not very bandwidth

View file

@ -0,0 +1,8 @@
[glacier-cli](https://github.com/basak/glacier-cli) calls its own command `glacier` rather than `glacier-cli` or something else. This conflicts with [boto](https://github.com/boto/boto/)'s own `glacier` executable, as noted here:
* <https://github.com/basak/glacier-cli/issues/30>
* <https://github.com/basak/glacier-cli/issues/47>
Whilst the `glacier-cli` project should resolve this conflict, it would be good if git-annex could be made to use a configurable path for this executable, rather than just assuming that it has been installed as `glacier`. After all, its installation procedure is simply telling the user to run `ln -s`, so there's no reason why the user couldn't make the target of this command `~/bin/glacier-cli` rather than `~/bin/glacier` - it's really irrelevant what the source file inside the git repo is called.
Of course, [`checkSaneGlacierCommand`](https://github.com/joeyh/git-annex/blob/master/Remote/Glacier.hs#L307) is still very much worth having, for safety.

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="basak"
subject="comment 1"
date="2015-04-24T15:48:48Z"
content="""
Well, it's supposed to be a command line command, and I don't type `cd-cli` and `ls-cli`. So while `glacier-cli` might be fine as a project name and is fine for a name for integration, I don't think it makes sense to call it that in `/usr/bin/`, which is why I didn't. I'd prefer to have seen that boto integrate an improved `glacier` command, or for packaging to provide this one as an alternative (like `mawk` vs. `gawk` as `/usr/bin/awk`). But upstream boto considers themselves deprecated, so that's not going to happen. One of these days I'll package glacier-cli up for Debian, at which point I'll see if the boto maintainer is interested in doing something, since I don't actually believe anybody uses boto's glacier command (since it's mostly useless).
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://adamspiers.wordpress.com/"
nickname="adamspiers"
subject="Good point"
date="2015-04-24T15:55:29Z"
content="""
glacier-cli would be a rather silly name to put in `/usr/bin`. How about `glcr`, as suggested [here](https://github.com/basak/glacier-cli/issues/30#issuecomment-95972840)?
"""]]