Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2014-10-22 13:13:42 -04:00
commit 98881ffc6e
14 changed files with 179 additions and 1 deletions

View file

@ -0,0 +1,9 @@
It appears that git-annex issues one GET request to S3 / Google cloud for every file it tries to copy, if you don't pass --fast. (I could be wrong; I'm basing this on the fact that each "checking <remote name>" takes about the same amount of time, and that it's slow enough to be hitting the network.)
Amazon lets you GET 1000 objects in one GET request, and afaict a request that returns 1000 objects costs just as much as a request that returns 1 object. The cost of GET'ing every file in my annex is nontrivial -- Google charges 0.01 per 1000 GETs, and my repo has 130k objects, so that's $1.3, compared to a monthly cost for storage of under $10. This means that if I want to back up my files more than, say, once a week, I need to write a script that parses the JSON output of git annex whereis and uploads with --fast only the files that aren't present in the cloud. It also means that I have to trust the output of whereis.
All those GETs also slow down the non-fast copy, and this also applies to other kinds of remotes.
There are a number of ways one could implement this. One way would be to have a command that updates the whereis data from the remote and then to add a parameter (maybe you already have it) to copy that's like --fast but skips files that are already present (maybe this is what --fast already does, but I did a quick check and it doesn't seem to). Because of the way git annex names files, I think it would be hard to coalesce GETs during a copy command, but it could be done.
Anyway, please don't consider this a high-priority request; I can get by as-is, and I <3 git annex.

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkvSZ1AFJdY_1FeutZr_KWeqtzjZta1PNE"
nickname="Thedward"
subject="comment 10"
date="2014-10-21T21:25:57Z"
content="""
The only files that succeeded were small text files. The other files — 3-200MiB — all failed.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 9"
date="2014-10-21T20:22:52Z"
content="""
How big is the file that it fails to copy?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 5"
date="2014-10-21T19:59:06Z"
content="""
Recent autobuilds will also print out some useful info when you run `git annex info glacier`, including where it's getting the AWS credentials from.
"""]]

View file

@ -0,0 +1,48 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawkck-Tokgfh_1Fwh6pkl69xPA_dYUgA4Tg"
nickname="Benjamin"
subject="autobuild test"
date="2014-10-21T22:09:47Z"
content="""
Okay I managed to package the autobuild for my Arch system and installed. Here is what I get, retrieving finished glacier retrieval jobs which was started yesterday:
Without AWS credentials as environment variables, the call fails:
[[!format sh \"\"\"
[ben@voyagerS9 annextest]$ git annex get --from=glacier mydir/myfile1
get mydir/myfile (from glacier...) (gpg)
['/usr/local/bin/glacier', '--region=us-east-1', 'archive', 'retrieve', '-o-', 'glacier-myvault', 'GPGHMACSHA1--4286b1a121892c9e64de436725478b0bc5038e67']
glacier: archive 'GPGHMACSHA1--4286b1a121892c9e64de436725478b0bc5038e67' not found
failed
git-annex: get: 1 failed
\"\"\"]]
I patched the glacier-cli Python source so that it prints out the command arguments argv.
The archive _does_ exist. Executing the glacier-cli command manually is successful. So is calling
git-annex with AWS credentials exported into env:
[[!format sh \"\"\"
[ben@voyagerS9 annextest]$ git annex get --from=glacier mydir/myfile2
get mydir/myfile2 (from glacier...) (gpg)
['/usr/local/bin/glacier', '--region=us-east-1', 'archive', 'retrieve', '-o-', 'glacier-myvault', 'GPGHMACSHA1--c3827c03d48b4829c7cc584778652c66e2784b0f']
ok
(Recording state in git...)
\"\"\"]]
So I guess one bug is fixed, although I think there is a wrong error message.
Regarding AWS credentials, I have no success in updating credentials or finding out which if any are embedded:
[[!format sh \"\"\"
[ben@voyagerS9 annextest]$ git annex info glacier
remote: glacier
description: [glacier]
uuid: b4dcf525-40c7-4f04-86cc-3850d1260680
cost: 1050.0
type: glacier
glacier vault: glacier-myvault
encryption: encrypted (to gpg keys: MYKEY)
chunking: none
\"\"\"]]
When I checkout the git-annex branch and look into the remote.log I see fields for cipher, cipherkeys, datacenter, embedcreds=yes, name, s3creds, type, vault, timestamp.
The s3creds field does not look like my current AWS credentials, at least not in plaintext.
"""]]

View file

@ -0,0 +1,14 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 2"
date="2014-10-21T20:20:25Z"
content="""
The problem is that there's no way for preferred content expressions to specify that a file is wanted just because some old version of the file is (or was) present.
It's not clear to me how that could be added to the preferred content expressions in an efficient way.
It might be possible to hack `git annex sync --content` and the assistant to look at incoming merges, and queue downloads of newer versions of files before merging.
Also being discussed at <https://github.com/datalad/datalad/issues/6>.
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 1"
date="2014-10-21T20:07:40Z"
content="""
A \"small archive\" only wants to contain files that are located inside archive/ directories.
That seems to explain everything you reported except for:
> 6. but the sizes are really small, seems that the actual files are not being transferred
Maybe the remote is configured to use chunking? What happens if you run `git annex fsck --from $remotename` after copying a file to it? Any problem detected?
> The add remote interface stops at \"check remote\" prompt for a long time without completing
Please explain exactly what you did in the webapp. What did you click on, and what did you enter? I need enough detail to be able to reproduce the problem.
(Also, in the future, one problem per bug report turns out to be a lot less confusing, and have better results all around. True here and really anywhere..)
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 2"
date="2014-10-21T20:02:40Z"
content="""
Can you please provide more information, like showing the commits made to the git-annex branch when the configuration was reverted?
Also, might some of the clocks of computers where you're using git-annex be set wrong?
I have tagged this report moreinfo because I don't have enough information to do anything else.
"""]]

View file

@ -0,0 +1,9 @@
[[!comment format=mdwn
username="musella"
ip="84.73.42.152"
subject="comment 2"
date="2014-10-21T23:35:35Z"
content="""
what is the minimal kernel version that I would need?
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 3"
date="2014-10-22T16:22:24Z"
content="""
I know kernel 3.2 would work. I don't know what the minimum kernel supported by glibc 2.13 is.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="https://www.google.com/accounts/o8/id?id=AItOawlog_5wIICaMcrKTexlFNA6IO6UTp323aE"
nickname="Torkaly"
subject="comment 2"
date="2014-10-22T08:56:43Z"
content="""
Thank you for your response.
So annex looks like it's not really designed to work with an existing git repository, but only standalone?!
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 3"
date="2014-10-22T16:18:16Z"
content="""
I struggle to see how you could draw that conclusion from what I said.
git-annex will work fine in an existing git repository. You can mix regular git commands like `git add`, `git push`, `git pull`, `git merge` with git-annex commands like `git annex add`, `git annex copy --to origin`, `git annex get`, `git annex merge`, in the same repository.
The `git annex sync` command effcetively runs `git commit; git pull; git annex merge; git push; git annex copy --to origin; git annex get`. If you don't want to run all those commands at once, you don't want to run `git annex sync`. That will not prevent you from using git-annex in any way.
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="http://joeyh.name/"
ip="209.250.56.96"
subject="comment 4"
date="2014-10-22T16:18:50Z"
content="""
I struggle to see how you could draw that conclusion from what I said.
git-annex will work fine in an existing git repository. You can mix regular git commands like `git add`, `git push`, `git pull`, `git merge` with git-annex commands like `git annex add`, `git annex copy --to origin`, `git annex get`, `git annex merge`, in the same repository.
The `git annex sync` command effcetively runs `git commit; git pull; git annex merge; git push; git annex copy --to origin; git annex get`. If you don't want to run all those commands at once, you don't want to run `git annex sync`. That will not prevent you from using git-annex in any way.
"""]]

View file

@ -31,7 +31,7 @@ XFCE uses the Thunar file manager, which can also be easily configured to allow
for drop, and for get:
git-annex drop --notify-start --notify-finish -- %F
git-annex get --notify-start --notify-finish -- %F
This gives me the resulting config on disk, in `.config/Thunar/uca.xml`: