Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
98881ffc6e
14 changed files with 179 additions and 1 deletions
9
doc/bugs/Issue_fewer_S3_GET_requests.mdwn
Normal file
9
doc/bugs/Issue_fewer_S3_GET_requests.mdwn
Normal file
|
@ -0,0 +1,9 @@
|
|||
It appears that git-annex issues one GET request to S3 / Google cloud for every file it tries to copy, if you don't pass --fast. (I could be wrong; I'm basing this on the fact that each "checking <remote name>" takes about the same amount of time, and that it's slow enough to be hitting the network.)
|
||||
|
||||
Amazon lets you GET 1000 objects in one GET request, and afaict a request that returns 1000 objects costs just as much as a request that returns 1 object. The cost of GET'ing every file in my annex is nontrivial -- Google charges 0.01 per 1000 GETs, and my repo has 130k objects, so that's $1.3, compared to a monthly cost for storage of under $10. This means that if I want to back up my files more than, say, once a week, I need to write a script that parses the JSON output of git annex whereis and uploads with --fast only the files that aren't present in the cloud. It also means that I have to trust the output of whereis.
|
||||
|
||||
All those GETs also slow down the non-fast copy, and this also applies to other kinds of remotes.
|
||||
|
||||
There are a number of ways one could implement this. One way would be to have a command that updates the whereis data from the remote and then to add a parameter (maybe you already have it) to copy that's like --fast but skips files that are already present (maybe this is what --fast already does, but I did a quick check and it doesn't seem to). Because of the way git annex names files, I think it would be hard to coalesce GETs during a copy command, but it could be done.
|
||||
|
||||
Anyway, please don't consider this a high-priority request; I can get by as-is, and I <3 git annex.
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawkvSZ1AFJdY_1FeutZr_KWeqtzjZta1PNE"
|
||||
nickname="Thedward"
|
||||
subject="comment 10"
|
||||
date="2014-10-21T21:25:57Z"
|
||||
content="""
|
||||
The only files that succeeded were small text files. The other files — 3-200MiB — all failed.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 9"
|
||||
date="2014-10-21T20:22:52Z"
|
||||
content="""
|
||||
How big is the file that it fails to copy?
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 5"
|
||||
date="2014-10-21T19:59:06Z"
|
||||
content="""
|
||||
Recent autobuilds will also print out some useful info when you run `git annex info glacier`, including where it's getting the AWS credentials from.
|
||||
"""]]
|
|
@ -0,0 +1,48 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawkck-Tokgfh_1Fwh6pkl69xPA_dYUgA4Tg"
|
||||
nickname="Benjamin"
|
||||
subject="autobuild test"
|
||||
date="2014-10-21T22:09:47Z"
|
||||
content="""
|
||||
Okay I managed to package the autobuild for my Arch system and installed. Here is what I get, retrieving finished glacier retrieval jobs which was started yesterday:
|
||||
|
||||
Without AWS credentials as environment variables, the call fails:
|
||||
[[!format sh \"\"\"
|
||||
[ben@voyagerS9 annextest]$ git annex get --from=glacier mydir/myfile1
|
||||
get mydir/myfile (from glacier...) (gpg)
|
||||
['/usr/local/bin/glacier', '--region=us-east-1', 'archive', 'retrieve', '-o-', 'glacier-myvault', 'GPGHMACSHA1--4286b1a121892c9e64de436725478b0bc5038e67']
|
||||
glacier: archive 'GPGHMACSHA1--4286b1a121892c9e64de436725478b0bc5038e67' not found
|
||||
failed
|
||||
git-annex: get: 1 failed
|
||||
\"\"\"]]
|
||||
|
||||
I patched the glacier-cli Python source so that it prints out the command arguments argv.
|
||||
The archive _does_ exist. Executing the glacier-cli command manually is successful. So is calling
|
||||
git-annex with AWS credentials exported into env:
|
||||
|
||||
[[!format sh \"\"\"
|
||||
[ben@voyagerS9 annextest]$ git annex get --from=glacier mydir/myfile2
|
||||
get mydir/myfile2 (from glacier...) (gpg)
|
||||
['/usr/local/bin/glacier', '--region=us-east-1', 'archive', 'retrieve', '-o-', 'glacier-myvault', 'GPGHMACSHA1--c3827c03d48b4829c7cc584778652c66e2784b0f']
|
||||
ok
|
||||
(Recording state in git...)
|
||||
\"\"\"]]
|
||||
|
||||
So I guess one bug is fixed, although I think there is a wrong error message.
|
||||
|
||||
Regarding AWS credentials, I have no success in updating credentials or finding out which if any are embedded:
|
||||
[[!format sh \"\"\"
|
||||
[ben@voyagerS9 annextest]$ git annex info glacier
|
||||
remote: glacier
|
||||
description: [glacier]
|
||||
uuid: b4dcf525-40c7-4f04-86cc-3850d1260680
|
||||
cost: 1050.0
|
||||
type: glacier
|
||||
glacier vault: glacier-myvault
|
||||
encryption: encrypted (to gpg keys: MYKEY)
|
||||
chunking: none
|
||||
\"\"\"]]
|
||||
|
||||
When I checkout the git-annex branch and look into the remote.log I see fields for cipher, cipherkeys, datacenter, embedcreds=yes, name, s3creds, type, vault, timestamp.
|
||||
The s3creds field does not look like my current AWS credentials, at least not in plaintext.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 2"
|
||||
date="2014-10-21T20:20:25Z"
|
||||
content="""
|
||||
The problem is that there's no way for preferred content expressions to specify that a file is wanted just because some old version of the file is (or was) present.
|
||||
|
||||
It's not clear to me how that could be added to the preferred content expressions in an efficient way.
|
||||
|
||||
It might be possible to hack `git annex sync --content` and the assistant to look at incoming merges, and queue downloads of newer versions of files before merging.
|
||||
|
||||
Also being discussed at <https://github.com/datalad/datalad/issues/6>.
|
||||
"""]]
|
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 1"
|
||||
date="2014-10-21T20:07:40Z"
|
||||
content="""
|
||||
A \"small archive\" only wants to contain files that are located inside archive/ directories.
|
||||
|
||||
That seems to explain everything you reported except for:
|
||||
|
||||
> 6. but the sizes are really small, seems that the actual files are not being transferred
|
||||
|
||||
Maybe the remote is configured to use chunking? What happens if you run `git annex fsck --from $remotename` after copying a file to it? Any problem detected?
|
||||
|
||||
> The add remote interface stops at \"check remote\" prompt for a long time without completing
|
||||
|
||||
Please explain exactly what you did in the webapp. What did you click on, and what did you enter? I need enough detail to be able to reproduce the problem.
|
||||
|
||||
(Also, in the future, one problem per bug report turns out to be a lot less confusing, and have better results all around. True here and really anywhere..)
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 2"
|
||||
date="2014-10-21T20:02:40Z"
|
||||
content="""
|
||||
Can you please provide more information, like showing the commits made to the git-annex branch when the configuration was reverted?
|
||||
|
||||
Also, might some of the clocks of computers where you're using git-annex be set wrong?
|
||||
|
||||
I have tagged this report moreinfo because I don't have enough information to do anything else.
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="musella"
|
||||
ip="84.73.42.152"
|
||||
subject="comment 2"
|
||||
date="2014-10-21T23:35:35Z"
|
||||
content="""
|
||||
what is the minimal kernel version that I would need?
|
||||
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 3"
|
||||
date="2014-10-22T16:22:24Z"
|
||||
content="""
|
||||
I know kernel 3.2 would work. I don't know what the minimum kernel supported by glibc 2.13 is.
|
||||
"""]]
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://www.google.com/accounts/o8/id?id=AItOawlog_5wIICaMcrKTexlFNA6IO6UTp323aE"
|
||||
nickname="Torkaly"
|
||||
subject="comment 2"
|
||||
date="2014-10-22T08:56:43Z"
|
||||
content="""
|
||||
Thank you for your response.
|
||||
|
||||
So annex looks like it's not really designed to work with an existing git repository, but only standalone?!
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 3"
|
||||
date="2014-10-22T16:18:16Z"
|
||||
content="""
|
||||
I struggle to see how you could draw that conclusion from what I said.
|
||||
|
||||
git-annex will work fine in an existing git repository. You can mix regular git commands like `git add`, `git push`, `git pull`, `git merge` with git-annex commands like `git annex add`, `git annex copy --to origin`, `git annex get`, `git annex merge`, in the same repository.
|
||||
|
||||
The `git annex sync` command effcetively runs `git commit; git pull; git annex merge; git push; git annex copy --to origin; git annex get`. If you don't want to run all those commands at once, you don't want to run `git annex sync`. That will not prevent you from using git-annex in any way.
|
||||
"""]]
|
|
@ -0,0 +1,12 @@
|
|||
[[!comment format=mdwn
|
||||
username="http://joeyh.name/"
|
||||
ip="209.250.56.96"
|
||||
subject="comment 4"
|
||||
date="2014-10-22T16:18:50Z"
|
||||
content="""
|
||||
I struggle to see how you could draw that conclusion from what I said.
|
||||
|
||||
git-annex will work fine in an existing git repository. You can mix regular git commands like `git add`, `git push`, `git pull`, `git merge` with git-annex commands like `git annex add`, `git annex copy --to origin`, `git annex get`, `git annex merge`, in the same repository.
|
||||
|
||||
The `git annex sync` command effcetively runs `git commit; git pull; git annex merge; git push; git annex copy --to origin; git annex get`. If you don't want to run all those commands at once, you don't want to run `git annex sync`. That will not prevent you from using git-annex in any way.
|
||||
"""]]
|
|
@ -31,7 +31,7 @@ XFCE uses the Thunar file manager, which can also be easily configured to allow
|
|||
|
||||
for drop, and for get:
|
||||
|
||||
git-annex drop --notify-start --notify-finish -- %F
|
||||
git-annex get --notify-start --notify-finish -- %F
|
||||
|
||||
This gives me the resulting config on disk, in `.config/Thunar/uca.xml`:
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue