Merge branch 'master' of ssh://git-annex.branchable.com
This commit is contained in:
commit
4375c5c837
12 changed files with 297 additions and 0 deletions
117
doc/bugs/Improvements_to_S3_glacier_integration.mdwn
Normal file
117
doc/bugs/Improvements_to_S3_glacier_integration.mdwn
Normal file
|
@ -0,0 +1,117 @@
|
|||
### Please describe the problem.
|
||||
|
||||
I have a git annex remote on s3 configured to push things to glacier rather than normal storage. Compared to regular s3 things in glacier are not immediately available and must be "restored" before they can be downloaded (the trade off is that data which is untouched long term is quite a lot cheaper per GB). I'm using the DEEP_ARCHIVE storage class (configured using the `storageclass` key in the remote's config, I didn't fiddle with the s3 bucket lifecycle at all). I think the following applies to any Glacier stored objects, the class just changes how long a restore will take.
|
||||
|
||||
My annexed objects are > 1GB and the s3 remote is chunked at 1GB granularity.
|
||||
|
||||
When I attempt to `git annex get` such an object the error message misleading refers to the unchunked path e.g. `/SHA256E-s6749536256--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso` rather than `/SHA256E-s6749536256-S1000000000-C1--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso` etc, which sent me down a blind alleyway for a bit. If I `git annex get -d` then I can see in the log that it tries the chunked path first and the falls back to the unchunked. It would be useful if the non-verbose error message listed the first attempt and the fallback. It would be even better if it could be aware enough of Glacier to point out that some list objects need to be manually restored in order to be retrieved.
|
||||
|
||||
In my quest to manually restore I could not for the life of me figure out (even going into the plumbing layers of git etc) how to retrieve a list of the chunks needed. I can get the key from `git annex info` easily enough and then `aws s3api list-objects --bucket <...> --prefix` to look for chunks of objects with the `SHA256E-s6749536256` prefix which works ok so long as all objects in the annex are different sizes -- AWS CLI seems to only lets you filter by prefix not a glob. I could probably list everything and extract what I wanted with `jq` but I _think_ there are cost implications to listing everything (although I might be wrong about that, and it wouldn't be a lot of money for my use case in any event).
|
||||
|
||||
Fixing those two minor issues (the error reporting and the ability to get the list of chunks) would be a massive improvement to the usability of S3/glacier remotes IMHO, especially if the output of the latter were consumable by scripts.
|
||||
|
||||
I will also include my manual steps to restore in the final section, it would be amazing of git annex could learn to do all this itself though...
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
Given an S3/glacier remote with chunked objects in it just a `git annex get` for an object in it will do.
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
8.20210223-1 on Debian, from the Debian archive.
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
Issue with `git annex get` error logging:
|
||||
|
||||
[[!format sh """
|
||||
|
||||
$ git annex get OBJECT.iso
|
||||
get OBJECT.iso (from s3...)
|
||||
|
||||
HttpExceptionRequest Request {
|
||||
host = "<REDACTED>"
|
||||
port = 80
|
||||
secure = False
|
||||
requestHeaders = [("Date","Sun, 25 Apr 2021 09:42:56 GMT"),("Authorization","<REDACTED>")]
|
||||
path = "/SHA256E-s6749536256--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
queryString = ""
|
||||
method = "GET"
|
||||
proxy = Nothing
|
||||
rawBody = False
|
||||
redirectCount = 10
|
||||
responseTimeout = ResponseTimeoutDefault
|
||||
requestVersion = HTTP/1.1
|
||||
}
|
||||
(StatusCodeException (Response {responseStatus = Status {statusCode = 404, statusMessage = "Not Found"}, responseVersion = HTTP/1.1, responseHeaders = [("x-amz-request-id","YVCPYD2RW4QEWMWN"),("x-amz-id-2","6dJY8ceWlLOSNIyTTchniLm5+cvJLovbMZL44YjNmViGwfChQSmWLl6VI6E5sFNDbMpwUeBhpbA="),("Content-Type","application/xml"),("Transfer-Encoding","chunked"),("Date","Sun, 25 Apr 2021 09:42:55 GMT"),("Server","AmazonS3")], responseBody = (), responseCookieJar = CJ {expose = []}, responseClose' = ResponseClose}) "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>SHA256E-s6749536256--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso</Key><RequestId>YVCPYD2RW4QEWMWN</RequestId><HostId>6dJY8ceWlLOSNIyTTchniLm5+cvJLovbMZL44YjNmViGwfChQSmWLl6VI6E5sFNDbMpwUeBhpbA=</HostId></Error>")
|
||||
|
||||
Unable to access these remotes: s3
|
||||
|
||||
No other repository is known to contain the file.
|
||||
|
||||
(Note that these git remotes have annex-ignore set: origin)
|
||||
failed
|
||||
git-annex: get: 1 failed
|
||||
|
||||
$ git annex get -d OBJECT.iso
|
||||
[...]
|
||||
(from s3...)
|
||||
[2021-04-25 10:43:44.098104491] Path: "/SHA256E-s6749536256-S1000000000-C1--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
[...]
|
||||
[2021-04-25 10:43:44.200461839] Response status: Status {statusCode = 403, statusMessage = "Forbidden"}
|
||||
[...]
|
||||
[2021-04-25 10:43:44.238156623] Response status: Status {statusCode = 404, statusMessage = "Not Found"}
|
||||
[...]
|
||||
*** error message as above ***
|
||||
|
||||
"""]]
|
||||
|
||||
It's notable that the status codes differ, since the chunk is present but not currently accessible while the unchunked just isn't there.
|
||||
|
||||
Manually fetching things:
|
||||
|
||||
[[!format sh """
|
||||
|
||||
: 1. Figure out the key and the prefix on it:
|
||||
|
||||
$ git annex info OBJECT.iso
|
||||
file: OBJECT.iso
|
||||
size: 6.75 gigabytes
|
||||
key: SHA256E-s6749536256--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso
|
||||
present: false
|
||||
|
||||
: 2. Find the number of chunks using the prefix:
|
||||
|
||||
$ aws s3api list-objects --bucket <BUCKET> --prefix SHA256E-s6749536256 | jq '.Contents[].Key'
|
||||
"SHA256E-s6749536256-S1000000000-C1--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C2--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C3--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C4--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C5--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C6--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
"SHA256E-s6749536256-S1000000000-C7--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso"
|
||||
|
||||
: 3. Request a restore of those chunks:
|
||||
|
||||
$ for i in $(seq 1 7) ; do aws s3api restore-object --bucket <BUCKET> --key SHA256E-s6749536256-S1000000000-C${i}--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso --restore-request Days=1 ; done
|
||||
|
||||
: 4. Poll for completion, for DEEP_ARCHIVE restores happen on the order of hours.
|
||||
|
||||
$ until
|
||||
for i in $(seq 1 7) ; do aws s3api head-object --bucket <BUCKET> --key SHA256E-s6749536256-S1000000000-C${i}--f76639fa11276b4045844e6110035c15e6803acc38d77847c2e4a2be1b1850ca.iso ; done | jq -r .Restore
|
||||
git annex get OBJECT.iso
|
||||
do echo "$(date): sleeping..." ; sleep 1h; done
|
||||
|
||||
ongoing-request="true"
|
||||
ongoing-request="true"
|
||||
...eventually becoming
|
||||
ongoing-request="false", expiry-date="Mon, 26 Apr 2021 00:00:00 GMT"
|
||||
... for all objects then the "git annex get" succeeds and the loop exits
|
||||
|
||||
|
||||
|
||||
"""]]
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Yes, for loads of stuff. It's awesome, thanks!
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Lukey"
|
||||
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
|
||||
subject="comment 1"
|
||||
date="2021-04-25T11:53:29Z"
|
||||
content="""
|
||||
I think that is expected, from [[special_remotes/glacier]]: \"To deal with this, commands like \"git-annex get\" request Glacier start the retrieval process, and will fail due to the data not yet being available. \"
|
||||
"""]]
|
|
@ -0,0 +1,15 @@
|
|||
[[!comment format=mdwn
|
||||
username="felix.hagemann@b76e9ea0928cf33dacffc37ec3dbecf33171a8a5"
|
||||
nickname="felix.hagemann"
|
||||
avatar="http://cdn.libravatar.org/avatar/1f7e89860de517a494f35ebfb385288e"
|
||||
subject="Still happening"
|
||||
date="2021-04-25T16:20:04Z"
|
||||
content="""
|
||||
I am still suffering from the same problem, git-annex versions are:
|
||||
laptop git-annex version: 8.20210223
|
||||
server git-annex version: 8.20200330
|
||||
|
||||
Also as soon as there is a bit more something to be synced the repositories frequently need to be repaired, which on the server takes days for this 130GB repo.
|
||||
|
||||
Anything I can do to help debug this?
|
||||
"""]]
|
50
doc/bugs/newly_created_annex_fails_fsck.mdwn
Normal file
50
doc/bugs/newly_created_annex_fails_fsck.mdwn
Normal file
|
@ -0,0 +1,50 @@
|
|||
Even in a newly created annex, git annex fsck fails. Furthermore, the suggested solution apparently does not work. Maybe the warning is simply spurious. Does this occur in the latest release?
|
||||
|
||||
[[!format sh """
|
||||
# If you can, paste a complete transcript of the problem occurring here.
|
||||
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
|
||||
|
||||
r0lf@work:/tmp/tmp$ dpkg -l git-annex
|
||||
Desired=Unknown/Install/Remove/Purge/Hold
|
||||
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|
||||
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
|
||||
||/ Name Version Architecture Description
|
||||
+++-==============-============-============-===============================================================
|
||||
ii git-annex 8.20200226-1 amd64 manage files with git, without checking their contents into git
|
||||
|
||||
r0lf@work:/tmp/tmp$ git init;git annex init
|
||||
Initialized empty Git repository in /tmp/tmp/.git/
|
||||
init (scanning for unlocked files...)
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
r0lf@work:/tmp/tmp$ echo "test" > test.txt
|
||||
|
||||
r0lf@work:/tmp/tmp$ git annex add test.txt
|
||||
add test.txt
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
r0lf@work:/tmp/tmp$ git commit -m "test commit"
|
||||
[master (root-commit) d2e64c0] test commit
|
||||
1 file changed, 1 insertion(+)
|
||||
create mode 120000 test.txt
|
||||
|
||||
r0lf@work:/tmp/tmp$ git annex fsck test.txt
|
||||
fsck test.txt (checksum...)
|
||||
test.txt: Can be upgraded to an improved key format. You can do so by running: git annex migrate --backend=SHA256E test.txt
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
r0lf@work:/tmp/tmp$ git annex migrate --backend=SHA256E test.txt
|
||||
migrate test.txt (checksum...) (checksum...) ok
|
||||
(recording state in git...)
|
||||
|
||||
r0lf@work:/tmp/tmp$ git annex fsck test.txt
|
||||
fsck test.txt (checksum...)
|
||||
test.txt: Can be upgraded to an improved key format. You can do so by running: git annex migrate --backend=SHA256E test.txt
|
||||
ok
|
||||
(recording state in git...)
|
||||
|
||||
# End of transcript or log.
|
||||
"""]]
|
|
@ -0,0 +1,9 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://launchpad.net/~r0lf"
|
||||
nickname="r0lf"
|
||||
avatar="http://cdn.libravatar.org/avatar/aa82122557e706df7ba83bd1983eb79ef1ba2e51350217850176d4f9a1bb2bc0"
|
||||
subject="comment 1"
|
||||
date="2021-04-26T12:22:43Z"
|
||||
content="""
|
||||
I was unable to verify the problem with the latest standalone package. Neither a freshly created annex nor the one created in focal gave any problems when tested with fsck.
|
||||
"""]]
|
|
@ -0,0 +1,14 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://launchpad.net/~r0lf"
|
||||
nickname="r0lf"
|
||||
avatar="http://cdn.libravatar.org/avatar/aa82122557e706df7ba83bd1983eb79ef1ba2e51350217850176d4f9a1bb2bc0"
|
||||
subject="comment 2"
|
||||
date="2021-04-26T13:05:15Z"
|
||||
content="""
|
||||
Standalone package from NeuroDebian is not affected.
|
||||
Hirsute package is not affected.
|
||||
Groovy package is not affected.
|
||||
Focal package is affected.
|
||||
|
||||
As it looks like this was fixed between focal and groovy, it would be great to pinpoint a particular commit so it can be backported to the Ubuntu LTS.
|
||||
"""]]
|
|
@ -0,0 +1,21 @@
|
|||
[[!comment format=mdwn
|
||||
username="https://launchpad.net/~r0lf"
|
||||
nickname="r0lf"
|
||||
avatar="http://cdn.libravatar.org/avatar/aa82122557e706df7ba83bd1983eb79ef1ba2e51350217850176d4f9a1bb2bc0"
|
||||
subject="try to find a better solution"
|
||||
date="2021-04-24T11:03:28Z"
|
||||
content="""
|
||||
Thank you for the your comment. It certainly explains your situation back when you coded this up. I'm not sure I'm convinced it is good justification to keep things the way they are going forward.
|
||||
|
||||
1. The branch names aren't short
|
||||
2. The branch names aren't clash-free (see my other report)
|
||||
3. I switch between branches all the time, that's their purpose.
|
||||
|
||||
For #3, I usually rely on either mark-copy'ing the characters off the terminal with the mouse or via bash-completion. Both are broken.
|
||||
|
||||
Isn't it that people aren't using () for branch or filenames precisely because of the problems you will almost certainly run into? I'm certain there is a better way.
|
||||
|
||||
Personally, I see the act of hiding non-local files as a temporary measure. Maybe creating a branch for that isn't even the right thing. Maybe \"git stash\" or a tag is a better solution. Or maybe just remove broken links without committing the result at all.
|
||||
|
||||
To expand on that idea, https://unix.stackexchange.com/a/49470 discusses how to use find and I found \"find . -maxdepth 1 -type l -exec test ! -e {} \; -delete\" to work just fine. \"git checkout .\" then brought back those links for non-local files just fine. How about going without any branch at all? Obviously, in git-annex there are still some corner cases to consider. Like making sure not to continue on master while the links to non-local files are still removed. I haven't considered all potential pitfalls, but I feel this might be a better way to handle this need.
|
||||
"""]]
|
|
@ -0,0 +1,8 @@
|
|||
[[!comment format=mdwn
|
||||
username="Ilya_Shlyakhter"
|
||||
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
|
||||
subject="sparse checkouts to hide non-present files"
|
||||
date="2021-04-24T22:00:14Z"
|
||||
content="""
|
||||
Could [sparse checkouts](https://git-scm.com/docs/git-read-tree#_sparse_checkout) be used to hide non-present files, as an alternative to adjusted branches?
|
||||
"""]]
|
17
doc/bugs/sync_merges_local-to-repo_adjusted_branch.mdwn
Normal file
17
doc/bugs/sync_merges_local-to-repo_adjusted_branch.mdwn
Normal file
|
@ -0,0 +1,17 @@
|
|||
Given that repo1 and repo2 both have a branch adjusted/master(hidemissing), those branches should be local to the repo as they contain the files available locally. Yet, when doing a sync between the two repositories, the branches are forcefully merged which is wrong.
|
||||
|
||||
$ git annex sync
|
||||
commit (recording state in git...)
|
||||
On branch master
|
||||
nothing to commit, working tree clean
|
||||
ok
|
||||
remote: Enumerating objects: 332, done.
|
||||
remote: Counting objects: 100% (332/332), done.
|
||||
remote: Compressing objects: 100% (185/185), done.
|
||||
remote: Total 232 (delta 35), reused 0 (delta 0)
|
||||
Receiving objects: 100% (232/232), 20.46 KiB | 1.57 MiB/s, done.
|
||||
Resolving deltas: 100% (35/35), completed with 2 local objects.
|
||||
From /media/leggewie/WD250G/@1TB.work
|
||||
+ 20bcd18...170148a adjusted/master(hidemissing) -> WD250G/adjusted/master(hidemissing) (forced update)
|
||||
a071f8c..4a1f6d3 git-annex -> WD250G/git-annex
|
||||
pull WD250G ok
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="ntraut9@6c7c5e5bfddb591c28ab8222f1de9cb56840a69e"
|
||||
nickname="ntraut9"
|
||||
avatar="http://cdn.libravatar.org/avatar/c39ac3cf2806b08e1fdf8384c2817e6c"
|
||||
subject="comment 2"
|
||||
date="2021-04-26T16:01:02Z"
|
||||
content="""
|
||||
So I can do `git annex config --set annex.tune.objecthashlower true` to use the new hash format.
|
||||
I don't see why it couldn't become the default since it can be done even on existing repositories. It is just that already stored files will not be directly recoverable since the path expected will not be the same, but for newly created repositories I really don't see any problem.
|
||||
"""]]
|
|
@ -0,0 +1,13 @@
|
|||
[[!comment format=mdwn
|
||||
username="ntraut9@6c7c5e5bfddb591c28ab8222f1de9cb56840a69e"
|
||||
nickname="ntraut9"
|
||||
avatar="http://cdn.libravatar.org/avatar/c39ac3cf2806b08e1fdf8384c2817e6c"
|
||||
subject="comment 3"
|
||||
date="2021-04-26T16:40:15Z"
|
||||
content="""
|
||||
I tried to do `git annex config --set annex.tune.objecthashlower true` just after doing `git annex init` but after some other commands I got this error when trying to do `git annex sync` (I'm using git-annex v6):
|
||||
```
|
||||
git-annex: Remote repository is tuned in incompatible way; cannot be merged with local repository.
|
||||
```
|
||||
So I guess there is way to have this behavior in normal repos, but again I don't see why since it's even possible on crippled filesystems.
|
||||
"""]]
|
15
doc/forum/Git_LFS_lock_feature.mdwn
Normal file
15
doc/forum/Git_LFS_lock_feature.mdwn
Normal file
|
@ -0,0 +1,15 @@
|
|||
Hey Joey, thanks for this awesome software! I am starting to use it for some personal git repos with images/PDFs/etc.
|
||||
|
||||
I was wondering if you have considered a feature similar to Git LFS's file locking feature (https://github.com/git-lfs/git-lfs/wiki/File-Locking).
|
||||
|
||||
Not sure if it's easy or possible with the way that git-annex does things, but I can imagine it would be useful to know if someone is actively working on a binary file so that 2 different people don't change the same binary file at once. I think it's important to know because it's often difficult or impossible to merge binary files. And on multi-person projects, this becomes more important.
|
||||
|
||||
I think the interface described in the LFS wiki link above is pretty nice:
|
||||
|
||||
```
|
||||
$ git lfs locks
|
||||
images/bar.jpg jane ID:123
|
||||
images/foo.jpg alice ID:456
|
||||
```
|
||||
|
||||
Here we can see who has locked each individual file.
|
Loading…
Add table
Add a link
Reference in a new issue