some analysis
This commit is contained in:
parent
8706a6faf1
commit
a51c5d1cde
3 changed files with 103 additions and 0 deletions
|
@ -0,0 +1,32 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2024-05-31T14:52:17Z"
|
||||||
|
content="""
|
||||||
|
Reproduction recipe works, thanks!
|
||||||
|
|
||||||
|
Happens back to 10.20240129 at least, this is not recent breakage.
|
||||||
|
|
||||||
|
There are some interesting things in the git-annex history.
|
||||||
|
Including some git-annex:export.tree grafting, and also
|
||||||
|
a continued transition.
|
||||||
|
|
||||||
|
I made a new empty repo, initialized and annexed some files. Running the
|
||||||
|
same script but cloning that, this problem does not occur. I also tried
|
||||||
|
exporting a tree in that repo, and still the problem doesn't occur. I even
|
||||||
|
tried running `git-annex forget` in there and still can't cause the
|
||||||
|
problem.
|
||||||
|
|
||||||
|
So something about this specific repo's git-annex branch history is
|
||||||
|
triggering the problem and I don't know what. I've archived the current
|
||||||
|
state of this repo in my big repo as git-annex-test-repos/ds002144.tar.gz
|
||||||
|
to make sure I can continue to reproduce this.
|
||||||
|
|
||||||
|
The first git-annex branch commit that is missing its tree object
|
||||||
|
is a git-annex:export.tree graft commit. That is 3 commits above
|
||||||
|
the git-annex branch pulled from github:
|
||||||
|
|
||||||
|
Very interesting. Especially since the point of those export.tree graft commits
|
||||||
|
are to make sure that the exported tree objects are referenced and so don't get
|
||||||
|
gced out from under us.
|
||||||
|
"""]]
|
|
@ -0,0 +1,54 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2024-05-31T15:26:26Z"
|
||||||
|
content="""
|
||||||
|
Resetting the repo's git-annex branch all the way back to the 1st commit in it
|
||||||
|
is sufficient to reproduce this bug.
|
||||||
|
|
||||||
|
joey@darkstar:~/tmp/ds002144#main>git log git-annex
|
||||||
|
commit 2e24112747f3742c5426138def93fd3219574df7 (git-annex)
|
||||||
|
Author: Git Worker <git@openneuro.org>
|
||||||
|
Date: Fri Jan 19 21:04:18 2024 +0000
|
||||||
|
|
||||||
|
new branch for transition ["forget git history"]
|
||||||
|
|
||||||
|
Hmm. That ref contains an export.log that references some tree shas.
|
||||||
|
|
||||||
|
1596548650.649001679s 2606f878-85c6-459a-8402-5f4b98720bbd:58a4efbe-8fb4-4cb3-8be3-b982a4673947 b78b723042e6d7a967c806b52258e8554caa1696 ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e
|
||||||
|
1705698180.15617956s 2606f878-85c6-459a-8402-5f4b98720bbd:8af9d961-216f-47ec-b052-31696fc2f12d ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e 28b655e8207f916122bbcbd22c0369d86bb4ffc1
|
||||||
|
|
||||||
|
Those seem familiar:
|
||||||
|
|
||||||
|
missing tree b78b723042e6d7a967c806b52258e8554caa1696
|
||||||
|
missing tree ae2937297eb1b4f6c9bfdfcf9d7a41b1adcea32e
|
||||||
|
|
||||||
|
So ok.. We have here a transition that forgot git history. But it kept an
|
||||||
|
export.log that referenced 2 trees in that now-forgotten git history.
|
||||||
|
|
||||||
|
Everything else seems to follow from that. Grafting those trees back into the
|
||||||
|
git-annex branch in order to not forget them is a bad move since they're
|
||||||
|
already forgotten. So it could just avoid doing that, if the tree object
|
||||||
|
is missing, I suppose.
|
||||||
|
|
||||||
|
There might be a deeper bug though: If we want to `git-annex export`, in either
|
||||||
|
the original repo with forgotten history, or in a clone, it won't be able to
|
||||||
|
refer to those tree objects. So it won't know what has been written to the
|
||||||
|
special remote. So eg, if we export a tree that deletes a file compared to one
|
||||||
|
of these trees, it wouldn't delete the file from the special remote.
|
||||||
|
I think this problem might not happen when exporting in the original repo,
|
||||||
|
because there the export database also records the same information. More likely
|
||||||
|
it will happen in a clone.
|
||||||
|
|
||||||
|
So, action items:
|
||||||
|
|
||||||
|
* When performing a transition, the trees mentioned in export.log needs to be
|
||||||
|
grafted back in, in order not to lose them. I think it already is supposed to
|
||||||
|
do that, but it clearly didn't work in this case. So I need to find a way to
|
||||||
|
reproduce the situation in commit 2e24112747f3742c5426138def93fd3219574df7 in
|
||||||
|
a new repository to find out why that didn't happen. And fix that.
|
||||||
|
* When encountering a git-annex branch with this situation in it, avoid
|
||||||
|
grafting missing trees back into the branch. And probably `git-annex export`
|
||||||
|
needs to refuse to touch the affected special remote, or warn the user
|
||||||
|
that it's lost track of what files were sent to the special remote.
|
||||||
|
"""]]
|
|
@ -0,0 +1,17 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 3"""
|
||||||
|
date="2024-05-31T15:42:17Z"
|
||||||
|
content="""
|
||||||
|
Occurs to me that one way to get a repository into this situation would be
|
||||||
|
to do a `git-annex export`, then `git-annex forget`, and then manually
|
||||||
|
reset the git-annex branch to `git-annex^^` (or similarly push
|
||||||
|
`git-annex^^` to origin).
|
||||||
|
|
||||||
|
There is a commit after the transition commit that re-grafts the exported
|
||||||
|
tree back into the git-annex branch, and a manual reset would cause exactly
|
||||||
|
this situation.
|
||||||
|
|
||||||
|
I doubt OpenNeuro is manually resetting the git-annex branch when creating
|
||||||
|
these repos, but stranger things have happened...
|
||||||
|
"""]]
|
Loading…
Add table
Add a link
Reference in a new issue