prove this optimisation would not be safe, so close
This commit is contained in:
parent
2ab2b1f9e2
commit
d2e78dfc0d
3 changed files with 99 additions and 0 deletions
|
@ -16,3 +16,5 @@ unexport files added by the other tree. It's sufficient to check that files
|
||||||
are present in the export and upload any that are missing. --[[Joey]]
|
are present in the export and upload any that are missing. --[[Joey]]
|
||||||
|
|
||||||
[[!tag confirmed]]
|
[[!tag confirmed]]
|
||||||
|
|
||||||
|
> I proved this is not a safe optimisation, so [[wontfix|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,66 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 2"""
|
||||||
|
date="2020-05-04T17:30:59Z"
|
||||||
|
content="""
|
||||||
|
I took a look at this, the relevant code is here:
|
||||||
|
|
||||||
|
warning "Resolving export conflict.."
|
||||||
|
forM_ ts $ \oldtreesha -> do
|
||||||
|
-- Unexport both the srcsha and the dstsha,
|
||||||
|
-- because the wrong content may have
|
||||||
|
-- been renamed to the dstsha due to the
|
||||||
|
-- export conflict.
|
||||||
|
let unexportboth d =
|
||||||
|
[ Git.DiffTree.srcsha d
|
||||||
|
, Git.DiffTree.dstsha d
|
||||||
|
]
|
||||||
|
-- Don't rename to temp, because the
|
||||||
|
-- content is unknown; delete instead.
|
||||||
|
mapdiff
|
||||||
|
(\diff -> commandAction $ startUnexport r db (Git.DiffTree.file diff) (unexportboth diff))
|
||||||
|
oldtreesha new
|
||||||
|
|
||||||
|
So, it diffs from the tree in the export conflict to the new, resolved tree.
|
||||||
|
Any file that differs -- eg, any file that was involved in the conflict --
|
||||||
|
gets removed from the export.
|
||||||
|
|
||||||
|
> In the todo, I said:
|
||||||
|
>
|
||||||
|
> For example, if A exports a tree containing `[foo]`, and B exports a tree
|
||||||
|
> containing `[foo, bar]`, bar gets unexported when resolving the conflict.
|
||||||
|
|
||||||
|
Let's be more clear about the content of the trees, and say A exports
|
||||||
|
`[(foo, 1)]` and B exports `[(foo, 1), (bar, 1)]`.
|
||||||
|
|
||||||
|
If the export is then resolved to `[(foo, 1), (bar, 1)]`,
|
||||||
|
we can see nothing needs to be done. But what it currently does is
|
||||||
|
diff from `[(foo, 1)]` to the resolution and so unexports bar.
|
||||||
|
|
||||||
|
If B had instead exported `[(foo, 1), (bar, 2)]`, then
|
||||||
|
it would still need to diff from that the the resolution, and so would
|
||||||
|
unexport bar, and so it should.
|
||||||
|
|
||||||
|
But.. but.. ugh. Consider an export conflict that started with `[(bar, 1)]`
|
||||||
|
exported. A exported `[(bar, 2)]` and B exported `[(baz, 1)]` (by renaming bar
|
||||||
|
to baz). So the export remote might contain `[(baz, 2)]` (A uploaded 2 to bar,
|
||||||
|
and then B renamed bar to baz) or it might contain `[(bar, 2), (baz, 1)]`;
|
||||||
|
we do not know ordering between A and B.
|
||||||
|
|
||||||
|
If the export conflict resolution is `[(bar, 2), (baz 1)]` then the tree
|
||||||
|
exported by B is a subset, so it skips that one. And the tree
|
||||||
|
exported by A is a subset, so ummm... it skips that one. And so nothing gets
|
||||||
|
unexported. Then, it proceeds to try to upload any missing files
|
||||||
|
to the export. If the export remote contains `[(bar, 2), (baz, 1)]` nothing is
|
||||||
|
missing, nothing gets uploaded, all is well. But if the export remote
|
||||||
|
contains `[(baz, 2)]`, it will upload `(bar, 2)`, resulting in
|
||||||
|
`[(bar, 2), (baz 2)]`. That is not what it's supposed to contain.
|
||||||
|
|
||||||
|
So, no, this optimisation will not work!
|
||||||
|
|
||||||
|
The only way to make this optimisation work, I think, is to not do renaming
|
||||||
|
when updating export remotes. But file renaming is more common than export
|
||||||
|
conflicts; you can always adjust your workflow to avoid export conflicts,
|
||||||
|
by pulling from the remote you tend to conflict with, before performing an
|
||||||
|
export.
|
||||||
|
"""]]
|
|
@ -0,0 +1,31 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 3"""
|
||||||
|
date="2020-05-04T18:36:30Z"
|
||||||
|
content="""
|
||||||
|
Hmm, but.. What if both A and B's trees are subsets
|
||||||
|
of the resolved tree? Safe then?
|
||||||
|
|
||||||
|
A exports `[(foo, 1)]`, while B exports `[(bar, 2)]`
|
||||||
|
the resolved tree is `[(foo, 1), (bar, 2)]`.
|
||||||
|
|
||||||
|
Well, what was in the export before? Suppose it was `[(foo, 2)]`..
|
||||||
|
Then B would have renamed foo to bar, and A exported 1 to foo.
|
||||||
|
Order is unknown, so the export has either of `[(foo, 1), (bar, 2)]`
|
||||||
|
or `[(bar, 1)]`
|
||||||
|
|
||||||
|
Yeah, still not safe even when both trees are subsets.
|
||||||
|
|
||||||
|
----
|
||||||
|
|
||||||
|
An optimisation like this needs some way to detect if there's been a rename
|
||||||
|
like B keeps doing in these examples. If there has not been any rename,
|
||||||
|
the optimisation is safe.
|
||||||
|
|
||||||
|
export.log contains only the sha of the tree that has been exported
|
||||||
|
by each repo to the export remote. It might contains some trees that
|
||||||
|
were exported before, but when it gets compacted, that information is
|
||||||
|
lost, and anyway there's no way to know if B exported some old tree before
|
||||||
|
or after A exported its most recently exported tree. So, I don't think
|
||||||
|
retrospective rename detection is possible.
|
||||||
|
"""]]
|
Loading…
Reference in a new issue