prove this optimisation would not be safe, so close

2020-05-04 14:35:11 -04:00 · 2020-05-04 14:35:11 -04:00 · d2e78dfc0d
commit d2e78dfc0d
parent 2ab2b1f9e2
3 changed files with 99 additions and 0 deletions
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts.mdwn
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts.mdwn
@ -16,3 +16,5 @@ unexport files added by the other tree. It's sufficient to check that files
 are present in the export and upload any that are missing. --[[Joey]]
 [[!tag confirmed]]
 > I proved this is not a safe optimisation, so [[wontfix|done]] --[[Joey]]
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_2_70c3cfa9b95f3aa6c2e2e2f6dbbc50d0._comment
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_2_70c3cfa9b95f3aa6c2e2e2f6dbbc50d0._comment
@ -0,0 +1,66 @@
 [[!comment format=mdwn
 username="joey"
 subject="""comment 2"""
 date="2020-05-04T17:30:59Z"
 content="""
 I took a look at this, the relevant code is here:
                        warning "Resolving export conflict.."
                        forM_ ts $ \oldtreesha -> do
                                -- Unexport both the srcsha and the dstsha,
                                -- because the wrong content may have
                                -- been renamed to the dstsha due to the
                                -- export conflict.
                                let unexportboth d =
                                        [ Git.DiffTree.srcsha d
                                        , Git.DiffTree.dstsha d
                                        ]
                                -- Don't rename to temp, because the
                                -- content is unknown; delete instead.
                                mapdiff
                                        (\diff -> commandAction $ startUnexport r db (Git.DiffTree.file diff) (unexportboth diff))
                                        oldtreesha new
 So, it diffs from the tree in the export conflict to the new, resolved tree.
 Any file that differs -- eg, any file that was involved in the conflict -- 
 gets removed from the export.
 > In the todo, I said:
 > 
 > For example, if A exports a tree containing `[foo]`, and B exports a tree
 > containing `[foo, bar]`, bar gets unexported when resolving the conflict.
 Let's be more clear about the content of the trees, and say A exports
 `[(foo, 1)]` and B exports `[(foo, 1), (bar, 1)]`.
 If the export is then resolved to `[(foo, 1), (bar, 1)]`,
 we can see nothing needs to be done. But what it currently does is
 diff from `[(foo, 1)]` to the resolution and so unexports bar.
 If B had instead exported `[(foo, 1), (bar, 2)]`, then
 it would still need to diff from that the the resolution, and so would
 unexport bar, and so it should.
 But.. but.. ugh. Consider an export conflict that started with `[(bar, 1)]`
 exported. A exported `[(bar, 2)]` and B exported `[(baz, 1)]` (by renaming bar
 to baz). So the export remote might contain `[(baz, 2)]` (A uploaded 2 to bar,
 and then B renamed bar to baz) or it might contain `[(bar, 2), (baz, 1)]`;
 we do not know ordering between A and B.
 If the export conflict resolution is `[(bar, 2), (baz 1)]` then the tree
 exported by B is a subset, so it skips that one. And the tree
 exported by A is a subset, so ummm... it skips that one. And so nothing gets
 unexported. Then, it proceeds to try to upload any missing files
 to the export. If the export remote contains `[(bar, 2), (baz, 1)]` nothing is
 missing, nothing gets uploaded, all is well. But if the export remote
 contains `[(baz, 2)]`, it will upload `(bar, 2)`, resulting in 
 `[(bar, 2), (baz 2)]`. That is not what it's supposed to contain.
 So, no, this optimisation will not work!
 The only way to make this optimisation work, I think, is to not do renaming
 when updating export remotes. But file renaming is more common than export
 conflicts; you can always adjust your workflow to avoid export conflicts,
 by pulling from the remote you tend to conflict with, before performing an
 export.
 """]]
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_3_d0549f8a07032ce61e88b4ccb2c6ef3b._comment
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_3_d0549f8a07032ce61e88b4ccb2c6ef3b._comment
@ -0,0 +1,31 @@
 [[!comment format=mdwn
 username="joey"
 subject="""comment 3"""
 date="2020-05-04T18:36:30Z"
 content="""
 Hmm, but.. What if both A and B's trees are subsets
 of the resolved tree? Safe then?
 A exports `[(foo, 1)]`, while B exports `[(bar, 2)]`
 the resolved tree is `[(foo, 1), (bar, 2)]`.
 Well, what was in the export before? Suppose it was `[(foo, 2)]`..
 Then B would have renamed foo to bar, and A exported 1 to foo.
 Order is unknown, so the export has either of `[(foo, 1), (bar, 2)]`
 or `[(bar, 1)]`
 Yeah, still not safe even when both trees are subsets.
 ----
 An optimisation like this needs some way to detect if there's been a rename
 like B keeps doing in these examples. If there has not been any rename,
 the optimisation is safe.
 export.log contains only the sha of the tree that has been exported
 by each repo to the export remote. It might contains some trees that
 were exported before, but when it gets compacted, that information is
 lost, and anyway there's no way to know if B exported some old tree before
 or after A exported its most recently exported tree. So, I don't think
 retrospective rename detection is possible.
 """]]