prove this optimisation would not be safe, so close

2020-05-04 14:35:11 -04:00 · 2020-05-04 14:35:11 -04:00 · d2e78dfc0d
commit d2e78dfc0d
parent 2ab2b1f9e2
3 changed files with 99 additions and 0 deletions
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts.mdwn
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts.mdwn
@ -16,3 +16,5 @@ unexport files added by the other tree. It's sufficient to check that files
 are present in the export and upload any that are missing. --[[Joey]]

 [[!tag confirmed]]
+
+> I proved this is not a safe optimisation, so [[wontfix|done]] --[[Joey]]
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_2_70c3cfa9b95f3aa6c2e2e2f6dbbc50d0._comment
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_2_70c3cfa9b95f3aa6c2e2e2f6dbbc50d0._comment
@ -0,0 +1,66 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 2"""
+ date="2020-05-04T17:30:59Z"
+ content="""
+I took a look at this, the relevant code is here:
+
+                        warning "Resolving export conflict.."
+                        forM_ ts $ \oldtreesha -> do
+                                -- Unexport both the srcsha and the dstsha,
+                                -- because the wrong content may have
+                                -- been renamed to the dstsha due to the
+                                -- export conflict.
+                                let unexportboth d =
+                                        [ Git.DiffTree.srcsha d
+                                        , Git.DiffTree.dstsha d
+                                        ]
+                                -- Don't rename to temp, because the
+                                -- content is unknown; delete instead.
+                                mapdiff
+                                        (\diff -> commandAction $ startUnexport r db (Git.DiffTree.file diff) (unexportboth diff))
+                                        oldtreesha new
+
+So, it diffs from the tree in the export conflict to the new, resolved tree.
+Any file that differs -- eg, any file that was involved in the conflict -- 
+gets removed from the export.
+
+> In the todo, I said:
+> 
+> For example, if A exports a tree containing `[foo]`, and B exports a tree
+> containing `[foo, bar]`, bar gets unexported when resolving the conflict.
+
+Let's be more clear about the content of the trees, and say A exports
+`[(foo, 1)]` and B exports `[(foo, 1), (bar, 1)]`.
+
+If the export is then resolved to `[(foo, 1), (bar, 1)]`,
+we can see nothing needs to be done. But what it currently does is
+diff from `[(foo, 1)]` to the resolution and so unexports bar.
+
+If B had instead exported `[(foo, 1), (bar, 2)]`, then
+it would still need to diff from that the the resolution, and so would
+unexport bar, and so it should.
+
+But.. but.. ugh. Consider an export conflict that started with `[(bar, 1)]`
+exported. A exported `[(bar, 2)]` and B exported `[(baz, 1)]` (by renaming bar
+to baz). So the export remote might contain `[(baz, 2)]` (A uploaded 2 to bar,
+and then B renamed bar to baz) or it might contain `[(bar, 2), (baz, 1)]`;
+we do not know ordering between A and B.
+
+If the export conflict resolution is `[(bar, 2), (baz 1)]` then the tree
+exported by B is a subset, so it skips that one. And the tree
+exported by A is a subset, so ummm... it skips that one. And so nothing gets
+unexported. Then, it proceeds to try to upload any missing files
+to the export. If the export remote contains `[(bar, 2), (baz, 1)]` nothing is
+missing, nothing gets uploaded, all is well. But if the export remote
+contains `[(baz, 2)]`, it will upload `(bar, 2)`, resulting in 
+`[(bar, 2), (baz 2)]`. That is not what it's supposed to contain.
+
+So, no, this optimisation will not work!
+
+The only way to make this optimisation work, I think, is to not do renaming
+when updating export remotes. But file renaming is more common than export
+conflicts; you can always adjust your workflow to avoid export conflicts,
+by pulling from the remote you tend to conflict with, before performing an
+export.
+"""]]
--- a/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_3_d0549f8a07032ce61e88b4ccb2c6ef3b._comment
+++ b/doc/todo/more_efficient_resolution_of_trivial_export_conflicts/comment_3_d0549f8a07032ce61e88b4ccb2c6ef3b._comment
@ -0,0 +1,31 @@
+[[!comment format=mdwn
+ username="joey"
+ subject="""comment 3"""
+ date="2020-05-04T18:36:30Z"
+ content="""
+Hmm, but.. What if both A and B's trees are subsets
+of the resolved tree? Safe then?
+
+A exports `[(foo, 1)]`, while B exports `[(bar, 2)]`
+the resolved tree is `[(foo, 1), (bar, 2)]`.
+
+Well, what was in the export before? Suppose it was `[(foo, 2)]`..
+Then B would have renamed foo to bar, and A exported 1 to foo.
+Order is unknown, so the export has either of `[(foo, 1), (bar, 2)]`
+or `[(bar, 1)]`
+
+Yeah, still not safe even when both trees are subsets.
+
+----
+
+An optimisation like this needs some way to detect if there's been a rename
+like B keeps doing in these examples. If there has not been any rename,
+the optimisation is safe.
+
+export.log contains only the sha of the tree that has been exported
+by each repo to the export remote. It might contains some trees that
+were exported before, but when it gets compacted, that information is
+lost, and anyway there's no way to know if B exported some old tree before
+or after A exported its most recently exported tree. So, I don't think
+retrospective rename detection is possible.
+"""]]