export file renaming

This is seriously super hairy. It has to handle interrupted exports,
which may be resumed with the same or a different tree. It also has to
recover from export conflicts, which could cause the wrong content
to be renamed to a file.

I think this works, or is close to working. See the update to the design
for how it works.

This is definitely not optimal, in that it does more renames than are
necessary. It would probably be worth finding the keys that are really
renamed and only renaming those. But let's get the "simple" approach to
work first..

This commit was supported by the NSF-funded DataLad project.
This commit is contained in:
Joey Hess 2017-09-06 15:33:40 -04:00
parent 0fa948b402
commit cae3704a44
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
4 changed files with 189 additions and 41 deletions

View file

@ -205,7 +205,7 @@ a tree that resolves the conflict as they desire (it could be the same as
one of the exported trees, or some merge of them or an entirely new tree).
The UI to do this can just be another `git annex export $tree --to remote`.
To resolve, diff each exported tree in turn against the resolving tree
and delete all files that differ.
and delete all files that differ. Then, upload all missing files.
## when to update export.log for efficient resuming of exports
@ -256,18 +256,48 @@ tree, so no state needs to be maintained to clean it up. Also, using the
key in the name simplifies calculation of complicated renames (eg, renaming
A to B, B to C, C to A)
Export can first try to rename the temp name of all keys
whose files are added in the diff. Followed by deleting the temp name
of all keys whose files are removed in the diff. That is more renames and
Export can first try to rename all files that are deleted/modified
to their key's temp name (falling back to deleting since not all
special remotes support rename), and then, in a second pass, rename
from the temp name to the new name. Followed by deleting the temp name
of all keys whose files are deleted in the diff. That is more renames and
deletes than strictly necessary, but it will statelessly clean up
an interruped export as long as it's run again with the same new tree.
But, an export of tree B should clean up after
an interrupted export of tree A. Some state is needed to handle this.
Before starting the export of tree A, record it somewhere. Then when
resuming, diff A..B, and rename/delete the temp names of the keys in the
diff. As well as diffing from the last fully exported tree to B and doing
the same rename/delete.
resuming, diff A..B, and delete the temp names of the keys in the
diff. (Can't rename here, because we don't know what was the content
of a file when an export was interrupted.)
So, before an export does anything, need to record the tree that's about
to be exported to export.log, not as an exported tree, but as a goal.
## renames and export conflicts
What is there's an export conflict going on at the same time that a file
in the export gets renamed?
Suppose that there are two git repos A and B, each exporting to the same
remote. A and B are not currently communicating. A exports T1 which
contains F. B exports T2, which has a different content for F.
Then A exports T3, which renames F to G. If that rename is done
on the remote, then A will think it's successfully exported T3,
but G will have F's content from T2, not from T1.
When A and B reconnect, the export conflict will be detected.
To resolve the export conflict, it says above to:
> To resolve, diff each exported tree in turn against the resolving tree
> and delete all files that differ. Then, upload all missing files.
Assume that the resolving tree is T3. So B's export of T2 is diffed against
T3. F differs and is deleted (no change). G differs and is deleted,
which fixes up the problem that the wrong content was renamed to G.
G is missing so gets uploaded.
So, this works, as long as "delete all files that differ" means it
deletes both old and new files. And as long as conflict resolution does not
itself stash away files in the temp name for later renaming.