diff --git a/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment new file mode 100644 index 0000000000..036484a63b --- /dev/null +++ b/doc/bugs/external_remote_export_sent_to_wrong_process/comment_1_831bdc33451de9ef5c54592a7455683f._comment @@ -0,0 +1,62 @@ +[[!comment format=mdwn + username="joey" + subject="""comment 1""" + date="2023-03-28T18:35:23Z" + content=""" +Looking at the code, handleRequestExport first uses +withExternalState to send EXPORT, and then it calls +handleRequestKey to send the following command. That uses +withExternalState a second time. + +withExternalState operates on a pool of processes, so in a race +(when using -J presumably, as your test case does), the two +calls to it can use different externals. And so the bug. + +Was easy to fix based on that analysis. I have not tried to reproduce it, +but am confident that I fixed it in +[[!commit 02662f52920e84cd9464641ada84f6c3bbe3f86a]] + +---- + +I'm looking forward to export support in rclone! + +I'm wondering about the data loss part of this. You said: + +> As a result, one process receives two `EXPORT`s in a row, the first of which it ignores, +> while some other process receives a `TRANSFEREXPORT` without a prior `EXPORT`, +> reusing whatever filename was set in the previous transaction, +> and either ovewriting the last accessed remote file with the wrong content, +> or retrieving the content of the last accessed remote file instead of the one git-annex wanted. + +So in your implementation, you're keeping EXPORT set after handling TRANSFEREXPORT +and similar commands, and so if you receive a TRANSFEREXPORT not prefixed by an +EXPORT, it can be bad. A natural way to write things, indeed +`doc/special_remotes/external/example.sh` does the same. + +An alternative implementation would be to clear the EXPORT after handling a +TRANSFEREXPORT, CHECKPRESENTEXPORT, REMOVEEXPORT, or RENAMEEXPORT. +And have those error out if no EXPORT was received before them. + +So that's one way we can avoid the data loss problem if your external remote +is used with an old git-annex that has this bug. You might want to do that now. + +---- + +But, that requires defensive coding in every external remote... Maybe it +would be worth changing the protocol in some way, that avoids the problem. +Then if external remotes were updated to the new protocol, they'd not work +with the buggy git-annex versions, and avoid data loss. I'm leaving this bug +open for now to consider such a protocol change.. + +Weighing the costs and benefits of such a change, the extent of the data +loss is fairly limited. exporttree=yes remotes are always untrusted, so +if a file on one is overwritten with the wrong content, git-annex will be preserving +the right content elsewhere. And if the wrong file is retrieved from one, +git-annex will notice it has the wrong content (so long as its key uses a checksum). +Still, whatever that export is used for would unexpectedly have the wrong file +content, which could still be a bad day for somebody. + +(If I decide against the protocol change, I should fix +`doc/special_remotes/external/example.sh` to defend against the bug, +and tell others who have externals that support exporttree too..) +"""]]