This optimisation was not necessary, and didn't work for v6 unlocked files.
Typically only a small number of files will be changed by a commit, so just
catKey them all.
In copyFromRemote, it used to check isDirect, but that was not needed;
the remote is sending the file, so it doesn't matter if the local,
receiving repository is in direct mode or not. And, since the content is not
present, yet, it's certianly not unlocked. Note that, the remote may indeed
be sending an unlocked file, but sendkey uses sendAnnex, which will detect
if the file is modified before or during transfer, and will exit nonzero,
aborting the upload. So, the receiver doesn't need any checks.
In copyToRemote, it forces recvkey to verify content whenever it's being
sent from a v6 repository. recvkey is almost always going to verify content
anyway, unless annex.verify is not set. So, this doesn't make it any more
expensive, except for in that unusual configuration. The alternative would
be to change the recvkey interface, so that the sender checks afterwards if
what it was sending changed, and the receiver then throws out the bad
transfer. That would be less expensive for the reciever, as it would not
need to do a checksum verification. But, it would mean another network
round trip, and since rsync closes the connection, it would need to open
another ssh connection to do this. Even with connction caching, that would
add latency to uploads. It would also complicate the interface, especially
because an older git-annex-shell would not have the new interface
available. For these reasons, I prefer punting on that at this time, and
instead someone might set annex.verify=false and be unhappy that it still
verifies..
(One other gotcha not dealt with is that a v5 repo could be upgraded to v6
while an upload is in progress, and a file unlocked and modified.)
(Also, I double-checked Remote.GCrypt's calls to rsyncParamsRemote, and
they're fine. When a file is being uploaded to gcrypt, or any other special
repository, it is mediated by sendAnnex, so changes will be detected at
that level and the special remote implementation doesn't need to worry
about them.)
The direct flag is also set when sending unlocked content, to support old
versions of git-annex-shell. At some point, the direct flag will be
removed, and only the unlocked flag will be used.
Writes are optimised by queueing up multiple writes when possible.
The queue is flushed after the Annex monad action finishes. That makes it
happen on program termination, and also whenever a nested Annex monad action
finishes.
Reads are optimised by checking once (per AnnexState) if the database
exists. If the database doesn't exist yet, all reads return mempty.
Reads also cause queued writes to be flushed, so reads will always be
consistent with writes (as long as they're made inside the same Annex monad).
A future optimisation path would be to determine when that's not necessary,
which is probably most of the time, and avoid flushing unncessarily.
Design notes for this commit:
- separate reads from writes
- reuse a handle which is left open until program
exit or until the MVar goes out of scope (and autoclosed then)
- writes are queued
- queue is flushed periodically
- immediate queue flush before any read
- auto-flush queue when database handle is garbage collected
- flush queue on exit from Annex monad
(Note that this may happen repeatedly for a single database connection;
or a connection may be reused for multiple Annex monad actions,
possibly even concurrent ones.)
- if database does not exist (or is empty) the handle
is not opened by reads; reads instead return empty results
- writes open the handle if it was not open previously
Fsck can use the queue for efficiency since it is write-heavy, and only
reads a value before writing it. But, the queue is not suited to the Keys
database.
In v6 unlocked mode, this fixes a problem that was making eg,
echo > file cause the assistant to copy the file to the annex object,
instead of hard linking it. That because 2 change events were seen
(one for opening the file and one for closing) and processed together
the file was then locked down twice. Which meant it had mutiple hard links,
and so prevented linkAnnex from hard linking it.
There might be scenarios where multiple events come in, but staggered such
that a file gets locked down repeatedly, and it would still be copied to
the annex object in that case.
This can happen when ingesting a new file in either locked or unlocked
mode, when some unlocked files in the repo use the same key, and the
content was not locally available before.
This fixes a race where the modified file ended up in annex/objects, and
the InodeCache stored in the database was for the modified version, so
git-annex didn't know it had gotten modified.
The race could occur when the smudge filter was running; now it gets the
InodeCache before generating the Key, which avoids the race.