untested
This won't be super slow, but it does need to diff two likely large
trees, and since the git-annex branch rarely sits still, it will most
likely be run at the beginning of every import.
A possible speed improvement would be to only run this when the database
did not contain a ContentIdentifier. But that would only speed up
imports when there is no new version of a file on the special remote,
at most renames of existing files being imported.
A better speed improvement would be to record something in the git-annex
branch that indicates when an import has been run, and only do the diff
if the git-annex branch has record of a newer import than we've seen
before. Then, it would only run when there is in fact new
ContentIdentifier information available from a remote. Certianly doable,
but didn't want to complicate things yet.
locationLogFileKey had an out of date list of toplevel log files to skip
over, and was only not broken because the other toplevel log files don't
look like keys. Fixed that too.
Note that I tried an evil remote that lists ImportLocations with
../../../ in them and indeed this resulted in git blowing up and the
import failing, and not writing outside the repo.
An empty list of [ContentIdenfier] serialized to the same thing
as a single ContentIdentifier "". Avoid this ambiguity by requiring the
list be non-empty.
Alternative doesn't combine the subparsers the way I wanted.
Unfortunately this new parser has suboptimal usage because everything is
all jumbled together.
git-annex: thread blocked indefinitely in an STM transaction
failed
git-annex: sqlite query crashed
CallStack (from HasCallStack):
error, called at ./Database/Handle.hs:98:42 in main:Database.Handle
failed
This needs further investigation.
Use same, simpler method to make only one thread open the export db as
is used for the ContentIdentifier db.
And, always update the export db once before using.
Had to add two more API calls to override export APIs that are not safe
for use in combination with import.
It's unfortunate that removeExportDirectory is documented to be allowed
to remove non-empty directories. I'm not entirely sure why it's that
way, my best guess is it was intended to make it easy to implement with
just rm -rf.
For now, it's only allowed when exporttree=yes is also set.
That simplified the implementation, but could later be changed if
there's a remote that makes sense to be an import but not an export.
However, it may work just as well to make a remote be readonly to
prevent export to it while still allowing import.
Not sure if my reasoning about the races really holds.
It would certianly be possible to better guard against races by using
Linux-specific renameat2 with RENAME_EXCHANGE or RENAME_NOREPLACE.
Or by using link and relying on it not overwriting existing files -- but
that would need a filesystem that supports hard links and directory can
be used in filesystems that don't.
This does not avoid all possible races, but it does avoid all likely
ones, and is demonstratably better than git's own handling of races
where files get modified at the same time as it's updating the working
tree.
The main thing this won't detect are not unlikely races where part
of a file gets changed while it's being copied and then the file is
restored to its original condition before the modification check.
No, it's more likely that the limitations of checking inode, size,
and mtime won't detect certian modifications, involving eg mmapped
files.
The branch is only updated once the export is 100% complete. This way,
if an export is started but interrupted and so the remote does not yet
contain some of the files, an import will make a commit on the old
branch, and so won't delete the missing files.