Commit 428c91606b made it need to do more
work in situations like switching between very different branches.
Compare with seekFilteredKeys which has a similar optimisation. Might be
possible to factor out the common part from these?
Sponsored-by: Dartmouth College's Datalad project
When this option is not used, there should be effectively no added
overhead, thanks to the optimisation in
b3cd0cc6ba.
When an action fails on a file, the size of the file still counts toward
the size limit. This was necessary to support concurrency, but also
generally seems like the right choice.
Most commands that operate on annexed files support the option.
export and import do not, and I don't know if it would make sense for
export to.. Why would you want an incomplete export? sync doesn't, and
while it would be easy to make it support it for transferring files,
it's not clear if dropping files should also take the size limit into
account. Commands like add that don't operate on annexed files don't
support the option either.
Exiting 101 not yet implemented.
Sponsored-by: Denis Dzyubenko on Patreon
Avoids users thinking this scan is a big deal, when it's not in the
majority of repos.
showSideActionAfter has some ugly caveats, since it has to display in
the background of another action. I could not see a better way to do it
and it works fine in this particular case. It also doesn't really belong
in Annex.Concurrent, but cannot go in Messages due to an import loop.
Sponsored-by: Dartmouth College's Datalad project
There seems to be no reason to check the time here. I think it was
inherited from code in Database.Fsck, which does have a reason to commit
every few minutes. Removing that syscall speeds up a git-annex init
in a repo with 100000 annexed files by about 3 seconds.
Sponsored-by: Dartmouth College's Datalad project
Streaming through git this way speeds it up by around 25%. This is
similar to the optimisations of seeking annexed files.
Sponsored-by: Dartmouth College's Datalad project
The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.
Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"
Sponsored-by: Dartmouth College's Datalad project
Before only unlocked files were included.
The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.
reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.
On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.
reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.
Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.
Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.
There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.
However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.