isKnownImportLocation does a database lookup and there's an index
to make that lookup fast, so it's probably faster than talking to git
check-ignore. Checking the matcher is faster still.
While before the gitignore check was added it did not need to always
check isknown, now it does, because it's that or the more expensive
notignored. But at least we can skip notignored when a file is known,
which will often be the common case: Importing from a remote that's been
exported to, and/or imported from before, only new files will not be
known, so only those will need to check notignored.
At first, I had this:
(matches <&&> (isknown <||> notignored)) <||> isknown
Notice that checks isknown every time, whether it matches or not.
So, it's no slower to instead do this:
isknown <||> (matches <&&> notignored)
That has the benefit that, when it's known, it doesn't need to run
matches, which while faster than isknown, is still going to use some CPU.
And it perhaps more clearly expresses the condition: Any known file is
wanted, otherwise it's down to what matches and is not ignored.
This commit was sponsored by Jack Hill on Patren.
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.
This commit was sponsored by Brett Eisenberg on Patreon.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.
Ensure that checkCanAdd is used everywhere a file is added to git,
so git add is run with -f, presumably avoiding the work it would usually
do to check ignores.
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)
This commit was sponsored by Svenne Krap on Patreon.
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.
Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.
Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.
This commit was sponsored by Noam Kremen on Patreon.
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.
If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.
This commit was sponsored by Jake Vosloo on Patreon.
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.
This is more groundwork for matching largefiles while importing,
without downloading content.
This commit was sponsored by Graham Spencer on Patreon.
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.
The real point of this is to not need to provide a dummy content file
when matching.
This commit was sponsored by Martin D on Patreon.
Believed to be no longer needed as I've squashed the last ones.
Note that, in Test.Framework, I can see no reason for the code to have
run it twice. It does not cause running processes to exit after all,
so any process that has leaked and is running and causing problems with
cleanup of the directory won't be helped by running it.
This commit was sponsored by Mark Reidenbach on Patreon.
This was the last one marked as a zombie. There might be others I don't
know about, but except for in the hypothetical case of a thread dying
due to an async exception before it can wait on a process it started, I
don't know of any.
It would probably be safe to remove the reapZombies now, but let's wait
and so that in its own commit in case it turns out to cause problems.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.
This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.
Only remaining zombies are in CmdLine.Seek
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.
Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.
For --in here, the speedup was less, 5-10% or so.
(both warm cache)
This commit was sponsored by Jack Hill on Patreon.
matchMagic: Always False for MatchingKey. Unsure why.. Could be a bug?
limitUnused: Behaves differently when there is a filename.
limitSize: When used with LimitDiskFiles, checks the size on disk of the
filename.