Make sure to relay any remaining stderr from the process after it has
shut down, rather than closing stderr just before shutdown. This avoids
a situation where the process is still running and tries to write to
stderr, getting a SIGPIPE. And, it ensures that no stderr output is
lost.
This may fix a problem encountered by datalad on windows, where it hangs
during the external special remote shutdown.
Before commit a49d300545, it closed stdin
and stdout, but left stderr open, and never killed the stderr waiter
thread, which presumably exited on its own. For async exception
safety, do need to at make sure that thread gets waited on, as that
commit does, but it introduced this problem.
Note that, the process's stdout is closed before waiting on it. It's too
late for anything it writes to stdout to be processed, and since we're
not going to consume any such writes, this avoids the process getting
blocked writing to stdout due to us not reading what it's buffered. This
does mean that if the process writes to stdout too late, it will get a
SIGPIPE. (This was already the case before the above-mentioned commit.)
In practice, I think only the protocol's ERROR is allowed to be
sent at a point where this could happen.
Because it's a special character on Windows ("c:").
Use same technique already used for '/' and '\'.
I didn't record how I generated their encoded forms before, so am sure
there was a better way, but the way I did it now is to look at
ghci> encodeFilePath "∕"
"\226\136\149"
And then the difference from that to "\56546\56456\56469"
is adding 56320 to each, to get up to the escaped code plane.
See comment for why I think handling ':' is ok, but that other illegal
windows filenames won't. Note that, this should be enough to make the
test suite always work. Other windows illegal filenames will fail at
checkout time when it tries to put the illegal filename on the
filesystem.
In cases where numcopies checks prevented the resumed move from dropping
the object from the source repository, it now relies on a log of recent
moves to replicate the behavior of the interrupted command.
Performance: Probably noticable impact, since it has to add to the log,
check the log, and remove from the log. Seems worth it to avoid this
annoying edge case. The log functions are pretty well optimised to avoid
unncessary work.
An performance improvement to make later would be to avoid cleanup doing
anything if it's not written to the log file, and has confirmed that the
log file does not contain the log line.
This commit was sponsored by Jake Vosloo on Patreon.
Those are not installed by git-annex but by the user, and so removal
will never find the default content, and so if the user did install
them, it would display a misleading message.
Seems better, since the user installed them, to let the user remove them
if they want to.
isKnownImportLocation does a database lookup and there's an index
to make that lookup fast, so it's probably faster than talking to git
check-ignore. Checking the matcher is faster still.
While before the gitignore check was added it did not need to always
check isknown, now it does, because it's that or the more expensive
notignored. But at least we can skip notignored when a file is known,
which will often be the common case: Importing from a remote that's been
exported to, and/or imported from before, only new files will not be
known, so only those will need to check notignored.
At first, I had this:
(matches <&&> (isknown <||> notignored)) <||> isknown
Notice that checks isknown every time, whether it matches or not.
So, it's no slower to instead do this:
isknown <||> (matches <&&> notignored)
That has the benefit that, when it's known, it doesn't need to run
matches, which while faster than isknown, is still going to use some CPU.
And it perhaps more clearly expresses the condition: Any known file is
wanted, otherwise it's down to what matches and is not ignored.
This commit was sponsored by Jack Hill on Patren.
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.
This commit was sponsored by Brett Eisenberg on Patreon.
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.
The youtube-dl output is no longer displayed, except for any errors.
This commit was sponsored by Denis Dzyubenko on Patreon.
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)
This commit was sponsored by Svenne Krap on Patreon.
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.
Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.
Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.
This commit was sponsored by Noam Kremen on Patreon.
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.
If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.
This commit was sponsored by Jake Vosloo on Patreon.
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.
This is more groundwork for matching largefiles while importing,
without downloading content.
This commit was sponsored by Graham Spencer on Patreon.
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.
The real point of this is to not need to provide a dummy content file
when matching.
This commit was sponsored by Martin D on Patreon.
Believed to be no longer needed as I've squashed the last ones.
Note that, in Test.Framework, I can see no reason for the code to have
run it twice. It does not cause running processes to exit after all,
so any process that has leaked and is running and causing problems with
cleanup of the directory won't be helped by running it.
This commit was sponsored by Mark Reidenbach on Patreon.
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.
This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.
Only remaining zombies are in CmdLine.Seek
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.
Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.
For --in here, the speedup was less, 5-10% or so.
(both warm cache)
This commit was sponsored by Jack Hill on Patreon.
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.
Thanks to Lukey for identifying the optimisation.
This commit was sponsored by Brock Spratlen on Patreon.
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/
Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.
This commit was sponsored by Svenne Krap on Patreon.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.
(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)
addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.
In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.
Over the course of 2 grueling days.
withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.
Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".
This is not done when automatic merge conflict resolution is disabled.
This commit was sponsored by Mark Reidenbach on Patreon.
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.
This commit was sponsored by Ryan Newton on Patreon.
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.
My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.
That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This does not actually change how the merge conflict is resolved when
one side deleted the file, but it was not documented before, and I think
it only worked by accident.
This commit was sponsored by Brett Eisenberg on Patreon.
One reason is, 5 is an arbitrary number so ought to be configurable.
The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
This fixes the problem that, if forwardRetry was checked for the first 5
and decided to retry, the 6th would go to configuredRetry which would
see the counter was 6 and so wait retry-delay*2^5 seconds (default 32).
Now, it waits for retry-delay before each retry, even when forwardRetry
initiated the retry.
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.
Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.
Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.
Renamed runsGitAnnexChildProcess to make clearer where it should be
used.
Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.
Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.