matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/
Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.
This commit was sponsored by Svenne Krap on Patreon.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.
(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)
addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.
In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.
Over the course of 2 grueling days.
withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.
Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".
This is not done when automatic merge conflict resolution is disabled.
This commit was sponsored by Mark Reidenbach on Patreon.
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.
This commit was sponsored by Ryan Newton on Patreon.
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.
My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.
That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This does not actually change how the merge conflict is resolved when
one side deleted the file, but it was not documented before, and I think
it only worked by accident.
This commit was sponsored by Brett Eisenberg on Patreon.
One reason is, 5 is an arbitrary number so ought to be configurable.
The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
This fixes the problem that, if forwardRetry was checked for the first 5
and decided to retry, the 6th would go to configuredRetry which would
see the counter was 6 and so wait retry-delay*2^5 seconds (default 32).
Now, it waits for retry-delay before each retry, even when forwardRetry
initiated the retry.
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.
Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.
Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.
Renamed runsGitAnnexChildProcess to make clearer where it should be
used.
Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.
Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.
This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.
This commit was sponsored by Denis Dzyubenko on Patreon.
The test suite noticed this case, where two files with the same key are
dropped, and the seek stage sees both have content due to the way files
stream through it. But then locking the content to drop fails on the
second file, because the first file has already been dropped.
So, add back otherwise redundant inAnnex check.
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.
Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.
Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.
This commit was sponsored by Ryan Newton on Patreon.
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.
Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
planned to use for an optimisation
most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
And convert parser to attoparsec, probably faster.
Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.
Especially when the --all optimisation in the previous commit
pre-cached the location log.
This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.
The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.
But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.
Clearly there could be some further benchmarking and tuning here.
The cache was removed way back in 2012,
commit 3417c55189
Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".
The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.
Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.
git-annex sync --no-content does not yet use this to do imports
This adds a dep on hashable, but it's a free dependency, since
unordered-containers already pulled it in.
Using unordered-containers for the set seems to make sense, since it
hashes and bloom filter hashes too. (Though different hashes.)
I dunno, never quite know if I should use unordered-containers or containers.
This is a fairly hard to understand situation for the user. Listing the
remotes should help them understand it a bit better.
This commit was sponsored by Ethan Aubin.
git is making that configurable, and configuring it globally would break
the test suite in a few places.
No other part of git-annex assumes any branch name. Renamed a few
placeholders to make that clearer.
This commit was sponsored by Jake Vosloo on Patreon.