Commit graph

1513 commits

Author SHA1 Message Date
Joey Hess
41271e4eb4
avoid git check-ignore overhead on importing known files
isKnownImportLocation does a database lookup and there's an index
to make that lookup fast, so it's probably faster than talking to git
check-ignore. Checking the matcher is faster still.

While before the gitignore check was added it did not need to always
check isknown, now it does, because it's that or the more expensive
notignored. But at least we can skip notignored when a file is known,
which will often be the common case: Importing from a remote that's been
exported to, and/or imported from before, only new files will not be
known, so only those will need to check notignored.

At first, I had this:
	(matches <&&> (isknown <||> notignored)) <||> isknown
Notice that checks isknown every time, whether it matches or not.

So, it's no slower to instead do this:
	isknown <||> (matches <&&> notignored)
That has the benefit that, when it's known, it doesn't need to run
matches, which while faster than isknown, is still going to use some CPU.

And it perhaps more clearly expresses the condition: Any known file is
wanted, otherwise it's down to what matches and is not ignored.

This commit was sponsored by Jack Hill on Patren.
2020-09-30 11:20:44 -04:00
Joey Hess
c56efbbdb6
import: Check gitignores when importing trees from special remotes
It seemed best to do this, for consistency with every other way files can
get into a git-annex repo. Although it's just a bit strange that a local
.gitignore file affects the pseudo-commits made for the remote that's
imported from.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-30 10:41:59 -04:00
Joey Hess
0033e08193
avoid a second traversal of the ImportableContents
Do all filtering in one pass.
2020-09-30 10:10:03 -04:00
Joey Hess
4c32499e82
Parse youtube-dl progress output
Which lets progress be displayed when doing concurrent downloads.
Amoung other things, like --json-progress etc.

The youtube-dl output is no longer displayed, except for any errors.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-29 17:53:48 -04:00
Joey Hess
dc274a6804
fix inverted logic in recent commit 2020-09-29 12:11:50 -04:00
Joey Hess
658ea7ca3c
sync --no-content import from directory special remote
sync: When run without --content, import without copying from
importtree=yes directory special remotes. (Other special remotes may
support this later as well.)

This commit was sponsored by Svenne Krap on Patreon.
2020-09-28 15:29:08 -04:00
Joey Hess
3eaaec3113
consistently use importKey when available
This avoids import with --no-content and with --content potentially
generating two different trees, leading to a merge conflict when run in
two different clones of a repo. And it's necessary groundwork to make
git-annex sync --no-content import from special remotes that support
importKey.

Only the directory special remote currently supports importKey, and it
generates the same key as git-annex usually does, so there is no
behavior change for it.

Future special remotes will need to take care when adding importKey,
if it generates different keys. Added some warnings about that to
comments.

This commit was sponsored by Noam Kremen on Patreon.
2020-09-28 15:27:46 -04:00
Joey Hess
15c1ee16d9
import --no-content: Check annex.largefiles
Import small files into git, the same as is done when importing with content.
Which means, for small files, --no-content does download them.

If the largefiles expression needs the file content available
(due to mimetype or mimeencoding being used), the import will fail.

This commit was sponsored by Jake Vosloo on Patreon.
2020-09-28 13:28:57 -04:00
Joey Hess
8b74f01a26
split ProvidedInfo and UserProvidedInfo
The latter is for git-annex matchexpression and matching against it can
throw an exception. Splitting out the former reduces the potential for
mistakes and avoids needing to worry about matching against that
throwing an exception.

This is more groundwork for matching largefiles while importing,
without downloading content.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-28 12:12:38 -04:00
Joey Hess
00dbe35fbc
allow matching on files whose content is not present
Anything that needs to examine the file content will fail to match,
or fall back to other available information. But the intent is that the
matcher be checked for matchNeedsFileContent and only be used if it does
not, so the exact behavior doesn't much matter as it should never
happen.

The real point of this is to not need to provide a dummy content file
when matching.

This commit was sponsored by Martin D on Patreon.
2020-09-28 11:17:46 -04:00
Joey Hess
3e577a6dd3
remove reapZombies
Believed to be no longer needed as I've squashed the last ones.

Note that, in Test.Framework, I can see no reason for the code to have
run it twice. It does not cause running processes to exit after all,
so any process that has leaked and is running and causing problems with
cleanup of the directory won't be helped by running it.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-25 11:50:38 -04:00
Joey Hess
ca454c47f2
explicitly wait for a git process
Eliminate a zombie that was only cleaned up by the later zombie cleanup
code.

This is still not ideal, it would be cleaner if it used conduit or
something, and if the thread gets killed before waiting, it won't stop
the process.

Only remaining zombies are in CmdLine.Seek
2020-09-25 11:03:12 -04:00
Joey Hess
d81f549385
fix some compile warnings left in yesterday
at least 2 could have caused a crash in some circumstances

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-25 10:55:39 -04:00
Joey Hess
ace02f41b0
seek: defer matcher check until more info is known
Sped up seeking for files to operate on, when using options like --copies
or --in, by around 20%.

Benchmark showed an increase for --copies from 155 seconds to 121
seconds, and --in remote will be similar to that.

For --in here, the speedup was less, 5-10% or so.

(both warm cache)

This commit was sponsored by Jack Hill on Patreon.
2020-09-24 17:59:12 -04:00
Joey Hess
d89984b121
sync --all avoid unncessary first pass
Sped up seeking to around twice as fast, by avoiding a pass over the
worktree files when preferred content expressions of the local repo and
remotes don't use include=/exclude=.

Thanks to Lukey for identifying the optimisation.

This commit was sponsored by Brock Spratlen on Patreon.
2020-09-24 15:12:09 -04:00
Joey Hess
c1b4d76e6b
make MatchFiles introspectable
matchNeedsFileContent is not used yet, but shows how to add information
about terminals. That one would be needed for
https://git-annex.branchable.com/todo/sync_fast_import/

Note the tricky bit in Annex.FileMatcher.call where it folds over the
included matcher to propagate the information.

This commit was sponsored by Svenne Krap on Patreon.
2020-09-24 14:01:53 -04:00
Joey Hess
5cfcf1f05f
cache remote.log
Unlikely to speed up any of the existing uses much, but I want to use it
in a message that might be displayed many times.
2020-09-22 13:52:26 -04:00
Joey Hess
d0b06c17c0
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.

(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)

addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.

In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 13:19:13 -04:00
Joey Hess
922621301a
Serialize use of C magic library, which is not thread safe.
This fixes failures uploading to S3 when using -J.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-17 17:27:42 -04:00
Joey Hess
77c42782d0
differentiate between concurrency enabled at command line and by git config
The latter should not affect --batch mode.
2020-09-16 11:47:12 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
62372ee052
resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge
This commit was sponsored by Jake Vosloo on Patreon.
2020-09-07 16:50:27 -04:00
Joey Hess
0e21a3221e
clean up old code
withworktree is no longer doing anything useful so remove it
2020-09-07 16:16:15 -04:00
Joey Hess
03dee56546
revert change that broke test suite
Opened a new bug about it.

This commit was sponsored by Ethan Aubin.
2020-09-07 15:42:38 -04:00
Joey Hess
d120c73302
sync, assistant: When merge.directoryRenames is not set, default it it to "false"
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".

This is not done when automatic merge conflict resolution is disabled.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-07 13:50:58 -04:00
Joey Hess
f4c4b89aa3
refactor
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.

This commit was sponsored by Ryan Newton on Patreon.
2020-09-07 13:26:16 -04:00
Joey Hess
69053a93a2
resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.

My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.

That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-07 12:25:57 -04:00
Joey Hess
a360437215
make automerge behavior when one side deleted explict
This does not actually change how the merge conflict is resolved when
one side deleted the file, but it was not documented before, and I think
it only worked by accident.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-07 12:01:03 -04:00
Joey Hess
e36bae74da
Exposed annex.forward-retry git config
One reason is, 5 is an arbitrary number so ought to be configurable.

The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
2020-09-04 15:16:40 -04:00
Joey Hess
2bb933eb60
import: Retry downloads that fail
Also, using the transfer machinery for this makes eg, git-annex info show
in-progress imports, and makes --notify-start/finish work.
2020-09-04 13:54:05 -04:00
Joey Hess
1a42b2c5a3
combine retry deciders in better way
This fixes the problem that, if forwardRetry was checked for the first 5
and decided to retry, the 6th would go to configuredRetry which would
see the counter was 6 and so wait retry-delay*2^5 seconds (default 32).

Now, it waits for retry-delay before each retry, even when forwardRetry
initiated the retry.
2020-09-04 12:48:30 -04:00
Joey Hess
1d244bafbd
Limit retrying of failed transfers when forward progress is being made to 5
To avoid some unusual edge cases where too much retrying could result in
far more data transfer than makes sense.
2020-09-04 12:46:37 -04:00
Joey Hess
eed20fe3b7
fix some file modes in calls to withTmpFileIn to honor umask
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.

Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.

Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
2020-09-02 14:36:08 -04:00
Joey Hess
00937c4813
when downloading same content from multiple urls, only display error if all fail 2020-09-02 11:35:07 -04:00
Joey Hess
571ec900ac
Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https.
With automatic layout learning!
2020-09-01 15:16:35 -04:00
Joey Hess
f95664305b
remove unused imports 2020-08-28 11:16:51 -04:00
Joey Hess
b68f214312
Display a message when git-annex has to wait for a pid lock file held by another process 2020-08-26 13:05:34 -04:00
Joey Hess
b24ba92231
refactor out Annex.PidLock 2020-08-26 12:29:13 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
2b6fc17f70
fix comment format 2020-08-25 13:40:52 -04:00
Joey Hess
283d2f85d1
importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_'
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.

Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.
2020-08-05 11:35:00 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
555fe669e1
refactoring in preparation for external backends 2020-07-29 12:00:27 -04:00
Joey Hess
f5e65d680b
add back inAnnex check for drop here
Needed again after last commit removed it from startLocal again.
2020-07-25 18:17:33 -04:00
Joey Hess
2a45b5ae9a
avoid failure to lock content of removed file causing drop etc to fail
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.

This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-07-25 11:59:33 -04:00
Joey Hess
c30fd24d91
add back inAnnex check after seeking
The test suite noticed this case, where two files with the same key are
dropped, and the seek stage sees both have content due to the way files
stream through it. But then locking the content to drop fails on the
second file, because the first file has already been dropped.

So, add back otherwise redundant inAnnex check.
2020-07-25 11:18:50 -04:00
Joey Hess
18f1fb5841
drop performance improvements
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.

Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.

Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.

This commit was sponsored by Ryan Newton on Patreon.
2020-07-24 13:27:46 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
2634a5ed99
avoid inflating error counter when forking and merging annex state 2020-07-19 18:31:25 -04:00