Commit graph

1647 commits

Author SHA1 Message Date
Joey Hess
5cfcf1f05f
cache remote.log
Unlikely to speed up any of the existing uses much, but I want to use it
in a message that might be displayed many times.
2020-09-22 13:52:26 -04:00
Joey Hess
d0b06c17c0
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.

(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)

addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.

In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 13:19:13 -04:00
Joey Hess
922621301a
Serialize use of C magic library, which is not thread safe.
This fixes failures uploading to S3 when using -J.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-09-17 17:27:42 -04:00
Joey Hess
77c42782d0
differentiate between concurrency enabled at command line and by git config
The latter should not affect --batch mode.
2020-09-16 11:47:12 -04:00
Joey Hess
3a05d53761
add SeekInput (not yet used)
No behavior changes (hopefully), just adding SeekInput and plumbing it
through to the JSON display code for later use.

Over the course of 2 grueling days.

withFilesNotInGit reimplemented in terms of seekHelper
should be the only possible behavior change. It seems to test as
behaving the same.

Note that seekHelper dummies up the SeekInput in the case where
segmentPaths' gives up on sorting the expanded paths because there are
too many input paths. When SeekInput later gets exposed as a json field,
that will result in it being a little bit wrong in the case where
100 or more paths are passed to a git-annex command. I think this is a
subtle enough problem to not matter. If it does turn out to be a
problem, fixing it would require splitting up the input
parameters into groups of < 100, which would make git ls-files run
perhaps more than is necessary. May want to revisit this, because that
fix seems fairly low-impact.
2020-09-15 15:41:13 -04:00
Joey Hess
62372ee052
resolvemerge: Improve cleanup of cruft left in the working tree by a conflicted merge
This commit was sponsored by Jake Vosloo on Patreon.
2020-09-07 16:50:27 -04:00
Joey Hess
0e21a3221e
clean up old code
withworktree is no longer doing anything useful so remove it
2020-09-07 16:16:15 -04:00
Joey Hess
03dee56546
revert change that broke test suite
Opened a new bug about it.

This commit was sponsored by Ethan Aubin.
2020-09-07 15:42:38 -04:00
Joey Hess
d120c73302
sync, assistant: When merge.directoryRenames is not set, default it it to "false"
Works better with automatic merge conflict resolution than git's ususual
default of "conflict".

This is not done when automatic merge conflict resolution is disabled.

This commit was sponsored by Mark Reidenbach on Patreon.
2020-09-07 13:50:58 -04:00
Joey Hess
f4c4b89aa3
refactor
Make all calls to git merge go through autoMergeFrom, in preparation
for fine-tuning git merge's config for automatic merge conflict
resolution.

This commit was sponsored by Ryan Newton on Patreon.
2020-09-07 13:26:16 -04:00
Joey Hess
69053a93a2
resolvemerge: Improve cleanup of files that were deleted by one side of a conflicted merge, and modified by the other side
This case was handled by cleanConflictCruft, but only when the annexed
file's object was present. When not present, it left the annexed file
with the original name, not checked into git, while adding the variant
file. So, add an explicit deletion of the deleted file in this case.

My specific case where this happened actually involves
merge.directoryRenames=conflict. After a merge involving that,
the situation was the file appears as "added by them", because that
caused the file that they added to be moved into a directory we renamed.

That case is the same as them adding a modified version of the file,
while we deleted it. (Except for the history of the file, since it's a
new file, but this doesn't look at history.)

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-09-07 12:25:57 -04:00
Joey Hess
a360437215
make automerge behavior when one side deleted explict
This does not actually change how the merge conflict is resolved when
one side deleted the file, but it was not documented before, and I think
it only worked by accident.

This commit was sponsored by Brett Eisenberg on Patreon.
2020-09-07 12:01:03 -04:00
Joey Hess
e36bae74da
Exposed annex.forward-retry git config
One reason is, 5 is an arbitrary number so ought to be configurable.

The real reason though, is I wanted to make the man page explain when
forward retry can override annex.retry, and having a config made the
man page easier to write.
2020-09-04 15:16:40 -04:00
Joey Hess
2bb933eb60
import: Retry downloads that fail
Also, using the transfer machinery for this makes eg, git-annex info show
in-progress imports, and makes --notify-start/finish work.
2020-09-04 13:54:05 -04:00
Joey Hess
1a42b2c5a3
combine retry deciders in better way
This fixes the problem that, if forwardRetry was checked for the first 5
and decided to retry, the 6th would go to configuredRetry which would
see the counter was 6 and so wait retry-delay*2^5 seconds (default 32).

Now, it waits for retry-delay before each retry, even when forwardRetry
initiated the retry.
2020-09-04 12:48:30 -04:00
Joey Hess
1d244bafbd
Limit retrying of failed transfers when forward progress is being made to 5
To avoid some unusual edge cases where too much retrying could result in
far more data transfer than makes sense.
2020-09-04 12:46:37 -04:00
Joey Hess
eed20fe3b7
fix some file modes in calls to withTmpFileIn to honor umask
Also audited for other calls to openTempFile, and all are ok,
except for viaTmp which will need further work.

Remote.Directory fixed to set umask mode when writing to an export,
although it has another one using viaTmp that's not fixed.
Will make exports that are published via a http server running as
another user work, for example.

Remote.BitTorrent fixed to set umask mode when downloading the torrent
file. Normally this does not matter as that file does not hang around
after the download, but if a bittorrent download were started by one user,
got interrupted and then another user ran it, this will let them access
the torrent file created by the first user.
2020-09-02 14:36:08 -04:00
Joey Hess
00937c4813
when downloading same content from multiple urls, only display error if all fail 2020-09-02 11:35:07 -04:00
Joey Hess
571ec900ac
Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https.
With automatic layout learning!
2020-09-01 15:16:35 -04:00
Joey Hess
f95664305b
remove unused imports 2020-08-28 11:16:51 -04:00
Joey Hess
b68f214312
Display a message when git-annex has to wait for a pid lock file held by another process 2020-08-26 13:05:34 -04:00
Joey Hess
b24ba92231
refactor out Annex.PidLock 2020-08-26 12:29:13 -04:00
Joey Hess
7bdb0cdc0d
add gitAnnexChildProcess and use instead of incorrect use of runsGitAnnexChildProcess
Fixes reversion in 8.20200617 that made annex.pidlock being enabled result
in some commands stalling, particularly those needing to autoinit.

Renamed runsGitAnnexChildProcess to make clearer where it should be
used.

Arguably, it would be better to have a way to make any process git-annex
runs have the env var set. But then it would need to take the pid lock
when running any and all processes, and that would be a problem when
git-annex runs two processes concurrently. So, I'm left doing it ad-hoc
in places where git-annex really does run a child process, directly
or indirectly via a particular git command.
2020-08-25 14:57:49 -04:00
Joey Hess
2b6fc17f70
fix comment format 2020-08-25 13:40:52 -04:00
Joey Hess
283d2f85d1
importfeed: Fix reversion that caused some '.' in filenames to be replaced with '_'
sanitizeFilePath was changed to sanitize leading '.', but ImportFeed was
running it on parts of the template. So eg the leading '.' in the extension
got sanitized.

Note the added case for sanitizeLeadingFilePathCharacter ('/':_)
-- this was added because, if the template is title/episode and the title
is not set, it would expand to "/episode". So this is another potential
security fix.
2020-08-05 11:35:00 -04:00
Joey Hess
f75be32166
external backends wip
It's able to start them up, the only thing not implemented is generating
and verifying keys. And, the key translation for HasExt.
2020-07-29 15:23:18 -04:00
Joey Hess
555fe669e1
refactoring in preparation for external backends 2020-07-29 12:00:27 -04:00
Joey Hess
f5e65d680b
add back inAnnex check for drop here
Needed again after last commit removed it from startLocal again.
2020-07-25 18:17:33 -04:00
Joey Hess
2a45b5ae9a
avoid failure to lock content of removed file causing drop etc to fail
This was already prevented in other ways, but as seen in commit
c30fd24d91, those were a bit fragile.
And I'm not sure races were avoided in every case before. At least a
race between two separate git-annex processes, dropping the same
content, seemed possible.

This way, if locking fails, and the content is not present, it will
always do the right thing. Also, it avoids the overhead of an unncessary
inAnnex check for every file.

This commit was sponsored by Denis Dzyubenko on Patreon.
2020-07-25 11:59:33 -04:00
Joey Hess
c30fd24d91
add back inAnnex check after seeking
The test suite noticed this case, where two files with the same key are
dropped, and the seek stage sees both have content due to the way files
stream through it. But then locking the content to drop fails on the
second file, because the first file has already been dropped.

So, add back otherwise redundant inAnnex check.
2020-07-25 11:18:50 -04:00
Joey Hess
18f1fb5841
drop performance improvements
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.

Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.

Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.

This commit was sponsored by Ryan Newton on Patreon.
2020-07-24 13:27:46 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
2634a5ed99
avoid inflating error counter when forking and merging annex state 2020-07-19 18:31:25 -04:00
Joey Hess
7a42a47902
renaming 2020-07-10 14:17:35 -04:00
Joey Hess
9f6bd6cc05
add inRepoDetails
planned to use for an optimisation

most things using stagedDetails were not expecting to get dup files in a
conflicted merge and deal with them, so converted them to use
inRepoDetails.
2020-07-08 15:36:35 -04:00
Joey Hess
7347e50123
add stage number to stagedDetails parser
And convert parser to attoparsec, probably faster.

Before, a parse failure threw the whole --stage output line in to the
filename, which was certianly a bad idea, so fixed that.
2020-07-08 15:05:12 -04:00
Joey Hess
9483b10469
cache one more log file for metadata
My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.

Especially when the --all optimisation in the previous commit
pre-cached the location log.

This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.

The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.

But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.

Clearly there could be some further benchmarking and tuning here.
2020-07-07 14:18:55 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
57cceac569
simplify interface by removing size
Add size to the returned key after the fact, unless the remote happened
to add it itself.
2020-07-03 14:22:22 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
b2f4b84d27
clean up some build warnings on windows 2020-07-02 11:34:18 -04:00
Joey Hess
087b7ee66a
Revert "data type that starts off using a set but converts to a bloom filter when large"
This reverts commit 7e2c4ed216.

I was not able to use this in the end..
See comment in the previous commit.
2020-07-01 20:12:19 -04:00
Joey Hess
a09937580e
more windows build fixes 2020-07-01 15:22:56 -04:00
Joey Hess
7e2c4ed216
data type that starts off using a set but converts to a bloom filter when large
This adds a dep on hashable, but it's a free dependency, since
unordered-containers already pulled it in.

Using unordered-containers for the set seems to make sense, since it
hashes and bloom filter hashes too. (Though different hashes.)
I dunno, never quite know if I should use unordered-containers or containers.
2020-07-01 14:06:12 -04:00
Joey Hess
d3d187c869
fix build on windows
Annex.GitOverlay was using a module that needs posix to build.
2020-07-01 11:22:15 -04:00
Joey Hess
a59e95a82d
improve "unable to lock down 1 copy" message
This is a fairly hard to understand situation for the user. Listing the
remotes should help them understand it a bit better.

This commit was sponsored by Ethan Aubin.
2020-06-26 13:00:40 -04:00
Joey Hess
b651d3ede0
test: Fix some test cases that assumed git's default branch name
git is making that configurable, and configuring it globally would break
the test suite in a few places.

No other part of git-annex assumes any branch name. Renamed a few
placeholders to make that clearer.

This commit was sponsored by Jake Vosloo on Patreon.
2020-06-23 16:40:51 -04:00
Joey Hess
7757c0e900
Honor annex.largefiles when importing a tree from a special remote.
This commit was sponsored by Martin D on Patreon.
2020-06-23 16:07:18 -04:00
Joey Hess
104b3a9c6a
Build with the http-client-restricted library when available
Otherwise use the vendored copy as before.

The library is in Debian testing but not stable. Once it reaches
stable, the vendored copy can be removed.

Did not add it to debian/control because IIRC that's used to build
git-annex on stable too, possibly. However, the Debian maintainer will
probably want to make the package depend on libghc-http-client-restricted-dev

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:31:31 -04:00
Joey Hess
aa1ad0b7ca
remove redundant imports
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:05:34 -04:00
Joey Hess
d5451afc8f
fix deadlock
Fix a deadlock that could occur after git-annex got an unlocked file,
causing the command to hang indefinitely.

Known to happen on vfat filesystems, possibly others.

Note that a deadlock is still theoretically possible, if anything
smudge --clean does causes it to run the git queue for some other
reason.

Apparently that doesn't happen, but will need to keep an eye on it.
2020-06-18 12:56:29 -04:00
Joey Hess
96f6aa39dd
add runsGitAnnexChildProcess calls
This is all the calls to git-annex that seem capable of possibly locking
the same pidlock as their parent. Except possibly for some in the
assistant.
2020-06-17 15:31:03 -04:00
Joey Hess
82448bdf39
fix a annex.pidlock issue
That made eg git-annex get of an unlocked file hang until the
annex.pidlocktimeout and then fail.

This fix should be fully thread safe no matter what else git-annex is
doing.

Only using runsGitAnnexChildProcess in the one place it's known to be a
problem. Could audit for all places where git-annex runs itself as a child
and add it to all of them, later.
2020-06-17 15:30:59 -04:00
Joey Hess
ad81feb053
fix implicit embedcreds regression
Fix bug that made creds not be stored in git when a special remote was
initialized with gpg encryption, but without an explicit embedcreds=yes.

(Yet nother regression introduced in version 7.20200202.7. 5th so far.)
2020-06-16 18:00:19 -04:00
Joey Hess
a76b1ba3d6
local git remote autoinit improvements
* Improve display of problems auto-initializing or upgrading local git
  remotes.
* When a local git remote cannot be initialized because it has no
  git-annex branch or a .noannex file, avoid displaying a message about it.
2020-06-16 13:24:00 -04:00
Joey Hess
8a7c615a8f
import: Avoid using some strange names for temporary keys
The ContentIdentifier can contain almost anything, so could have characters
that are not fit for the filesystem, or might be longer than a key usually
is, or contain a newline, or .... genKeyName deals with those problems.

This should not present a back-compat issue, because this is a temporary
key used while downloading the imported file, before the real key for it
can be generated.
2020-06-11 16:07:36 -04:00
Joey Hess
6b0cb2d732
defer cleaning keys db of old data
Avoid creating the keys database during init when there are no unlocked
files, to prevent init failing when sqlite does not work in the filesystem.
2020-06-11 15:40:13 -04:00
Joey Hess
24ff5e2b29
use uninterruptibleMask
Some recent changes to use mask missed that async exceptions can still
be thrown inside it. The goal is to make sure a block of cleanup code
runs entirely, w/o being interrupted by an async exception, so use
uninterruptibleMask.

Also, converted a few to bracket, which is nicer.
2020-06-09 15:02:56 -04:00
Joey Hess
0210e81d83
async exception safety for openFd
Audited for openFile and openFd, and this fixes all the ones I found
where an async exception could prevent the file getting closed.

Except for the lock pool, which is a whole other can of worms.
2020-06-05 15:48:00 -04:00
Joey Hess
319f2a4afc
audit all uses of SomeException to avoid catching async exceptions
Except for the assistant, which I think may use them between threads?

Most of the uses of SomeException were already catching only async exceptions.
But I did find a few places that were accidentially catching them.
2020-06-05 15:16:57 -04:00
Joey Hess
2bff3b7c49
init: When annex.pidlock is set, skip lock probing. 2020-06-05 11:12:16 -04:00
Joey Hess
1d41ae5d2a
init warning on stalled lock probe
init: If lock probing stalls for a long time (eg a broken NFS server),
display a message to let the user know what's taking so long.
2020-06-05 11:06:19 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
438dbe3b66
convert to withCreateProcess for async exception safety
This handles all sites where checkSuccessProcess/ignoreFailureProcess
is used, except for one: Git.Command.pipeReadLazy
That one will be significantly more work to convert to bracketing.

(Also skipped Command.Assistant.autoStart, but it does not need to
shut down the processes it started on exception because they are
git-annex assistant daemons..)

forceSuccessProcess is done, except for createProcessSuccess.
All call sites of createProcessSuccess will need to be converted
to bracketing.

(process pools still todo also)
2020-06-04 12:44:09 -04:00
Joey Hess
2dc7b5186a
convert to withCreateProcess for async exception safety 2020-06-04 12:05:25 -04:00
Joey Hess
92f775eba0
convert to withCreateProcess for async exception safety
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.

Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.

checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
2020-06-03 15:48:09 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
484a74f073
auto-init autoenable=yes
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.

Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.

Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.

That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.

(Had to split out Remote.List.Util to avoid an import cycle.)
2020-05-27 12:40:35 -04:00
Joey Hess
0a9a3ed1c3
left an unhandled case in previous commit 2020-05-15 14:31:50 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
5f5170b22b
remove SafeFilePath
Move sanitizeFilePath call to where fromSafeFilePath had been.
2020-05-11 14:04:56 -04:00
Joey Hess
cabbc91b18
addurl, importfeed: Allow '-' in filenames, as long as it's not the first character 2020-05-11 13:50:49 -04:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
04352ed9c5
check-ignore resource pool
Much like check-attr before.
2020-04-21 11:25:28 -04:00
Joey Hess
45fb7af21c
check-attr resource pool
Limited to min of -JN or number of CPU cores, because it will often be
CPU bound, once it's read the gitignore file for a directory.

In some situations it's more disk bound, but in any case it's unlikely
to be the main bottleneck that -J is used to avoid. Eg, when dropping,
this is used for numcopies checks, but the main bottleneck will be
accessing the remotes to verify presence. So the user might decide to
-J32 that, but having 32 check-attr processes would just waste however
many filehandles they open, and probably worsen their performance due to
CPU contention.

Note that, I first tried just letting up to the -JN be started. However,
even when it's no bottleneck at all, that still results in all of them
being started. Why? Well, all the worker threads start up nearly
simulantaneously, so there's a thundering herd..
2020-04-21 11:05:57 -04:00
Joey Hess
cee6b344b4
cat-file resource pool
Avoid running a large number of git cat-file child processes when run with
a large -J value.

This implementation takes care to avoid adding any overhead to git-annex
when run without -J. When run with -J, there is a small bit of added
overhead, to manipulate the resource pool. That optimisation added a
fair bit of complexity.
2020-04-20 15:19:31 -04:00
Joey Hess
fe9cf1256e
move remoteList into dupState
This does mean that RemoteDaemon.Transport.Tor's call runs it, otherwise
no change, but this is groundwork for doing more such expensive actions
in dupState.
2020-04-17 14:36:45 -04:00
Joey Hess
a7840c0e04
improve programPath
Fixes a failure mode where git-annex sync would try to run git-annex and
complain that it failed to find it in ~/.config/git-annex/program or PATH,
when there was a git-annex in /usr/bin/, but the original one was run
from elsewhere (eg, ~/bin) and happened not to be present any longer.

Now, it will fall back to using git-annex from PATH in such a case.
Which might fail due to some version incompatability, but still better
than a misleading error message.

Also made readProgramFile only read the file, not look for git-annex in
PATH as a fallback. That fallback may have confused Assistant.Upgrade,
which really wants the value from the file.
2020-04-15 16:46:34 -04:00
Joey Hess
43a9808292
disable journal read optimisation when alwayscommit=false
The journal read optimisation in aeca7c220 later got fixed in eedd73b84
to stage and commit any files that were left in the journal by a
previous git-annex run. That's necessary for the optimisation to work
correctly. But it also meant that alwayscommit=false started committing
the previous git-annex processes journalled changes, which defeated the
purpose of the config setting entirely.

So, disable the optimisation when alwayscommit=false, leaving the
files in the journal and not committing them. See my comments on the bug
report for why this seemed the best approach.

Also fixes a problem when annex.merge-annex-branches=false and there
are changes in the journal. That config indirectly prevents committing
the journal. (Which seems a bit odd given its name, but it always has..)
So, when there were changes in the journal, perhaps left there due to
alwayscommit=false being set before, the optimisation would prevent
git-annex from reading the journal files, and it would operate with out
of date information.
2020-04-15 13:24:33 -04:00
Joey Hess
5a62e8132d
When parsing git configs, support all the documented ways to write true and false, including "yes", "on", "1", etc.
This change does impact git-annex config
eg "git annex config --set annex.addunlocked on"
will store "on" and new git-annex will understand that value, while
old git-annex will error:
git-annex: bad annex.addunlocked configuration in git annex config:
Parse failure: near "on"
That seems acceptable.

Not special remote configs that are only documented as =true or =false
however. Having git-annex support other values for those would break
backwards compatability when used with old versions of git-annex. And
older versions ignore invalid special remote configs.. That would not
be a good combination.
2020-04-13 14:05:30 -04:00
Joey Hess
ca9c6c5f60
Fix a potential failure to parse git config
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.

So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
2020-04-13 13:05:41 -04:00
Joey Hess
eedd73b846
fix reversion caused by earlier optimisation to git-annex branch reads
aeca7c2207 was predicated on the
assumption that updateTo would stage any journal files, but in one case
it did not actually do so. The test suite happened to expose the bug.
2020-04-10 15:25:22 -04:00
Joey Hess
2caf579718
cache annex index filename for 1.5% speedup to queries 2020-04-10 13:37:04 -04:00
Joey Hess
aeca7c2207
Sped up query commands that read the git-annex branch by around 5%
The only price paid is one additional MVar read per write to the journal.
Presumably writing a journal file dominiates over a MVar read time by
several orders of magnitude.

--batch does not get the speedup because then it needs to notice when
another process has made a change. Also made the assistant and other damon
modes bypass the optimisation, which would not help them anyway.
2020-04-09 13:54:43 -04:00
Joey Hess
c0cd07c36b
Ref ByteString conversion done
Test suite passes.
2020-04-07 17:41:09 -04:00
Joey Hess
6c81e0c8f1
ByteString Ref continued
Several nice speed wins I think.

At 340/633 files converted.
2020-04-07 13:27:11 -04:00
Joey Hess
87d5583a91
use programPath consistently, not readProgramFile
Improve git-annex's ability to find the path to its program, especially
when it needs to run itself in another repo to upgrade it.

Some parts of the code used readProgramFile, probably because I forgot that
programPath exists.

I noticed this when a git-annex auto-upgrade failed because it was running
git-annex upgrade --autoonly, but the code to run git-annex used
readProgramFile, which happened to point to an older build of git-annex.
2020-03-30 16:06:27 -04:00
Joey Hess
f6d19b18f6
remove unused imports 2020-03-30 12:11:52 -04:00
Joey Hess
0e4d80d5c1
remove pre-commit hook
This was originally added so that unannex could prevent the hook from
running while files were in a state that the hook would interpret as
old-style unlocked and so would lock.

Now that's gone, so the only thing the hook was preventing was two
pre-commit processes running simulantaneously. But such concurrency
is normal in git-annex and should not be a problem.

Does mean that .git/hooks/pre-commit-annex might run more concurrently,
that seems the only risk of it causing any problems.
2020-03-30 11:54:04 -04:00
Joey Hess
2e6e8aa60a
fix windows build some more 2020-03-20 11:47:09 -04:00
Joey Hess
d930a2035c
Avoid converting .git file in a worktree or submodule to a symlink when the repository is not a git-annex repository.
This means it will still be a .git file when git-annex init runs. That's
ok, the repo probably contains no annexed objects yet, and even if it does,
git-annex init does not care if symlinks in the worktree don't point to the
objects.

I made init, at the end, run the conversion code. Not really necessary
because the next git-annex command could do it just as well. But, this
avoids commands that don't normally write to the repo needing to write to
it, which might avoid some problem or other, and seems worth avoiding
generally.
2020-03-09 14:54:14 -04:00
Joey Hess
c0a981cb0e
update comment 2020-03-09 14:31:28 -04:00
Joey Hess
093fde5abd
completed the createDirectoryIfMissing conversion
Remaining calls in the assistant and Annex.Ssh have been audited and are ok.
2020-03-06 12:55:03 -04:00
Joey Hess
2f204b5d37
refactor 2020-03-06 11:43:07 -04:00
Joey Hess
eaa49ab53d
convert replaceFile to createDirectoryUnder
Since it was used on both worktree and .git/annex files, split into
multiple functions.

In passing, this also improves permissions of created directories in
.git/annex, using createAnnexDirectory on those.
2020-03-06 11:31:01 -04:00
Joey Hess
6d58ca94d6
some easy createDirectoryUnder conversions 2020-03-05 15:20:10 -04:00
Joey Hess
ebbc5004fa
convert createAnnexDirectory to use createDirectoryUnder
It will create foo/.git/annex/, but not foo/.git/ and not foo/.

This will avoid it creating an empty path to a repo when a drive is
yanked out and the mount point goes away, for example.
2020-03-05 14:33:04 -04:00
Joey Hess
ccd8c43dc8
git-annex config: guard against non-repo-global configs
git-annex config: Only allow configs be set that are ones git-annex
actually supports reading from repo-global config, to avoid confused users
trying to set other configs with this.
2020-03-02 15:54:18 -04:00
Joey Hess
c78b9b55b6
rename changeGitConfig to overrideGitConfig and avoid unncessary calls
It's important that it be clear that it overrides a config, such that
reloading the git config won't change it, and in particular, setConfig
won't change it.

Most of the calls to changeGitConfig were actually after setConfig,
which was redundant and unncessary. So removed those.

The only remaining one, besides --debug, is in the handling of
repository-global config values. That one's ok, because the
way mergeGitConfig is implemented, it does not override any value that
is set in git config. If a value with a repo-global setting was passed
to setConfig, it would set it in the git config, reload the git config,
re-apply mergeGitConfig, and use the newly set value, which is the right
thing.
2020-02-27 01:11:53 -04:00
Joey Hess
81e3faf810
Merge branch 'v7' 2020-02-26 18:15:18 -04:00
Joey Hess
8af6d2c3c5
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.

Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)

To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.

And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.

(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 18:05:36 -04:00
Joey Hess
9659f1c30f
annex.security.allowed-ip-addresses ports syntax
Extended annex.security.allowed-ip-addresses to let specific ports of an IP
address to be used, while denying use of other ports.
2020-02-25 15:45:52 -04:00
Joey Hess
1bb32098d6
jump right to v8, don't stop part way
* init --version: When the version given is one that automatically
  upgrades to a newer version, use the newer version instead.
* Auto upgrades from older repo versions, like v5, now jump right to v8.
2020-02-24 13:21:00 -04:00
Joey Hess
c31e1be781
convert KeySource to RawFilePath 2020-02-21 10:04:44 -04:00
Joey Hess
029c883713
Merge branch 'master' into v8 2020-02-19 14:32:11 -04:00
Joey Hess
69f2d1dd43
remoteConfig rework
remoteAnnexConfig will avoid bugs like
a3a674d15b

Use now more generic remoteConfig in a couple places that built
non-annex config settings manually before.
2020-02-19 13:45:11 -04:00
Joey Hess
ae4177d456
fix warning 2020-02-17 15:06:28 -04:00
Joey Hess
da9945c013
silence build warning 2020-02-14 19:38:50 -04:00
Joey Hess
879f52a116
annex.tune.branchhash1=true bugfix
Fix support for repositories tuned with annex.tune.branchhash1=true,
including --all not working and git-annex log not displaying anything for
annexed files.
2020-02-14 15:22:48 -04:00
Joey Hess
a490947068
annex.sshcaching warning improvement and allow overridding build time default
* When git-annex is built with a ssh that does not support ssh connection
  caching, default annex.sshcaching to false, but let the user override it.
* Improve warning messages further when ssh connection caching cannot
  be used, to clearly state why.
2020-02-14 14:21:03 -04:00
Joey Hess
5c3636037b
Display a warning when concurrency is enabled but ssh connection caching is not enabled or won't work due to a crippled filesystem
A warning message is unsatisfying. But erroring out is too hard a failure,
especially since it may well work fine if the user has enabled passwordless
ssh.

I did think about falling back to one ssh connection at a time in this
case, but it would have needed a rework of every ssh call, which
seems far overboard for such a niche problem. There's no single place where
git-annex runs ssh, so no one place that it could block a concurrent call
on a semaphore. And, even if it did fall back to one ssh connection at a
time, it seems to me that doing so without warning the user about the
problem just invites bug reports like "git-annex is ignoring my -J2 and
only doing one download at a time". So a warning is needed, and I suppose
is good enough.
2020-01-23 12:35:46 -04:00
Joey Hess
6f90bb7738
handle git-credential prompt in -J mode
If git-credential has it cached and does not prompt, this will
unfortunately result in a brief flicker, as the displayed console
regions are hidden while running it and then re-displayed. Better than a
corrupted display.

Actually, I tried it and don't see a visible flicker, so probably only
over a slow ssh will it be apparent.
2020-01-22 16:42:15 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
2be4122bfc
include passthrough params in --describe-other-params 2020-01-20 16:53:27 -04:00
Joey Hess
aa949bbb7d
initremote --describe-other-params
Does not yet include descriptions from external special remote programs.
2020-01-20 16:05:51 -04:00
Joey Hess
7038acf96c
add descriptions for all remote config fields
not yet used
2020-01-20 15:20:04 -04:00
Joey Hess
923230ea30
convert RemoteConfigFieldParser to data type 2020-01-20 13:49:30 -04:00
Joey Hess
8b9b90c74a
bugfixes
getRemoteConfigPassedThrough was never returning anything, Typeable
prevented the type checker from noticing a dumb mistake.

parseRemoteConfig was not adding Accepted values as PassedThrough
2020-01-17 17:09:56 -04:00
Joey Hess
1d711c4378
use "param" not "field" to match man pages 2020-01-15 14:07:05 -04:00
Joey Hess
2edf0506a5
a few forgotten remote config fields
preferreddir can be used with any special remote, so its parser needs to
be included in the commonFieldParsers.

initremote with uuid= changed to delete that field, so it does not
need to be included in commonFieldParsers. Note that, existing remotes
initialized before this change will have the field in remote.log.
This will not cause problems parsing, because the value will be
Accepted.

Grepping for 'Accepted "' found these, and I'm pretty sure this is all of
them.
2020-01-15 11:22:36 -04:00
Joey Hess
c4ea3ca40a
ported almost all remotes, until my brain melted
external is not started yet, and S3 is part way through and not
compiling yet
2020-01-14 15:41:34 -04:00
Joey Hess
c498269a88
convert configParser to Annex action and add passthrough option
Needed so Remote.External can query the external program for its
configs. When the external program does not support the query,
the passthrough option will make all input fields be available.
2020-01-14 13:52:03 -04:00
Joey Hess
963239da5c
separate RemoteConfig parsing basically working
Many special remotes are not updated yet and are commented out.
2020-01-14 12:35:08 -04:00
Joey Hess
71f78fe45d
wip separate RemoteConfig parsing
Remote now contains a ParsedRemoteConfig. The parsing happens when the
Remote is constructed, rather than when individual configs are used.

This is more efficient, and it lets initremote/enableremote
reject configs that have unknown fields or unparsable values.

It also allows for improved type safety, as shown in
Remote.Helper.Encryptable where things that used to match on string
configs now match on data types.

This is a work in progress, it does not build yet.

The main risk in this conversion is forgetting to add a field to
RemoteConfigParser. That will prevent using that field with
initremote/enableremote, and will prevent remotes that already are set
up from seeing that configuration. So will need to check carefully that
every field that getRemoteConfigValue is called on has been added to
RemoteConfigParser.

(One such case I need to remember is that credPairRemoteField needs to be
included in the RemoteConfigParser.)
2020-01-13 12:39:21 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
5e4deb3620
support sha256 git repos
Git will eventually switch to sha2 and there will not be one single
shaSize anymore, but two (40 and 64).

Changed all parsers for git plumbing output to support both sizes of
shas.

One potential problem this does not deal with is, if somewhere in
git-annex it reads two shas from different sources, and compares them
to see if they're the same sha, it would fail if they're sha1 and sha256
of the same value. I don't know if that will really be a concern.
2020-01-07 12:22:19 -04:00
Joey Hess
2000e9a4b8
avoid build warning on windows 2020-01-01 14:40:35 -04:00
Joey Hess
2cea674d1e
Merge branch 'master' into v8 2020-01-01 14:26:43 -04:00
Joey Hess
ea3cb7d277
fix a case where file tracked by git unexpectedly becomes annex pointer file
smudge: When annex.largefiles=anything, files that were already stored in
git, and have not been modified could sometimes be converted to being
stored in the annex. Changes in 7.20191024 made this more of a problem.
This case is now detected and prevented.
2019-12-27 15:08:03 -04:00
Joey Hess
2b821eb225
Merge branch 'master' into sqlite 2019-12-26 15:15:42 -04:00
Joey Hess
37467a008f
annex.addunlocked expressions
* annex.addunlocked can be set to an expression with the same format used by
  annex.largefiles, in case you want to default to unlocking some files but
  not others.
* annex.addunlocked can be configured by git-annex config.

Added a git-annex-matching-expression man page, broken out from
tips/largefiles.

A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.

Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
2019-12-20 15:56:25 -04:00
Joey Hess
8e9e809d9b
when annex.largefiles parse fails, say where the config came from 2019-12-20 13:07:10 -04:00
Joey Hess
4acbb40112
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.

Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.

Performance notes:

git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.

git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 13:01:41 -04:00
Joey Hess
02e00fd7ab
Merge branch 'master' into sqlite 2019-12-19 16:33:42 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
d5628a16b8
Merge branch 'bs' into sqlite-bs 2019-12-18 14:51:03 -04:00
Joey Hess
322c542b5c
fix ByteString conversion on windows
the encode' and decode' functions on Windows should not apply the
filesystem encoding, which does not work there. Instead, convert to and
from UTF-8.

Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem
encoding, so won't work as expected on windows.
2019-12-18 13:32:56 -04:00
Joey Hess
3d38ec9585
fix fileJournal
My ByteString rewrite oversimplified it, resulting in any _ in a journal
file turning into a / in the git-annex branch, which was often the wrong
filename, or sometimes (//) an invalid filename that git
refused to add.
2019-12-18 11:29:34 -04:00
Joey Hess
cee0d738fc
match also / path separator on windows 2019-12-11 17:08:08 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
2f9a80d803
merging sqlite and bs branches
Since the sqlite branch uses blobs extensively, there are some
performance benefits, ByteStrings now get stored and retrieved w/o
conversion in some cases like in Database.Export.
2019-12-06 15:30:45 -04:00
Joey Hess
5f391179f1
use RawFilePath getFileStatus for speed
Only done on those calls to getFileStatus that had a RawFilePath, not a
FilePath. The others would probably be just as fast if converted to use
it with toRawFilePath, but I'm not 100% sure.

Note that genInodeCache' uses fromRawFilePath, but that value only gets
used on Windows, so on unix the thunk will never be evaluated.
2019-12-06 14:44:42 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
6535aea49a
optimisation
This was already optimised before, but profiling found that delEntry was
around 1.5% of the total runtime of git-annex whereis. It was being
called once per environment variable per file processed.

Fixed by better caching. Since withIndexFile is almost always run with
the same .git/annex/index file, it can cache the modified environment,
rather than re-modifying it each time called.
2019-12-04 14:27:11 -04:00
Joey Hess
b88f89c1ef
get the most commonly used commands building again
A quick benchmark of whereis shows not much speed improvement, maybe a
few percent. Profiling it found a hotspot, adds to todo.
2019-12-04 13:45:18 -04:00
Joey Hess
f3047d7186
include git-annex-shell back in
Also pushed ConfigKey down into the Git modules, which is the bulk of
the changes.
2019-12-02 11:51:52 -04:00
Joey Hess
c756006374
fix hacked up AutoMerge module to work again 2019-12-02 10:51:43 -04:00
Joey Hess
d7833def66
use ByteString for git config
The parser and looking up config keys in the map should both be faster
due to using ByteString.

I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.

Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
2019-11-27 17:40:09 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
6a97ff6b3a
wip RawFilePath
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)

Currently in a very bad unbuildable in-between state.
2019-11-25 16:18:19 -04:00
Joey Hess
ddf6973d22
minor optimisation
avoid repeated scan of the same bytestring
2019-11-22 19:13:05 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
d4661959de
Merge branch 'master' into sqlite 2019-11-21 17:26:50 -04:00
Joey Hess
740e0ddbfe
avoid running scanUnlockedFiles in bare repo
It's not necessary. And if the bare repo somehow has a pointer
file in it with the same name as a file in HEAD, that file would be
populated, which would be surprising since the file is not really under
git's control.
2019-11-21 14:31:12 -04:00
Joey Hess
5877de5e80
git-lfs: remember urls, and autoenable remotes using known urls
* git-lfs: The url provided to initremote/enableremote will now be
  stored in the git-annex branch, allowing enableremote to be used without
  an url. initremote --sameas can be used to add additional urls.
* git-lfs: When there's a git remote with an url that's known to be
  used for git-lfs, automatically enable the special remote.
2019-11-18 16:09:09 -04:00
Joey Hess
667d38a8f1
Fix a crash (STM deadlock) when -J is used with multiple files that point to the same key
See the comment for a trace of the deadlock.

Added a new StartStage. New worker threads begin in the StartStage.
Once a thread is ready to do work, it moves away from the StartStage,
and no thread will ever transition back to it.

A thread that blocks waiting on another thread that is processing
the same key will block while in the StartStage. That other thread
will never switch back to the StartStage, and so the deadlock is avoided.
2019-11-14 13:51:09 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
99536e3a0b
remove one more warningIO
Had to generalize Git.Queue so it can run an Annex action, yipes.

Only remaining warningIO are in the legacy chunk code.
2019-11-12 10:45:52 -04:00
Joey Hess
0be23bae2f
refactor
Better to not have a single function module, and better to have a more
specific type than Bool.

This commit was sponsored by Jack Hill on Patreon
2019-11-11 19:10:52 -04:00
Joey Hess
3b34d123ed
Added annex.allowsign option.
This commit was sponsored by Ilya Shlyakhter on Patreon.
2019-11-11 16:28:56 -04:00
Joey Hess
3553867b66
v7 to v8 auto-upgrade
bump version to 8

and update NEWS about it
2019-11-07 13:24:16 -04:00
Joey Hess
2f94b5419a
use new name for new format export dbs
Delete the old export dbs on upgrade.

Testing this an exporting to a directory with both exporttree=yes and
importtree=yes, it refused to let an interrupted export proceed after
upgrade, with "unsafe to overwrite file". An import resolved the
problem.
2019-11-06 17:34:15 -04:00
Joey Hess
3b820f08f7
use new name for new format content identifier db
It will be populated automatically by the next command that needs data
from it, the same way it gets populated in a fresh clone. That may be a
little expensive, but it's a one time cost, and no slower than in a
fresh clone.
2019-11-06 16:43:52 -04:00
Joey Hess
1b5f4b67b5
use new name for new format fsck db
The old db is cleaned up when a new incremental fsck is started.

The incremental fsck won't pick up where the old one left off, but I
consider this a minor enough thing that it can just be documented and
won't be a problem.
2019-11-06 16:27:25 -04:00
Joey Hess
dc9295017f
v8 upgrade of keys db
Renamed the database to .git/annex/keysdb;
the old .git/annex/keys gets deleted during the upgrade.

It is possible that an old git-annex process is running during the
upgrade. If so, it will be able to continue using the old keys db until the
upgrade is complete, and then will presumably fail in some ugly way. Or
perhaps the upgrade will be unable to delete the open files on some
systems, and so fail with an ugly error message.

It's also possible for multiple processes to be running the upgrade
concurrently. That should be fine; they will both write the same
information into the keys db.

Other databases still need to be upgraded.
2019-11-06 16:16:00 -04:00
Joey Hess
6147130e86
Merge branch 'master' into sqlite 2019-11-05 12:59:28 -04:00
Joey Hess
e2d4c133f5
init: fix data loss bug
Fix bug that lost modifications to unlocked files when init is re-ran in an
already initialized repo.

In retrospect needing scanUnlockedFiles False in the direct mode upgrade
path was a good hint that it was unsafe when used with True.

However, this bug did not affect upgrade from v5. In such an upgrade, an
unlocked file that is modified is left as-is. The only place
scanUnlockedFiles True did overwrite modified unlocked files is during an
git-annex init of a repo that was already initialized by git-annex.

(I also tried a scenario where the repo had not been initialized by
git-annex yet, but was cloned from a v7 repo with an unlocked file, and the
pointer file replaced with some other content, and the data loss did not
occur in that situation.)

Since the fixed scanUnlockedFiles avoids overwriting non-pointer files,
it should be safe to run in any situation, so there's no need any longer
for the parameter.
2019-11-05 12:41:15 -04:00
Joey Hess
61c9b0945d
bump version, though there is no upgrade path yet
I just don't want this branch to accidentially run in my production repos yet.
2019-10-29 17:06:35 -04:00
Joey Hess
c35a9047d3
improve data types for sqlite
This is a non-backwards compatable change, so not suitable for merging
w/o a annex.version bump and transition code. Not yet tested.

This improves performance of git-annex benchmark --databases
across the board by 10-25%, since eg Key roundtrips as a ByteString.

(serializeKey' produces a lazy ByteString, so there is still a
copy involved in converting it to a strict ByteString. It may be faster
to switch to using bytestring-strict-builder.)

FilePath and Key are both stored as blobs. This avoids mojibake in some
situations. It would be possible to use varchar instead, if persistent
could avoid converting that to Text, but it seems there is no good
way to do so. See doc/todo/sqlite_database_improvements.mdwn

Eliminated some ugly artifacts of using Read/Show serialization;
constructors and quoted strings are no longer stored in sqlite.

Renamed SRef to SSha to reflect that it is only ever a git sha,
not a ref name. Since it is limited to the characters in a sha,
it is not affected by mojibake, so still uses String.
2019-10-29 17:05:36 -04:00
Joey Hess
e98f230c95
remove unused function 2019-10-23 12:01:34 -04:00
Joey Hess
5db79339a1
init: Fix a failure when used in a submodule on a crippled filesystem.
When the submodule's parent repo has an adjusted unlocked branch,
it gets cloned by git, but git checks out master. git annex init then
fails because it wants to enter the adjusted branch, but:

  adjusted branch adjusted/master(unlocked) already exists.

  Aborting because that branch may have changes that have not yet reached master

Note that init actually then exits 0, leaving master checked out.

This could also happen, absent submodules, if the parent repo has
an adjusted unlocked branch, but it is not checked out. In the more common
case where that branch is checked out, the clone uses the same branch,
so no problem.

The choices to fix this:

* Init could delete the existing adjusted branch, and re-adjust.
  But then running init inside an adjusted branch on a crippled filesystem
  would lose any changes that have not been synced back to master.
* Init could sync any changes back to master, but that would be very surprising
  behavior for it.
* Init could simply check out the existing adjusted branch. If the branch
  is diverged from master, well, sync will sort that out later.
  This mirrors the behavior of cloning a repo that has an adjusted branch
  checked out that has not yet been synced back to master.
  Picked this choice.
2019-10-21 11:41:15 -04:00
Joey Hess
ce48eb797c
make DropDead transition minimize remote.log for dead sameas remotes
All that needs to be retained in remote.log is the sameas-uuid.
The rest of the config is eliminated. This doesn't save enough space to
bother with, but it prevents anything sensitive in the config of the
dead sameas remote from lingering around.

Note that minimizesameasdead does not update the VectorClock when
changing the log line. That's normally a no-no, but in this case,
it makes each DropDead result in the exact same file contents,
and vector clocks are not needed because the transition breaks
the history chain.
2019-10-15 11:39:25 -04:00
Joey Hess
4306dfbe68
remove empty log files in transition
forget --drop-dead: Remove several classes of git-annex log files when they
become empty, further reducing the size of the git-annex branch.

Noticed while testing sameas uuid removal, but it could happen other times
too.

An empty log file is always treated by git-annex the same as no file
being present, and when the files are per-key, it can be a sizable space
saving to exclude them from the tree.
2019-10-14 16:04:15 -04:00
Joey Hess
5e9a2cc37f
forget state of sameas remotes during DropDead transitions
It would have been a lot less round-about to just make git annex dead
also add the uuids of sameas remotes to the trust.log as dead.

But, that would fail in the case where there's an unmerged other clone
that has a sameas remote that the current repo does not know about.
Then it would not get marked as dead.

Handling it at transition time avoids that scenario.

Note that the generation of trustmap' in dropDead should only
happen once, due to the partial application.
2019-10-14 15:47:42 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
37f725a9f7
Merge branch 'master' into sameas 2019-10-11 15:56:00 -04:00
Joey Hess
debafcba2b
autoenable sameas remotes 2019-10-11 15:52:40 -04:00
Joey Hess
ec778888d2
got enableremote working for sameas
Also the assistant can enable sameas remotes, should work, but not
tested.
2019-10-11 15:11:08 -04:00
Joey Hess
1b9c1d1737
fix sameas inherited key removal 2019-10-11 13:18:27 -04:00
Joey Hess
8d7dc76dff
fix bad paste of field name 2019-10-11 13:05:25 -04:00
Joey Hess
91eed85fd4
add sameas inherited configs to newConfig
This makes initremote --sameas work with encryption inherited.
2019-10-11 13:05:20 -04:00
Joey Hess
0dd5691951
update notes 2019-10-10 16:12:59 -04:00
Joey Hess
df5b0ffab3
inherit other fields
I think this is all that need to be inherited.
2019-10-10 16:11:21 -04:00
Joey Hess
c3975ff3b4
sameas RemoteConfig inheritance
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.

The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
2019-10-10 15:58:22 -04:00
Joey Hess
59908586f4
rename RemoteConfigKey to RemoteConfigField
And some associated renames.
I was going to have some values named fooKeyKey otherwise..
2019-10-10 15:44:05 -04:00
Joey Hess
92ff30df70
set annex-config-uuid when RemoteConfig contains a sameas-uuid
Initremote sets that, so after both initremote and enableremote,
the git config will be set.

Any remote that does not use Annex.SpecialRemote won't set
annex-config-uuid. But that's only Remote.Git, which doesn't use
RemoteConfig anyway.
2019-10-10 12:58:59 -04:00
Joey Hess
97b499a4dc
use sameas-name and sameas-uuid for sameas remotes
initremote --sameas=remotename sets sameas-name and sameas-uuid

Using sameas-name rather than name prevents old git-annex initremote
from enabling a sameas remote by name, since it would not handle it
correctly.
2019-10-10 12:32:05 -04:00
Joey Hess
53da7f1cf8
update uninit to handle all the v7 stuff
* uninit: Remove several git hooks that git-annex init sets up.
* uninit: Remove the smudge and clean filters that git-annex init sets up.
2019-10-08 14:34:00 -04:00
Joey Hess
1113caa53e
preserve unlocked file mtime when dropping
When dropping an unlocked file, preserve its mtime, which avoids git status
unncessarily running the clean filter on the file.

If the index file has close to the same mtime as a work tree file, git will
not trust the index to be up-to-date, and re-runs the clean filter
unncessarily. Preserving the mtime when depopulating a pointer file avoids
git status doing a little (or maybe a lot) of unncessary work.

There are other places that the mtime could be preserved, including other
places where pointer files are written perhaps, but also
populatePointerFile. But, I don't know of cases where those lead to git
status doing unncessary work, so I just fixed the one I'm aware of for now.
2019-10-08 14:01:12 -04:00
Joey Hess
3066bdb1fb
fix annex.largefiles largerthan/smallerthan bug
Fix bug in handling of annex.largefiles that use largerthan/smallerthan.
When adding a modified file, it incorrectly used the file size of the old
version of the file, not the current size.

That was the only largefiles limit that didn't directly look at the file on
disk already. Added a new type to keep straight the two different ways such
a limit can be matched. I kind of wanted to extend MatchingFile or FileInfo
to indicate that the matcher is supposed to operate on files from disk or
annex, but it turned out to be too complex to implement it that way.

This also changes the LimitAnnexFiles case when lookupFileKey does not find
a key. It used to fall back to statting the file, now it always returns
False. I doubt the old code could really get to that point, but if it
somehow does, it's better for preferred content matching to be consistent.
2019-09-30 17:15:08 -04:00
Joey Hess
9f27d03945
fix a typo that didn't matter so far 2019-09-27 14:08:16 -04:00
Joey Hess
fda1bdd679
Added --mimetype and --mimeencoding file matching options.
Already had these for largefiles matching, but I forgot to add them as
command-line options.
2019-09-19 12:09:59 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
fef3cd055d
Removed support for git versions older than 2.1
debian oldoldstable has 2.1, and that's what i386ancient uses. It would be
better to require git 2.2, which is needed to use adjusted branches, but
can't do that w/o losing support for some old linux kernels or a
complicated git backport.
2019-09-11 16:14:43 -04:00
Joey Hess
061231621e
Merge branch 'master' into v7-default 2019-09-10 16:06:43 -04:00
Joey Hess
94c75d2bd9
init: Fix a reversion that broke initialization on systems that need to use pid locking
This brings back .git/annex/misctmp, but only for init. If an init
is interrupted while probing using that temp directory, the files it left
will get deleted 1 week later by a subsequent git-annex run.
2019-09-10 13:37:07 -04:00