Commit graph

2291 commits

Author SHA1 Message Date
Joey Hess
7a42a47902
renaming 2020-07-10 14:17:35 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
4229713e63
importfeed: Added some additional --template variables for date and time
This commit was sponsored by Ethan Aubin.
2020-06-24 14:24:50 -04:00
Joey Hess
7757c0e900
Honor annex.largefiles when importing a tree from a special remote.
This commit was sponsored by Martin D on Patreon.
2020-06-23 16:07:18 -04:00
Joey Hess
5098236c6b
testremote: Fix over-allocation of resources and bad caching
Including starting up a large number of external special remote processes.
(Regression introduced in version 8.20200501)
2020-06-22 14:25:49 -04:00
Joey Hess
aa1ad0b7ca
remove redundant imports
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:05:34 -04:00
Joey Hess
d5451afc8f
fix deadlock
Fix a deadlock that could occur after git-annex got an unlocked file,
causing the command to hang indefinitely.

Known to happen on vfat filesystems, possibly others.

Note that a deadlock is still theoretically possible, if anything
smudge --clean does causes it to run the git queue for some other
reason.

Apparently that doesn't happen, but will need to keep an eye on it.
2020-06-18 12:56:29 -04:00
Joey Hess
96f6aa39dd
add runsGitAnnexChildProcess calls
This is all the calls to git-annex that seem capable of possibly locking
the same pidlock as their parent. Except possibly for some in the
assistant.
2020-06-17 15:31:03 -04:00
Joey Hess
c4f2c56f5e
checkpresentkey: fix behavior to match documentation
checkpresentkey: When no remote is specified, try all remotes, not only
ones that the location log says contain the key. This is what the
documentation has always said it did.

Still try the logged remotes first, because they are far more likely to
have the key.
2020-06-16 13:54:26 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
92f775eba0
convert to withCreateProcess for async exception safety
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.

Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.

checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
2020-06-03 15:48:09 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
484a74f073
auto-init autoenable=yes
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.

Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.

Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.

That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.

(Had to split out Remote.List.Util to avoid an import cycle.)
2020-05-27 12:40:35 -04:00
Joey Hess
3824645368
change to new waitForAllRunningCommandActions
waitForAllRunningCommandActions is a subset of finishCommandActions
and more appropriate for what is being done here: Just a concurrency
barrier.
2020-05-26 14:00:51 -04:00
Joey Hess
864ba4ecaa
disable buggy concurrency in Command.Export
Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.

Crash as described in fixed bug report.

waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.

For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
2020-05-26 13:54:08 -04:00
Joey Hess
e04a931439
improve transfer stages for some commands
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.

transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.

As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
2020-05-26 11:55:50 -04:00
Joey Hess
0d82a88742
drop: use commandStages, not transferStages
I cannot find any rationalle for why this was changed before.
drop certianly does not do any transfers, so commandStages will perform
better.
2020-05-26 11:47:54 -04:00
Joey Hess
0bcecb67f5
export: Let concurrent transfers be done with -J or annex.jobs
Tested working, although I did find this bug in my testing, which also
afflicts sync -J to an export remote.
2020-05-26 11:44:07 -04:00
Joey Hess
f7fe71602c
import: Added --json-progress
Already supported --json, but not that.

Also checked all other commands that only support --json, and the only
other one that does transfers is fsck (--from), which it did not seem worth
adding --json-progress to really.
2020-05-26 11:27:47 -04:00
Joey Hess
5b8524e1e6
addurl: Make --preserve-filename also apply when eg a torrent contains multiple files
Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath
here.
2020-05-26 10:45:57 -04:00
Joey Hess
fc9833f68d
export: Added options for json output
Just worked, no need to do anything except add the options.
2020-05-26 10:31:10 -04:00
Joey Hess
d7c7245438
whereis: Added --format option.
One way this can be used is to remove all urls for some website that went
away:

git-annex whereis --format '${file} ${url}\0' | \
	grep -z whatever.com | git-annex rmurl --batch -z

Combining ${url} and ${uuid} is a bit of a combinatorial explosion.
It didn't seem worth only outputting a uuid alongside an url belonging
to it, so each uuid is output beside each url.
2020-05-19 16:20:56 -04:00
Joey Hess
6361074174
convert renameExport to throw exception
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.

A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?

However, it did let me clean up several places in the code.

This commit was sponsored by Ethan Aubin.
2020-05-15 15:08:09 -04:00
Joey Hess
037440ef36
convert removeExportDirectory to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 14:43:18 -04:00
Joey Hess
cdbfaae706
change removeExport to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Graham Spencer on Patreon.
2020-05-15 14:15:14 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
4814b444dd
make storeExport throw exceptions 2020-05-15 12:20:02 -04:00
Joey Hess
dc7dc1e179
refactor 2020-05-14 14:21:58 -04:00
Joey Hess
4be94c67c7
make removeKey throw exceptions 2020-05-14 14:11:05 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
39d7e6dd2a
addurl --preserve-filename for other remotes
Finishing work begun in 6952060665

Also, truncate filenames provided by other remotes if they're too long,
when --preserve-filename is not used. That seems to have been omitted
before by accident.
2020-05-11 14:33:27 -04:00
Joey Hess
5f5170b22b
remove SafeFilePath
Move sanitizeFilePath call to where fromSafeFilePath had been.
2020-05-11 14:04:56 -04:00
Thomas Koch
8a0480daf3 Fix haddock parse error
I run haddock with `cabal haddock --executables`. It fails with:

    Types/Remote.hs:271:17: error: parse error on input ‘->’

Apparently haddock does not like to find haddock blocks outside of
declarations? In any case, this patch makes these type of errors go
away.

Afterwards, I see errors like these, that need to be investigated as
a next step:

haddock: internal error: internal: extractDecl
CallStack (from HasCallStack):
  error, called at utils/haddock/haddock-api/src/Haddock/Interface/Create.hs:1116:12 in main:Haddock.Interface.Create
2020-05-11 08:40:13 +02:00
Joey Hess
6952060665
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
  without any sanitization, but with some security checking.

  Not yet implemented for remotes other than the web.

* addurl, importfeed: Avoid adding filenames with leading '.', instead
  it will be replaced with '_'.

  This might be considered a security fix, but a CVE seems unwattanted.
  It was possible for addurl to create a dotfile, which could change
  behavior of some program. It was also possible for a web server to say
  the file name was ".git" or "foo/.git". That would not overrwrite the
  .git directory, but would cause addurl to fail; of course git won't
  add "foo/.git".

sanitizeFilePath is too opinionated to remain in Utility, so moved it.

The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:

	isDrive will never succeed, because "c:" gets munged to "c_"
	".." gets sanitized now
	".git" gets sanitized now
	It will never be null, because sanitizeFilePath keeps the length
	the same, and splitDirectories never returns a null path.

Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 16:22:55 -04:00
Joey Hess
0040d2c129
sync: Avoid an ugly error message when nothing has been committed to master yet and there is a synced master branch to merge from
Now the warning gets displayed, which is better than an arcane git error.

The warning is still kind of ugly, especially when the pull later in the
sync will clear up what it warns about. But, this is an unusual situation
not likely to happen, and if there is no remote to pull from, the warning
message is needed or the sync will seem to succeed despite not merging the
synced master branch.

Would still be better if it could merge the synced master branch in this
situation, making an empty commit to master to do it seems wrong, and
otherwise it would need a whole separate code path, and would bypass using
git merge in favor of say, setting master to the syned branch. Which would
bypass git configs like arguably merge.ff and certianly
merge.verifySignatures. So don't want to do that.
2020-05-05 14:31:37 -04:00
Joey Hess
9fa940569c
added remote variants
Todo item is done at last.

Might later want to think about testing some other types of remotes that
can be tested locally. The git remote itself is probably already well
enough tested by the test suite that testremote is not needed. Could
test things like bup, or rsync to a local directory. Or even external,
although that would require embedding an external special remote program
into the test suite..
2020-04-30 13:52:03 -04:00
Joey Hess
fc1ae62ef1
added export remote tests 2020-04-30 13:13:08 -04:00
Joey Hess
735d2e90df
testremote in test is working
Not yet testing export, or remote variants, but it already adds several
hundred test cases, so big win.
2020-04-30 12:59:20 -04:00
Joey Hess
d7db481471
wip
This does not compile, and I hit a bad dead end. Wah.
2020-04-29 15:48:39 -04:00
Joey Hess
20f954c3b2
groundwork for adding testremote to git-annex test
Factored out a mkTestTree, which can be used to get a TestTree,
w/o needing to first run any annex actions, which the main test suite
cannot do because it does not operate in an annex repo to start with,
and it needs to start testing before a repo is available.
2020-04-29 13:16:43 -04:00
Joey Hess
fa98025de0
fix testremote to not throw away annex state
aeca7c2207 exposed this problem, but it
was never a good idea to have a series of test cases, some of which depend on
prior ones, and throw away annex state after each.
2020-04-28 17:19:07 -04:00
Joey Hess
19b5137227
addurl --fast error message improvement
addurl: When run with --fast on an url that
annex.security.allowed-ip-addresses prevents accessing, display a more
useful message.

(Also importfeed --fast potentially.)
2020-04-27 13:48:14 -04:00
Joey Hess
c05c4e549e
sync: When some remotes to sync with are specified, and --fast is too, pick the lowest cost of the specified remotes
Do not sync with a faster remote that was not specified.

That old behavior was only documented in the changelog, and was certianly
surprising. It also meant adding --fast made it slower..
2020-04-23 16:08:45 -04:00
Joey Hess
cd1676d604
fix bug involving local git remote and out of date location log
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.

I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.

get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.

Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.

(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
2020-04-21 12:36:58 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
957a87b437
fix absolute filenames fed into --batch and git-annex info 2020-04-15 16:04:05 -04:00
Joey Hess
f85ca7dc80
fix all remaining -Wincomplete-uni-patterns warnings
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
2020-04-15 13:55:08 -04:00