Commit graph

2408 commits

Author SHA1 Message Date
Joey Hess
18f1fb5841
drop performance improvements
Sped up seeking files to drop by 2x, and also some performance
improvements to checking numcopies.

Interestingly, the seek speedup is not due to precaching, but I think is
due to calling getParsed earlier.

Annex.Drop had to be changed to check inAnnex there, since it was removed
from Command.Drop. All other users of Command.Drop already checked inAnnex
themselves.

This commit was sponsored by Ryan Newton on Patreon.
2020-07-24 13:27:46 -04:00
Joey Hess
a01aa214be
enable location log precaching for mirror
It will be some perf increase, but the command is not much used so I
have not bothered to benchmark it.
2020-07-24 13:19:24 -04:00
Joey Hess
d732ef1a89
move, copy: Sped up seeking for annexed files to operate on by a factor of nearly 2x. 2020-07-24 12:56:02 -04:00
Joey Hess
00865cdae8
Fix a bug in find --branch in the previous version
inAnnex check was lost for that code path. To avoid more such mistakes,
made withKeyOptions check it when the AnnexedFileSeeker specifies.
2020-07-24 12:05:28 -04:00
Joey Hess
2d771a7d32
add back inAnnex check for keys options
Lost in recent commit.
2020-07-24 11:49:15 -04:00
Joey Hess
4685612f43
small git-annex get speedup
Remove an redundant inAnnex check. The checkContentPresent handles that,
and after the last commit also does in batch mode.
2020-07-22 14:29:30 -04:00
Joey Hess
1be92381ec
unify batch mode with non-batch by using AnnexedFileSeeker 2020-07-22 14:23:28 -04:00
Joey Hess
abd56fb019
Fix a bug in find --batch in the previous version. 2020-07-20 19:50:53 -04:00
Joey Hess
c4cc2cdf4c
rename getKey to genKey
for consistency with external backend protocol
2020-07-20 14:06:05 -04:00
Joey Hess
172743728e
move cryptographicallySecure into Backend type
This is groundwork for external backends, but also makes sense to keep
this information with the rest of a Backend's implementation.

Also, removed isVerifiable. I noticed that the same information is
encoded by whether a Backend implements verifyKeyContent or not.
2020-07-20 12:17:42 -04:00
Joey Hess
a7156b875c
fix fsck reversion
75aab72d23 made fsck skip files whose
content is not present, but it should complain if there are not enough
copies.
2020-07-15 11:21:43 -04:00
Joey Hess
9c23f99d45
add back missing check that content is present
Lost in 75aab72d23 and some related
commits. unannex skips files whose content is not present.
2020-07-15 11:15:28 -04:00
Joey Hess
377866d884
remove unused import 2020-07-14 14:37:40 -04:00
Joey Hess
7b2d236556
importfeed: stream metadata for 5% speedup
On top of the 10% speedup from streaming url logs.
2020-07-14 14:35:26 -04:00
Joey Hess
75aab72d23
mostly done with location log precaching
Some nice wins.
2020-07-13 17:04:02 -04:00
Joey Hess
df58609804
convert sync to use seekFilteredKeys
This only speeds up sync --content from 34.75 to 33.17 seconds;
location log precaching will probably be a bigger win.
2020-07-13 15:02:52 -04:00
Joey Hess
88a7fb5cbb
convert all applicable commands to new 2x faster annexed file seeking
This removes all calls to inAnnex, except for some involving --batch.
It may be that the batch code could get a similar speedup, but I don't
know if people habitually pass a huge number of files through --batch
that git-annex does not need to do anything to process, so I skipped it
for now.

A few calls to ifAnnexed remain, and might be worth doing more to
convert. In particular, Command.Sync has one that would probably speed
it up by a good amount.

(also removed some dead code from Command.Lock)
2020-07-10 15:45:38 -04:00
Joey Hess
7a42a47902
renaming 2020-07-10 14:17:35 -04:00
Joey Hess
4c9ad1de46
optimisation: stream keys through git cat-file --buffer
This is only implemented for git-annex get so far. It makes git-annex
get nearly twice as fast in a repo with 10k files, all of them present!

But, see the TODO for some caveats.
2020-07-10 13:54:52 -04:00
Joey Hess
e72ec8b9b2
add back git-annex branch read cache
The cache was removed way back in 2012,
commit 3417c55189

Then I forgot I had removed it! I remember clearly multiple times when I
thought, "this reads the same data twice, but the cache will avoid that
being very expensive".

The reason it was removed was it messed up the assistant noticing when
other processes made changes. That same kind of problem has recently
been addressed when adding the optimisation to avoid reading the journal
unnecessarily.

Indeed, enableInteractiveJournalAccess is run in just the
right places, so can just piggyback on it to know when it's not safe
to use the cache.
2020-07-06 12:22:33 -04:00
Joey Hess
85506a7015
import: Added --no-content option, which avoids downloading files from a special remote
Only supported by some special remotes: directory
I need to check the rest and they're currently missing methods until I do.

git-annex sync --no-content does not yet use this to do imports
2020-07-03 13:41:57 -04:00
Joey Hess
4229713e63
importfeed: Added some additional --template variables for date and time
This commit was sponsored by Ethan Aubin.
2020-06-24 14:24:50 -04:00
Joey Hess
7757c0e900
Honor annex.largefiles when importing a tree from a special remote.
This commit was sponsored by Martin D on Patreon.
2020-06-23 16:07:18 -04:00
Joey Hess
5098236c6b
testremote: Fix over-allocation of resources and bad caching
Including starting up a large number of external special remote processes.
(Regression introduced in version 8.20200501)
2020-06-22 14:25:49 -04:00
Joey Hess
aa1ad0b7ca
remove redundant imports
Clean build under ghc 8.8.3, which seems to do better at finding cases
where two imports both provide the same symbol, and warns about one of
them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-06-22 11:05:34 -04:00
Joey Hess
d5451afc8f
fix deadlock
Fix a deadlock that could occur after git-annex got an unlocked file,
causing the command to hang indefinitely.

Known to happen on vfat filesystems, possibly others.

Note that a deadlock is still theoretically possible, if anything
smudge --clean does causes it to run the git queue for some other
reason.

Apparently that doesn't happen, but will need to keep an eye on it.
2020-06-18 12:56:29 -04:00
Joey Hess
96f6aa39dd
add runsGitAnnexChildProcess calls
This is all the calls to git-annex that seem capable of possibly locking
the same pidlock as their parent. Except possibly for some in the
assistant.
2020-06-17 15:31:03 -04:00
Joey Hess
c4f2c56f5e
checkpresentkey: fix behavior to match documentation
checkpresentkey: When no remote is specified, try all remotes, not only
ones that the location log says contain the key. This is what the
documentation has always said it did.

Still try the logged remotes first, because they are far more likely to
have the key.
2020-06-16 13:54:26 -04:00
Joey Hess
2670890b17
convert to withCreateProcess for async exception safety
This handles all createProcessSuccess callers, and aside from process
pools, the complete conversion of all process running to async exception
safety should be complete now.

Also, was able to remove from Utility.Process the old API that I now
know was not a good idea. And proof it was bad: The code size went *down*,
despite there being a fair bit of boilerplate for some future API to
reduce.
2020-06-04 15:45:52 -04:00
Joey Hess
92f775eba0
convert to withCreateProcess for async exception safety
Not yet 100% done, so far I've grepped for waitForProcess and converted
everything that uses that to start the process with withCreateProcess.

Except for some things like P2P.IO and Assistant.TransferrerPool,
and Utility.CoProcess, that manage a pool of processes. See #2
in https://git-annex.branchable.com/todo/more_extensive_retries_to_mask_transient_failures/#comment-209f8a8c38e63fb3a704e1282cb269c7
for how those will need to be dealt with.

checkSuccessProcess, ignoreFailureProcess, and forceSuccessProcess calls waitForProcess, so
callers of them will also need to be dealt with, and have not been yet.
2020-06-03 15:48:09 -04:00
Joey Hess
89b2542d3c
annex.skipunknown with transition plan
Added annex.skipunknown git config, that can be set to false to change the
behavior of commands like `git annex get foo*`, to not skip over files/dirs
that are not checked into git and are explicitly listed in the command
line.

Significant complexity was needed to handle git-annex add, which uses some
git ls-files calls, but needs to not use --error-unmatch because of course
the files are not known to git.

annex.skipunknown is planned to change to default to false in a
git-annex release in early 2022. There's a todo for that.
2020-05-28 15:55:17 -04:00
Joey Hess
484a74f073
auto-init autoenable=yes
Try to enable special remotes configured with autoenable=yes when git-annex
auto-initialization happens in a new clone of an existing repo. Previously,
git-annex init had to be explicitly run to enable them. That was a bit of a
wart of a special case for users to need to keep in mind.

Special remotes cannot display anything when autoenabled this way, to avoid
interfering with the output of git-annex query commands.

Any error messages will be hidden, and if it fails, nothing is displayed.
The user will realize the remote isn't enable when they try to use it,
and can run git-annex init manually then to try the autoenable again and
see what failed.

That seems like a reasonable approach, and it's less complicated than
communicating something across a pipe in order to display it as a side
message. Other reason not to do that is that, if the first command the
user runs is one like git-annex find that has machine readable output,
any message about autoenable failing would need to not be displayed anyway.
So better to not display a failure message ever, for consistency.

(Had to split out Remote.List.Util to avoid an import cycle.)
2020-05-27 12:40:35 -04:00
Joey Hess
3824645368
change to new waitForAllRunningCommandActions
waitForAllRunningCommandActions is a subset of finishCommandActions
and more appropriate for what is being done here: Just a concurrency
barrier.
2020-05-26 14:00:51 -04:00
Joey Hess
864ba4ecaa
disable buggy concurrency in Command.Export
Fix a crash or potentially not all files being exported when sync -J
--content is used with an export remote.

Crash as described in fixed bug report.

waitForAllRunningCommandActions inserted in several points where all the
commandActions started before need to have finished before moving on to
the next stage of the export. A race across those points could have
maybe resulted in not all files being exported, or a wrong tree being
export.

For example, changeExport starting up an action like
a rename of A to B. Then, with that action still running, fillExport
uploading a new A, *before* the rename occurred. That race seems
unlikely to have happened. There are some other ones that this also
fixes.
2020-05-26 13:54:08 -04:00
Joey Hess
e04a931439
improve transfer stages for some commands
move --to, copy --to, mirror --to: When concurrency is enabled, run cleanup
actions in separate job pool from uploads.

transferStages was confusingly named, it's only useful when doing downloads
as then the verify actions can be run concurrently with other downloads.
For commands that upload, there will be more concurrency from running
cleanup actions in a separate job pool.

As for sync, I left it using downloadStages although that's not optimal
for the part of a sync that uploads. Perhaps it should use the union of
both?
2020-05-26 11:55:50 -04:00
Joey Hess
0d82a88742
drop: use commandStages, not transferStages
I cannot find any rationalle for why this was changed before.
drop certianly does not do any transfers, so commandStages will perform
better.
2020-05-26 11:47:54 -04:00
Joey Hess
0bcecb67f5
export: Let concurrent transfers be done with -J or annex.jobs
Tested working, although I did find this bug in my testing, which also
afflicts sync -J to an export remote.
2020-05-26 11:44:07 -04:00
Joey Hess
f7fe71602c
import: Added --json-progress
Already supported --json, but not that.

Also checked all other commands that only support --json, and the only
other one that does transfers is fsck (--from), which it did not seem worth
adding --json-progress to really.
2020-05-26 11:27:47 -04:00
Joey Hess
5b8524e1e6
addurl: Make --preserve-filename also apply when eg a torrent contains multiple files
Forgot to remove sanitizeFilePath after adding sanitizeOrPreserveFilePath
here.
2020-05-26 10:45:57 -04:00
Joey Hess
fc9833f68d
export: Added options for json output
Just worked, no need to do anything except add the options.
2020-05-26 10:31:10 -04:00
Joey Hess
d7c7245438
whereis: Added --format option.
One way this can be used is to remove all urls for some website that went
away:

git-annex whereis --format '${file} ${url}\0' | \
	grep -z whatever.com | git-annex rmurl --batch -z

Combining ${url} and ${uuid} is a bit of a combinatorial explosion.
It didn't seem worth only outputting a uuid alongside an url belonging
to it, so each uuid is output beside each url.
2020-05-19 16:20:56 -04:00
Joey Hess
6361074174
convert renameExport to throw exception
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.

A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?

However, it did let me clean up several places in the code.

This commit was sponsored by Ethan Aubin.
2020-05-15 15:08:09 -04:00
Joey Hess
037440ef36
convert removeExportDirectory to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 14:43:18 -04:00
Joey Hess
cdbfaae706
change removeExport to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Graham Spencer on Patreon.
2020-05-15 14:15:14 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
4814b444dd
make storeExport throw exceptions 2020-05-15 12:20:02 -04:00
Joey Hess
dc7dc1e179
refactor 2020-05-14 14:21:58 -04:00
Joey Hess
4be94c67c7
make removeKey throw exceptions 2020-05-14 14:11:05 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00