Commit graph

3859 commits

Author SHA1 Message Date
Joey Hess
0605cc1bfb
idea 2022-03-29 18:09:41 -04:00
Joey Hess
bc6d64ec8a
comment 2022-03-29 14:53:07 -04:00
yarikoptic
a03aee0033 Added a comment 2022-03-16 20:32:57 +00:00
Joey Hess
025c18128b
test: Added --jobs option
Default to the number of CPU cores, which seems about optimal
on my laptop. Using one more saves me 2 seconds actually.

Better packing of workers improves speed significantly.

In 2 tests runs, I saw segfaulting workers despite my attempt
to work around that issue. So detect when a worker does, and re-run it.

Removed installSignalHandlers again, because I was seeing an
error "lost signal due to full pipe", which I guess was somehow caused
by using it.

Sponsored-by: Dartmouth College's Datalad project
2022-03-16 14:42:07 -04:00
Joey Hess
8d14ce8f38
parallelize git-annex test for 25% speedup
Note the very weird workaround for what appears to be some kind of tasty
bug, which causes a segfault. This is not new to this modification,
I was seeing a segfault before at least intermittently when limiting
git-annex test -p to only run a single test group.

Also, the path from one test repo to a remote test repo used to be
"../../foo", which somehow broke when moving the test repos from .t to
.t/N. I don't actually quite understand how it used to work, but
"../foo" seems correct and works in the new situation.

Test output from the concurrent processes is not yet serialized.
Should be easy to do using concurrent-output.

More test groups will probably make the speedup larger. It would
probably be best to have a larger number of test groups and divvy them
amoung subprocesses numbered based on the number of CPU cores, perhaps
times 2 or 3.

Sponsored-by: Dartmouth College's Datalad project
2022-03-14 15:24:37 -04:00
ErrGe
7580d787cc Added a comment 2022-03-11 02:26:56 +00:00
ErrGe
4d593d4461 Added a comment 2022-03-11 02:23:20 +00:00
Joey Hess
afeb9b728e
comment 2022-03-10 16:10:46 -04:00
Joey Hess
feaf16141e
comment 2022-03-10 13:22:32 -04:00
Atemu
0e39304905 Added a comment 2022-03-09 12:07:45 +00:00
ErrGe
f706a68c43 2022-03-09 01:08:23 +00:00
Joey Hess
c7f7be0236
comment 2022-03-08 15:50:31 -04:00
Joey Hess
82f1d82286
comment 2022-03-08 15:46:29 -04:00
Joey Hess
4cab0c1b05
comment 2022-03-08 14:51:01 -04:00
Joey Hess
14add55c2b
reopen 2022-03-08 14:08:27 -04:00
Joey Hess
2c5bf952cf
comment 2022-03-08 13:47:06 -04:00
Atemu
c7e3414d8d Added a comment 2022-03-08 13:21:03 +00:00
yarikoptic
875a04e1e2 Added a comment: still slow 2022-03-08 00:28:00 +00:00
Joey Hess
1cbbd23109
comment 2022-03-07 15:25:32 -04:00
tomdhunt
0127e5f4e4 Added a comment 2022-03-07 18:27:38 +00:00
Joey Hess
da698437b6
close 2022-03-07 14:12:11 -04:00
Joey Hess
dab9078ab7
close 2022-03-07 13:23:19 -04:00
mih
fa5a001ef6 Added a comment: Thanks! 2022-03-04 16:23:16 +00:00
Joey Hess
5e385cb637
add 2022-03-02 10:44:49 -04:00
Joey Hess
2fc46e1871
git-annex test from standalone speedup
Avoid git-annex test being very slow when run from within the standalone
linux tarball or OSX app.

It may not really be necessary to add to PATH the directory where the
git-annex binary resides, but it can't hurt. Most places where the test
suite or git-annex run git-annex, they use programPath, so won't need
a modified PATH. But I'm not sure if that's always the case.

Sponsored-by: Dartmouth College's Datalad project
2022-03-01 16:08:55 -04:00
Joey Hess
ecf7c29107
update comment 2022-03-01 15:57:13 -04:00
Joey Hess
316a049e96
comment 2022-03-01 15:50:44 -04:00
yarikoptic
ca5834a18c Added a comment: question about backend 2022-02-28 22:42:34 +00:00
yarikoptic
9e6e53af71 initial report on a very slow git annex test on discovery 2022-02-28 20:50:06 +00:00
Joey Hess
525218ef86
commet 2022-02-28 15:42:33 -04:00
yarikoptic
a33b40876d Added a comment 2022-02-28 18:48:51 +00:00
Joey Hess
7a4a1322f5
update 2022-02-28 13:37:05 -04:00
Joey Hess
20875bd5e8
open related todo 2022-02-28 13:26:43 -04:00
Joey Hess
7de469edd0
comment 2022-02-25 13:32:06 -04:00
yarikoptic
7e9ebea910 Added a comment 2022-02-21 21:53:03 +00:00
Joey Hess
5a8b15f6db
comment 2022-02-21 15:46:12 -04:00
Joey Hess
ce1b3a9699
info: Allow using matching options in more situations
File matching options like --include will be rejected in situations where
there is no filename to match against. (Or where there is a filename but
it's not relative to the cwd, or otherwise seemed too bothersome to match
against.)

The addition of listKeys' was necessary to avoid using more memory in the
common case of "git-annex info". Adding a filterM would have caused the
list to buffer in memory and not stream. This is an ugly hack, but listKeys
had previously run Annex operations inside unafeInterleaveIO (for direct
mode). And matching against a matcher should hopefully not change any Annex
state.

This does allow for eg `git-annex info somefile --include=*.ext`
although why someone would want to do that I don't really know. But it
seems to make sense to allow it.
But, consider: `git-annex info ./somefile --include=somefile`
This does not match, so will not display info about somefile.
If the user really wants to, they can `--include=./somefile`.

Using matching options like --copies or --in=remote seems likely to be
slower than git-annex find with those options, because unlike such
commands, info does not have optimised streaming through the matcher.

Note that `git-annex info remote` is not the same as
`git-annex info --in remote`. The former shows info about all files in
the remote. The latter shows local keys that are also in that remote.
The output should make that clear, but this still seems like a point
where users could get confused.

Sponsored-by: Jochen Bartl on Patreon
2022-02-21 14:46:07 -04:00
Joey Hess
d36de3edf9
comment 2022-02-21 12:49:36 -04:00
Atemu
6ca9f5e18a 2022-02-20 18:03:35 +00:00
yarikoptic
b481ec2738 Added a comment 2022-02-18 21:56:19 +00:00
yarikoptic
9d2e6a60f0 Added a comment 2022-02-18 20:18:04 +00:00
Joey Hess
faf84aa5c2
Avoid git status taking a long time after git-annex unlock of many files.
Implemented by making Git.Queue have a FlushAction, which can accumulate
along with another action on files, and runs only once the other action has
run.

This lets git-annex unlock queue up git update-index actions, without
conflicting with the restagePointerFiles FlushActions.

In a repository with filter-process enabled, git-annex unlock will
often not take any more time than before, though it may when the files are
large. Either way, it should always slow down less than git-annex status
speeds up.

When filter-process is not enabled, git-annex unlock will slow down as much
as git status speeds up.

Sponsored-by: Jochen Bartl on Patreon
2022-02-18 15:06:40 -04:00
Joey Hess
c68f52c6a2
restage pointer file after unlock
This avoids a later git status or similar taking a long time to run
as it runs git-annex smudge once per file. While v9 repositories do
avoid that taking long when the files are small, large files can still
make git status take a very long time.

This does make unlock slower, because now git-annex smudge is being run
once per file unlocked. However, the next commit should speed that up in
many cases.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-02-18 14:55:52 -04:00
Joey Hess
07215cfeb5
complete annex.skipunknown transition
annex.skipunknown now defaults to false, so commands like `git annex get foo*`
will not silently skip over files/dirs that are not checked into git.

Sponsored-by: Brock Spratlen on Patreon
2022-02-18 13:18:05 -04:00
Joey Hess
0edf01d7d4
registerurl,unregisterurl: rework output and support --json
* registerurl, unregisterurl: Improved output when reading from stdin
  to be more like other batch commands.
* registerurl, unregisterurl: Added --json and --json-error-messages options.

Note that this did change the --batch output in a way that could possibly
break something that expected the old output to never change. I think it's
acceptable to break that because there has never been a guarantee of
unchanging output format except with --batch for most commands. The old
output was just really weird too!

One possible wart is that "git-annex registerurl" with no options now
seems to just hang, since it's waiting for stdin input. Before, it said
"registerurl (stdin)" which was clearer about what's happenening. But this
is a deprecated mode anyway, --batch makes clear what's happening. If
anything, this problem would be a reason to eventually remove the support
for reading from stdin w/o --batch.

Sponsored-by: Dartmouth College's Datalad project
2022-02-14 13:29:20 -04:00
Joey Hess
291dc0d1a9
comment 2022-02-14 12:42:37 -04:00
yarikoptic
c908046235 initial todo for --json for registerurl 2022-02-09 21:39:46 +00:00
Joey Hess
ad2f0446a0
comment 2022-02-08 13:24:28 -04:00
Atemu
d20550ac69 2022-02-08 10:47:21 +00:00
Joey Hess
47084b8a1d
enable filter.annex.process in v9
This has tradeoffs, but is generally a win, and users who it causes git add to
slow down unacceptably for can just disable it again.

It needed to happen in an upgrade, since there are git-annex versions
that do not support it, and using such an old version with a v8
repository with filter.annex.process set will cause bad behavior.
By enabling it in v9, it's guaranteed that any git-annex version that
can use the repository does support it. Although, this is not a perfect
protection against problems, since an old git-annex version, if it's
used with a v9 repository, will cause git add to try to run
git-annex filter-process, which will fail. But at least, the user is
unlikely to have an old git-annex in path if they are using a v9
repository, since it won't work in that repository.

Sponsored-by: Dartmouth College's Datalad project
2022-01-21 13:11:18 -04:00
Joey Hess
d427afb347
v9-locking branch still wip 2022-01-11 17:04:25 -04:00
Joey Hess
029820c832
v9-locking branch 2022-01-11 14:49:21 -04:00
Joey Hess
8ae88923b8
moreinfo 2022-01-11 12:24:40 -04:00
Joey Hess
c36895e9cb
comment 2022-01-05 13:09:18 -04:00
yarikoptic
ec9a4945e4 Added a comment 2022-01-03 20:03:38 +00:00
yarikoptic
4f31a27e6a initial report on slow drop 2022-01-03 19:59:08 +00:00
Joey Hess
b060d99fe0
comment 2021-12-08 13:18:13 -04:00
yarikoptic
2719170575 initial todo on more flexible credentials management mechanism 2021-12-07 18:34:58 +00:00
Joey Hess
b7976e08f0
comment 2021-12-01 13:03:05 -04:00
adina.wagner@2a4cac6443aada2bd2a329b8a33f4a7b87cc8eff
a5b635af20 Added a comment: A few Windows benchmarks 2021-11-29 22:17:39 +00:00
Joey Hess
05d79b26d8
clarify 2021-11-29 14:00:32 -04:00
Joey Hess
0f9e5ada82
idea 2021-11-21 11:19:47 -04:00
Joey Hess
9121154a75
new todo 2021-11-09 15:52:17 -04:00
Joey Hess
a0758bdd10
dynamically disable filter-process in restagePointerFile when it would be slower
Based on my earlier benchmark, I have a rough cost model for how
expensive it is for git-annex smudge to be run on a file, vs
how expensive it is for a gigabyte of a file's content to be read and
piped through to filter-process.

So, using that cost model, it can decide if using filter-process will
be more or less expensive than running the smudge filter on the files to
be restaged.

It turned out to be *really* annoying to temporarily disable
filter-process. I did find a way, but urk, this is horrible. Notice
that, if it's interrupted with it disabled, it will remain disabled
until the next time restagePointerFile runs. Which could be some time
later. If the user runs `git add` or `git checkout` on a lot of small
files before that, they will see slower than expected performance.

(This commit also deletes where I wrote down the benchmark results
earlier.)

Sponsored-by: Noam Kremen on Patreon
2021-11-08 16:20:34 -04:00
Joey Hess
054c803f8d
benchmarking of filter-process vs smudge/clean
No firm conclusions yet, but it's doing better than I would have
expected.

Sponsored-by: Graham Spencer on Patreon
2021-11-05 13:37:53 -04:00
Joey Hess
099e8fe061
close 2021-11-05 12:46:56 -04:00
Joey Hess
b25a138e22
update for git-annex filter-process 2021-11-04 15:15:26 -04:00
Joey Hess
8dd91be867
mention filter-process as v9 material 2021-11-04 15:05:24 -04:00
Joey Hess
bf1408f7bf
long-running-smudge branch started 2021-11-03 15:44:05 -04:00
Joey Hess
38ba8cca1b
investigation results
Also, close dup bug.
2021-11-02 15:06:20 -04:00
Joey Hess
669037862a
avoid redundant freezeContent call
This opens the potential for the object file to be in place but
git-annex is interrupted before it can freeze it. git-annex fsck already
fixes that situation, which can also occur when lockContentForRemoval
thaws content.

Also improve comment to not be Windows-specific.
2021-10-27 14:18:10 -04:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
10582d1fe3 Updated patch 2021-10-26 19:55:58 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
d607446043 Added a comment 2021-10-26 19:54:53 +00:00
Joey Hess
3aaf6ade30
review 2021-10-26 14:08:56 -04:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
63313e0b40 2021-10-24 19:12:23 +00:00
jkniiv
4471aae22f still think we should highlight this as new info 2021-10-24 15:10:13 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
42eaa6f8c9 Moved WSL1 guide to a tips page 2021-10-22 22:14:40 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
29dfc0a9bf Added a comment 2021-10-22 15:58:11 +00:00
Joey Hess
53f315db5a
comment 2021-10-20 14:12:08 -04:00
Joey Hess
67a67d740b
comment 2021-10-19 12:43:08 -04:00
Joey Hess
81ec8508df
Merge branch 'master' of ssh://git-annex.branchable.com 2021-10-19 12:03:51 -04:00
Joey Hess
3de3f40c11
comment 2021-10-19 10:26:39 -04:00
Atemu
6824a56d09 Added a comment 2021-10-19 13:05:50 +00:00
Atemu
8626f35898 Added a comment 2021-10-19 12:26:33 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
8a20c01775 Added a comment 2021-10-16 15:46:32 +00:00
Lukey
7c40c31210 Added a comment 2021-10-16 15:27:35 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
7ec426734b Note about case sensitivity dirs 2021-10-16 15:25:00 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
8ef65adaca Update WSL1 instructions 2021-10-16 15:16:23 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
4eefcf2c75 2021-10-16 14:45:43 +00:00
Joey Hess
f42679364b
comment 2021-10-15 13:14:19 -04:00
Joey Hess
647fc90b12
comment 2021-10-15 12:43:23 -04:00
Joey Hess
20c375d912
followup 2021-10-14 12:08:54 -04:00
Joey Hess
97dfaabbf0
remove 3 comments that turned out to be about an unrelated problem which got its own bug report 2021-10-14 12:05:07 -04:00
Joey Hess
b117f9338d
open todo 2021-10-12 13:42:08 -04:00
Joey Hess
17a0fa3dbc
negotiate P2P protocol version for tor remotes
This negotiation is not supported by versions of git-annex older
than 6.20180312. Well, maybe really 6.20180227 or so, but using that in
the changelog simplifies things since it was the version for the other
changes as well.

See commit c81768d425 for the back story.

As well as allowing for future protocol improvements, this will result
in negoatiating protocol version 1, which is an improvement over default
version 0.

In fact, it looks like no supported version of git-annex will use
protocol version 0, since version 1 was introduced in 6.20180227.
Still, removing the code for version 0 seems unncessary.
See commit 31e1adc005.

Sponsored-by: Brett Eisenberg on Patreon.
2021-10-11 15:58:51 -04:00
Joey Hess
7bdc7350a5
remove git-annex-shell compat code
* Removed support for accessing git remotes that use versions of
  git-annex older than 6.20180312.
* git-annex-shell: Removed several commands that were only needed to
  support git-annex versions older than 6.20180312.
  (lockcontent, recvkey, sendkey, transferinfo, commit)

The P2P protocol was added in that version, and used ever since, so
this code was only needed for interop with older versions.

"git-annex-shell commit" is used by newer git-annex versions, though
unnecessarily so, because the p2pstdio command makes a single commit at
shutdown. Luckily, it was run with stderr and stdout sent to /dev/null,
and non-zero exit status or other exceptions are caught and ignored. So,
that was able to be removed from git-annex-shell too.

git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by
gcrypt special remotes accessed over ssh, so those had to be kept.
It would probably be possible to convert that to using the P2P protocol,
but it would be another multi-year transition.

Some git-annex-shell fields were able to be removed. I hoped to remove
all of them, and the very concept of them, but unfortunately autoinit
is used by git-annex sync, and gcrypt uses remoteuuid.

The main win here is really in Remote.Git, removing piles of hairy fallback
code.

Sponsored-by: Luke Shumaker
2021-10-11 15:36:51 -04:00
Joey Hess
0c12d01233
update 2021-10-11 10:06:36 -04:00
Joey Hess
1b79f2404d
Merge branch 'master' of ssh://git-annex.branchable.com 2021-10-08 13:27:23 -04:00
Joey Hess
7ae7820ac0
todo 2021-10-08 13:26:40 -04:00
jkniiv
921e736953 Added a comment 2021-10-07 18:27:19 +00:00