Commit graph

4079 commits

Author SHA1 Message Date
Joey Hess
74cf62a51e
comment 2022-05-09 11:00:13 -04:00
yarikoptic
0eeeea318a question/todo about migrating .git/annex/objects 2022-05-06 15:32:56 +00:00
Joey Hess
d1cce869ed
implement dataUnits finally
Added support for "megabit" and related bandwidth units in
annex.stalldetection and everywhere else that git-annex parses data units.

Note that the short form is "Mbit" not "Mb" because that differs from "MB"
only in case, and git-annex parses units case-insensitively. It would be
horrible if two different versions of git-annex parsed the same value
differently, so I don't think "Mb" can be supported.

See comment for bonus sad story from my childhood.

Sponsored-by: Nicholas Golder-Manning
2022-05-05 15:25:11 -04:00
Joey Hess
3d8af64527
close 2022-05-05 12:13:09 -04:00
Joey Hess
1c00731f9e
comment 2022-05-02 14:24:32 -04:00
kdm9
58fcedb598 2022-04-29 07:28:46 +00:00
Ilya_Shlyakhter
97a57a4c2e Added a comment: transitive transfers 2022-04-22 16:59:59 +00:00
Joey Hess
8bb86ee97e
comment 2022-04-19 14:04:14 -04:00
Joey Hess
fa89a52d36
remove moreinfo 2022-04-19 13:17:11 -04:00
Joey Hess
021fafc086
todo 2022-04-13 11:08:33 -04:00
Joey Hess
8f0c7eac09
todo 2022-04-12 23:26:17 -04:00
Ilya_Shlyakhter
6bd23e1725 Added a comment: maxextensionlength 2022-04-08 19:13:38 +00:00
Joey Hess
469801c886
how to implement this? 2022-04-05 14:52:09 -04:00
Joey Hess
77de20c925
todo triage
Tagging todos that seem to have a plan ready as confirmed.

Also closed some old ones for various reasons. Including several that
turn out to be addressed by newer features.

Also opened a new todo about git-annex-config needing a criteria to add
new configs to it.
2022-04-04 15:22:49 -04:00
Joey Hess
f51007d716
comment 2022-04-04 14:27:48 -04:00
Joey Hess
91b2285010
comment 2022-04-04 12:18:05 -04:00
Atemu
53d0f4b0f0 2022-04-01 10:30:48 +00:00
Atemu
9c6dc9db0c Added a comment 2022-03-31 10:39:28 +00:00
Joey Hess
04f13c7d3d
more thoughts
This idea seems fleshed out enough to implement now.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-03-30 12:53:42 -04:00
Joey Hess
afced1a8ba
idea 2022-03-30 12:28:25 -04:00
Joey Hess
0605cc1bfb
idea 2022-03-29 18:09:41 -04:00
Joey Hess
bc6d64ec8a
comment 2022-03-29 14:53:07 -04:00
yarikoptic
a03aee0033 Added a comment 2022-03-16 20:32:57 +00:00
Joey Hess
025c18128b
test: Added --jobs option
Default to the number of CPU cores, which seems about optimal
on my laptop. Using one more saves me 2 seconds actually.

Better packing of workers improves speed significantly.

In 2 tests runs, I saw segfaulting workers despite my attempt
to work around that issue. So detect when a worker does, and re-run it.

Removed installSignalHandlers again, because I was seeing an
error "lost signal due to full pipe", which I guess was somehow caused
by using it.

Sponsored-by: Dartmouth College's Datalad project
2022-03-16 14:42:07 -04:00
Joey Hess
8d14ce8f38
parallelize git-annex test for 25% speedup
Note the very weird workaround for what appears to be some kind of tasty
bug, which causes a segfault. This is not new to this modification,
I was seeing a segfault before at least intermittently when limiting
git-annex test -p to only run a single test group.

Also, the path from one test repo to a remote test repo used to be
"../../foo", which somehow broke when moving the test repos from .t to
.t/N. I don't actually quite understand how it used to work, but
"../foo" seems correct and works in the new situation.

Test output from the concurrent processes is not yet serialized.
Should be easy to do using concurrent-output.

More test groups will probably make the speedup larger. It would
probably be best to have a larger number of test groups and divvy them
amoung subprocesses numbered based on the number of CPU cores, perhaps
times 2 or 3.

Sponsored-by: Dartmouth College's Datalad project
2022-03-14 15:24:37 -04:00
ErrGe
7580d787cc Added a comment 2022-03-11 02:26:56 +00:00
ErrGe
4d593d4461 Added a comment 2022-03-11 02:23:20 +00:00
Joey Hess
afeb9b728e
comment 2022-03-10 16:10:46 -04:00
Joey Hess
feaf16141e
comment 2022-03-10 13:22:32 -04:00
Atemu
0e39304905 Added a comment 2022-03-09 12:07:45 +00:00
ErrGe
f706a68c43 2022-03-09 01:08:23 +00:00
Joey Hess
c7f7be0236
comment 2022-03-08 15:50:31 -04:00
Joey Hess
82f1d82286
comment 2022-03-08 15:46:29 -04:00
Joey Hess
4cab0c1b05
comment 2022-03-08 14:51:01 -04:00
Joey Hess
14add55c2b
reopen 2022-03-08 14:08:27 -04:00
Joey Hess
2c5bf952cf
comment 2022-03-08 13:47:06 -04:00
Atemu
c7e3414d8d Added a comment 2022-03-08 13:21:03 +00:00
yarikoptic
875a04e1e2 Added a comment: still slow 2022-03-08 00:28:00 +00:00
Joey Hess
1cbbd23109
comment 2022-03-07 15:25:32 -04:00
tomdhunt
0127e5f4e4 Added a comment 2022-03-07 18:27:38 +00:00
Joey Hess
da698437b6
close 2022-03-07 14:12:11 -04:00
Joey Hess
dab9078ab7
close 2022-03-07 13:23:19 -04:00
mih
fa5a001ef6 Added a comment: Thanks! 2022-03-04 16:23:16 +00:00
Joey Hess
5e385cb637
add 2022-03-02 10:44:49 -04:00
Joey Hess
2fc46e1871
git-annex test from standalone speedup
Avoid git-annex test being very slow when run from within the standalone
linux tarball or OSX app.

It may not really be necessary to add to PATH the directory where the
git-annex binary resides, but it can't hurt. Most places where the test
suite or git-annex run git-annex, they use programPath, so won't need
a modified PATH. But I'm not sure if that's always the case.

Sponsored-by: Dartmouth College's Datalad project
2022-03-01 16:08:55 -04:00
Joey Hess
ecf7c29107
update comment 2022-03-01 15:57:13 -04:00
Joey Hess
316a049e96
comment 2022-03-01 15:50:44 -04:00
yarikoptic
ca5834a18c Added a comment: question about backend 2022-02-28 22:42:34 +00:00
yarikoptic
9e6e53af71 initial report on a very slow git annex test on discovery 2022-02-28 20:50:06 +00:00
Joey Hess
525218ef86
commet 2022-02-28 15:42:33 -04:00
yarikoptic
a33b40876d Added a comment 2022-02-28 18:48:51 +00:00
Joey Hess
7a4a1322f5
update 2022-02-28 13:37:05 -04:00
Joey Hess
20875bd5e8
open related todo 2022-02-28 13:26:43 -04:00
Joey Hess
7de469edd0
comment 2022-02-25 13:32:06 -04:00
yarikoptic
7e9ebea910 Added a comment 2022-02-21 21:53:03 +00:00
Joey Hess
5a8b15f6db
comment 2022-02-21 15:46:12 -04:00
Joey Hess
ce1b3a9699
info: Allow using matching options in more situations
File matching options like --include will be rejected in situations where
there is no filename to match against. (Or where there is a filename but
it's not relative to the cwd, or otherwise seemed too bothersome to match
against.)

The addition of listKeys' was necessary to avoid using more memory in the
common case of "git-annex info". Adding a filterM would have caused the
list to buffer in memory and not stream. This is an ugly hack, but listKeys
had previously run Annex operations inside unafeInterleaveIO (for direct
mode). And matching against a matcher should hopefully not change any Annex
state.

This does allow for eg `git-annex info somefile --include=*.ext`
although why someone would want to do that I don't really know. But it
seems to make sense to allow it.
But, consider: `git-annex info ./somefile --include=somefile`
This does not match, so will not display info about somefile.
If the user really wants to, they can `--include=./somefile`.

Using matching options like --copies or --in=remote seems likely to be
slower than git-annex find with those options, because unlike such
commands, info does not have optimised streaming through the matcher.

Note that `git-annex info remote` is not the same as
`git-annex info --in remote`. The former shows info about all files in
the remote. The latter shows local keys that are also in that remote.
The output should make that clear, but this still seems like a point
where users could get confused.

Sponsored-by: Jochen Bartl on Patreon
2022-02-21 14:46:07 -04:00
Joey Hess
d36de3edf9
comment 2022-02-21 12:49:36 -04:00
Atemu
6ca9f5e18a 2022-02-20 18:03:35 +00:00
yarikoptic
b481ec2738 Added a comment 2022-02-18 21:56:19 +00:00
yarikoptic
9d2e6a60f0 Added a comment 2022-02-18 20:18:04 +00:00
Joey Hess
faf84aa5c2
Avoid git status taking a long time after git-annex unlock of many files.
Implemented by making Git.Queue have a FlushAction, which can accumulate
along with another action on files, and runs only once the other action has
run.

This lets git-annex unlock queue up git update-index actions, without
conflicting with the restagePointerFiles FlushActions.

In a repository with filter-process enabled, git-annex unlock will
often not take any more time than before, though it may when the files are
large. Either way, it should always slow down less than git-annex status
speeds up.

When filter-process is not enabled, git-annex unlock will slow down as much
as git status speeds up.

Sponsored-by: Jochen Bartl on Patreon
2022-02-18 15:06:40 -04:00
Joey Hess
c68f52c6a2
restage pointer file after unlock
This avoids a later git status or similar taking a long time to run
as it runs git-annex smudge once per file. While v9 repositories do
avoid that taking long when the files are small, large files can still
make git status take a very long time.

This does make unlock slower, because now git-annex smudge is being run
once per file unlocked. However, the next commit should speed that up in
many cases.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2022-02-18 14:55:52 -04:00
Joey Hess
07215cfeb5
complete annex.skipunknown transition
annex.skipunknown now defaults to false, so commands like `git annex get foo*`
will not silently skip over files/dirs that are not checked into git.

Sponsored-by: Brock Spratlen on Patreon
2022-02-18 13:18:05 -04:00
Joey Hess
0edf01d7d4
registerurl,unregisterurl: rework output and support --json
* registerurl, unregisterurl: Improved output when reading from stdin
  to be more like other batch commands.
* registerurl, unregisterurl: Added --json and --json-error-messages options.

Note that this did change the --batch output in a way that could possibly
break something that expected the old output to never change. I think it's
acceptable to break that because there has never been a guarantee of
unchanging output format except with --batch for most commands. The old
output was just really weird too!

One possible wart is that "git-annex registerurl" with no options now
seems to just hang, since it's waiting for stdin input. Before, it said
"registerurl (stdin)" which was clearer about what's happenening. But this
is a deprecated mode anyway, --batch makes clear what's happening. If
anything, this problem would be a reason to eventually remove the support
for reading from stdin w/o --batch.

Sponsored-by: Dartmouth College's Datalad project
2022-02-14 13:29:20 -04:00
Joey Hess
291dc0d1a9
comment 2022-02-14 12:42:37 -04:00
yarikoptic
c908046235 initial todo for --json for registerurl 2022-02-09 21:39:46 +00:00
Joey Hess
ad2f0446a0
comment 2022-02-08 13:24:28 -04:00
Atemu
d20550ac69 2022-02-08 10:47:21 +00:00
Joey Hess
47084b8a1d
enable filter.annex.process in v9
This has tradeoffs, but is generally a win, and users who it causes git add to
slow down unacceptably for can just disable it again.

It needed to happen in an upgrade, since there are git-annex versions
that do not support it, and using such an old version with a v8
repository with filter.annex.process set will cause bad behavior.
By enabling it in v9, it's guaranteed that any git-annex version that
can use the repository does support it. Although, this is not a perfect
protection against problems, since an old git-annex version, if it's
used with a v9 repository, will cause git add to try to run
git-annex filter-process, which will fail. But at least, the user is
unlikely to have an old git-annex in path if they are using a v9
repository, since it won't work in that repository.

Sponsored-by: Dartmouth College's Datalad project
2022-01-21 13:11:18 -04:00
Joey Hess
d427afb347
v9-locking branch still wip 2022-01-11 17:04:25 -04:00
Joey Hess
029820c832
v9-locking branch 2022-01-11 14:49:21 -04:00
Joey Hess
8ae88923b8
moreinfo 2022-01-11 12:24:40 -04:00
Joey Hess
c36895e9cb
comment 2022-01-05 13:09:18 -04:00
yarikoptic
ec9a4945e4 Added a comment 2022-01-03 20:03:38 +00:00
yarikoptic
4f31a27e6a initial report on slow drop 2022-01-03 19:59:08 +00:00
Joey Hess
b060d99fe0
comment 2021-12-08 13:18:13 -04:00
yarikoptic
2719170575 initial todo on more flexible credentials management mechanism 2021-12-07 18:34:58 +00:00
Joey Hess
b7976e08f0
comment 2021-12-01 13:03:05 -04:00
adina.wagner@2a4cac6443aada2bd2a329b8a33f4a7b87cc8eff
a5b635af20 Added a comment: A few Windows benchmarks 2021-11-29 22:17:39 +00:00
Joey Hess
05d79b26d8
clarify 2021-11-29 14:00:32 -04:00
Joey Hess
0f9e5ada82
idea 2021-11-21 11:19:47 -04:00
Joey Hess
9121154a75
new todo 2021-11-09 15:52:17 -04:00
Joey Hess
a0758bdd10
dynamically disable filter-process in restagePointerFile when it would be slower
Based on my earlier benchmark, I have a rough cost model for how
expensive it is for git-annex smudge to be run on a file, vs
how expensive it is for a gigabyte of a file's content to be read and
piped through to filter-process.

So, using that cost model, it can decide if using filter-process will
be more or less expensive than running the smudge filter on the files to
be restaged.

It turned out to be *really* annoying to temporarily disable
filter-process. I did find a way, but urk, this is horrible. Notice
that, if it's interrupted with it disabled, it will remain disabled
until the next time restagePointerFile runs. Which could be some time
later. If the user runs `git add` or `git checkout` on a lot of small
files before that, they will see slower than expected performance.

(This commit also deletes where I wrote down the benchmark results
earlier.)

Sponsored-by: Noam Kremen on Patreon
2021-11-08 16:20:34 -04:00
Joey Hess
054c803f8d
benchmarking of filter-process vs smudge/clean
No firm conclusions yet, but it's doing better than I would have
expected.

Sponsored-by: Graham Spencer on Patreon
2021-11-05 13:37:53 -04:00
Joey Hess
099e8fe061
close 2021-11-05 12:46:56 -04:00
Joey Hess
b25a138e22
update for git-annex filter-process 2021-11-04 15:15:26 -04:00
Joey Hess
8dd91be867
mention filter-process as v9 material 2021-11-04 15:05:24 -04:00
Joey Hess
bf1408f7bf
long-running-smudge branch started 2021-11-03 15:44:05 -04:00
Joey Hess
38ba8cca1b
investigation results
Also, close dup bug.
2021-11-02 15:06:20 -04:00
Joey Hess
669037862a
avoid redundant freezeContent call
This opens the potential for the object file to be in place but
git-annex is interrupted before it can freeze it. git-annex fsck already
fixes that situation, which can also occur when lockContentForRemoval
thaws content.

Also improve comment to not be Windows-specific.
2021-10-27 14:18:10 -04:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
10582d1fe3 Updated patch 2021-10-26 19:55:58 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
d607446043 Added a comment 2021-10-26 19:54:53 +00:00
Joey Hess
3aaf6ade30
review 2021-10-26 14:08:56 -04:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
63313e0b40 2021-10-24 19:12:23 +00:00
jkniiv
4471aae22f still think we should highlight this as new info 2021-10-24 15:10:13 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
42eaa6f8c9 Moved WSL1 guide to a tips page 2021-10-22 22:14:40 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
29dfc0a9bf Added a comment 2021-10-22 15:58:11 +00:00
Joey Hess
53f315db5a
comment 2021-10-20 14:12:08 -04:00
Joey Hess
67a67d740b
comment 2021-10-19 12:43:08 -04:00
Joey Hess
81ec8508df
Merge branch 'master' of ssh://git-annex.branchable.com 2021-10-19 12:03:51 -04:00
Joey Hess
3de3f40c11
comment 2021-10-19 10:26:39 -04:00
Atemu
6824a56d09 Added a comment 2021-10-19 13:05:50 +00:00
Atemu
8626f35898 Added a comment 2021-10-19 12:26:33 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
8a20c01775 Added a comment 2021-10-16 15:46:32 +00:00
Lukey
7c40c31210 Added a comment 2021-10-16 15:27:35 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
7ec426734b Note about case sensitivity dirs 2021-10-16 15:25:00 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
8ef65adaca Update WSL1 instructions 2021-10-16 15:16:23 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
4eefcf2c75 2021-10-16 14:45:43 +00:00
Joey Hess
f42679364b
comment 2021-10-15 13:14:19 -04:00
Joey Hess
647fc90b12
comment 2021-10-15 12:43:23 -04:00
Joey Hess
20c375d912
followup 2021-10-14 12:08:54 -04:00
Joey Hess
97dfaabbf0
remove 3 comments that turned out to be about an unrelated problem which got its own bug report 2021-10-14 12:05:07 -04:00
Joey Hess
b117f9338d
open todo 2021-10-12 13:42:08 -04:00
Joey Hess
17a0fa3dbc
negotiate P2P protocol version for tor remotes
This negotiation is not supported by versions of git-annex older
than 6.20180312. Well, maybe really 6.20180227 or so, but using that in
the changelog simplifies things since it was the version for the other
changes as well.

See commit c81768d425 for the back story.

As well as allowing for future protocol improvements, this will result
in negoatiating protocol version 1, which is an improvement over default
version 0.

In fact, it looks like no supported version of git-annex will use
protocol version 0, since version 1 was introduced in 6.20180227.
Still, removing the code for version 0 seems unncessary.
See commit 31e1adc005.

Sponsored-by: Brett Eisenberg on Patreon.
2021-10-11 15:58:51 -04:00
Joey Hess
7bdc7350a5
remove git-annex-shell compat code
* Removed support for accessing git remotes that use versions of
  git-annex older than 6.20180312.
* git-annex-shell: Removed several commands that were only needed to
  support git-annex versions older than 6.20180312.
  (lockcontent, recvkey, sendkey, transferinfo, commit)

The P2P protocol was added in that version, and used ever since, so
this code was only needed for interop with older versions.

"git-annex-shell commit" is used by newer git-annex versions, though
unnecessarily so, because the p2pstdio command makes a single commit at
shutdown. Luckily, it was run with stderr and stdout sent to /dev/null,
and non-zero exit status or other exceptions are caught and ignored. So,
that was able to be removed from git-annex-shell too.

git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by
gcrypt special remotes accessed over ssh, so those had to be kept.
It would probably be possible to convert that to using the P2P protocol,
but it would be another multi-year transition.

Some git-annex-shell fields were able to be removed. I hoped to remove
all of them, and the very concept of them, but unfortunately autoinit
is used by git-annex sync, and gcrypt uses remoteuuid.

The main win here is really in Remote.Git, removing piles of hairy fallback
code.

Sponsored-by: Luke Shumaker
2021-10-11 15:36:51 -04:00
Joey Hess
0c12d01233
update 2021-10-11 10:06:36 -04:00
Joey Hess
1b79f2404d
Merge branch 'master' of ssh://git-annex.branchable.com 2021-10-08 13:27:23 -04:00
Joey Hess
7ae7820ac0
todo 2021-10-08 13:26:40 -04:00
jkniiv
921e736953 Added a comment 2021-10-07 18:27:19 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
5bbbc87e99 Added a comment 2021-10-07 16:27:39 +00:00
jkniiv
04cb121b16 add footnote about the permissions required and Microsoft blog post 2021-10-07 16:25:37 +00:00
jkniiv
5135730467 remove mention of mount option case=dir (turned out to be unnecessary) 2021-10-07 11:40:25 +00:00
jkniiv
d9c2a632f5 Added a comment 2021-10-07 11:36:49 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
ae00d385fe Added a comment 2021-10-07 06:17:30 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
bcf7bd8505 2021-10-07 05:53:23 +00:00
jkniiv
f36272be0c Added a comment: the WSL1 use case 2021-10-07 04:12:48 +00:00
jkniiv
2b06612ed5 rename WSL1 section to highlight date, add wording about being experimental, reword some awkwardness, add further directions 2021-10-07 03:56:37 +00:00
asakurareiko@f3d908c71c009580228b264f63f21c7274df7476
c3afda1699 Add steps for WSL1 2021-10-06 22:35:27 +00:00
spwhitton
3b4e56c760 Added a comment 2021-10-02 17:04:02 +00:00
Joey Hess
9012fa0187
reinject: Fix crash when reinjecting a file from outside the repository
Commit 4bf7940d6b introduced this
problem, but was otherwise doing a good thing. Problem being
that fileRef "/foo" used to return ":./foo", which was actually wrong,
but as long as there was no foo in the local repository, catKey
could operate on it without crashing. After that fix though, fileRef
would return eg "../../foo", resulting in fileRef returning
":./../../foo", which will make git cat-file crash since that's
not a valid path in the repo.

Fix is simply to make fileRef detect paths outside the repo and return
Nothing. Then catKey can be skipped. This needed several bugfixes to
dirContains as well, in previous commits.

In Command.Smudge, this led to needing to check for Nothing. That case
should actually never happen, because the fileoutsiderepo check will
detect it earlier.

Sponsored-by: Brock Spratlen on Patreon
2021-10-01 14:06:34 -04:00
spwhitton
6fbca0bb5b add sign off 2021-09-30 21:19:39 +00:00
spwhitton
c6e1a6a3a1 post bug 2021-09-30 21:15:04 +00:00
Joey Hess
a92427c0d3
comment 2021-09-27 14:14:52 -04:00
Joey Hess
69c1c02339
comment 2021-09-27 13:59:54 -04:00
Joey Hess
64cac1a721
avoid potentially very long bwlimit delay at start
I first saw this getting with -J2 over ssh, but later saw it also
without the -J2. It was resuming, and the calulated unboundDelay was
many minutes. The first update of the meter jumped to some large value,
because of the resuming, and so it thought the BW was super fast.

Avoid by waiting until the second meter update.

Might be a good idea to also guard for the delay being many seconds
and avoid waiting. But how many? If BW is legitimately super fast, and a
remote happens to read more than a 32kb or so chunk at a time, it could
in theory download megabytes or gigabytes of data before the first meter
update. It would actually be appropriate then to delay for a long time,
if the desired BW was low. Could make up some numbers that are sane now,
but tech may improve.

(BTW, pleased to see bwlimit does work with -J. I had worried that
it might not, if the meter update happened in a different thread than
the downloading, but it's done in the same thread.)

Sponsored-by: Brett Eisenberg on Patreon
2021-09-22 19:23:30 -04:00
Joey Hess
1033a81d63
mention tiny wart 2021-09-22 15:29:07 -04:00
Joey Hess
fc9abca231
Merge branch 'bwlimit' 2021-09-22 15:27:28 -04:00
Joey Hess
e8496d62e4
improved bwrate limiting implementation
New method is much better. Avoids unrestrained transfer at the beginning
(except for the first block. Keeps right at or a few kb/s below the
configured limit, with very little varation in the actual reported bandwidth.

Removed the /s part of the config as it's not needed.

Ready to merge.

Sponsored-by: Luke Shumaker on Patreon
2021-09-22 15:27:16 -04:00
Joey Hess
00c452f0db
Merge branch 'master' of ssh://git-annex.branchable.com 2021-09-22 11:15:51 -04:00
Joey Hess
44d3d50785
note 2021-09-22 11:10:55 -04:00
Atemu
f71cffa401 Added a comment 2021-09-22 14:17:59 +00:00
Joey Hess
b76a4453cb
bwlimit branch 2021-09-22 09:54:14 -04:00
Joey Hess
b8130655cc
comment 2021-09-11 17:01:39 -04:00
Joey Hess
4f42292b13
improve url download failure display
* When downloading urls fail, explain which urls failed for which
  reasons.
* web: Avoid displaying a warning when downloading one url failed
  but another url later succeeded.

Some other uses of downloadUrl use urls that are effectively internal use,
and should not all be displayed to the user on failure. Eg, Remote.Git
tries different urls where content could be located depending on how the
remote repo is set up. Exposing those urls to the user would lead to wild
goose chases. So had to parameterize it to control whether it displays urls
or not.

A side effect of this change is that when there are some youtube urls
and some regular urls, it will try regular urls first, even if the
youtube urls are listed first. This seems like an improvement if
anything, but in any case there's no defined order of urls that it's
supposed to use.

Sponsored-by: Dartmouth College's Datalad project
2021-09-01 15:33:38 -04:00
yarikoptic
07aa981379 initial TODO on improving 'get' errors reporting 2021-08-31 14:40:48 +00:00
Joey Hess
0418d54f28
update 2021-08-30 14:34:25 -04:00
Joey Hess
8208daaf17
idea for making more special remotes support importtree
Sponsored-by: Jack Hill on Patreon
2021-08-30 14:27:22 -04:00
Joey Hess
7b1709105a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-27 12:30:35 -04:00
Joey Hess
3093e8fcc3
thought 2021-08-27 00:59:24 -04:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
3007e3c177 Added a comment: it turns out I had to file this as a bug 2021-08-27 01:38:29 +00:00
Joey Hess
ab7b5a492c
--batch-keys
New --batch-keys option added to these commands:  get, drop, move, copy, whereis

git-annex-matching-options had to be reworded since some of its options
can be used to match on keys, not only files.

Sponsored-by: Luke Shumaker on Patreon
2021-08-25 14:21:12 -04:00
Joey Hess
8944569988
comment 2021-08-24 11:21:12 -04:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
20a252a129 Added a comment: git annex sync --no-commit --content takes double the time of git annex get . 2021-08-20 02:05:53 +00:00
Joey Hess
53744e132d
incremental verification for gitlfs and httpalso
And that should be all the special remotes supporting it on linux now,
except for in the odd edge case here and there.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:17:10 -04:00
Joey Hess
f5e09a1dbe
incremental verification for S3
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:07:00 -04:00
Joey Hess
d154e7022e
incremental verification for web special remote
Except when configuration makes curl be used. It did not seem worth
trying to tail the file when curl is downloading.

But when an interrupted download is resumed, it does not read the whole
existing file to hash it. Same reason discussed in
commit 7eb3742e4b76d1d7a487c2c53bf25cda4ee5df43; that could take a long
time with no progress being displayed. And also there's an open http
request, which needs to be consumed; taking a long time to hash the file
might cause it to time out.

Also in passing implemented it for git and external special remotes when
downloading from the web. Several others like S3 are within striking
distance now as well.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 15:02:22 -04:00
Joey Hess
1dca3ba26a
status update 2021-08-18 12:58:27 -04:00
Joey Hess
d67da1f4db
idea 2021-08-18 12:39:03 -04:00
Joey Hess
4bbc6a25fa
comment 2021-08-17 10:28:18 -04:00
Joey Hess
ffa1f6ed30
Merge branch 'master' of ssh://git-annex.branchable.com 2021-08-16 17:30:04 -04:00
Joey Hess
f8463ad52f
status update 2021-08-16 17:29:39 -04:00
Joey Hess
b1622eb932
incremental verify for directory special remote
Added fileRetriever', which will let the remaining special remotes
eventually also support incremental verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 16:51:33 -04:00
Joey Hess
ec82299730
status update
I was wrong about S3 supporting tailVerify.
2021-08-16 15:15:32 -04:00
https://christian.amsuess.com/chrysn
905fef31b3 Added a comment: Another example 2021-08-15 17:42:54 +00:00
Joey Hess
751242b55e
status update 2021-08-13 16:34:18 -04:00
Joey Hess
51d59fb260
comment 2021-08-12 14:49:48 -04:00
Joey Hess
7eb3742e4b
incremental verify for chunked remotes
Simply feed each chunk in turn to the incremental verifier.

When resuming an interrupted retrieve, it does not do incremental
verification. That would need to read the file, up to the resume point,
and feed it to the incremental verifier. That seems easy to get wrong.
Also it would mean extra work done before the transfer can start. Which
would complicate displaying progress, and would perhaps not appear to the
user as if it was resuming from where it left off. Instead, in that
situation, return UnVerified, and let the verification be done in a
separate pass.

Granted, Annex.CopyFile does manage all that, but it's not complicated
by dealing with chunks too.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:42:49 -04:00
Joey Hess
8886ff1cff
done! 2021-08-04 12:40:25 -04:00
Joey Hess
c67b1e31a6
branch 2021-08-03 17:05:50 -04:00
Joey Hess
629e95fd8e
update 2021-08-03 14:03:25 -04:00
Joey Hess
bb56186daa
new todo.. I seem to have cracked a longstanding problem
Sponsored-by: Jochen Bartl on Patreon
2021-08-03 13:51:23 -04:00
Joey Hess
461035c6ec
close
I'm now reasonably sure I've identified both cases where this can
happen. v8 upgrades and certian filesystems eg NFS. Both are handled as
well as can be, though it may involve some extra checksumming work.
2021-07-30 15:22:22 -04:00
Joey Hess
1fc6e6a65f
close this frustrating todo due to lack of followup and/or being fixed 2021-07-26 11:16:58 -04:00
Joey Hess
2ab19be3f6
comment 2021-07-22 13:34:27 -04:00
Joey Hess
39d7479366
comment 2021-07-21 11:06:03 -04:00
neil.okamoto@6ca5b1b6563bc71ce13ec235852d23e6ac86e331
9da4a7d5b2 Added a comment 2021-07-21 07:13:02 +00:00
neil.okamoto@6ca5b1b6563bc71ce13ec235852d23e6ac86e331
24c88e8167 Added a comment 2021-07-20 23:22:29 +00:00
Joey Hess
c9b9a93509
close 2021-07-16 16:28:22 -04:00
Joey Hess
acbf711e21
comment 2021-07-16 15:16:56 -04:00
Joey Hess
aa33c928cb
Merge branch 'master' of ssh://git-annex.branchable.com 2021-07-16 14:02:41 -04:00
Joey Hess
dc4e79c582
I understand this now, marking confirmed 2021-07-16 14:02:12 -04:00
Atemu
a4947c65ec Added a comment 2021-07-16 11:21:43 +00:00
Lukey
3c1f7ee773 Added a comment 2021-07-15 18:40:13 +00:00
Lukey
ca12b01fca Clarify 2021-07-15 18:30:22 +00:00
Joey Hess
b2a7a665b2
comment 2021-07-15 12:19:49 -04:00
Atemu
eb89eeea30 Added a comment 2021-07-14 21:27:42 +00:00
Joey Hess
d6f056eca3
have whereused also check the reflog
Since the stash is part of that, it can also find stashed content.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-07-14 16:05:20 -04:00
Joey Hess
980613ec8c
man page for git-annex whereused
And an implementation strategy.

Sponsored-by: Brett Eisenberg on Patreon
2021-07-14 13:43:45 -04:00
Joey Hess
6db4ee6c34
comment 2021-07-14 12:01:34 -04:00
Atemu
fc5c6c6f11 Added a comment 2021-07-13 21:13:12 +00:00
Joey Hess
011341ff1a
comment 2021-07-13 13:20:14 -04:00
Joey Hess
93d8585faf
comment 2021-07-13 13:17:27 -04:00
Joey Hess
ce6a288412
comment 2021-07-13 12:47:56 -04:00
Ilya_Shlyakhter
e371367cb0 Added a comment: (partial) filenames in keys 2021-07-13 15:13:27 +00:00
Ilya_Shlyakhter
f4a281f712 Added a comment 2021-07-13 03:05:45 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
72b9db6680 Added a comment 2021-07-12 22:48:26 +00:00
Joey Hess
b4c149b05e
comment 2021-07-12 12:05:52 -04:00
Ilya_Shlyakhter
5b45deda15 Added a comment: about maxextensionlength 2021-07-12 15:38:22 +00:00
Ilya_Shlyakhter
ea3fdc43d6 Added a comment 2021-07-12 15:30:25 +00:00
Joey Hess
443f841b78
comment 2021-07-12 11:20:51 -04:00
Joey Hess
bb87ee3375
comment 2021-07-12 11:01:59 -04:00
Joey Hess
0d3da9496f
comment 2021-07-12 10:57:44 -04:00
Ilya_Shlyakhter
e85eed583f Added a comment 2021-07-09 16:04:47 +00:00
https://christian.amsuess.com/chrysn
b4d33f3358 Added a comment: +1 on supportunlocked in git-annex config 2021-07-09 15:26:13 +00:00
https://christian.amsuess.com/chrysn
4a78debbe1 Added a comment: +1 on supportunlocked in git-annex config 2021-07-09 15:25:53 +00:00
jkniiv@b330fc3a602d36a37a67b2a2d99d4bed3bb653cb
a74ea32352 Added a comment 2021-07-06 19:49:07 +00:00
Ilya_Shlyakhter
9dd962aa73 added suggestion: add maxextensionlength to git-annex-config 2021-07-06 15:05:05 +00:00
Ilya_Shlyakhter
92c0080bf4 Added a comment: about keys with extensions 2021-07-06 14:59:24 +00:00
Joey Hess
2a328dca8c
comment 2021-07-06 10:09:03 -04:00
Joey Hess
92ca28c47b
close 2021-07-06 09:46:25 -04:00
Joey Hess
c00a15dde0
close 2021-07-06 09:45:45 -04:00
Ilya_Shlyakhter
97eb79e752 Added a comment: read-only view of repo clone with symlinks as normal files 2021-07-01 17:17:20 +00:00
Joey Hess
5594d00f8c
close 2021-06-28 13:08:57 -04:00
Joey Hess
ef5fb1a853
close 2021-06-28 13:05:26 -04:00
Joey Hess
3a14648142
dropping unused marks as dead
Dropping an object with drop --unused or dropunused will mark it as
dead, preventing fsck --all from complaining about it after it's been
dropped from all repositories.

If another repository still has a copy, it won't be treated as dead
until it's also dropped from there.

The drop has to use --unused, can't be --key or something else, because
this indicates that the user has recently ran git-annex unused. If it
checked the unused log on every drop, bad things would happen when the
unused log was out of date, eg a file used to be unused but then got
re-added. Marking such a file as dead could be confusing. When the user
uses --unused/dropunused, they must consider the unused information to be
up-to-date.

The particular workflow this enables is:

	git annex add foo
	git annex unannex foo
	git annex unused
	git annex drop --unused / dropunused
	git annex fsck --all # no warnings

The docs for git-annex unannex say to use git-annex unused and dropunused,
so the user should be pointed in this direction when they want to undo an
accidental add.

Sponsored-by: Brock Spratlen on Patreon
2021-06-25 15:22:26 -04:00
Joey Hess
0c1df95dd7
comment 2021-06-25 14:15:14 -04:00
Joey Hess
172c7e22e9
comment 2021-06-25 13:52:59 -04:00
Joey Hess
f5595ea063
comment 2021-06-25 12:13:25 -04:00
Joey Hess
e2cc9bf53d
comment 2021-06-25 12:11:05 -04:00
Ilya_Shlyakhter
27df3f8e88 added suggestion re: hardening git-annex against interruptions 2021-06-24 18:16:43 +00:00
Ilya_Shlyakhter
7ad95210d1 Added a comment: resolving merge conflicts 2021-06-24 18:03:40 +00:00
Lukey
fb2e418fa2 Added a comment 2021-06-24 17:43:36 +00:00
Ilya_Shlyakhter
879ac0b713 Added a comment: thanks 2021-06-24 16:40:48 +00:00
Joey Hess
847aa88bcb
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-23 17:31:09 -04:00
Joey Hess
a3a6e64122
comment 2021-06-23 17:30:50 -04:00
Ilya_Shlyakhter
7bff1dc27c Added a comment: recovering from sqlite db corruption 2021-06-23 18:45:47 +00:00
Joey Hess
af64c184f3
comment 2021-06-23 13:19:34 -04:00
Joey Hess
b980d90b05
close 2021-06-23 12:59:00 -04:00
Joey Hess
e701f69ac4
comment and close 2021-06-23 12:53:18 -04:00
Lukey
d2ba1698c9 2021-06-22 18:40:56 +00:00
Joey Hess
4b1b9d7a83
Added annex.freezecontent-command and annex.thawcontent-command configs
Freeze first sets the file perms, and then runs
freezecontent-command. Thaw runs thawcontent-command before
restoring file permissions. This is in case the freeze command
prevents changing file perms, as eg setting a file immutable does.
Also, changing file perms tends to mess up previously set ACLs.

git-annex init's probe for crippled filesystem uses them, so if file perms
don't work, but freezecontent-command manages to prevent write to a file,
it won't treat the filesystem as crippled.

When the the filesystem has been probed as crippled, the hooks are not
used, because there seems to be no point then; git-annex won't be relying
on locking annex objects down. Also, this avoids them being run when the
file perms have not been changed, in case they somehow rely on
git-annex's setting of the file perms in order to work.

Sponsored-by: Dartmouth College's Datalad project
2021-06-21 14:40:52 -04:00
Joey Hess
f23ae9a45b
comment 2021-06-21 13:52:50 -04:00
Joey Hess
ea0835eba6
tag datalad
at yoh's request
2021-06-21 13:23:51 -04:00
Joey Hess
a6a6217322
remove old closed bugs and todo items to speed up wiki updates and reduce size
Remove closed bugs and todos that were last edited or commented before 2020.

Except for ones tagged projects/* since projects like datalad want to keep
around records of old deleted bugs longer.

Command line used:

for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2020 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2020 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2020 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2020 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
2021-06-21 13:10:13 -04:00
Ilya_Shlyakhter
bfa530e962 added suggestion to allow use of synchronous=OFF with Sqlite 2021-06-16 20:06:39 +00:00
Joey Hess
d2be68907c
drop, move, mirror: when two files have the same content, honor the max numcopies and requiredcopies
Eg, before with a .gitattributes like:

*.2 annex.numcopies=2
*.1 annex.numcopies=1

And foo.1 and foo.2 having the same content and key, git-annex drop foo.1 foo.2
would succeed, leaving just 1 copy, despite foo.2 needing 2 copies.
It dropped foo.1 first and then skipped foo.2 since its content was gone.

Now that the keys database includes locked files, this longstanding wart
can be fixed.

Sponsored-by: Noam Kremen on Patreon
2021-06-15 11:38:44 -04:00
Joey Hess
af9fdf5dba
verify associated files when checking numcopies
Most of this is just refactoring. But, handleDropsFrom
did not verify that associated files from the keys db were still
accurate, and has now been fixed to.

A minor improvement to this would be to avoid calling catKeyFile
twice on the same file, when getting the numcopies and mincopies value,
in the common case where the same file has the highest value for both.
But, it avoids checking every associated file, so it will scale well to
lots of dups already.

Sponsored-by: Kevin Mueller on Patreon
2021-06-15 11:14:52 -04:00
Joey Hess
effc9bf5dd
close 2021-06-15 10:11:14 -04:00
Joey Hess
711252331e
comment 2021-06-14 14:34:22 -04:00
Joey Hess
398f9decd4
comment 2021-06-14 14:32:38 -04:00
Joey Hess
78da00c7a6
Future proof activity log parsing
When the log has an activity that is not known, eg added by a future
version of git-annex, it used to be treated as no activity at all,
which would make git-annex expire think it should expire the repository,
despite it having some kind of recent activity.

Hopefully there will be no reason to add a new activity until enough
time has passed that this commit is in use everywhere.

Sponsored-by: Jake Vosloo on Patreon
2021-06-14 14:18:19 -04:00
Joey Hess
3ac9363c03
comment 2021-06-14 12:42:11 -04:00
Joey Hess
014dc63a55
avoid sometimes expensive operations when annex.supportunlocked = false
This will mostly just avoid a DB lookup, so things get marginally
faster. But in cases where there are many files using the same key, it
can be a more significant speedup.

Added overhead is one MVar lookup per call, which should be small
enough, since this happens after transferring or ingesting a file,
which is always a lot more work than that. It would be nice, though,
to move getGitConfig to AnnexRead, which there is an open todo about.
2021-06-14 12:40:41 -04:00
Joey Hess
6cb9113ff5
comments 2021-06-08 17:38:56 -04:00
Joey Hess
7b6deb1109
display scanning message whenever reconcileStaged has enough files to chew on
Clear visible progress bar first.

Removed showSideActionAfter because it can't be used in reconcileStaged
(import loop). Instead, it counts the number of files it
processes and displays it after it's seen a sufficient to know it's
taking a while.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 12:48:30 -04:00
Joey Hess
13b9a288d3
scanAnnexedFiles in smudge --update
This makes git checkout and git merge hooks do the work to catch up with
changes that they made to the tree. Rather than doing it at some later
point when the user is not thinking about that past operation.

Sponsored-by: Dartmouth College's Datalad project
2021-06-08 11:37:47 -04:00
Joey Hess
2467de4f9b
todo 2021-06-07 16:58:35 -04:00
Joey Hess
0f10f208a7
avoid double work in git-annex init
reconcileStaged was doing a redundant scan to scannAnnexedFiles.

It would probably make sense to move the body of scannAnnexedFiles
into reconcileStaged, the separation does not really serve any purpose.

Sponsored-by: Dartmouth College's Datalad project
2021-06-07 16:50:14 -04:00
Joey Hess
6ceb31a30a
optimise reconcileStaged with git cat-file streaming
Commit 428c91606b made it need to do more
work in situations like switching between very different branches.

Compare with seekFilteredKeys which has a similar optimisation. Might be
possible to factor out the common part from these?

Sponsored-by: Dartmouth College's Datalad project
2021-06-07 15:26:48 -04:00
Joey Hess
570e93abfd
comment 2021-06-07 13:28:36 -04:00
Joey Hess
1c35cacf8e
fix link 2021-06-07 13:06:16 -04:00
Joey Hess
b960ebf1b3
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-07 12:59:21 -04:00
Ilya_Shlyakhter
4d581ad6b4 Added a comment: deferring the keys-to-files scan 2021-06-07 16:11:01 +00:00
Joey Hess
a0bba3afad
comment 2021-06-07 11:49:28 -04:00
Ilya_Shlyakhter
5359f8bc14 added suggestion to match keys by file extension in the key 2021-06-07 15:08:51 +00:00
lucas.gautheron@09f1983993dfb0907d02ba268b3ca672f1dc3eea
b38dc11a37 Added a comment 2021-06-05 10:10:57 +00:00
Ilya_Shlyakhter
d39dfed2a7 Added a comment: "why all these wild ideas are being thrown out there" 2021-06-04 22:15:33 +00:00
Joey Hess
a2c9360905
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-04 16:45:02 -04:00
Joey Hess
8a13bbedd6
--size-limit exit 101
Sponsored-by: Mark Reidenbach on Patreon
2021-06-04 16:43:47 -04:00
Atemu
ee5f30ee6b 2021-06-04 20:26:27 +00:00
Joey Hess
771a122c9e
add --size-limit option
When this option is not used, there should be effectively no added
overhead, thanks to the optimisation in
b3cd0cc6ba.

When an action fails on a file, the size of the file still counts toward
the size limit. This was necessary to support concurrency, but also
generally seems like the right choice.

Most commands that operate on annexed files support the option.
export and import do not, and I don't know if it would make sense for
export to.. Why would you want an incomplete export? sync doesn't, and
while it would be easy to make it support it for transferring files,
it's not clear if dropping files should also take the size limit into
account. Commands like add that don't operate on annexed files don't
support the option either.

Exiting 101 not yet implemented.

Sponsored-by: Denis Dzyubenko on Patreon
2021-06-04 16:16:53 -04:00
Joey Hess
7868dbd5e0
comment 2021-06-04 13:53:24 -04:00
Joey Hess
327033c2e5
comment 2021-06-04 13:36:51 -04:00
Joey Hess
0434674c85
avoid displaying the scanning annexed files message when repo is not large
Avoids users thinking this scan is a big deal, when it's not in the
majority of repos.

showSideActionAfter has some ugly caveats, since it has to display in
the background of another action. I could not see a better way to do it
and it works fine in this particular case. It also doesn't really belong
in Annex.Concurrent, but cannot go in Messages due to an import loop.

Sponsored-by: Dartmouth College's Datalad project
2021-06-04 13:16:48 -04:00
Joey Hess
95cec1bdfe
comment 2021-06-04 13:14:29 -04:00
yarikoptic
b925ea2923 about "scanning for annexed" while in git-annex branch 2021-06-04 15:20:34 +00:00
Atemu
f70251d638 2021-06-03 13:36:20 +00:00
Ilya_Shlyakhter
de12aeb1a4 Added a comment: matching include/exclude based on file extension in the key 2021-06-02 17:02:58 +00:00
Ilya_Shlyakhter
6a2bfad192 Added a comment: keys db optimization 2021-06-02 16:53:03 +00:00
Joey Hess
6f3f972355
Merge branch 'master' of ssh://git-annex.branchable.com 2021-06-01 11:43:36 -04:00
Joey Hess
3155c0d03e
todo 2021-06-01 10:39:48 -04:00
Ilya_Shlyakhter
a7e8a630fb Added a comment: keys-to-paths db 2021-05-31 23:15:21 +00:00
Joey Hess
f00e365f41
comments 2021-05-31 17:54:17 -04:00
Ilya_Shlyakhter
2dac55978c Added a comment: startup scan for files 2021-05-31 20:50:36 +00:00
Joey Hess
8734f17bc5
comment 2021-05-31 15:15:09 -04:00
Joey Hess
988dbce27a
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-31 15:05:40 -04:00
Joey Hess
eb6f6ff9b8
speed up keys database writes
There seems to be no reason to check the time here. I think it was
inherited from code in Database.Fsck, which does have a reason to commit
every few minutes. Removing that syscall speeds up a git-annex init
in a repo with 100000 annexed files by about 3 seconds.

Sponsored-by: Dartmouth College's Datalad project
2021-05-31 15:01:00 -04:00
Atemu
6da7f26e2a 2021-05-31 18:59:15 +00:00
Atemu
ae129dc317 2021-05-31 18:42:56 +00:00
Joey Hess
0f54e5e0ae
speed up initial scanning for annexed files
Streaming through git this way speeds it up by around 25%. This is
similar to the optimisations of seeking annexed files.

Sponsored-by: Dartmouth College's Datalad project
2021-05-31 14:29:34 -04:00
Joey Hess
759e5a9903
todo 2021-05-31 10:50:22 -04:00
Joey Hess
3b7f28feca
comment 2021-05-31 10:43:59 -04:00
Joey Hess
57a0ef8d90
comment and reject todo 2021-05-27 12:19:35 -04:00
Atemu
0a0889e72e Added a comment 2021-05-26 07:11:20 +00:00
Joey Hess
13a6bfff49
comments 2021-05-25 16:37:32 -04:00
Joey Hess
f5dc06077d
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 13:10:34 -04:00
Joey Hess
b5f5475ed6
New matching options --excludesamecontent and --includesamecontent
The normalisation of filenames turns out to be the tricky part here,
because the associated files coming out of the keys db may look like
"./foo/bar" or "../bar". For the former to match a glob like "foo/*",
it needs to be normalised.

Note that, on windows, normalise "./foo/bar" = "foo\\bar"
which a glob like "foo/*" won't match. So the glob is matched a second
time, on the toInternalGitPath, so allowing the user to provide a glob
with the slashes in either direction. However, this still won't support
some wacky edge cases like the user providing a glob of "foo/bar\\*"

Sponsored-by: Dartmouth College's Datalad project
2021-05-25 13:08:18 -04:00
Lukey
2ccf525b7f Added a comment 2021-05-25 16:48:26 +00:00
Joey Hess
cd73fcc92c
Merge branch 'master' of ssh://git-annex.branchable.com 2021-05-25 11:45:02 -04:00
Joey Hess
483fc4dc6b
Merge branch 'trackassociated' 2021-05-25 11:43:52 -04:00
Joey Hess
e9c95ef890
comments 2021-05-25 11:43:46 -04:00
Atemu
7ed4c4a35c 2021-05-25 14:51:21 +00:00
parhuzamos
54e1ac849a Added a comment 2021-05-24 09:33:50 +00:00
Joey Hess
428c91606b
include locked files in the keys database associated files
Before only unlocked files were included.

The initial scan now scans for locked as well as unlocked files. This
does mean it gets a little bit slower, although I optimised it as well
as I think it can be.

reconcileStaged changed to diff from the current index to the tree of
the previous index. This lets it handle deletions as well, removing
associated files for both locked and unlocked files, which did not
always happen before.

On upgrade, there will be no recorded previous tree, so it will diff
from the empty tree to current index, and so will fully populate the
associated files, as well as removing any stale associated files
that were present due to them not being removed before.

reconcileStaged now does a bit more work. Most of the time, this will
just be due to running more often, after some change is made to the
index, and since there will be few changes since the last time, it will
not be a noticable overhead. What may turn out to be a noticable
slowdown is after changing to a branch, it has to go through the diff
from the previous index to the new one, and if there are lots of
changes, that could take a long time. Also, after adding a lot of files,
or deleting a lot of files, or moving a large subdirectory, etc.

Command.Lock used removeAssociatedFile, but now that's wrong because a
newly locked file still needs to have its associated file tracked.

Command.Rekey used removeAssociatedFile when the file was unlocked.
It could remove it also when it's locked, but it is not really
necessary, because it changes the index, and so the next time git-annex
run and accesses the keys db, reconcileStaged will run and update it.

There are probably several other places that use addAssociatedFile and
don't need to any more for similar reasons. But there's no harm in
keeping them, and it probably is a good idea to, if only to support
mixing this with older versions of git-annex.

However, mixing this and older versions does risk reconcileStaged not
running, if the older version already ran it on a given index state. So
it's not a good idea to mix versions. This problem could be dealt with
by changing the name of the gitAnnexKeysDbIndexCache, but that would
leave the old file dangling, or it would need to keep trying to remove
it.
2021-05-21 16:24:37 -04:00
Joey Hess
1d9bad51d2
plan for these 2021-05-21 13:50:26 -04:00
Joey Hess
b68a40fa88
todo 2021-05-20 11:18:46 -04:00
Joey Hess
64e26287dd
comment 2021-05-19 11:07:02 -04:00
yarikoptic
674f33c139 todo for extra logging when content changed 2021-05-18 14:05:18 +00:00
Joey Hess
c525d18cf7
filter-branch: New command, useful to produce a filtered version of the git-annex branch, eg when splitting a repository 2021-05-17 14:16:46 -04:00