Used to fail with a bad error message, indicating there was no
repository with the specified name, or something like that. Now, suggest
they use the uuid to disambiguate.
* info, enableremotemote, renameremote: Avoid a confusing message when more
than one repository matches the user provided name.
* info: Exit nonzero when the input is not supported.
Sponsored-by: Kevin Mueller on Patreon
A benchmark in my sound repository with `git-annex view feedtitle=*`
took 2:52 wall clock time before and 1:58 after. Though it still only used
130% of CPU.
This is the same kind of optimisation that is in seekFilteredKeys, though
that precaches location logs while this streams the metadata logs direct
to parsing them.
seekFilteredKeys contains more streaming, to find the annexed files, and
this could be further sped up with similar streaming.
Sponsored-by: Nicholas Golder-Manning on Patreon
Based on https://github.com/golang/go/discussions/58409, the Go compiler
already defaults to using a google proxy server, which would allow
Google to collect information about what dependencies users are
installing. (Of course they claim they won't.) Two separate environment
settings are needed to turn that off, and users in that thread were
surprised to learn about one of them.
So this warning is already appropriate to some extent.
Also based on the minimisation of user concerns by the golang developers
on that issue and elsewhere, it seems best to assume that they are not
going to be dissuaded from increasing data collection efforts in the future,
even if the blowback prevents this particular attempt.
So this warning should not be removed unless the Go community somehow
extricates itself from Google's control. Or unless ipfs is rewritten in
another language.
Some distros do have ipfs. Unfortunately, Debian appears to be structurally
incapable of packaging it. (8 years and counting;
https://bugs.debian.org/779893). So lots of users will be stuck
installing it from source or having to trust its official binaries.
It apparently displays a notice on first use, but I was surprised to
learn about this behavior today, and I've used it. Displaying a notice
does not make violating users' privacy acceptable.
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.
Sponsored-by: Max Thoursie on Patreon
* sync: When run in a view branch, refresh the view branch to reflect any
changes that have been made to the parent branch or metadata.
This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.
Sponsored-by: Lawrence Brogan on Patreon
* view: New field?=glob and ?tag syntax that includes a directory "_"
in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
"_" directory in a view.
When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.
annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.
annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.
Sponsored-By: Jack Hill on Patreon
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.
datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.
Sponsored-By: Brett Eisenberg on Patreon
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.
Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.
Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.
Sponsored-by: Dartmouth College's DANDI project
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.
Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.
Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.
I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.
So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.
Sponsored-by: Dartmouth College's DANDI project
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!
The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.
Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.
Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.
Sponsored-by: Dartmouth College's DANDI project
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.
Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.
A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.
Also, cleaned up comments on git-annex-find man page asking for --all
option.
Sponsored-by: Dartmouth College's DANDI project
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.
In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.
Sponsored-by: Dartmouth College's DANDI project
When a web special remote does not have urlinclude/urlexclude
configured, make it respect the configuration of other web special
remotes and avoid using urls that match the config of another.
Note that the other web special remote does not have to be enabled.
That seems ok, it would have been extra work to check for only ones that
are enabled.
The implementation does mean that the web special remote re-parses
its own config once at startup, as well as re-parsing the configs of any
other web special remotes. This should be a very small slowdown
unless there are lots of web special remotes.
Sponsored-by: Dartmouth College's DANDI project
Allow initremote of additional special remotes with type=web, in addition
to the default web special remote.
When --sameas=web is used, these provide additional names for the web
special remote, and may also have their own additional configuration
(once there is any for the web special remote) and cost.
Sponsored-by: Dartmouth College's DANDI project
Remove closed bugs and todos that were last edited or commented before 2022.
Except for ones tagged projects/* since projects like datalad want to keep
around records of old deleted bugs longer.
Command line used:
for f in $(grep -l '|done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2022 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2022 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done
for f in $(grep -l '\[\[done\]\]' -- ./*.mdwn); do if ! grep -q "projects/" "$f"; then d="$(echo "$f" | sed 's/.mdwn$//')"; if [ -z "$(git log --since=01-01-2022 --pretty=oneline -- "$f")" -a -z "$(git log --since=01-01-2022 --pretty=oneline -- "$d")" ]; then git rm -- "./$f" ; git rm -rf "./$d"; fi; fi; done