As well as escape sequences, control characters seem unlikely to be desired when
doing addurl, and likely to trip someone up. So disallow them as well.
I did consider going the other way and allowing filenames with control characters
and escape sequences, since git-annex is in the process of escaping display
of all filenames. Might still be a better idea?
Also display the illegal filename git quoted when it rejects it.
Sponsored-by: Nicholas Golder-Manning on Patreon
Added StringContainingQuotedPath, which is used for ActionItemOther.
In the process, checked every ActionItemOther for those containing
filenames, and made them use quoting.
Sponsored-by: Graham Spencer on Patreon
registerurl: When an url is claimed by a special remote other than the web,
update location tracking for that special remote.
registerurl's behavior was changed in commit
451171b7c1, apparently accidentially to not
update location tracking except for the web.
This makes registerurl followed by unregisterurl not be a no-op, when the
url happens to be claimed by a remote other than the web. It is a noop when
the url is unclaimed except by the web. I don't like the inconsistency,
and wish that registerurl and unregisterurl never updated location
tracking, which would be more in keeping with them being plumbing.
But there is the fact that it used to behave this way, and also it was
inconsistent that it updated location tracking for the web but not for
other remotes, unlike addurl. And there's an argument that the user might
not know what remote to expect to claim an url, so would be considerably in
the dark when using registerurl. (Although they have to know what content
gets downloaded, since they specify a key..)
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
This serves two purposes. --remote=web bypasses other special remotes that
claim the url, same as addurl --raw. And, specifying some other remote
allows making sure that an url is claimed by the remote you expect,
which makes then using setpresentkey not be fragile.
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Support VERSION 2 in the external special remote protocol, which is
identical to VERSION 1, but avoids external remote programs neededing to
work around the above bug. External remote program that support
exporttree=yes are recommended to be updated to send VERSION 2.
Sponsored-by: Kevin Mueller on Patreon
Remote.Directory makes a temp file, then calls this, and since the temp
file exists, it prevented probing if CoW works.
Note that deleting the empty file does mean there's a small window for a
race. If another process is also exporting to the remote, that could let it
make the same temp file. However, the temp filename actually has the
processes's pid in it, which avoids that being a problem.
This may have been a reversion caused by commits around
63d508e885, but I haven't gone back and
tested to be sure. The directory special remote had supposedly supported
CoW for this going back to about half a year before that.
Sponsored-by: Graham Spencer on Patreon
The temporary URL key used for the download, before the real key is
generated, was blocked by annex.securehashesonly.
Fixed by passing the Backend that will be used for the final key into
runTransfer. When a Backend is provided, have preCheckSecureHashes
check that, rather than the key being transferred.
Sponsored-by: unqueued on Patreon
view: Support annex.maxextensionlength when generating filenames for the
view branch.
Note that refining an existing view will reuse the extension length that was
configured when initially constructing the view. This is necessarily the case
because it reuses the filenames.
Also view files used to have all extensions at the end, no matter how
many there were. Since annex.maxextensionlength's documentation includes
that it's limited to 2 extensions, I made it consistent with that.
Sponsored-by: k0ld on Patreon
I don't know of scenarios where that can happen (besides the bug
fixed by the parent commit), but there probably are some.
Sponsored-by: Boyd Stephen Smith Jr. on Patreon
Avoid failure to update adjusted branch --unlock-present after git-annex
drop when annex.adjustedbranchrefresh=1
At higher values, it did flush the queue, which ran restagePointerFiles.
But at 1, adjustedBranchRefreshFull gets added to the queue, and while
restagePointerFiles is also in the queue, it runs after that.
Sponsored-by: Brock Spratlen on Patreon
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon
copy: When --from and --to are combined and the content is already present
on the destination remote, update location tracking as necessary.
Sponsored-by: Dartmouth College's DANDI project
A repository can have a newline in its description due to being in a
directory containing a newline, or due to git-annex describe being
passed a string with a newline in it for some reason. Putting that
newline in uuid.log breaks its format.
So, escape the newline when it enters uuid.log, to \n
This is a one-way escaping, it is not converted back to a newline
when reading the log. If it were, commands like git-annex info and
whereis would display a multi-line description, which could be confusing
to read.
And, implementing roundtripping would necessarily cause problems if an
old version of git-annex were used to set a description that contained
whatever special character is used to escape the \n. Eg, a \ or if
it used the ! prefix before base64 data that is used in some other logs,
the ! character. Then the description set by the old git-annex would not
roundtrip.
There just doesn't seem to be any benefit of roundtripping newlines through,
so why bother? And, git often displays \n for newline when a filename
contains a newline, so git-annex doing it in this case seems sorta ok
by analogy to git.
(Some other git-annex logs can also have newlines put into them if the
user really wants to break git-annex. For example:
git-annex config annex.largefiles "foo
bar"
The full list is probably config.log, remote.log, group.log,
preferred-content.log, required-content.log,
group-preferred-content.log, schedule.log. Probably there is no
good reason to use a newline in any of these, and the breakage is
probably limited to the bad data the user put in not coming back out.
And users can write any garbage to log files themselves manually in any
case. So, I am not going to address all of those at this time. If a
problem such as this one with the newline in the repository path comes
up, it can be dealt with on a case by case basis.)
Sponsored-by: Dartmouth College's Datalad project
git hash-object --stdin-paths is a newline protocol so it cannot
support them. It would help to not use absPath, when the problem
is that the repository itself is in a path with a newline. But,
there's a reason it used absPath, which is that
git hash-object --stdin-paths actually chdirs to the top of the
repository on startup! That is not documented, and I think is a bug
in git.
I considered making the path relative to the top of the repo, but
then what if this is a git bug and gets fixed? git-annex would break
horribly.
So instead, keep the absPath, but when the path contains a newline,
fall back to running git hash-object once per file, which avoids
the problem with newlines and --stdin-paths. It will be slower,
but this is an edge case. (Similar slow code paths are already used
elsewhere when dealing with filenames with newlines and other parts
of git that use line-based protocols.)
Sponsored-by: Dartmouth College's Datalad project
Added arm64 build for ancient kernels, needed to support Android phones
whose kernels are too old to support kernels used by the current arm64
build.
Updated Android/git-annex-install to use it. (Also made it use i386-ancient
because that seems like a good idea.)
Sponsored-by: Noam Kremen on Patreon
As far as I can see, git-annex status was added to support direct mode, and
like other things added for that, it ought to be deprecated.
Behavior is similar to git status --short, though not identical in a few
cases eg renamed files.
I think datalad does not use this command, although it might have in the
past. Could not find any use of it in the current datalad code.
A deprecation warning at runtime would be the next step, probably will wait
and do that for all the deprecated commands together (except findref).
Well, perhaps it could be documented better, but it's a compositional
feature so users who need it will probably try it and be happy to find
that it works.
I had thought this would not make sense to combine with view branches,
since removing files from a view changes metadata.
However, that's committing removal of files. With --hide-missing, the
files get removed when git-annex updates the branch itself, so there is
no conflict.
It does not seem likely to be very useful, but it does work! And that's
nice because it means all types of adjusted branches can be combined with
view branches.
Sponsored-by: Max Thoursie on Patreon
When generating the view, check if the key is present.
When syncing in a view branch with an adjustment, run adjustedBranchRefreshFull
the same as is done when syncing in other adjusted branches. This is
needed because the docs for git-annex adjust --unlock-present suggest
using git-annex sync to update the branch when annex.adjustedbranchrefresh
is not set.
Note that, with annex.adjustedbranchrefresh set, it just works! The
adjusted branch gets updated in the usual way and it doesn't matter that
there's a view branch underneath.
And of course, re-running git-annex adjut --unlock-present also works,
as suggested in the docs.
Sponsored-by: Erik Bjäreholt on Patreon
This reverts commit 648e59cac2.
Failed to build on windows, because
In the dependencies for haskeline-0.8.2:
Win32-2.11.1.0 from Stack configuration does not match >=2.1 && <2.10 || >=2.12 (latest
matching version is 2.13.4.0)
jkniiv did find a solution that builds:
-- Win32-2.11.1.0
+- Win32-2.9.0.0
+- Cabal-3.6.3.0
+- directory-1.3.7.1
+- process-1.6.17.0
+- time-1.11.1.2
But that is a quite old version of Win32 and risks bugs from it, and bumping
Cabal and directory to newer than lts-19.33 has seems also likely to be risky.
So, I've given up. aws-0.24 won't be able to be in the stack build until
there's a stackage lts (or nightly) that has filepath (>=1.4.100.0),
which will not happen until sometime after the next ghc release.
info: Fix reversion in last release involving handling of unsupported input
by continuing to handle any other inputs, before exiting nonzero at the
end.
Sponsored-by: Dartmouth College's Datalad project
view: Fix a reversion in 10.20230214 that omitted a file from a view when
the file had no metadata set, but the view only used path fields.
Sponsored-by: Jack Hill on Patreon
path to a bare repo when git config is not allowed to list the configs
due to the CVE-2022-24765 fix.
That resulted in a confusing error message, and prevented the nice
message that explains how to mark the repo as safe to use.
Made isBare a tristate so that the case where core.bare is not returned can
be handled.
The handling in updateLocation is to check if the directory
contains config and objects and if so assume it's bare.
Note that if that heuristic is somehow wrong, it would construct a repo
that thinks it's bare but is not. That could cause follow-on problems,
but since git-annex then checks checkRepoConfigInaccessible, and skips
using the repo anyway, a wrong guess should not be a problem.
Sponsored-by: Luke Shumaker on Patreon
Used to fail with a bad error message, indicating there was no
repository with the specified name, or something like that. Now, suggest
they use the uuid to disambiguate.
* info, enableremotemote, renameremote: Avoid a confusing message when more
than one repository matches the user provided name.
* info: Exit nonzero when the input is not supported.
Sponsored-by: Kevin Mueller on Patreon
A benchmark in my sound repository with `git-annex view feedtitle=*`
took 2:52 wall clock time before and 1:58 after. Though it still only used
130% of CPU.
This is the same kind of optimisation that is in seekFilteredKeys, though
that precaches location logs while this streams the metadata logs direct
to parsing them.
seekFilteredKeys contains more streaming, to find the annexed files, and
this could be further sped up with similar streaming.
Sponsored-by: Nicholas Golder-Manning on Patreon
Based on https://github.com/golang/go/discussions/58409, the Go compiler
already defaults to using a google proxy server, which would allow
Google to collect information about what dependencies users are
installing. (Of course they claim they won't.) Two separate environment
settings are needed to turn that off, and users in that thread were
surprised to learn about one of them.
So this warning is already appropriate to some extent.
Also based on the minimisation of user concerns by the golang developers
on that issue and elsewhere, it seems best to assume that they are not
going to be dissuaded from increasing data collection efforts in the future,
even if the blowback prevents this particular attempt.
So this warning should not be removed unless the Go community somehow
extricates itself from Google's control. Or unless ipfs is rewritten in
another language.
Some distros do have ipfs. Unfortunately, Debian appears to be structurally
incapable of packaging it. (8 years and counting;
https://bugs.debian.org/779893). So lots of users will be stuck
installing it from source or having to trust its official binaries.
It apparently displays a notice on first use, but I was surprised to
learn about this behavior today, and I've used it. Displaying a notice
does not make violating users' privacy acceptable.
sync: Fix a bug that caused files to be removed from an importtree=yes
exporttree=yes special remote when the remote's annex-tracking-branch was
not the currently checked out branch.
Sponsored-by: Max Thoursie on Patreon
* sync: When run in a view branch, refresh the view branch to reflect any
changes that have been made to the parent branch or metadata.
This is basically working, but probably needs some more work to deal with
all the edge cases of things sync does.
Sponsored-by: Lawrence Brogan on Patreon
* view: New field?=glob and ?tag syntax that includes a directory "_"
in the view for files that do not have the specified metadata set.
* Added annex.viewunsetdirectory git config to change the name of the
"_" directory in a view.
When in a view using the new syntax, old git-annex will fail to parse the
view log. It errors with "Not in a view.", which is not ideal. But that
only affects view commands.
annex.viewunsetdirectory is included in the View for a couple of reasons.
One is to avoid needing to warn the user that it should not be changed when
in a view, since that would confuse git-annex. Another reason is that it
helped with plumbing the value through to some pure functions.
annex.viewunsetdirectory is actually mangled the same as any other view
directory. So if it's configured to something like "N/A", there won't be
multiple levels of directories, which would also confuse git-annex.
Sponsored-By: Jack Hill on Patreon
S3: Support a region= configuration useful for some non-Amazon S3
implementations. This feature needs git-annex to be built with aws-0.24.
datacenter= sets both the AWS hostname and region in one setting, which is
easy when using AWS, but not useful for other hosts. So kept datacenter
as-is, but added this additional config.
Sponsored-By: Brett Eisenberg on Patreon
Use separate stages for download and upload. In the common case where
it downloads the file from one remote and then uploads to the other,
those are by far the most expensive operations, and there's a decent
chance the two remotes bottleneck on different resources.
Suppose it's being run with -J2 and a bunch of 10 mb files. Two threads
will be started both downloading from the src remote. They will probably
finish at the same time. Then two threads will be started uploading to
the dst remote. They will probably take the same time as well. Before
this change, it would alternate back and forth, bottlenecking on src and dst.
With this change, as soon as the two threads start uploading to dst, two
more threads are able to start, downloading from src. So bandwidth to
both remotes is saturated more often.
Other commands that use transferStages only send in one direction at a
time. So the worker threads for the other direction will sit idle, and
there will be no change in their behavior.
Sponsored-by: Dartmouth College's DANDI project
Lock the local content for drop after getting it from src, to prevent another
process from using the local content as a copy and dropping it from src,
which would prevent dropping the local content after sending it to dest.
Support resuming an interrupted move that downloaded the content from
src, leaving the local content populated. In this case, the location log
has not been updated to say the content is present locally, so we can
assume that it's resuming and go ahead and drop the local content after
sending it to dest.
Note that if a `git-annex get` is being ran at the same time as a
`git-annex move --from --to`, it may get a file just before the move
processes it. So the location log has not been updated yet, and the move
thinks it's resuming. Resulting in local copy being dropped after it's
sent to the dest. This race is something we'll just have to live with,
it seems.
I also gave up on the idea of checking if the location log had been updated
by a `git-annex get` that is ran at the same time. That wouldn't work, because
the location log is precached in the seek stage, so reading it again after
sending the content to dest would not notice changes made to it, unless the cache
were invalidated, which would slow it down a lot. That idea anyway was subject
to races where it would not detect the concurrent `git-annex get`.
So concurrent `git-annex get` will have results that may be surprising.
To make that less surprising, updated the documentation of this feature to
be explicit that it downloads content to the local repository
temporarily.
Sponsored-by: Dartmouth College's DANDI project
Allowing --from and --to as an alternative to --from or --to
is hard to do with optparse-applicative!
The obvious approach of (pfrom <|> pto <|> pfromandto) does not work
when pfromandto uses the same option names as pfrom and pto do.
It compiles but the generated parser does not work for all desired
combinations.
Instead, have to parse optionally from and optionally to. When neither
is provided, the parser succeeds, but it's a result that can't be
handled. So, have to giveup after option parsing. There does not seem to
be a way to make an optparse-applicative Parser give up internally
either.
Also, need seek' because I first tried making fto be a where binding,
but that resulted in a hang when git-annex move was run without --from
or --to. I think because startConcurrency was not expecting the stages
value to contain an exception and so ended up blocking.
Sponsored-by: Dartmouth College's DANDI project
I've long been asked for `git-annex find --all` or something like that,
but pushed back on it because I feel that the command is analagous to
find(1) and so it would be surprising for it to list keys rather than
files. So instead, add a new findkeys subcommand.
Note that the use of withKeyOptions is rather strange because usually
that is used to fall back to --all rather than listing files, but here
it's made to default to --all like behavior and never list files.
A performance thing that could be improved is that withKeyOptions
always reads and caches location logs. But findkeys with no options does
not need them, so it could be made faster. That caching does speed up
options like --in though. This is really just a subset of a more general
performance thing that --all reads location logs sometimes unncessarily.
Anyway, it needs to read the location log in order to checkDead,
and it seems good that findkeys does skip dead keys.
Also, cleaned up comments on git-annex-find man page asking for --all
option.
Sponsored-by: Dartmouth College's DANDI project