It now notices that a RepoLocation may not be Local, in which case
pattern matching on Local wouldn't do.
However, in these cases, I think it always is a Local. In particular,
Git.Config.read is only run on local repos and upgrades LocalUnknown to
Local.
Fix more breakage caused by git's fix for CVE-2022-24765, this time
involving a remote (either local or ssh) that is a repository not owned by
the current user.
Sponsored-by: Dartmouth College's DANDI project
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)
error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.
Changed a lot of existing error to giveup when it is not strictly an
internal error.
Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.
Sponsored-by: Noam Kremen on Patreon
That is a legal url, but parseUrl parses it to "/c:/path"
which is not a valid path on Windows. So as a workaround, use
parseURIPortable everywhere, which removes the leading slash when
run on windows.
Note that if an url is parsed like this and then serialized back
to a string, it will be different from the input. Which could
potentially be a problem, but is probably not in practice.
An alternative way to do it would be to have an uriPathPortable
that fixes up the path after parsing. But it would be harder to
make sure that is used everywhere, since uriPath is also used
when constructing an URI.
It's also worth noting that System.FilePath.normalize "/c:/path"
yields "c:/path". The reason I didn't use it is that it also
may change "/" to "\" in the path and I wanted to keep the url
changes minimal. Also noticed that convertToWindowsNativeNamespace
handles "/c:/path" the same as "c:/path".
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
Such an url is not valid; parseURI will fail on it. But git-annex doesn't
actually need to parse the url, because all it needs to do to support
syncing with it is know that it's not a local path, and use git pull and
push.
(Note that there is no good reason for the user to use such an url. An
absolute url is valid and I patched git-remote-gcrypt to support them
years ago. Still, users gonna do anything that tools allow, and
git-remote-gcrypt still supports them.)
Sponsored-by: Jack Hill on Patreon
path to a bare repo when git config is not allowed to list the configs
due to the CVE-2022-24765 fix.
That resulted in a confusing error message, and prevented the nice
message that explains how to mark the repo as safe to use.
Made isBare a tristate so that the case where core.bare is not returned can
be handled.
The handling in updateLocation is to check if the directory
contains config and objects and if so assume it's bare.
Note that if that heuristic is somehow wrong, it would construct a repo
that thinks it's bare but is not. That could cause follow-on problems,
but since git-annex then checks checkRepoConfigInaccessible, and skips
using the repo anyway, a wrong guess should not be a problem.
Sponsored-by: Luke Shumaker on Patreon
Fix a reversion that prevented git-annex from working in a repository when
--git-dir or GIT_DIR is specified to relocate the git directory to
somewhere else. (Introduced in version 10.20220525)
checkRepoConfigInaccessible could still run git config --list, just passing
--git-dir. It seems not necessary, because I know that passing --git-dir
bypasses git's check for repo ownership. I suppose it might be that git
eventually changes to check something about the ownership of the working
tree, so passing --git-dir without --work-tree would still be worth doing.
But for now this is the simple fix.
Sponsored-by: Nicholas Golder-Manning on Patreon
git does not crash when there's a remote configured for a user who does
not exist, and this prevents git-annex from crashing too. Consider that
a user might exist on one system but not another, and the git repo be
moved between systems. So not crashing is desirable.
Note that git fetch seems to mishandle a remote path like ~foo/bar
when the user does not exist. While it does access ./~foo/bar,
and gets as far as running git-upload-pack on the path,
it then complains there is no such repo. So different parts of git seem
to be doing different things in that edge case. Anyway, git-annex does
not need to be bug-for-bug compatible with git.
Sponsored-by: Jack Hill on Patreon
If it's passed a ConfigKey such as annex.version, avoid returning
an empty remote name and return Nothing instead. Also, foo.bar.baz is
not treated as a remote named "bar".
When a git remote is configured with an absolute path, use that path,
rather than making it relative. If it's configured with a relative path,
use that.
Git.Construct.fromPath changed to preserve the path as-is,
rather than making it absolute. And Annex.new changed to not
convert the path to relative. Instead, Git.CurrentRepo.get
generates a relative path.
A few things that used fromAbsPath unncessarily were changed in passing to
use fromPath instead. I'm seeing fromAbsPath as a security check,
while before it was being used in some cases when the path was
known absolute already. It may be that fromAbsPath is not really needed,
but only git-annex-shell uses it now, and I'm not 100% sure that there's
not some input that would cause a relative path to be used, opening a
security hole, without the security check. So left it as-is.
Test suite passes and strace shows the configured remote url is used
unchanged in the path into it. I can't be 100% sure there's not some code
somewhere that takes an absolute path to the repo and converts it to
relative and uses it, but it seems pretty unlikely that the code paths used
for a git remote would call such code. One place I know of is gitAnnexLink,
but I'm pretty sure that git remotes never deal with annex symlinks. If
that did get called, it generates a path relative to cwd, which would have
been wrong before this change as well, when operating on a remote.
After the last commit, it was able to throw errors just due to an
unparseable url. This avoids needing to worry about that, as long
as the call site has already checked that it has a parseable url.
Including the non-standard URI form that git-remote-gcrypt uses for rsync.
Eg, "ook://foo:bar" cannot be parsed because "bar" is not a valid port
number. But git could have a remote with that, it would try to run
git-remote-ook to handle it. So, git-annex has to allow for such things,
rather than crashing.
This commit was sponsored by Luke Shumaker on Patreon.
This fixes the bug.
Note, it's only done when GIT_DIR is set. When it's not set,
Git.Construct already handled it. This is why it was only noticed with this
git submodule command.
This commit was sponsored by Brett Eisenberg on Patreon.
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.
Git.Repo also changed to use RawFilePath for the path to the repo.
This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
The parser and looking up config keys in the map should both be faster
due to using ByteString.
I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.
Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
Goal is to make git-annex faster by using ByteString for all the
worktree traversal. For now, this is focusing on Command.Find,
in order to benchmark how much it helps. (All other commands are
temporarily disabled)
Currently in a very bad unbuildable in-between state.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Support working trees set up by git-worktree, by setting up some symlinks
such that git-annex links work right.
Also improved support for repositories created with --separate-git-dir.
At least recent git makes a .git file for those (older may have used a
symlink?), so that also needs to be converted to a symlink.
This commit was sponsored by Nick Piper on Patreon.
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.
Got rid of the remotes member of Git.Repo. This was a bit painful.
Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.
This commit was sponsored by Jake Vosloo on Patreon.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Reverts 965e106f24
Unfortunately, this caused breakage on Windows, and possibly elsewhere,
because parentDir and takeDirectory do not behave the same when there is a
trailing directory separator.
parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /
This commit was sponsored by Audric SCHILTKNECHT.
Note that on Windows a remote with a path like /home/foo/bar
is interpreted by git as being some screwy relative path (relative to what
exactly seems ill-defined -- it seemed relative to C:\Program Files\Git\ in
my tests!) So no attempt has been made to handle such a path sanely, just not
to crash when encountering it.
Note that "C:\\foo" </> "/home/foo/bar" yields /home/foo/bar even though
that is not absolute! I don't know what to make of all this,
except that I will be very happy when this crock of **** vanishes from
the face of the earth.
The -c option now not only modifies the git configuration seen by
git-annex, but it is passed along to every git command git-annex runs.
This was easy to plumb through because gitCommandLine is already used to
construct every git command line, to add --git-dir and --work-tree