Fix bug in handling of linked worktrees on filesystems not supporting
symlinks, that caused annexed file content to be stored in the wrong
location inside the git directory, and also caused pointer files to not get
populated.
This parameterizes functions in Annex.Locations with a GitLocationMaker.
The uses of standardGitLocationMaker are in cases where the path returned
by a function should not change when in a linked worktree. For example,
gitAnnexLink uses standardGitLocationMaker because symlink targets should
always be to ".git/annex/objects" paths, even when in a linked worktree.
Hopefully I have gotten all uses of standardGitLocationMaker right.
This also assumes that all path construction to the annex directory
is done via the functions in Annex.Locations, and there is no other,
ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around
since the beginning, and has been used consistently. I think.
---
In fixupUnusualRepos, when symlinks are supported, the .git file is replaced
with a symlink to the linked worktree git directory. And in that directory,
an "annex" symlink points to the main annex directory. In that case,
it's not necessary to set mainWorkTreePath. It would be ok to set it,
but not setting it in that case allows an optimisation of avoiding reading
the "commondir" file.
The change to make fixupUnusualRepos set mainWorkTreePath when the
repository is not initialized yet is done in case the initialization itself
writes to the annex directory. If that were the case, without setting
mainWorkTreePath, the annex symlink would not be set up yet, and so
it might have created the annex directory in the wrong place. Currently
that didn't happen, but now that mainWorkTreePath is available, using it
here avoids any such later problem.
---
This commit does not deal with the mess of a worktree that has
experienced this bug before. In particular, if `git-annex get` were
run in such a worktree, it would have stored the object files in the
linked worktree's git directory, rather than in the main git directory.
Such misplaced object files need to be dealt with; the plan is to make
git-annex fsck notice and fix them.
A worktree that has experienced this bug before will contain unpopulated
pointer files. Those may eventually get fixed up in regular usage of
git-annex, but git-annex fsck will also fix them up.
---
Finally, this has me pondering if all of git-annex's state files should
really be stored in one common place across all linked worktrees. Should
perhaps state files that are specific to the worktree be stored per-worktree?
That has not been the case when using git-annex on filesystems supporting
symlinks, but it *has* been the case on filesystems not supporting
symlinks. Perhaps this leads to some other buggy behavior in some cases.
Or perhaps to extra work being done.
For example, the keys database has an associated files table. Which depends
on the worktree. But reconcileStaged updates that table, so when git-annex
is used first in one worktree and then in another one, reconcileStaged will
update the table to reflect the current worktree. Which is extra work each
time a different worktree is used. But also, what if two git-annex
processes are running at the same time, in separate worktrees? Probably
this needs more thought and investigation.
So there is a risk that this commit exposes such buggy behavior in a
situation where it didn't happen before, due to the filesystem not
supporting symlinks. But, given how much this bug crippled using linked
worktrees in such a situation, I doubt that many people have been doing
that.
Slightly unsatisfying to fix this in Git.Construct.localToUrl rather than
in Command.Map.absRepo, which is what map relies on to make repos always
use the same path form. But it was already constructing the url with the
path there, so that was the easiest place to add normalization.
Sponsored-by: Dartmouth College's OpenNeuro project
It was treating remote paths of a remote repo as if they were local paths,
and so trying to expand git directories and so forth on them. That led to
bad results, including a path like "foo.git" getting turned into
"foo.git.git"
Sponsored-by: Dartmouth College's OpenNeuro project
* Support git remotes that use an url with a user name that is URL encoded.
* Fix git-lfs special remote ssh endpoint discovery when the repository
path is URL encoded.
In the previous commit, Git.Url.host was made to do URL decoding. That made
me wonder, what about URL encoded username and path? And so to these two
additional fixes. Note that Git.Url.authority remains URL encoded. That
seems ok given how it's used.
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.
Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.
Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`
Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.
Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.
Sponsored-by: Leon Schuermann
It seems to make sense to convert both System.Directory and
System.FilePath uses to OsPath in one go. This will generally look like
replacing RawFilePath with OsPath in type signatures, and will be driven
by the now absolutely massive pile of compile errors.
Got a few modules building in this new regime.
Sponsored-by: Jack Hill
This removes that function, using file-io readFile' instead.
Had to deal with newline conversion, which readFileStrict does on
Windows. In a few cases, that was pretty ugly to deal with.
Sponsored-by: Kevin Mueller
And follow-on changes.
Note that relatedTemplate was changed to operate on a RawFilePath, and
so when it counts the length, it is now the number of bytes, not the
number of code points. This will just make it truncate shorter strings
in some cases, the truncation is still unicode aware.
When not building with the OsPath flag, toOsPath . fromRawFilePath and
fromRawFilePath . fromOsPath do extra conversions back and forth between
String and ByteString. That overhead could be avoided, but that's the
non-optimised build mode, so didn't bother.
Sponsored-by: unqueued
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.
Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.
Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
* Added freezecontent-annex and thawcontent-annex hooks that
correspond to the git configs annex.freezecontent and
annex.thawcontent.
* Added secure-erase-annex hook that corresponds to the git config
annex.secure-erase-command.
* Added commitmessage-annex hook that corresponds to the git config
annex.commitmessage-command.
* Added http-headers-annex hook that corresponds to the git config
annex.http-headers-command.
that correspond to the post-update-annex and pre-commit-annex hooks.
The use case for these is eg, setting up a git repository that is run in a
container, where the easiest way to provide a script is by putting it in
.git/hooks/, rather than copying it into the container in a way that puts
it in PATH.
This is all the ones that make sense to add for annex.*-config git configs.
annex.youtube-dl-command is not a hook, it's telling git-annex what command
to run. So is annex.shared-sop-command. So omitted those.
May later also want to add hooks corresponding to
`remote.<name>.annex-cost-command` etc.
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
Previously, when the git config was unable to be read from a ssh remote,
it would try to git fetch from it to determine if the remote was
otherwise accessible. That was unnessary work, since exit status 255
indicates a connection problem.
As well as avoiding the extra work of the fetch, this also improves
things when a ssh remote cannot be connected to due to a problem with
the git-annex ssh control socket. In that situation, ssh will also exit 255.
Before, the git fetch was tried in that situation, and would succeed, since
it does not use the git-annex ssh control socket. git-annex would conclude
that git-annex-shell was not installed on the remote, which could be wrong.
I suppose it also used to be possible for the user to need to enter a
ssh password on each connection to the remote. If they entered the wrong
password for the git-annex-shell call, but then the right password for
the git fetch, it would also incorrectly set annex-ignore, and that
situation is also now fixed.
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.
While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.
That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.
Anyway, there are other use cases for this that don't involve annex+http.
Work around git hash-object --stdin-paths's odd stripping of carriage
return from the end of the line (some windows infection), avoiding crashing
when the repo contains a filename ending in a carriage return.
Since old ones had a buggy git bundle command.
In particular, git 2.30.2 has a git bundle that supports --stdin, but does
not read from it, and so fails to create a bundle.
While not using --stdin would perhaps work, it limits the number of revs
that get included in the bundle to the command line length limit.
But the real kicker is that at the same time --stdin got fixed, a bug also
got fixed that made git bundle skip including refs when they had the same
sha as other refs it included. Which would lead to data loss. So best to
avoid that buggy thing.
Fix a bug where interrupting git-annex while it is updating the git-annex
branch could lead to git fsck complaining about missing tree objects.
Interrupting git-annex while regraftexports is running in a transition
that is forgetting git-annex branch history would leave the
repository with a git-annex branch that did not contain the tree shas
listed in export.log. That lets those trees be garbage collected.
A subsequent run of the same transition then regrafts the trees listed
in export.log into the git-annex branch. But those trees have been lost.
Note that both sides of `if neednewlocalbranch` are atomic now. I had
thought only the True side needed to be, but I do think there may be
cases where the False side needs to be as well.
Sponsored-by: Dartmouth College's OpenNeuro project