Commit graph

838 commits

Author SHA1 Message Date
Joey Hess
117b520825
Removed support for git versions older than 2.22.
Which fixed an OOM with unlocked files.

Debian oldoldstable is the oldest version of git that git-annex needs to
support, since it's used in the amd64ancient build. That has 2.30.

At this point, the only complication git-annex for an bug in old
versions of git is that git bundle was broken before git 2.31.
That prevents git-remote-annex from working with git 2.30.

Sponsored-by: Luke T. Shumaker
2025-08-21 11:22:59 -04:00
Joey Hess
2686d2d7ea
Removed support for git versions older than 2.5.
This entirely removes Git.BuildVersion, which avoids the possibility that
git-annex will behave differently based on the version of git it was built
with, rather than the version it's used with.

Debian oldoldstable is the oldest version of git that git-annex needs to
support, since it's used in the amd64ancient build.

cabal configure will fail if the git version is too old.

Sponsored-by: Nicholas Golder-Manning
2025-08-21 11:04:26 -04:00
Joey Hess
6a9e923c74
fix handling of linked worktrees on filesystems w/o symlinks
Fix bug in handling of linked worktrees on filesystems not supporting
symlinks, that caused annexed file content to be stored in the wrong
location inside the git directory, and also caused pointer files to not get
populated.

This parameterizes functions in Annex.Locations with a GitLocationMaker.
The uses of standardGitLocationMaker are in cases where the path returned
by a function should not change when in a linked worktree. For example,
gitAnnexLink uses standardGitLocationMaker because symlink targets should
always be to ".git/annex/objects" paths, even when in a linked worktree.
Hopefully I have gotten all uses of standardGitLocationMaker right.

This also assumes that all path construction to the annex directory
is done via the functions in Annex.Locations, and there is no other,
ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around
since the beginning, and has been used consistently. I think.

---

In fixupUnusualRepos, when symlinks are supported, the .git file is replaced
with a symlink to the linked worktree git directory. And in that directory,
an "annex" symlink points to the main annex directory. In that case,
it's not necessary to set mainWorkTreePath. It would be ok to set it,
but not setting it in that case allows an optimisation of avoiding reading
the "commondir" file.

The change to make fixupUnusualRepos set mainWorkTreePath when the
repository is not initialized yet is done in case the initialization itself
writes to the annex directory. If that were the case, without setting
mainWorkTreePath, the annex symlink would not be set up yet, and so
it might have created the annex directory in the wrong place. Currently
that didn't happen, but now that mainWorkTreePath is available, using it
here avoids any such later problem.

---

This commit does not deal with the mess of a worktree that has
experienced this bug before. In particular, if `git-annex get` were
run in such a worktree, it would have stored the object files in the
linked worktree's git directory, rather than in the main git directory.
Such misplaced object files need to be dealt with; the plan is to make
git-annex fsck notice and fix them.

A worktree that has experienced this bug before will contain unpopulated
pointer files. Those may eventually get fixed up in regular usage of
git-annex, but git-annex fsck will also fix them up.

---

Finally, this has me pondering if all of git-annex's state files should
really be stored in one common place across all linked worktrees. Should
perhaps state files that are specific to the worktree be stored per-worktree?
That has not been the case when using git-annex on filesystems supporting
symlinks, but it *has* been the case on filesystems not supporting
symlinks. Perhaps this leads to some other buggy behavior in some cases.
Or perhaps to extra work being done.

For example, the keys database has an associated files table. Which depends
on the worktree. But reconcileStaged updates that table, so when git-annex
is used first in one worktree and then in another one, reconcileStaged will
update the table to reflect the current worktree. Which is extra work each
time a different worktree is used. But also, what if two git-annex
processes are running at the same time, in separate worktrees? Probably
this needs more thought and investigation.

So there is a risk that this commit exposes such buggy behavior in a
situation where it didn't happen before, due to the filesystem not
supporting symlinks. But, given how much this bug crippled using linked
worktrees in such a situation, I doubt that many people have been doing
that.
2025-07-14 13:20:39 -04:00
Joey Hess
fd493804c0
map: Avoid looping forever with mutually recursive paths between repositories accessed via ssh
Slightly unsatisfying to fix this in Git.Construct.localToUrl rather than
in Command.Map.absRepo, which is what map relies on to make repos always
use the same path form. But it was already constructing the url with the
path there, so that was the easiest place to add normalization.

Sponsored-by: Dartmouth College's OpenNeuro project
2025-04-22 15:50:30 -04:00
Joey Hess
2ee6c25c72
map: Fix buggy handling of remotes that are bare git repositories accessed via ssh
It was treating remote paths of a remote repo as if they were local paths,
and so trying to expand git directories and so forth on them. That led to
bad results, including a path like "foo.git" getting turned into
"foo.git.git"

Sponsored-by: Dartmouth College's OpenNeuro project
2025-04-22 15:21:01 -04:00
Joey Hess
bb04d1e71f
URL decoding for username and path
* Support git remotes that use an url with a user name that is URL encoded.
* Fix git-lfs special remote ssh endpoint discovery when the repository
  path is URL encoded.

In the previous commit, Git.Url.host was made to do URL decoding. That made
me wonder, what about URL encoded username and path? And so to these two
additional fixes. Note that Git.Url.authority remains URL encoded. That
seems ok given how it's used.
2025-04-02 15:29:46 -04:00
Joey Hess
ff520b06ac
Support git remotes that use a IPV6 link-local address with a zone ID
Fixed 3 problems, and it seems to work now for both forms:

ssh://[fe80::7697:xxx:xxxx:xxxx%wlp3s0]/foo
fe80::7697:xxx:xxxx:xxxx%wlp3s0:foo
2025-04-02 15:11:54 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
a149336a59
OsPath transition Windows build fixes
This gets it building on Windows again, with 1 test suite failure
(addurl).

Sponsored-by: Kevin Mueller
2025-02-11 15:40:53 -04:00
Joey Hess
9dc43396b3
fix comment 2025-02-11 14:07:01 -04:00
Joey Hess
27a0bacc49
improved OsPath conversion 2025-02-11 14:05:56 -04:00
Joey Hess
780a379ab1
remove unused functions from Utility.RawFilePath 2025-02-11 13:49:17 -04:00
Joey Hess
2ff716be30
OsPath build flag no longer depends on filepath-bytestring
However, filepath-bytestring is still in Setup-Depends.
That's because Utility.OsPath uses it when not built with OsPath.
It would be maybe possible to make Utility.OsPath fall back to using
filepath, and eliminate that dependency too, but it would mean either
wrapping all of System.FilePath's functions, or using `type OsPath = FilePath`

Annex.Import uses ifdefs to avoid converting back to FilePath when not
on windows. On windows it's a bit slower due to that conversion.
Utility.Path.Windows.convertToWindowsNativeNamespace got a bit
slower too, but not really worth optimising I think.

Note that importing Utility.FileSystemEncoding at the same time as
System.Posix.ByteString will result in conflicting definitions for
RawFilePath. filepath-bytestring avoids that by importing RawFilePath
from System.Posix.ByteString, but that's not possible in
Utility.FileSystemEncoding, since Setup-Depends does not include unix.
This turned out not to affect any code in git-annex though.

Sponsored-by: Leon Schuermann
2025-02-10 16:39:55 -04:00
Joey Hess
c74c75b352
more OsPath conversion (639/749)
Sponsored-by: k0ld
2025-02-07 16:07:05 -04:00
Joey Hess
a5d48edd94
more OsPath conversion (602/749)
Sponsored-by: Brock Spratlen
2025-02-07 14:46:11 -04:00
Joey Hess
0811531b59
more OsPath conversion (542/749)
Sponsored-by: Luke T. Shumaker
2025-02-06 11:38:14 -04:00
Joey Hess
b28433072c
more OsPath conversion (475/749)
Sponsored-by: Nicholas Golder-Manning
2025-02-05 12:14:56 -04:00
Joey Hess
4dc904bbad
more OsPath conversion
Sponsored-by: Leon Schuermann
2025-02-04 16:09:47 -04:00
Joey Hess
5cc8d9d03b
replace removeLink with removeFile
removeFile calls unlink so removes anything not a directory. So these
are replaceable in order to convert to OsPath.
2025-02-02 14:16:58 -04:00
Joey Hess
71195cce13
more OsPath conversion
Sponsored-by: k0ld
2025-02-01 14:06:38 -04:00
Joey Hess
474cf3bc8b
more OsPath conversion
Sponsored-by: Brock Spratlen
2025-02-01 11:54:19 -04:00
Joey Hess
c69e57aede
more OsPath conversion
Sponsored-by: Jack Hill
2025-01-30 15:46:32 -04:00
Joey Hess
97c83152d6
Merge branch 'master' into ospath 2025-01-29 18:48:02 -04:00
Joey Hess
998de2e7ce
remove temp debugging code 2025-01-29 17:19:01 -04:00
Joey Hess
415455883c
debug test suite crash on windows 2025-01-29 16:37:54 -04:00
Joey Hess
27305042f3
more OsPath conversion
Sponsored-by: Nicholas Golder-Manning
2025-01-29 11:53:20 -04:00
Joey Hess
f3539efc16
more OsPath conversion
Sponsored-by: Leon Schuermann
2025-01-24 16:31:14 -04:00
Joey Hess
aa0f3f31da
more OsPath conversion
Sponsored-by: Eve
2025-01-24 15:02:29 -04:00
Joey Hess
c412c59ecd
more OsPath conversion
About 1/10th done with this I think.
2025-01-24 13:40:44 -04:00
Joey Hess
ea775baccd
more OsPath conversion
Git.Types now uses it, as does TopFilePath, making for plenty of new
compile errors needing fixing.

Sponsored-by: Brock Spratlen
2025-01-23 16:15:00 -04:00
Joey Hess
c3c8870752
add System.FilePath to this conversion
It seems to make sense to convert both System.Directory and
System.FilePath uses to OsPath in one go. This will generally look like
replacing RawFilePath with OsPath in type signatures, and will be driven
by the now absolutely massive pile of compile errors.

Got a few modules building in this new regime.

Sponsored-by: Jack Hill
2025-01-23 11:07:29 -04:00
Joey Hess
77e9781ae2
parsePOSIXTime ByteString conversion
Some easy (though tiny) speed wins.

Sponsored-by: Luke T. Shumaker on Patreon
2025-01-22 16:42:09 -04:00
Joey Hess
6e27b0d4d1
convert from readFileStrict
This removes that function, using file-io readFile' instead.

Had to deal with newline conversion, which readFileStrict does on
Windows. In a few cases, that was pretty ugly to deal with.

Sponsored-by: Kevin Mueller
2025-01-22 16:20:36 -04:00
Joey Hess
9b79f0f43d
use file-io for readFile/writeFile/appendFile on ByteStrings
These are all straightforward, and easy small performance wins.

Sponsored-by: Nicholas Golder-Manning
2025-01-22 14:30:25 -04:00
Joey Hess
793ddecd4b
use openTempFile from file-io
And follow-on changes.

Note that relatedTemplate was changed to operate on a RawFilePath, and
so when it counts the length, it is now the number of bytes, not the
number of code points. This will just make it truncate shorter strings
in some cases, the truncation is still unicode aware.

When not building with the OsPath flag, toOsPath . fromRawFilePath and
fromRawFilePath . fromOsPath do extra conversions back and forth between
String and ByteString. That overhead could be avoided, but that's the
non-optimised build mode, so didn't bother.

Sponsored-by: unqueued
2025-01-22 11:41:43 -04:00
Joey Hess
1ceece3108
RawFilePath conversion of System.Directory
By using System.Directory.OsPath, which takes and returns OsString,
which is a ShortByteString. So, things like dirContents currently have the
overhead of copying that to a ByteString, but that should be less than
the overhead of using Strings which often in turn were converted to
RawFilePaths.

Added Utility.OsString and the OsString build flag. That flag is turned
on in the stack.yaml, and will be turned on automatically by cabal when
built with new enough libraries. The stack.yaml change is a bit ugly,
and that could be reverted for now if it causes any problems.

Note that Utility.OsString.toOsString on windows is avoiding only a
check of encoding that is documented as being unlikely to fail. I don't
think it can fail in git-annex; if it could, git-annex didn't contain
such an encoding check before, so at worst that should be a wash.
2025-01-20 19:17:33 -04:00
Joey Hess
104ca5e09e
Support help.autocorrect settings "never" and "immediate" 2025-01-20 11:01:07 -04:00
Joey Hess
b0ef04f0b7
Support help.autocorrect=prompt 2025-01-20 10:56:12 -04:00
Joey Hess
a73fa77417
added hooks corresponding to annex.*-command
* Added freezecontent-annex and thawcontent-annex hooks that
  correspond to the git configs annex.freezecontent and
  annex.thawcontent.
* Added secure-erase-annex hook that corresponds to the git config
  annex.secure-erase-command.
* Added commitmessage-annex hook that corresponds to the git config
  annex.commitmessage-command.
* Added http-headers-annex hook that corresponds to the git config
  annex.http-headers-command.
  that correspond to the post-update-annex and pre-commit-annex hooks.

The use case for these is eg, setting up a git repository that is run in a
container, where the easiest way to provide a script is by putting it in
.git/hooks/, rather than copying it into the container in a way that puts
it in PATH.

This is all the ones that make sense to add for annex.*-config git configs.
annex.youtube-dl-command is not a hook, it's telling git-annex what command
to run. So is annex.shared-sop-command. So omitted those.

May later also want to add hooks corresponding to
`remote.<name>.annex-cost-command` etc.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-01-10 14:54:42 -04:00
Joey Hess
a19a3076b5
ssh exit status 255 is a connection problem
Previously, when the git config was unable to be read from a ssh remote,
it would try to git fetch from it to determine if the remote was
otherwise accessible. That was unnessary work, since exit status 255
indicates a connection problem.

As well as avoiding the extra work of the fetch, this also improves
things when a ssh remote cannot be connected to due to a problem with
the git-annex ssh control socket. In that situation, ssh will also exit 255.
Before, the git fetch was tried in that situation, and would succeed, since
it does not use the git-annex ssh control socket. git-annex would conclude
that git-annex-shell was not installed on the remote, which could be wrong.

I suppose it also used to be possible for the user to need to enter a
ssh password on each connection to the remote. If they entered the wrong
password for the git-annex-shell call, but then the right password for
the git fetch, it would also incorrectly set annex-ignore, and that
situation is also now fixed.
2025-01-03 14:46:16 -04:00
Joey Hess
dd052dcba1
annexInsteadOf config
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.

While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.

That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.

Anyway, there are other use cases for this that don't involve annex+http.
2024-12-03 14:39:07 -04:00
Joey Hess
0c08ff3d2c
deal with git's CFLR nonsense once again
Work around git hash-object --stdin-paths's odd stripping of carriage
return from the end of the line (some windows infection), avoiding crashing
when the repo contains a filename ending in a carriage return.
2024-12-02 13:47:51 -04:00
Joey Hess
31a38f8468
git-remote-annex: Require git version 2.31 or newer
Since old ones had a buggy git bundle command.

In particular, git 2.30.2 has a git bundle that supports --stdin, but does
not read from it, and so fails to create a bundle.

While not using --stdin would perhaps work, it limits the number of revs
that get included in the bundle to the command line length limit.

But the real kicker is that at the same time --stdin got fixed, a bug also
got fixed that made git bundle skip including refs when they had the same
sha as other refs it included. Which would lead to data loss. So best to
avoid that buggy thing.
2024-11-20 15:00:17 -04:00
Joey Hess
75b3f0eb75
fix build with old base
i386ancient has a base too old for NE.singleton
2024-09-30 11:02:08 -04:00
Joey Hess
4ca3d1d584
remove read of the heads
and one tail

Removed head from Utility.PartialPrelude in order to avoid the build
warning with recent ghc versions as well.
2024-09-26 18:43:59 -04:00
Joey Hess
43f31121a5
Git: use NonEmpty in fullconfig
This is a nice win. Avoids partial functions, by encoding at the type
level the fact that fullconfig is never an empty list.
2024-09-26 17:54:36 -04:00
Joey Hess
343c87db45
improve haddocks 2024-08-13 15:05:49 -04:00
Joey Hess
48657405c6
cache credentials for p2phttp in memory 2024-07-23 18:45:02 -04:00
Joey Hess
bbf261487d
add git-annex updatecluster command
Seems to work fine, making the right changes to the git-annex branch.
2024-06-14 15:02:01 -04:00
Joey Hess
0ffb0a4d25
dash is legal in git remote names 2024-06-12 13:24:31 -04:00