Commit graph

3069 commits

Author SHA1 Message Date
Joey Hess
16d7432a2f
prevent deadlock when reconcileStaged runs restagePointerFiles
Fix hang that could occur when using git-annex adjust on a branch with a
number of files greater than annex.queuesize. Or potentially other
commands.

When reconcileStaged is running, the database is being opened. But
restagePointerFiles closes the database, and later writes to it. So it will
deadlock if called by reconcileStaged.

The deadlock occurred when the git queue happened to be full, causing
adding a call to restagePointerFiles to it to flush the queue and
restagePointerFiles to run at the wrong time.

Fixed by making reconcileStaged, when it populates or depopulates a pointer
file, arrange for restagePointerFiles to be run as a cleanup action, rather
than from the git queue.

But, what if restagePointerFiles is already in the git queue before
reconcileStaged is run? If it adds anything else to the git queue, causing
the queue to flush, it would still deadlock. To avoid this hypothetical
situation, added a Annex.inreconcilestaged, and made restagePointerFiles
check it and not do anything.

Note that, I did consider the simpler approach of only running
restagePointerFiles as a cleanup action, rather than from the git queue.
But see commit 6a3bd283b8 for why it was made
to use the queue in the first place. I wanted to avoid tying this bug fix
to a behavior change.

Sponsored-by: mycroft
2025-09-22 14:56:50 -04:00
Joey Hess
dfbf76e2ca
enableremote: Disallow using type= to attempt to change the type of an existing remote
Changing the type out from under an existing special remote exposes the
existing config to something that may interpret it wildly differently. As
seen in the bug report, this can even result in behavior that makes
git-annex say it's buggy. So prevent the user from doing this. --sameas is
the better way.

Sponsored-by: Kevin Mueller
2025-09-22 10:54:16 -04:00
Joey Hess
2b1e9eced2
open feed file with close-on-exec bit set
parseFeedFromFile does not set the bit, so open and read the file
ourselves.

Versioned dependency on utf8-string should not cause any issues,
that version is available in all all versions of debian that package it.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-09-05 16:02:17 -04:00
Joey Hess
6f9a9c81f6
convert all readFile, writeFile, and appendFile to close-on-exec safe versions
Even in the Build system. This allows grepping to make sure that there
are none left un-converted:

git grep "writeFile" |grep -v F\\.| grep -v doc/|grep -v writeFileString | grep -v writeFileProtected |grep -v Utility/FileIO
git grep "readFile" |grep -v F\\.| grep -v doc/|grep -v readFileString |grep -v Utility/FileIO
git grep "appendFile" |grep -v F\\.| grep -v doc/|grep -v appendFileString |grep -v Utility/FileIO

Might be nice to automate that to prevent future mistakes...

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-09-05 15:44:32 -04:00
Joey Hess
033e4b086f
audit all openFd and dupping for close-on-exec
Made all uses of openFd and dup set the close-on-exec flag, with a few
exceptions when starting a git-annex daemon.

Made openFdWithMode be used everywhere, rather than openFd.
Adding a new parameter to it ensures I checked everything.
And will help to make sure this gets considered in the future when
opening fds.

In lockPidFile, the only thing that keeps the pid file locked, once
daemonize re-runs the command in a new session, is that the fd is
inherited.

In Utility.LogFile.redir, the new fd it dups to does not have the
close-on-exec flag set, because this is used to set up the stdout and
stderr fds, which need to be inherited by child processes.

Same in Assistant.startDaemon where the browser gets started with the
original stdout and stderr.

This does nothing about uses of openFile and similar!

Sponsored-By: mycroft
2025-09-04 16:01:41 -04:00
Joey Hess
146d224c63
drop: --fast support when dropping from a remote
This is the same as --not --in $remote, but easier to type. And the
documentation of --fast helps also document that drop can do extra work
when used without --fast.

Sponsored-by: Nicholas Golder-Manning
2025-08-29 12:45:33 -04:00
Joey Hess
2a0ec700af
remove youtube-dl support, always use yt-dlp
The annex.youtube-dl-command git config is no longer used, git-annex always
runs the yt-dlp command, rather than the old youtube-dl command.

Sponsored-by: Leon Schuermann
2025-08-27 09:29:43 -04:00
Joey Hess
75be161574
remove git version check for adjusted branch
2686d2d7ea made git older than 2.5 not be
supported, so this check for an older version is not longer needed.

Sponsored-by: Kevin Mueller
2025-08-21 11:12:36 -04:00
Joey Hess
2686d2d7ea
Removed support for git versions older than 2.5.
This entirely removes Git.BuildVersion, which avoids the possibility that
git-annex will behave differently based on the version of git it was built
with, rather than the version it's used with.

Debian oldoldstable is the oldest version of git that git-annex needs to
support, since it's used in the amd64ancient build.

cabal configure will fail if the git version is too old.

Sponsored-by: Nicholas Golder-Manning
2025-08-21 11:04:26 -04:00
Joey Hess
0924a45cc4
info: Added --show option
To pick which parts of the info to calculate and display.

Sponsored-by: Dartmouth College's DANDI project
2025-08-13 16:49:21 -04:00
Joey Hess
d3fbda13e4
p2p --enable
p2p: Added --enable option, which can be used to enable P2P networks
provided by external commands git-annex-p2p-<netname>

Made git-annex p2p --enable tor behave the same as git-annex enable-tor,
to make tor a bit less of a special case. However, it canot be run as root,
since it cannot take the user id parameter.
2025-07-30 14:08:59 -04:00
Joey Hess
a6f8248465
add connProcess to P2PConnection
When using the new generic P2P transport to open an outgoing connection
to a peer, this will hold the pid of the git-annex-p2p-<netname>
command.

closeConnection simply waits for it. Rather than relying on garbage
collection of the closed handles to close it.

In Remote.Helper.Ssh, connProcess is set to Nothing, even though there
is a similar process being used there. That code stores the pid in
OpenConnection instead, and handles waiting for it itself. A bit ugly,
but not worth cleaning up at this point, maybe later.
2025-07-30 12:35:16 -04:00
Joey Hess
f631bc9e56
add P2PAnnex constructor
This is for p2p-annex:: urls that will use the new generic P2P
transport.

In addressCredsFile, threw in an url encoding of any non-alphanumeric
characters that are in the address. This is to avoid any possible path
traversal attacks via a p2p-annex:: url, since the address part of it
could contain any characters. And, went ahead and did the same url
encoding of tor-annex:: urls, even though tor onion addresses are all
alphanumerics, on the off chance that might avoid a similar problem.
(It does not seem likely enough to treat it as a security hole.)
2025-07-30 12:09:17 -04:00
Joey Hess
ba24f78626
fix build with OsPath build flag 2025-07-21 12:26:45 -04:00
Joey Hess
758515dc9a
fsck: Fix location of annexed files when run in linked worktrees
This cleans up after the bug that was fixed in commit
6a9e923c74
Object files that were stored in the wrong location are rescued,
and after that any wrong location logs will be fixed by the usual fsck.
2025-07-15 13:09:45 -04:00
Joey Hess
ef30fa2fa9
support combineing --socket with HTTPs
Might be useful when proxying? Dunno.
2025-07-07 16:41:19 -04:00
Joey Hess
492c484a82
p2phttp: Added --socket option
Used protectedOutput to set up a umask that makes the socket only
accessible by the current user.

Authentication is still needed when using this option unless it is combined
with --wideopen. It was just simpler to keep authentication separate from
this.
2025-07-07 16:40:02 -04:00
Joey Hess
66b009a0f6
p2phttp: Scan multilevel directories with --directory
This allows for eg dir/user/repo structure. But also other layouts. It
still does not look for repositories that are nested inside other
repositories.

The check for symlinks is mostly to avoid cycles that would prevent
findRepos from returning. Eg, foo/bar/baz being a symlink to foo/bar.

If the directory is writable by someone else they can still race it and
get it to follow a symlink to some other directory. I don't think p2phttp
needs to worry about that kind of situation though, and I doubt it avoids
such problems when operating on files in a git-annex repository either.
2025-07-07 16:07:13 -04:00
Joey Hess
46ee651c94
non-tor AuthTokens
As groundwork for making git-annex p2p support other P2P networks than
tor hidden services, when an AuthToken is not a TorAnnex value, but
something else (that will be added later), store the P2PAddress that it
will be used with along with the AuthToken. And in loadP2PAuthTokens,
only return AuthTokens for the specified P2PAddress.

See commit 2de27751d6 for some design work
that led to this.

Also, git-annex p2p --gen-addresses is changed to generate a separate
AuthToken for every P2P address. Rather than generating a single
AuthToke and using it for every one. When we have more than just tor,
this will be important for security, to avoid a compromise of one P2P
network exposing the AuthToken used for another network.
2025-07-07 15:10:15 -04:00
Joey Hess
9f4e956346
sync: push current branch first
sync: Push the current branch first, rather than a synced branch, to better
support git forges (gitlab, gitea, forgejo, etc.) which use push-to-create
with the first pushed branch becoming the default branch.

With considerable complication to filter out warning message about
receive.denyCurrentBranch when pushing to a non-bare repository. Localization
may break it in the future, but it seems like the best way to handle this. See
my comments for the gory details.
2025-06-04 12:06:00 -04:00
Joey Hess
f167e7f55b
adjust annex.synccontent transition warning
sync will also be changing to drop unwanted content by default, this
wording change avoids leaving the wrong impression
2025-05-30 14:30:01 -04:00
Joey Hess
f6eac67f0e
rename repoName to repoDesc
That's what the function mostly is, if it shows a remote name it's only
in an edge case, where that is the best description of it available.
2025-05-29 12:55:40 -04:00
Joey Hess
2fad57de44
fix display of remote name in json
Also fixes it in the graphviz map in some cases, where there is no
description for a repository.

And in json, use the remote name, never the description, since the field
is "remote" which is intended to be the git remote name.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-05-29 12:53:42 -04:00
Joey Hess
a44638ca73
adjust json field names
Avoid using "name" for what git-annex otherwise refers to as a
description.

(For the remotes in the map, the "remote" field should be the remote
name, but there is a bug preventing it from being that.)

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-05-29 12:42:53 -04:00
Joey Hess
52a8b5b117
map: Support --json option
Sponsored-by: Dartmouth College's OpenNeuro project
2025-05-28 14:17:28 -04:00
Joey Hess
286a681b57
remove dangling where 2025-05-20 09:37:33 -04:00
Joey Hess
e64e9d5fae
whereused: Fix bug that could find matches from grafts in remote git-annex branches
git log with --remotes= needs the preceeding --exclude=*/git-annex in order
to not look at git-annex branches of remotes.

Sponsored-by: mycroft
2025-05-05 14:32:25 -04:00
Joey Hess
2ee6c25c72
map: Fix buggy handling of remotes that are bare git repositories accessed via ssh
It was treating remote paths of a remote repo as if they were local paths,
and so trying to expand git directories and so forth on them. That led to
bad results, including a path like "foo.git" getting turned into
"foo.git.git"

Sponsored-by: Dartmouth College's OpenNeuro project
2025-04-22 15:21:01 -04:00
Joey Hess
7b3d7a8f78
fix message
also dead code removal
2025-04-22 13:36:54 -04:00
Joey Hess
7fb413189a
migrate: Fix --remove-size to work when a file is not present
5f74a45861 added this bug
2025-04-01 10:47:31 -04:00
Joey Hess
e81fd72018
Added remote.name.annex-web-options config
Which is a per-remote version of the annex.web-options config.

Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a
special remote does not use curl, there was no need to do that and I used
Nothing instead.

In the case of the addurl and importfeed commands, it seemed best to say
that running these commands is not using the web special remote per se,
so the config is not used for those commands.
2025-04-01 10:17:38 -04:00
Joey Hess
cc8f7e9776
fsck: Avoid complaining about required content of dead repositories
requiredContentMap does not exclude dead repos. Usually this is not a
problem because it is used when we are operating on a repository, and in
that case, the repository is not dead (or if it is, the required content
configurations should still be used). But in the case of fsck, this made a
old required content config for a dead repository be warned about in a
situation where it is not a problem.
2025-03-26 10:30:33 -04:00
Joey Hess
d0b5a09b0e
deal with NoUUID in checkCanProxy
updatecluster, updateproxy: When a remote that has no annex-uuid is
configured as annex-cluster-node, warn and avoid writing bad data to the
git-annex branch.

The proxy.log and cluster.log end up unparseable when a NoUUID gets written
to them.
2025-03-21 12:29:44 -04:00
Joey Hess
74457b6b93
findcompute --inputs
Useful for eg, generating dependency graphs.
2025-03-19 15:39:05 -04:00
Joey Hess
bcfd554a0f
findcomputed: New command, displays information about computed files. 2025-03-18 12:55:48 -04:00
Joey Hess
d74d2d5d91
--json for addcomputed and recompute
Not very useful, but it does work.
2025-03-17 15:51:43 -04:00
Joey Hess
2d60ce4803
record fscked files in fsck db by default
Remember the files that are checked, so a later run with --more will
skip them, without needing to use --incremental.
2025-03-17 15:34:08 -04:00
Joey Hess
23538ea17b
annex.addunlocked support for git-annex compute
And for git-annex recompute, add the file unlocked when the original is
unlocked.
2025-03-17 14:26:09 -04:00
Joey Hess
a673fc7cfd
recompute: stage new version of file in git
When writing doc/tips/computing_annexed_files.mdwn, I noticed
that a recompute --reproducible followed by a drop and a re-get did not
actually test if the file could be reproducible computed again.

Turns out that get and drop both operate on staged files. If there is an
unstaged modification in the work tree, that's ignored. Somewhat
surprisingly, other commands like info do operate on staged files. So
behavior is inconsistent, and fairly surprising really, when there are
unstaged modifications to files.

Probably this is rarely noticed because `git-annex add` is used to add a
new version of a file, and then it's staged. Or `git mv` is used to move
a file, rather than `mv` of a file over top of an existing file. So it's
uncommon to have an unstaged annexed file in a worktree.

It might be worth making things more consistent, but that's out of scope
for what I'm working on currently.

Also, I anticipate that supporting unlocked files with recompute will
require it to stage changes anyway.

So, make recompute stage the new version of the file.

I considered having recompute refuse to overwrite an existing staged
file. After all, whatever version was staged before will get lost when
the new version is staged over top of it. But, that's no different than
`git-annex addcomputed` being run with the name of an existing staged
file. Or `git-annex add` being run with a new file content when there is
an existing staged file. Or, for that matter, `git add` being ran with a
new content when there is an existing staged file.
2025-03-12 13:42:00 -04:00
Joey Hess
0712ae020c
fix recompute --reproducible run on a VURL key
This avoids "Cannot generate a key for backend VURL", and makes it use
the usual hashing backend.
2025-03-12 11:48:29 -04:00
Joey Hess
0477a8d098
add INPUT-REQUIRED
Used by git-annex-compute-singularity to make addcomputed --fast work.

Also, simplified git-annex-compute-singularity; there is no need to hard
link the container into place. singularity does not care about the
extension of the container, so can just pass it the annex object file.
2025-03-11 11:46:31 -04:00
Joey Hess
c6c6e2632d
avoid unncessary git-annex branch changes for recompute and addcomputed 2025-03-06 12:41:30 -04:00
Joey Hess
ccc454a791
computation progress display 2025-03-05 13:46:06 -04:00
Joey Hess
51538fa0a8
improve error message when unable to get an input file
In this case, the compute program is run the same as if addcomputed --fast
were used, so it should succeed, without outputting a computed file.

computeInputsUnavailable is in ComputeState for simplicity, but it is
not serialized with the rest of the ComputeState.
2025-03-04 13:13:18 -04:00
Joey Hess
b395bd4f56
move showOutput into compute remote 2025-03-04 10:02:33 -04:00
Joey Hess
89bfeada87
recompute: display one of the changed files 2025-03-03 15:12:19 -04:00
Joey Hess
b01a0d2323
avoid recomputing every time on git inputs 2025-03-03 14:56:49 -04:00
Joey Hess
a0d6a6ea2a
support git files as input to computations
Using GIT keys, like are used when exporting git files to special
remotes. Except here the GIT key refers to a file checked into the git
repo.

Note that, since the compute remote uses catObject to get the content,
a symlink that is checked into git does not get followed. This is important
for security, because following a symlink and adding the content to the
repo as an annex object would allow exfiltrating content from outside
the repository.

Instead, the behavior with a symlink is to run the computation on the
symlink target. This may turn out to be confusing, and it might be worth
addcomputed checking if the file in git is a symlink and erroring out.
Or it could follow symlinks as long as the destination is a file in the
repisitory.
2025-03-03 12:09:25 -04:00
Joey Hess
6ebab7fb00
factor out Annex.GitShaKey 2025-03-03 11:09:28 -04:00
Joey Hess
63d73d8d1b
record VURL key hashes in addcomputed and recompute 2025-03-03 10:57:56 -04:00