Commit graph

725 commits

Author SHA1 Message Date
Joey Hess
73060eea51
annex.fastcopy
Added annex.fastcopy and remote.name.annex-fastcopy config setting. When
set, this allows the copy_file_range syscall to be used, which can eg allow
for server-side copies on NFS. (For fastest copying, also disable
annex.verify or remote.name.annex-verify.)

This is a simple implementation, that does not handle resuming as well as
it possibly could.

It can be used with both local git remotes (including on NFS), and
directory special remotes. Other types of remotes could in theory also
support it, so I've left the config documented as a general thing.
2025-06-03 15:01:38 -04:00
Joey Hess
9024d8e2d1
fixes for enabling and autoenabling mask special remotes 2025-04-11 13:18:23 -04:00
Joey Hess
1313cc4d60
mask remotes, partial implementation
Everything implemented except for passing through to the masked remote.
Which should be trivial.
2025-04-10 13:10:07 -04:00
Joey Hess
e81fd72018
Added remote.name.annex-web-options config
Which is a per-remote version of the annex.web-options config.

Had to plumb RemoteGitConfig through to getUrlOptions. In cases where a
special remote does not use curl, there was no need to do that and I used
Nothing instead.

In the case of the addurl and importfeed commands, it seemed best to say
that running these commands is not using the web special remote per se,
so the config is not used for those commands.
2025-04-01 10:17:38 -04:00
Joey Hess
83163ae08a
typo 2025-03-26 11:15:58 -04:00
Joey Hess
bcfd554a0f
findcomputed: New command, displays information about computed files. 2025-03-18 12:55:48 -04:00
Joey Hess
52f51d065a
rename config to annex.security.allowed-compute-programs
And require for enable as well as autoenable.

It seemed asking for trouble for `git-annex enable foo` to use whatever
compute program is stored in the git config, without verifying that the
user wants that program to be used.

Note that it would be good to allow `git-annex enable foo program=...`
to be used without the program being in the git config. Not implemented yet
though.
2025-03-03 16:12:03 -04:00
Joey Hess
f32d2aecce
autoenable security for compute special remote
Added annex.security.autoenable-compute-programs and only allow
autoenabling special remotes that use compute programs on that list.

The reason this is needed is a user might have some compute programs
that are less safe to use than others. They might want to use an unsafe
one only with one repository, where they are the only committer or other
committers are trusted. They might be ok with others being used by any
repository, and if so they can add them to the list.

Another reason would be a user who has installed a compute program by
accident. Eg, it might be included with git-annex at some point, or
pulled in by some dependency. That user doesn't necessarily want that
compute program to be used in an autoenabled special remote.
2025-03-03 15:52:56 -04:00
Joey Hess
c1b53dbbd0
wip 2025-02-20 13:27:47 -04:00
Joey Hess
b5319ec575
documentation for compute remote and associated commands
None of this is implemented yet.
2025-02-19 14:29:18 -04:00
matrss
eab8aec4f0 2025-01-30 14:50:58 +00:00
Joey Hess
42d55bc57c
pre-init config and hook
Added annex.pre-init-command git config and pre-init-annex hook that is run
before git-annex repository initialization.

This can block initialization. Or it can preform pre-initialization
configuration or tweaking.

I left stdio connected while it's running, so it could also be used for
interactive prompting conceivably, although that would want to use /dev/tty
anyway probably in order to not pollute the stdout of a command when
automatic initialization is done.

Sponsored-by: Dartmouth College's OpenNeuro project
2025-01-13 14:22:49 -04:00
Joey Hess
ce49caec60
document files 2025-01-13 13:14:12 -04:00
Joey Hess
a73fa77417
added hooks corresponding to annex.*-command
* Added freezecontent-annex and thawcontent-annex hooks that
  correspond to the git configs annex.freezecontent and
  annex.thawcontent.
* Added secure-erase-annex hook that corresponds to the git config
  annex.secure-erase-command.
* Added commitmessage-annex hook that corresponds to the git config
  annex.commitmessage-command.
* Added http-headers-annex hook that corresponds to the git config
  annex.http-headers-command.
  that correspond to the post-update-annex and pre-commit-annex hooks.

The use case for these is eg, setting up a git repository that is run in a
container, where the easiest way to provide a script is by putting it in
.git/hooks/, rather than copying it into the container in a way that puts
it in PATH.

This is all the ones that make sense to add for annex.*-config git configs.
annex.youtube-dl-command is not a hook, it's telling git-annex what command
to run. So is annex.shared-sop-command. So omitted those.

May later also want to add hooks corresponding to
`remote.<name>.annex-cost-command` etc.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2025-01-10 14:54:42 -04:00
Joey Hess
5df1b2b36e
configs annex.post-update-command and annex.pre-commit-command
Added git configs annex.post-update-command and annex.pre-commit-command
that correspond to the git-annex hook scripts post-update-annex and
pre-commit-annex.

Note that the hook files take precience over the git config, since the git
config can includ global config which should be overridden by local config.

These new git configs are probably not super useful. Especially the
pre-commit-annex hook is there to install scripts to instead of the
pre-commit hook, since git-annex installs that hook itself. So why would
someone want to use a git config for that? Only reason I can think of would
be in a global git config. Or possibly because it's easier to set a git
config than write a hook script, on an OS like Windows.

The real reason I'm adding these is as groundwork for making other
annex.*-command git configs also be available as hook scripts. I want
to avoid having some things available as only git hooks and others as
both gitconfigs and git hooks. (It seems that some annex.*-command configs
don't translate to git hooks though.)

In the man page, moved documentation of the hooks to be next to the
documentation of the git configs. This is to avoid repitition.
2025-01-10 13:27:51 -04:00
Joey Hess
dd052dcba1
annexInsteadOf config
Added config `url.<base>.annexInsteadOf` corresponding to git's
`url.<base>.pushInsteadOf`, to configure the urls to use for accessing the
git-annex repositories on a server without needing to configure
remote.name.annexUrl in each repository.

While one use case for this would be rewriting urls to use annex+http,
I decided not to add any kind of special case for that. So while
git-annex p2phttp, when serving multiple repositories, needs an url
of eg "annex+http://example.com/git-annex/ for each of them, rewriting an
url like "https://example.com/git/foo/bar" with this config set to
"https://example.com/git/" will result in eg
"annex+http://example.com/git-annex/foo/bar", which p2phttp does not
support.

That seems better dealt with in either git-annex p2phttp or a http
middleware, rather than complicating the config with a special case for
annex+http.

Anyway, there are other use cases for this that don't involve annex+http.
2024-12-03 14:39:07 -04:00
Joey Hess
b8a717a617
reuse http url password for p2phttp url when on same host
When remote.name.annexUrl is an annex+http(s) url, that uses the same
hostname as remote.name.url, which is itself a http(s) url, they are
assumed to share a username and password.

This avoids unnecessary duplicate password prompts.
2024-11-19 15:27:26 -04:00
Joey Hess
b94221594b
add: When adding a dotfile as a non-large file, mention that it's a dotfile
This is to reduce user confusion when their annex.largefiles matches it,
or is not set.

Note that, when annex.dotfiles is set, but a dotfile is not matched by
annex.largefiles, the "non-large file" message will be displayed. That
makes sense because whether the file is a dotfile does not matter with that
configuration.

Also, this slightly optimised the annex.dotfiles path in passing,
by avoiding the slight slowdown caused by the check added in commit
876d5b6c6f in that case.
2024-11-13 14:09:24 -04:00
Joey Hess
876d5b6c6f
add: Consistently treat files in a dotdir as dotfiles, even when ran inside that dotdir
Assistant and smudge also updated.

This does add a small amount of extra work, getting the TopFilePath.
Not enough to be concerned by.

Also improve documentation to make clear that files inside dotdirs are
treated as dotfiles.

Sponsored-by: Eve on Patreon
2024-11-13 13:43:01 -04:00
Joey Hess
84c781d924
documentation for git-annex sim
command not implemented yet
2024-09-04 15:03:17 -04:00
Joey Hess
76ece2a699
make --rebalance of balanced use fullysizebalanced when useful
When the specified number of copies is > 1, and some repositories are
too full, it can be better to move content from them to other less full
repositories, in order to make space for new content.

annex.fullybalancedthreshhold is documented, but not implemented yet

This is not tested very well yet, and is known to sometimes take several
runs to stabalize.
2024-08-21 17:59:08 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
4750ffbd3b
finalized design for proxying to exporttree=yes annexobjects=yes special remotes 2024-08-06 11:45:45 -04:00
Joey Hess
bc9cc79e85
set remote's annexUrl automatically
When the remote repository's git config file
has annex.url set to an annex+http url.
2024-07-28 20:13:41 -04:00
Joey Hess
a6a03ca586
annex+http urls 2024-07-23 08:42:33 -04:00
Joey Hess
86ce3bf1e4
started servant implementation of HTTP P2P protocol 2024-07-07 12:08:10 -04:00
Joey Hess
542de0c0c4
document proxying to special remotes 2024-07-01 11:33:55 -04:00
Joey Hess
07e899c9d3
git-annex-shell: proxy nodes located beyond remote cluster gateways
Walking a tightrope between security and convenience here, because
git-annex-shell needs to only proxy for things when there has been
an explicit, local action to configure them.

In this case, the user has to have run `git-annex extendcluster`,
which now sets annex-cluster-gateway on the remote.

Note that any repositories that the gateway is recorded to
proxy for will be proxied onward. This is not limited to cluster nodes,
because checking the node log would not add any security; someone could
add any uuid to it. The gateway of course then does its own
checking to determine if it will allow proxying for the remote.
2024-06-26 12:56:16 -04:00
Joey Hess
0b72b85df5
added git-annex extendcluster
This works, but updatecluster does not work yet in multi-gateway
clusters, nor do gateways relay to other gateways.
2024-06-26 10:26:54 -04:00
Joey Hess
b8016eeb65
add annex-proxied
This makes git-annex sync and similar not treat proxied remotes as git
syncable remotes.

Also, display in git-annex info remote when the remote is proxied.
2024-06-24 10:16:59 -04:00
Joey Hess
570ceffe8d
broke out initcluster
One benefit of this is that a typo in annex-cluster-node config won't
init a new cluster.

Also it gets the cluster description set and is consistent with
initremote.
2024-06-14 17:23:11 -04:00
Joey Hess
bbf261487d
add git-annex updatecluster command
Seems to work fine, making the right changes to the git-annex branch.
2024-06-14 15:02:01 -04:00
Joey Hess
2844230dfe
add git configs for clusters 2024-06-14 12:20:17 -04:00
Joey Hess
f97f4b8bdb
Added updateproxy command and remote.name.annex-proxy configuration
So far this only records proxy information on the git-annex branch.
2024-06-04 14:52:03 -04:00
Joey Hess
2ffe077cc2
git-remote-annex: brought back max-git-bundles config
An incremental push that gets converted to a full push due to this
config results in the inManifest having just one bundle in it, and the
outManifest listing every other bundle. So it actually takes up more
space on the special remote. But, it speeds up clone and fetch to not
have to download a long series of bundles for incremental pushes.
2024-05-28 13:28:19 -04:00
Joey Hess
3e7324bbcb
only delete bundles on pushEmpty
This avoids some apparently otherwise unsolveable problems involving
races that resulted in the manifest listing bundles that were deleted.

Removed the annex-max-git-bundles config because it can't actually
result in deleting old bundles. It would still be possible to have a
config that controls how often to do a full push, which would avoid
needing to download too many bundles on clone, as well as needing to
checkpresent too many bundles in verifyManifest. But it would need a
different name and description.
2024-05-21 11:13:27 -04:00
Joey Hess
7dd2a67c41
fix names of new git configs 2024-05-14 15:33:47 -04:00
Joey Hess
23c4125ed4
mention other commands shipped with git-annex in SEE ALSO in man page 2024-05-14 15:23:45 -04:00
Joey Hess
0bf72ef103
max-git-bundles config for git-remote-annex 2024-05-14 14:23:40 -04:00
Joey Hess
6f1039900d
prevent using git-remote-annex with unsuitable special remote configs
I hope to support importtree=yes eventually, but it does not currently
work.

Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.

Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
2024-05-14 13:52:20 -04:00
Joey Hess
ff5193c6ad
Merge branch 'master' into git-remote-annex 2024-05-10 14:20:36 -04:00
Joey Hess
306ea42447
improve git-remote-annex docs
renamed the git config to something shorter too
2024-05-06 13:06:22 -04:00
Joey Hess
a8cef2bf85
added man page for git-remote-annex
And document remote.<name>.git-remote-annex-max-bundles which will
configure it.

datalad-annex uses a similar url format, but with some enhancements.
See https://github.com/datalad/datalad-next/blob/main/datalad_next/gitremotes/datalad_annex.py

I added the UUID to the URL, because it is needed in order to pick out which
manifest file to use. The design allows for a single key/value store to have
several special remotes all stored in it, and so the manifest includes
the UUID in its name.

While datalad-annex allows datalad-annex::<url>?, and allows referencing
peices of the url in the parameters, needing the UUID prevents
git-remote-annex from supporting that syntax. And anyway, it is a
complication and I want to keep things simple for now.

Sponsored-by: unqueued on Patreon
2024-05-06 12:48:04 -04:00
Joey Hess
c410b2bb73
annex.maxextensions configuration
Controls how many filename extensions to preserve.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-04-18 14:23:38 -04:00
Joey Hess
d372553540
rclone special remote
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".

This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.

Sponsored-by: Luke T. Shumaker on Patreon
2024-04-17 15:20:37 -04:00
Joey Hess
016d1bee88
add reregisterurl command
What this can currently be used for is only to change an url from being
used by a special remote to being used by the web remote.

This could have been a --move-from option to registerurl. But, that would
have complicated its option and --batch processing, and also would have
complicated unregisterurl, which is implemented on top of
Command.Registerurl. So, a separate command was actually less complicated
to implement.

The generic description of the command is because I want to make this
command a catch-all for other url updating kind of things, if there are
ever any more. Also because it was hard to come up with a good name for the
specific action. I considered `git-annex moveurl`, but that seems to
indicate data is perhaps actually being moved, and seems to sit at the same
level as addurl and rmurl, and this command is at the plumbing
level of registerurl and unregisterurl.

Sponsored-by: Dartmouth College's DANDI project
2024-03-05 15:06:14 -04:00
Joey Hess
68e99513f0
added annex.commitmessage-command config
Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-02-12 14:35:22 -04:00
Joey Hess
8e9ee31621
webapp: Added --port option, and annex.port config
The getSocket comment that mentioned using ":port"
in the hostname seems to have been incorrect or be out of date.
After all, the bug report came when the user first tried doing that,
and it didn't work.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-01-25 14:08:36 -04:00
Joey Hess
20567e605a
add directional stalldetection and bwlimit configs
Sponsored-by: Dartmouth College's DANDI project
2024-01-19 15:27:53 -04:00
Joey Hess
df35f70801
tweak stall detection scaling
Refactored to allow offline experimentation, and ended up changing the
allowedvariation (aka fudge factor) to 3. 10 seems too high, and 1.5 too low.

Scale earlier, so even if the first chunk takes less than the configured
time period, allowance is made that later chunks might transfer slower.
Decided to use the same allowedvariation to decide when to start
scaling.

Smoothed the scaling out.

Some examples:

ghci> upscale (BwRate 10 (Duration 60)) 25
BwRate 13 (Duration {durationSeconds = 75})
-- A small scaling upwards after 1/3rd the time. Not noticable.
ghci> upscale (BwRate 10 (Duration 60)) 60
BwRate 30 (Duration {durationSeconds = 180})
-- At the configured time, 3x scaling.
ghci> upscale (BwRate 10 (Duration 60)) 120
BwRate 60 (Duration {durationSeconds = 360})
-- A typical upscaling, here a 1 minute duration became 6 minutes
-- due to the first chunk taking 2 minutes to transfer.
ghci> upscale (BwRate 10 (Duration 60)) 600
BwRate 300 (Duration {durationSeconds = 1800})
-- Here the first chunk took 10 minutes to transfer, so it will
-- take 30 minutes to detect a stall.

Sponsored-by: Dartmouth College's DANDI project
2024-01-19 12:58:41 -04:00