Commit graph

1591 commits

Author SHA1 Message Date
Joey Hess
8b5fc94d50
add optional object file location to storeKey
This will be used by the next commit to simplify the proxy.
2024-07-01 10:42:27 -04:00
Joey Hess
f833a28844
Merge branch 'master' into proxy-specialremotes 2024-06-30 11:16:20 -04:00
Joey Hess
3d646703ee
list proxied remotes and cluster gateways in git-annex info
Wanted to also list a cluster's nodes when showing info for the cluster,
but that's hard because it needs getting the name of the proxying
remote, which is some prefix of the cluster's name, but if the names
contain dashes there's no good way to know which prefix it is.
2024-06-30 11:14:13 -04:00
Joey Hess
158d7bc933
fix handling of ERROR in response to REMOVE
This allows an error message from a proxied special remote to be
displayed to the client.

In the case where removal from several nodes of a cluster fails,
there can be several errors. What to do? I decided to only show
the first error to the user. Probably in this case the user is not in a
position to do anything about an error message, so best keep it simple.
If the problem with the first node is fixed, they'll see the error from
the next node.
2024-06-28 14:10:25 -04:00
Joey Hess
a6ea057f6b
fix handling of ERROR in response to CHECKPRESENT
That error is now rethrown on the client, so it will be displayed.

For example:

$ git-annex fsck x --fast --from AMS-dir
fsck x (special remote reports: directory /home/joey/tmp/bench2/dir is not accessible) failed

No protocol version check is needed. Because in order to talk to a
proxied special remote, the client has to be running the upcoming
git-annex release. Which has this fix in it.
2024-06-28 13:46:27 -04:00
Joey Hess
c3a785204e
support a P2PConnection that uses TMVars rather than Handles
This will allow having an internal thread speaking P2P protocol,
which will be needed to support proxying to external special remotes.

No serialization is done on the internal P2P protocol of course.

When a ByteString is being exchanged, it may or may not be exactly
the length indicated by DATA. While that has to be carefully managed
for the serialized P2P protocol, here it would require buffering the
whole lazy bytestring in memory to check its length when sending,
so it's better to do length checks on the receiving side.
2024-06-28 11:22:29 -04:00
Joey Hess
20ef1262df
give proxied cluster nodes a higher cost than the cluster gateway
This makes eg git-annex get default to using the cluster rather than an
arbitrary node, which is better UI.

The actual cost of accessing a proxied node vs using the cluster is
basically the same. But using the cluster allows smarter load-balancing
to be done on the cluster.
2024-06-27 15:21:03 -04:00
Joey Hess
0ef4183b00
Merge branch 'master' into proxy 2024-06-27 12:41:57 -04:00
Joey Hess
19137ae780
avoid unfiltered debugging from git-annex-shell
When --debugfilter or annex.debugfilter is set, avoid propigating debug
output from git-annex-shell, since it cannot be filtered.

It would be possible to pass --debugfilter on to git-annex-shell,
but it only started accepting that option in 2022. So it would break
interop with older versions.
2024-06-27 12:37:25 -04:00
Joey Hess
3dad9446ce
distributed cluster cycle prevention
Added BYPASS to P2P protocol, and use it to avoid cycling between
cluster gateways.

Distributed clusters are working well now!
2024-06-27 12:20:22 -04:00
Joey Hess
07e899c9d3
git-annex-shell: proxy nodes located beyond remote cluster gateways
Walking a tightrope between security and convenience here, because
git-annex-shell needs to only proxy for things when there has been
an explicit, local action to configure them.

In this case, the user has to have run `git-annex extendcluster`,
which now sets annex-cluster-gateway on the remote.

Note that any repositories that the gateway is recorded to
proxy for will be proxied onward. This is not limited to cluster nodes,
because checking the node log would not add any security; someone could
add any uuid to it. The gateway of course then does its own
checking to determine if it will allow proxying for the remote.
2024-06-26 12:56:16 -04:00
Joey Hess
202ea3ff2a
don't sync with cluster nodes by default
Avoid `git-annex sync --content` etc from operating on cluster nodes by default
since syncing with a cluster implicitly syncs with its nodes. This avoids a
lot of unncessary work when a cluster has a lot of nodes just in checking
if each node's preferred content is satisfied. And it avoids content
being sent to nodes individually, so instead syncing with clusters always
fanout uploads to nodes.

The downside is that there are situations where a cluster's preferred content
settings can be met, but those of its nodes are not. Or where a node does not
contain a key, but the cluster does, and there are not enough copies of the key
yet, so it would be desirable the send it there. I think that's an acceptable
tradeoff. These kind of situations are ones where the cluster itself should
probably be responsible for copying content to the node. Which it can do much
less expensively than a client can. Part of the balanced preferred content
design that I will be working on in a couple of months involves rebalancing
clusters, so I expect to revisit this.

The use of annex-sync config does allow running git-annex sync with a specific
node, or nodes, and it will sync with it. And it's also possible to set
annex-sync git configs to make it sync with a node by default. (Although that
will require setting up an explicit git remote for the node rather than relying
on the proxied remote.)

Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster
due to a cycle. And the Annex.Startup load of clusters happens
too late for Remote.Git to use that. This does mean one redundant load
of the cluster log, though only when there is a proxy.
2024-06-25 10:24:38 -04:00
Joey Hess
b8016eeb65
add annex-proxied
This makes git-annex sync and similar not treat proxied remotes as git
syncable remotes.

Also, display in git-annex info remote when the remote is proxied.
2024-06-24 10:16:59 -04:00
Joey Hess
0c111fc96a
fix git-annex sync --content with proxied remotes
Loading the remote list a second time was removing all proxied remotes.
That happened because setting up the proxied remote added some config
fields to the in-memory git config, and on the second load, it saw those
configs and decided not to overwrite them with the proxy.

Now on the second load, that still happens. But now, the proxied
git configs are used to generate a remote same as if those configs were
all set. The reason that didn't happen before was twofold,
the gitremotes cache was not dropped, and the remote's url field was not
set correctly.

The problem with the remote's url field is that while it was marked as
proxy inherited, all other proxy inherited fields are annex- configs.
And the code to inherit didn't work for the url field.

Now it all works, but git-annex sync is left running git push/pull on
the proxied remote, which doesn't work. That still needs to be fixed.
2024-06-24 09:45:51 -04:00
Joey Hess
5b332a87be
dropping from clusters
Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.

Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.

Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
2024-06-23 09:43:40 -04:00
Joey Hess
a6a04b7e5e
avoid storing SUCCESS-PLUS uuid when it is the remote uuid
This is slightly belt and suspenders, but nothing guarantees that the
peer avoids including its uuid in the SUCCESS-PLUS list as it's supposed
to. And while it probably doesn't matter if the location log is updated
redundantly, let's not find out.
2024-06-23 08:21:11 -04:00
Joey Hess
6eac3112e5
be quiet when reading cluster and proxy information at startup
I had a transfer of 3 files fail like this:

git-annex: transferrer protocol error: "(recording state in git...)"

The remote had stalldetection enabled, although I didn't see it stall.
So git-annex transferrer would have been started up. I guess that
one of these new git-annex branch reads, that happens early, caused
that message due to perhaps an uncommitted git-annex branch change.

Since the transferrer speaks a protocol over stdout, it needs to be
prevented from outputting other messages to stdout. Interestingly,
startupAnnex is run after prepRunCommand, so if a command requests quiet
output it would already be quiet. But the transferrer does not, instead
it calls Annex.setOutput SerializedOutput in its start action.
2024-06-18 21:31:32 -04:00
Joey Hess
f18740699e
P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS
Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
2024-06-18 16:21:40 -04:00
Joey Hess
780367200b
remove dead nodes when loading the cluster log
This is to avoid inserting a cluster uuid into the location log when
only dead nodes in the cluster contain the content of a key.

One reason why this is necessary is Remote.keyLocations, which excludes
dead repositories from the list. But there are probably many more.

Implementing this was challenging, because Logs.Location importing
Logs.Cluster which imports Logs.Trust which imports Remote.List resulted
in an import cycle through several other modules.

Resorted to making Logs.Location not import Logs.Cluster, and instead
it assumes that Annex.clusters gets populated when necessary before it's
called.

That's done in Annex.Startup, which is run by the git-annex command
(but not other commands) at early startup in initialized repos. Or,
is run after initialization.

Note that is Remote.Git, it is unable to import Annex.Startup, because
Remote.Git importing Logs.Cluster leads the the same import cycle.
So ensureInitialized is not passed annexStartup in there.

Other commands, like git-annex-shell currently don't run annexStartup
either.

So there are cases where Logs.Location will not see clusters. So it won't add
any cluster UUIDs when loading the log. That's ok, the only reason to do
that is to make display of where objects are located include clusters,
and to make commands like git-annex get --from treat keys as being located
in a cluster. git-annex-shell certainly does not do anything like that,
and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo)
don't either.
2024-06-16 14:39:44 -04:00
Joey Hess
a4c9d4424c
remove Logs.Presence imports
When imported along with Logs.Location, it can be an unused import and
it won't warn, due to reexports. The point if this is really to show
that Logs.Presence is not widely used, outside Logs/
2024-06-14 17:27:34 -04:00
Joey Hess
e224b99f36
whitespace 2024-06-12 13:24:25 -04:00
Joey Hess
f98605bce7
a local git remote cannot proxy
Prevent listProxied from listing anything when the proxy remote's
url is a local directory. Proxying does not work in that situation,
because the proxied remotes have the same url, and so git-annex-shell
is not run when accessing them, instead the proxy remote is accessed
directly.

I don't think there is any good way to support this. Even if the instantiated
git repos for the proxied remotes somehow used an url that caused it to use
git-annex-shell to access them, planned features like `git-annex copy --to
proxy` accepting a key and sending it on to nodes behind the proxy would not
work, since git-annex-shell is not used to access the proxy.

So it would need to use something to access the proxy that causes
git-annex-shell to be run and speaks P2P protocol over it. And we have that.
It's a ssh connection to localhost. Of course, it would be possible to
take ssh out of that mix, and swap in something that does not have
encryption overhead and authentication complications, but otherwise
behaves the same as ssh. And if the user wants to do that, GIT_SSH
does exist.
2024-06-12 10:16:04 -04:00
Joey Hess
c6e0710281
proxying to local git remotes works
This just happened to work correctly. Rather surprisingly. It turns out
that openP2PSshConnection actually also supports local git remotes,
by just running git-annex-shell with the path to the remote.

Renamed "P2PSsh" to "P2PShell" to make this clear.
2024-06-12 10:10:11 -04:00
Joey Hess
09b5e53f49
set annex.uuid in proxy's Repo
getRepoUUID looks at that, and was seeing the annex.uuid of the proxy.
Which caused it to unncessarily set the git config. Probably also would
have led to other problems.
2024-06-11 13:40:50 -04:00
Joey Hess
501d65eeab
started implementing git-annex-shell proxy
So far, it negotiates VERSION with both parties. This is a tricky dance.

Untested.
2024-06-10 18:01:36 -04:00
Joey Hess
649b87bedd
Merge branch 'master' into proxy 2024-06-10 14:26:18 -04:00
Joey Hess
9a8391078a
git-annex-shell: block relay requests
connRepo is only used when relaying git upload-pack and receive-pack.
That's only supposed to be used when git-annex-remotedaemon is serving
git-remote-tor-annex connections over tor. But, it was always set, and
so could be used in other places possibly.

Fixed by making connRepo optional in the P2P protocol interface.

In Command.EnableTor, it's not needed, because it only speaks the
protocol in order to check that it's able to connect back to itself via
the hidden service. So changed that to pass Nothing rather than the git
repo.

In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio,
so is making the requests, so will never need connRepo.

In git-annex-shell p2pstdio, it was accepting git upload-pack and
receive-pack requests over the P2P protocol, even though nothing sent
them. This is arguably a security hole, particularly if the user has
set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent
git push/pull via git-annex-shell.
2024-06-10 14:16:27 -04:00
Joey Hess
4b940c92bb
proxied remotes working on client side
Got the right git config settings inherited now.

Note that the url config is not passed on to git, so it won't be able
to access the proxied remote. That would need some kind of
git-remote-annex but for proxied remotes anyway. Unsure yet if that will
be needed.
2024-06-07 12:11:46 -04:00
Joey Hess
b43c835def
instantiate remotes that are behind a proxy remote
Untested, but this should be close to working. The proxied remotes have
the same url but a different uuid. When talking to current
git-annex-shell, it will fail due to a uuid mismatch. Once it supports
proxies, it will know that the presented uuid is for a remote that it
proxies for.

The check for any git config settings for a remote with the same name as
the proxied remote is there for several reasons. One is security:
Writing a name to the proxy log should not cause changes to
how an existing, configured git remote operates in a different clone of
the repo.

It's possible that the user has been using a proxied remote, and decides
to set a git config for it. We can't tell the difference between that
scenario and an evil remote trying to eg, intercept a file upload
by replacing their remote with a proxied remote.

Also, if the user sets some git config, does it override the config
inherited from the proxy remote? Seems a difficult question. Luckily,
the above means we don't need to think through it.

This does mean though, that in order for a user to change the config of
a proxy remote, they have to manually set its annex-uuid and url, as
well as the config they want to change. They may also have to set any of
the inherited configs that they were relying on.
2024-06-06 17:15:32 -04:00
Joey Hess
f3f40e03b4
clarify comment
Since remotes not being available is a thing, this was confusing.
2024-06-04 14:29:24 -04:00
Joey Hess
e64add7cdf
git-remote-annex: support importrree=yes remotes
When exporttree=yes is also set. Probably it would also be possible to
support ones with only importtree=yes, by enabling exporttree=yes for
the remote only when using git-remote-annex, but let's keep this
simple... I'm not sure what gets recorded in .git/annex/ state
differently in the two cases that might cause a problem when doing that.

Note that the full annex:: urls generated and displayed for such a
remote omit the importree=yes. Which is ok, cloning from such an url
uses an exporttree=remote, but the git-annex branch doesn't get written
by this program, so once the real config is available from the git-annex
branch, it will still function as an importree=yes remote.
2024-05-27 12:35:42 -04:00
Joey Hess
19418e81ee
git-remote-annex: Display full url when using remote with the shorthand url 2024-05-24 17:15:31 -04:00
Joey Hess
58301e40d2
sync with special remotes with an annex:: url
Check explicitly for an annex:: url, not just any url. While no built-in
special remotes set an url, except ones that can be synced with, it
seems possible that some external special remote sets an url for its own
use, but did not expect it to be used by git-annex sync et al.

The assistant also syncs with them.
2024-05-24 14:57:29 -04:00
Joey Hess
6f1039900d
prevent using git-remote-annex with unsuitable special remote configs
I hope to support importtree=yes eventually, but it does not currently
work.

Added remote.<name>.allow-encrypted-gitrepo that needs to be set to
allow using it with encrypted git repos.

Note that even encryption=pubkey uses a cipher stored in the git repo
to encrypt the keys stored in the remote. While it would be possible to
not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using
encryption=pubkey, it doesn't currently work, and that would be a
complication that I doubt is worth it.
2024-05-14 13:52:20 -04:00
Joey Hess
ff5193c6ad
Merge branch 'master' into git-remote-annex 2024-05-10 14:20:36 -04:00
Joey Hess
a89e8f6bad
skip remotes with an annex:: url
These remotes are not regular git remotes, they are special remotes that
git uses git-remote-annex to access.

Sponsored-by: Jack Hill on Patreon
2024-05-07 15:02:20 -04:00
Yaroslav Halchenko
87e2ae2014
run codespell throughout fixing typos automagically
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "codespell -w",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
2024-05-01 15:46:21 -04:00
Joey Hess
d2709c6862
avoid accepting externaltype= and readonly= parameters for rclone
I think readonly= doesn't make sense here? externaltype= certianly
doesn't.
2024-04-17 15:41:55 -04:00
Joey Hess
d372553540
rclone special remote
Added rclone special remote, which can be used without needing to install
the git-annex-remote-rclone program. This needs a new version of rclone,
which supports "rclone gitannex".

This is implemented as a variant of an external special remote, that
runs "rclone gitannex" instead of the usual git-annex-remote- command.
Parameterized Remote.External to support that.

Sponsored-by: Luke T. Shumaker on Patreon
2024-04-17 15:20:37 -04:00
Joey Hess
c64a73c7ea
startExternalAddonProcess add parameters
Not used yet but intended to support eg running "rclone gitannex"
2024-04-17 13:09:10 -04:00
Joey Hess
631a3ca05d
typo fix 2024-04-17 12:59:22 -04:00
Joey Hess
7cef5e8f35
export tree: avoid confusing output about renaming files
When a file in the export is renamed, and the remote's renameExport
returned Nothing, renaming to the temp file would first say it was
renaming, and appear to succeed, but actually what it did was delete the
file. Then renaming from the temp file would not do anything, since the
temp file is not present on the remote. This appeared as if a file got
renamed to a temp file and left there.

Note that exporttree=yes importree=yes remotes have their usual
renameExport replaced with one that returns Nothing. (For reasons
explained in Remote.Helper.ExportImport.) So this happened
even with remotes that support renameExport.

Fix by letting renameExport = Nothing when it's not supported at all.
This avoids displaying the rename.

Sponsored-by: Graham Spencer on Patreon
2024-03-09 13:50:26 -04:00
Joey Hess
e7652b0997
implement URL to VURL migration
This needs the content to be present in order to hash it. But it's not
possible for a module used by Backend.URL to call inAnnex because that
would entail a dependency loop. So instead, rely on the fact that
Command.Migrate calls inAnnex before performing a migration.

But, Command.ExamineKey calls fastMigrate and the key may or may not
exist, and it's not wanting to actually perform a migration in any case.
To handle that, had to add an additional value to fastMigrate to
indicate whether the content is inAnnex.

Factored generateEquivilantKey out of Remote.Web.

Note that migrateFromURLToVURL hardcodes use of the SHA256E backend.
It would have been difficult not to, given all the dependency loop
issues. But --backend and annex.backend are used to tell git-annex
migrate to use VURL in any case, so there's no config knob that
the user could expect to configure that.

Sponsored-by: Brock Spratlen on Patreon
2024-03-01 16:42:02 -04:00
Joey Hess
1b0de3021a
avoid double checksum when downloading VURL from web for 1st time
Sponsored-by: Jack Hill on Patreon
2024-03-01 13:44:40 -04:00
Joey Hess
cc17ac423b
implement isCryptographicallySecureKey for VURL
Considerable difficulty to work around an import cycle. Had to move the
list of backends (except for VURL) to Backend.Variety to VURL could use
it.

Sponsored-by: Kevin Mueller on Patreon
2024-02-29 17:26:35 -04:00
Joey Hess
e7b7ea78af
lift isCryptographicallySecure to Annex
Needed for VURL backend.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-02-29 16:14:13 -04:00
Joey Hess
55bf01b788
add equivilant key log for VURL keys
When downloading a VURL from the web, make sure that the equivilant key
log is populated.

Unfortunately, this does not hash the content while it's being
downloaded from the web. There is not an interface in Backend currently
for incrementally hash generation, only for incremental verification of an
existing hash. So this might add a noticiable delay, and it has to show
a "(checksum...") message. This could stand to be improved.

But, that separate hashing step only has to happen on the first download
of new content from the web. Once the hash is known, the VURL key can have
its hash verified incrementally while downloading except when the
content in the web has changed. (Doesn't happen yet because
verifyKeyContentIncrementally is not implemented yet for VURL keys.)

Note that the equivilant key log file is formatted as a presence log.
This adds a tiny bit of overhead (eg "1 ") per line over just listing the
urls. The reason I chose to use that format is it seems possible that
there will need to be a way to remove an equivilant key at some point in
the future. I don't know why that would be necessary, but it seemed wise
to allow for the possibility.

Downloads of VURL keys from other special remotes that claim urls,
like bittorrent for example, does not popilate the equivilant key log.
So for now, no checksum verification will be done for those.

Sponsored-by: Nicholas Golder-Manning on Patreon
2024-02-29 16:01:49 -04:00
Joey Hess
0f7143d226
support VURL backend
Not yet implemented is recording hashes on download from web and
verifying hashes.

addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.

It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.

Sponsored-by: Joshua Antonishen on Patreon
2024-02-29 13:48:51 -04:00
Joey Hess
20567e605a
add directional stalldetection and bwlimit configs
Sponsored-by: Dartmouth College's DANDI project
2024-01-19 15:27:53 -04:00
Joey Hess
c02df79248
use watchFileSize in Remote.External.retrieveKeyFile
external: Monitor file size when getting content from external special
remotes and use that to update the progress meter, in case the external
special remote program does not report progress.

This relies on 703a70cafa to prevent ever
running the meter backwards.

Sponsored-by: Dartmouth College's DANDI project
2024-01-19 14:34:30 -04:00