Commit graph

34246 commits

Author SHA1 Message Date
Joey Hess
aa56d433d5
implement cluster.log
Not used yet. (Or tested.)

I did consider making the log start with the uuid of the node, followed
by the cluster uuid (or uuids). That would perhaps mean a smaller write
to the git-annex branch when adding a node, but overall the log file
would be larger, and it will be read and cached near to startup on most
git-annex runs.
2024-06-13 16:00:58 -04:00
Joey Hess
01f5015f30
update 2024-06-13 11:44:39 -04:00
Joey Hess
5e0acd1842
more cluster thoughts 2024-06-13 10:48:31 -04:00
Joey Hess
90e3b8b44f
avoided the strangeness of the cluster's proxy location tracking being wrong 2024-06-13 10:34:19 -04:00
Joey Hess
ffd7c745ff
update 2024-06-13 06:49:36 -04:00
Joey Hess
3cc48279ad
more thoughts on clusters 2024-06-13 06:41:42 -04:00
Joey Hess
555d7e52d3
more thoughts on clusters 2024-06-12 17:30:55 -04:00
Joey Hess
0ebb107974
update 2024-06-12 15:21:23 -04:00
Joey Hess
46a1fcb3ea
avoid git syncing with instantiate proxied remotes
These remotes have no url configured, so git pull and push will fail.
git-annex sync --content etc can still sync with them otherwise.

Also, avoid git syncing twice with the same url. This is for cases where
a proxied remote has been manually configured and so does have a url.
Or perhaps proxied remotes will get configured like that automatically
later.
2024-06-12 15:10:03 -04:00
Joey Hess
a986a20034
designing clusters 2024-06-12 14:57:26 -04:00
Joey Hess
e70e3473b3
on cycles 2024-06-12 13:52:17 -04:00
Joey Hess
44464e4410
update 2024-06-12 12:37:14 -04:00
Joey Hess
67d1e2a459
updates 2024-06-12 12:02:25 -04:00
Joey Hess
dfdda95053
proxy updates location tracking information
This does mean a redundant write to the git-annex branch. But,
it means that two clients can be using the same proxy, and after
one sends a file to a proxied remote, the other only has to pull from
the proxy to learn about that. It does not need to pull from every
remote behind the proxy (which it couldn't do anyway as git repo
access is not currently proxied).

Anyway, the overhead of this in git-annex branch writes is no worse
than eg, sending a file to a repository where git-annex assistant
is running, which then sends the file on to a remote, and updates
the git-annex branch then. Indeed, when the assistant also drops
the local copy, that results in more writes to the git-annex branch.
2024-06-12 11:37:14 -04:00
Joey Hess
96853cd833
finish P2P protocol proxying
CONNECT is not supported by git-annex-shell p2pstdio, but for proxying
to tor-annex remotes, it will be supported, and will make a git pull/push
to a proxied remote work the same with that as it does over ssh,
eg it accesses the proxy's git repo not the proxied remote's git repo.

The p2p protocol docs say that NOTIFYCHANGES is not always supported,
and it looked annoying to implement it for this, and it also seems
pretty useless, so make it be a protocol error. git-annex remotedaemon
will already be getting change notifications from the proxy's git repo,
so there's no need to get additional redundant change notifications for
proxied remotes that would be for changes to the same git repo.
2024-06-12 10:40:51 -04:00
Joey Hess
f98605bce7
a local git remote cannot proxy
Prevent listProxied from listing anything when the proxy remote's
url is a local directory. Proxying does not work in that situation,
because the proxied remotes have the same url, and so git-annex-shell
is not run when accessing them, instead the proxy remote is accessed
directly.

I don't think there is any good way to support this. Even if the instantiated
git repos for the proxied remotes somehow used an url that caused it to use
git-annex-shell to access them, planned features like `git-annex copy --to
proxy` accepting a key and sending it on to nodes behind the proxy would not
work, since git-annex-shell is not used to access the proxy.

So it would need to use something to access the proxy that causes
git-annex-shell to be run and speaks P2P protocol over it. And we have that.
It's a ssh connection to localhost. Of course, it would be possible to
take ssh out of that mix, and swap in something that does not have
encryption overhead and authentication complications, but otherwise
behaves the same as ssh. And if the user wants to do that, GIT_SSH
does exist.
2024-06-12 10:16:04 -04:00
Joey Hess
c6e0710281
proxying to local git remotes works
This just happened to work correctly. Rather surprisingly. It turns out
that openP2PSshConnection actually also supports local git remotes,
by just running git-annex-shell with the path to the remote.

Renamed "P2PSsh" to "P2PShell" to make this clear.
2024-06-12 10:10:11 -04:00
Joey Hess
178da0dc99
Merge branch 'master' into proxy 2024-06-12 09:49:30 -04:00
Joey Hess
345494e3b4
expanding on the exporttree=yes design 2024-06-12 09:43:59 -04:00
Joey Hess
5beaffb412
proxying PUT now working
The almost identical code duplication between relayDATA and relayDATA'
is very annoying. I tried quite a few things to parameterize them, but
the type checker is having fits when I try it.
2024-06-11 16:56:52 -04:00
Joey Hess
ed4fda098b
todo 2024-06-11 15:15:58 -04:00
Joey Hess
a2f4a8eddf
proxying GET now working
Memory use is small and constant; receiveBytes returns a lazy bytestring
and it does stream.

Comparing speed of a get of a 500 mb file over proxy from origin-origin,
vs from the same remote over a direct ssh:

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin
get bigfile (from origin-origin...)
ok
(recording state in git...)
1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k
0inputs+984320outputs (0major+10779minor)pagefaults 0swaps

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh
get bigfile (from direct-ssh...)
ok
1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k
0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps

So the proxy doesn't add much overhead even when run on the same machine as
the client and remote.

Still, piping receiveBytes into sendBytes like this does suggest that the proxy
could be made to use less CPU resouces by using `sendfile()`.
2024-06-11 15:09:43 -04:00
Joey Hess
09b5e53f49
set annex.uuid in proxy's Repo
getRepoUUID looks at that, and was seeing the annex.uuid of the proxy.
Which caused it to unncessarily set the git config. Probably also would
have led to other problems.
2024-06-11 13:40:50 -04:00
Joey Hess
657a91527a
update 2024-06-11 13:22:03 -04:00
Joey Hess
dd429ba8fe
Merge branch 'master' of ssh://git-annex.branchable.com 2024-06-11 13:08:45 -04:00
Joey Hess
5bb7f8cd64
Merge branch 'master' into proxy 2024-06-11 13:08:23 -04:00
Joey Hess
d2e3c5c89f
update 2024-06-11 13:07:53 -04:00
NewUser
124c1313bb 2024-06-11 13:31:01 +00:00
Joey Hess
501d65eeab
started implementing git-annex-shell proxy
So far, it negotiates VERSION with both parties. This is a tricky dance.

Untested.
2024-06-10 18:01:36 -04:00
Joey Hess
7b1548dbfa
correct AUTH-SUCCESS and AUTH-FAILURE
It's AUTH_SUCCESS internally in git-annex, but the line based
serialization uses AUTH-SUCCESS.
2024-06-10 15:06:27 -04:00
Joey Hess
649b87bedd
Merge branch 'master' into proxy 2024-06-10 14:26:18 -04:00
Joey Hess
d2576e5f1a
git-annex-shell: accept uuid of remote that proxying is enabled for
For NotifyChanges and also for the fallthrough case where
git-annex-shell passes a command off to git-shell, proxying is currently
ignored. So every remote that is accessed via a proxy will be treated as
the same git repository.

Every other command listed in cmdsMap will need to check if
Annex.proxyremote is set, and if so handle the proxying appropriately.
Probably only P2PStdio will need to support proxying. For now,
everything else refuses to work when proxying.

The part of that I don't like is that there's the possibility a command
later gets added to the list that doesn't check proxying.

When proxying is not enabled, it's important that git-annex-shell not
leak information that it would not have exposed before. Such as the
names or uuids of remotes.

I decided that, in the case where a repository used to have proxying
enabled, but no longer supports any proxies, it's ok to give the user a
clear error message indicating that proxying is not configured, rather
than a confusing uuid mismatch message.

Similarly, if a repository has proxying enabled, but not for the
requested repository, give a clear error message.

A tricky thing here is how to handle the case where there is more than
one remote, with proxying enabled, with the specified uuid. One way to
handle that would be to plumb the proxyRemoteName all the way through
from the remote git-annex to git-annex-shell, eg as a field, and use
only a remote with the same name. That would be very intrusive though.

Instead, I decided to let the proxy pick which remote it uses to access
a given Remote. And so it picks the least expensive one.
The client after all doesn't necessarily know any details about the
proxy's configuration. This does mean though, that if the least
expensive remote is not accessible, but another remote would have
worked, an access via the proxy will fail.
2024-06-10 12:44:35 -04:00
Joey Hess
783eb8879a
notes on behavior 2024-06-10 11:07:04 -04:00
jlueters@79a910340cdff27611c6a650c108afbe2f61c5f6
daa2c6cce1 2024-06-10 14:24:34 +00:00
Joey Hess
25a6ab6f11
Avoid grafting in export tree objects that are missing
They could be missing due to an interrupted git-annex at just the wrong
time during a prior graft, after which the tree objects got garbage
collected.

Or they could be missing because of manual messing with the git-annex
branch, eg resetting it to back before the graft commit.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-06-07 16:51:50 -04:00
Joey Hess
b32c4c2e98
atomic git-annex branch update when regrafting in transition
Fix a bug where interrupting git-annex while it is updating the git-annex
branch could lead to git fsck complaining about missing tree objects.

Interrupting git-annex while regraftexports is running in a transition
that is forgetting git-annex branch history would leave the
repository with a git-annex branch that did not contain the tree shas
listed in export.log. That lets those trees be garbage collected.

A subsequent run of the same transition then regrafts the trees listed
in export.log into the git-annex branch. But those trees have been lost.

Note that both sides of `if neednewlocalbranch` are atomic now. I had
thought only the True side needed to be, but I do think there may be
cases where the False side needs to be as well.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-06-07 16:34:10 -04:00
Joey Hess
6568ba4904
Merge branch 'master' into proxy 2024-06-07 12:35:47 -04:00
Joey Hess
43ff697f25
update status and design work on proxy encryption and chunking 2024-06-07 12:35:04 -04:00
Joey Hess
a0e59c1d17
comment 2024-06-07 12:35:00 -04:00
Joey Hess
5aaa285083
Merge branch 'master' into proxy 2024-06-07 10:43:13 -04:00
Joey Hess
058726ee86
next step identified 2024-06-06 18:06:45 -04:00
Joey Hess
d59383beaf
update 2024-06-06 17:25:22 -04:00
Joey Hess
9bc4dd635c
update 2024-06-06 17:23:51 -04:00
Joey Hess
a72d0f69d0
filter out illegal remote names when reading proxy log 2024-06-06 12:51:30 -04:00
Joey Hess
d208b03e5d
Merge branch 'master' into proxy 2024-06-06 12:42:18 -04:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
1e6b4f324a removed 2024-06-06 13:40:26 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
6274d16102 Added a comment 2024-06-06 11:23:55 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
d4993248eb Added a comment 2024-06-06 11:23:34 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
a1e1af35af 2024-06-06 10:29:21 +00:00
nobodyinperson
6985c62a47 Added a comment 2024-06-06 09:09:03 +00:00