Commit graph

34303 commits

Author SHA1 Message Date
Joey Hess
d16e19b8ca
comment 2024-06-13 14:30:32 -04:00
Joey Hess
ebebc04273
comment 2024-06-13 13:40:04 -04:00
Joey Hess
6ea78ec867
partial reproducer 2024-06-13 13:03:38 -04:00
Joey Hess
01f5015f30
update 2024-06-13 11:44:39 -04:00
Joey Hess
5e0acd1842
more cluster thoughts 2024-06-13 10:48:31 -04:00
Joey Hess
90e3b8b44f
avoided the strangeness of the cluster's proxy location tracking being wrong 2024-06-13 10:34:19 -04:00
Joey Hess
ffd7c745ff
update 2024-06-13 06:49:36 -04:00
Joey Hess
d8daabe9ec
Merge branch 'master' of ssh://git-annex.branchable.com 2024-06-13 06:44:22 -04:00
Joey Hess
22a329c57e
copied over some changes from proxy branch 2024-06-13 06:43:59 -04:00
Joey Hess
3cc48279ad
more thoughts on clusters 2024-06-13 06:41:42 -04:00
Joey Hess
555d7e52d3
more thoughts on clusters 2024-06-12 17:30:55 -04:00
Joey Hess
0ebb107974
update 2024-06-12 15:21:23 -04:00
Joey Hess
46a1fcb3ea
avoid git syncing with instantiate proxied remotes
These remotes have no url configured, so git pull and push will fail.
git-annex sync --content etc can still sync with them otherwise.

Also, avoid git syncing twice with the same url. This is for cases where
a proxied remote has been manually configured and so does have a url.
Or perhaps proxied remotes will get configured like that automatically
later.
2024-06-12 15:10:03 -04:00
Joey Hess
a986a20034
designing clusters 2024-06-12 14:57:26 -04:00
Joey Hess
e70e3473b3
on cycles 2024-06-12 13:52:17 -04:00
Joey Hess
44464e4410
update 2024-06-12 12:37:14 -04:00
Joey Hess
67d1e2a459
updates 2024-06-12 12:02:25 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
c855b50f04 2024-06-12 15:42:42 +00:00
Joey Hess
dfdda95053
proxy updates location tracking information
This does mean a redundant write to the git-annex branch. But,
it means that two clients can be using the same proxy, and after
one sends a file to a proxied remote, the other only has to pull from
the proxy to learn about that. It does not need to pull from every
remote behind the proxy (which it couldn't do anyway as git repo
access is not currently proxied).

Anyway, the overhead of this in git-annex branch writes is no worse
than eg, sending a file to a repository where git-annex assistant
is running, which then sends the file on to a remote, and updates
the git-annex branch then. Indeed, when the assistant also drops
the local copy, that results in more writes to the git-annex branch.
2024-06-12 11:37:14 -04:00
Joey Hess
96853cd833
finish P2P protocol proxying
CONNECT is not supported by git-annex-shell p2pstdio, but for proxying
to tor-annex remotes, it will be supported, and will make a git pull/push
to a proxied remote work the same with that as it does over ssh,
eg it accesses the proxy's git repo not the proxied remote's git repo.

The p2p protocol docs say that NOTIFYCHANGES is not always supported,
and it looked annoying to implement it for this, and it also seems
pretty useless, so make it be a protocol error. git-annex remotedaemon
will already be getting change notifications from the proxy's git repo,
so there's no need to get additional redundant change notifications for
proxied remotes that would be for changes to the same git repo.
2024-06-12 10:40:51 -04:00
Joey Hess
f98605bce7
a local git remote cannot proxy
Prevent listProxied from listing anything when the proxy remote's
url is a local directory. Proxying does not work in that situation,
because the proxied remotes have the same url, and so git-annex-shell
is not run when accessing them, instead the proxy remote is accessed
directly.

I don't think there is any good way to support this. Even if the instantiated
git repos for the proxied remotes somehow used an url that caused it to use
git-annex-shell to access them, planned features like `git-annex copy --to
proxy` accepting a key and sending it on to nodes behind the proxy would not
work, since git-annex-shell is not used to access the proxy.

So it would need to use something to access the proxy that causes
git-annex-shell to be run and speaks P2P protocol over it. And we have that.
It's a ssh connection to localhost. Of course, it would be possible to
take ssh out of that mix, and swap in something that does not have
encryption overhead and authentication complications, but otherwise
behaves the same as ssh. And if the user wants to do that, GIT_SSH
does exist.
2024-06-12 10:16:04 -04:00
Joey Hess
c6e0710281
proxying to local git remotes works
This just happened to work correctly. Rather surprisingly. It turns out
that openP2PSshConnection actually also supports local git remotes,
by just running git-annex-shell with the path to the remote.

Renamed "P2PSsh" to "P2PShell" to make this clear.
2024-06-12 10:10:11 -04:00
Joey Hess
178da0dc99
Merge branch 'master' into proxy 2024-06-12 09:49:30 -04:00
Joey Hess
345494e3b4
expanding on the exporttree=yes design 2024-06-12 09:43:59 -04:00
yarikoptic
c6f2a5d372 TODO for log --key 2024-06-12 13:20:29 +00:00
Joey Hess
5beaffb412
proxying PUT now working
The almost identical code duplication between relayDATA and relayDATA'
is very annoying. I tried quite a few things to parameterize them, but
the type checker is having fits when I try it.
2024-06-11 16:56:52 -04:00
Joey Hess
ed4fda098b
todo 2024-06-11 15:15:58 -04:00
Joey Hess
a2f4a8eddf
proxying GET now working
Memory use is small and constant; receiveBytes returns a lazy bytestring
and it does stream.

Comparing speed of a get of a 500 mb file over proxy from origin-origin,
vs from the same remote over a direct ssh:

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin
get bigfile (from origin-origin...)
ok
(recording state in git...)
1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k
0inputs+984320outputs (0major+10779minor)pagefaults 0swaps

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh
get bigfile (from direct-ssh...)
ok
1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k
0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps

So the proxy doesn't add much overhead even when run on the same machine as
the client and remote.

Still, piping receiveBytes into sendBytes like this does suggest that the proxy
could be made to use less CPU resouces by using `sendfile()`.
2024-06-11 15:09:43 -04:00
Joey Hess
09b5e53f49
set annex.uuid in proxy's Repo
getRepoUUID looks at that, and was seeing the annex.uuid of the proxy.
Which caused it to unncessarily set the git config. Probably also would
have led to other problems.
2024-06-11 13:40:50 -04:00
yarikoptic
b96ff82871 Added a comment 2024-06-11 17:36:51 +00:00
Joey Hess
657a91527a
update 2024-06-11 13:22:03 -04:00
Joey Hess
dd429ba8fe
Merge branch 'master' of ssh://git-annex.branchable.com 2024-06-11 13:08:45 -04:00
Joey Hess
5bb7f8cd64
Merge branch 'master' into proxy 2024-06-11 13:08:23 -04:00
Joey Hess
d2e3c5c89f
update 2024-06-11 13:07:53 -04:00
NewUser
124c1313bb 2024-06-11 13:31:01 +00:00
Joey Hess
501d65eeab
started implementing git-annex-shell proxy
So far, it negotiates VERSION with both parties. This is a tricky dance.

Untested.
2024-06-10 18:01:36 -04:00
Joey Hess
7b1548dbfa
correct AUTH-SUCCESS and AUTH-FAILURE
It's AUTH_SUCCESS internally in git-annex, but the line based
serialization uses AUTH-SUCCESS.
2024-06-10 15:06:27 -04:00
Joey Hess
649b87bedd
Merge branch 'master' into proxy 2024-06-10 14:26:18 -04:00
Joey Hess
d2576e5f1a
git-annex-shell: accept uuid of remote that proxying is enabled for
For NotifyChanges and also for the fallthrough case where
git-annex-shell passes a command off to git-shell, proxying is currently
ignored. So every remote that is accessed via a proxy will be treated as
the same git repository.

Every other command listed in cmdsMap will need to check if
Annex.proxyremote is set, and if so handle the proxying appropriately.
Probably only P2PStdio will need to support proxying. For now,
everything else refuses to work when proxying.

The part of that I don't like is that there's the possibility a command
later gets added to the list that doesn't check proxying.

When proxying is not enabled, it's important that git-annex-shell not
leak information that it would not have exposed before. Such as the
names or uuids of remotes.

I decided that, in the case where a repository used to have proxying
enabled, but no longer supports any proxies, it's ok to give the user a
clear error message indicating that proxying is not configured, rather
than a confusing uuid mismatch message.

Similarly, if a repository has proxying enabled, but not for the
requested repository, give a clear error message.

A tricky thing here is how to handle the case where there is more than
one remote, with proxying enabled, with the specified uuid. One way to
handle that would be to plumb the proxyRemoteName all the way through
from the remote git-annex to git-annex-shell, eg as a field, and use
only a remote with the same name. That would be very intrusive though.

Instead, I decided to let the proxy pick which remote it uses to access
a given Remote. And so it picks the least expensive one.
The client after all doesn't necessarily know any details about the
proxy's configuration. This does mean though, that if the least
expensive remote is not accessible, but another remote would have
worked, an access via the proxy will fail.
2024-06-10 12:44:35 -04:00
Joey Hess
783eb8879a
notes on behavior 2024-06-10 11:07:04 -04:00
jlueters@79a910340cdff27611c6a650c108afbe2f61c5f6
daa2c6cce1 2024-06-10 14:24:34 +00:00
Joey Hess
25a6ab6f11
Avoid grafting in export tree objects that are missing
They could be missing due to an interrupted git-annex at just the wrong
time during a prior graft, after which the tree objects got garbage
collected.

Or they could be missing because of manual messing with the git-annex
branch, eg resetting it to back before the graft commit.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-06-07 16:51:50 -04:00
Joey Hess
b32c4c2e98
atomic git-annex branch update when regrafting in transition
Fix a bug where interrupting git-annex while it is updating the git-annex
branch could lead to git fsck complaining about missing tree objects.

Interrupting git-annex while regraftexports is running in a transition
that is forgetting git-annex branch history would leave the
repository with a git-annex branch that did not contain the tree shas
listed in export.log. That lets those trees be garbage collected.

A subsequent run of the same transition then regrafts the trees listed
in export.log into the git-annex branch. But those trees have been lost.

Note that both sides of `if neednewlocalbranch` are atomic now. I had
thought only the True side needed to be, but I do think there may be
cases where the False side needs to be as well.

Sponsored-by: Dartmouth College's OpenNeuro project
2024-06-07 16:34:10 -04:00
Joey Hess
6568ba4904
Merge branch 'master' into proxy 2024-06-07 12:35:47 -04:00
Joey Hess
43ff697f25
update status and design work on proxy encryption and chunking 2024-06-07 12:35:04 -04:00
Joey Hess
a0e59c1d17
comment 2024-06-07 12:35:00 -04:00
Joey Hess
5aaa285083
Merge branch 'master' into proxy 2024-06-07 10:43:13 -04:00
Joey Hess
058726ee86
next step identified 2024-06-06 18:06:45 -04:00
Joey Hess
d59383beaf
update 2024-06-06 17:25:22 -04:00
Joey Hess
9bc4dd635c
update 2024-06-06 17:23:51 -04:00
Joey Hess
a72d0f69d0
filter out illegal remote names when reading proxy log 2024-06-06 12:51:30 -04:00
Joey Hess
d208b03e5d
Merge branch 'master' into proxy 2024-06-06 12:42:18 -04:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
1e6b4f324a removed 2024-06-06 13:40:26 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
6274d16102 Added a comment 2024-06-06 11:23:55 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
d4993248eb Added a comment 2024-06-06 11:23:34 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
a1e1af35af 2024-06-06 10:29:21 +00:00
nobodyinperson
6985c62a47 Added a comment 2024-06-06 09:09:03 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
7dbfb16415 2024-06-05 17:45:49 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
93b11da4db Added a comment 2024-06-05 17:34:32 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
6b4ae7b635 2024-06-05 17:22:04 +00:00
ruslan@302cb7f8d398fcce72f88b26b0c2f3a53aaf0bcd
ca687413ef Added a comment 2024-06-05 16:53:51 +00:00
Joey Hess
1761e971ee
status update after day 1 of new project 2024-06-04 14:55:54 -04:00
Joey Hess
f97f4b8bdb
Added updateproxy command and remote.name.annex-proxy configuration
So far this only records proxy information on the git-annex branch.
2024-06-04 14:52:03 -04:00
Joey Hess
3df70c5c0c
implementation plan 2024-06-04 07:51:33 -04:00
Joey Hess
6375e3be3b
recieved funding to work on this, which comes with a schedule 2024-06-04 06:53:59 -04:00
Joey Hess
ac3fe92956
comment 2024-06-04 06:41:14 -04:00
Joey Hess
3db94f1b71
Merge branch 'master' of ssh://git-annex.branchable.com 2024-06-04 06:40:08 -04:00
Joey Hess
3be7163771
update 2024-06-04 06:40:04 -04:00
Joey Hess
5992e1729a
fixed by git release 2024-06-04 06:39:08 -04:00
nobodyinperson
c606b6a35d Added a comment: Yes, GitLab fixed! 2024-06-04 07:38:47 +00:00
datamanager
82b891de7a Added a comment: GitLab fixed? 2024-06-04 01:18:25 +00:00
Joey Hess
61ed0b3f03
root cause analysis 2024-06-03 13:56:43 -04:00
yarikoptic
4a48933867 Added a comment 2024-06-03 17:54:43 +00:00
Joey Hess
c382555cf8
comment 2024-06-03 12:31:55 -04:00
jkniiv
313a0285e5 a small clarification 2024-06-01 22:11:32 +00:00
jkniiv
5badd2ae4e report on git-remote-annex on Windows not quite working 2024-06-01 21:59:27 +00:00
Joey Hess
0e96f0acd8
add news item for git-annex 10.20240531 2024-05-31 12:32:42 -04:00
Joey Hess
a51c5d1cde
some analysis 2024-05-31 11:47:59 -04:00
yarikoptic
8706a6faf1 report on git repo getting broken 2024-05-31 14:38:58 +00:00
yarikoptic
d313dc22e3 reporting that annex merge should not merge into main branch 2024-05-31 13:49:17 +00:00
Joey Hess
d8cf23ffdb
tweak 2024-05-30 13:31:49 -04:00
Joey Hess
69c9e8c11c
tweak 2024-05-30 13:30:57 -04:00
Joey Hess
19454917eb
tweak 2024-05-30 13:30:33 -04:00
Joey Hess
3a48eafce4
tweaks 2024-05-30 13:30:10 -04:00
Joey Hess
adf17f5038
Merge branch 'master' of ssh://git-annex.branchable.com 2024-05-30 13:26:44 -04:00
Joey Hess
f877afe930
tip 2024-05-30 13:26:34 -04:00
Joey Hess
0155abfba4
git-remote-annex: Support urls like annex::https://example.com/foo-repo
Using the usual url download machinery even allows these urls to need
http basic auth, which is prompted for with git-credential. Which opens
the possibility for urls that contain a secret to be used, eg the cipher
for encryption=shared. Although the user is currently on their own
constructing such an url, I do think it would work.

Limited to httpalso for now, for security reasons. Since both httpalso
(and retrieving this very url) is limited by the usual
annex.security.allowed-ip-addresses configs, it's not possible for an
attacker to make one of these urls that sets up a httpalso url that
opens the garage door. Which is one class of attacks to keep in mind
with this thing.

It seems that there could be either a git-config that allows other types
of special remotes to be set up this way, or special remotes could
indicate when they are safe. I do worry that the git-config would
encourage users to set it without thinking through the security
implications. One remote config might be safe to access this way, but
another config, for one with the same type, might not be. This will need
further thought, and real-world examples to decide what to do.
2024-05-30 12:24:16 -04:00
yarikoptic
d23ae92da8 Added a comment 2024-05-30 14:34:32 +00:00
yarikoptic
285a7ff3c3 Added a comment 2024-05-30 14:29:43 +00:00
Joey Hess
3f33616068
security 2024-05-29 22:55:06 -04:00
Joey Hess
efa684ab8a
todo 2024-05-29 18:21:17 -04:00
yarikoptic
f186485fab Added a comment 2024-05-29 18:31:16 +00:00
yarikoptic
09626c8114 Added a comment: odd odd odd 2024-05-29 18:25:23 +00:00
yarikoptic
e05564c297 Added a comment: odd odd odd 2024-05-29 18:25:11 +00:00
yarikoptic
60a7dea828 get is silently stuck. 2024-05-29 18:14:44 +00:00
Joey Hess
98762a2f96
group: Added --list option
Seemed to make sense to exclude groups used only by dead repositories.
2024-05-29 13:37:35 -04:00
Joey Hess
09a0552489
split off todo, comment 2024-05-29 13:16:36 -04:00
Joey Hess
14daed9db7
Merge branch 'master' of ssh://git-annex.branchable.com 2024-05-29 13:00:34 -04:00
Joey Hess
e19916f54b
add config-uuid to annex:: url for --sameas remotes
And use it to set annex-config-uuid in git config. This makes
using the origin special remote work after cloning.

Without the added Logs.Remote.configSet, instantiating the remote will
look at the annex-config-uuid's config in the remote log, which will be
empty, and so it will fail to find a special remote.

The added deletion of files in the alternatejournaldir is just to make
100% sure they don't get committed to the git-annex branch. Now that
they contain things that definitely should not be committed.
2024-05-29 12:50:00 -04:00
derphysiker
dfb0c4683c Added a comment 2024-05-29 06:58:16 +00:00