Merge branch 'master' of ssh://git-annex.branchable.com

This commit is contained in:
Joey Hess 2024-12-17 15:03:25 -04:00
commit 6a179e6ef9
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
10 changed files with 242 additions and 0 deletions

View file

@ -0,0 +1,76 @@
### Please describe the problem.
Following the installation instructions for android (termux), I get an error while sourcing git-annex-install:
```
Running on Android.. Tuning for optimal behavior.
sed: can't read /data/data/com.termux/files/home/git-annex.linux/git-remote-annex: No such file or directory
```
I can confirm that git-remote-annex is indeed missing in that directory.
### What steps will reproduce the problem?
```
pkg install wget
wget https://git-annex.branchable.com/install/Android/git-annex-install
source git-annex-install
```
### What version of git-annex are you using? On what operating system?
None yet x) and on a freshly updated termux.
### Please provide any additional information below.
[[!format sh """
# If you can, paste a complete transcript of the problem occurring here.
# If the problem is with the git-annex assistant, paste in .git/annex/daemon.log
~ $ wget https://git-annex.branchable.com/install/Android/git-annex-install
source git-annex-install
--2024-12-16 23:01:13-- https://git-annex.branchable.com/install/Android/git-annex-install
Resolving git-annex.branchable.com (git-annex.branchable.com)... 2600:3c03::f03c:91ff:fedf:c0e5, 66.228.46.55
Connecting to git-annex.branchable.com (git-annex.branchable.com)|2600:3c03::f03c:91ff:fedf:c0e5|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1470 (1.4K)
Saving to: git-annex-install
git-annex-ins 100% 1.44K --.-KB/s in 0s
2024-12-16 23:01:14 (194 MB/s) - git-annex-install saved [1470/1470]
Installing dependencies with termux pkg manager...
Checking availability of current mirror:
[*] https://ftp.fau.de/termux/termux-main: ok
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
git is already the newest version (2.47.1).
wget is already the newest version (1.25.0).
tar is already the newest version (1.35).
coreutils is already the newest version (9.5-3).
proot is already the newest version (5.1.107-65).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
Downloading git-annex...
--2024-12-16 23:01:14-- https://downloads.kitenet.net/git-annex/linux/current/git-annex-standalone-arm64-ancient.tar.gz
Resolving downloads.kitenet.net (downloads.kitenet.net)... 2600:3c03::f03c:91ff:fe73:b0d2, 66.228.36.95
Connecting to downloads.kitenet.net (downloads.kitenet.net)|2600:3c03::f03c:91ff:fe73:b0d2|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 57553624 (55M) [application/x-gzip]
Saving to: STDOUT
- 100% 54.89M 8.16MB/s in 11s
2024-12-16 23:01:25 (5.18 MB/s) - written to stdout [57553624/57553624]
Running on Android.. Tuning for optimal behavior.
sed: can't read /data/data/com.termux/files/home/git-annex.linux/git-remote-annex: No such file or directory
[Process completed (code 2) - press Enter]
# End of transcript or log.
"""]]
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)

View file

@ -0,0 +1 @@
Before I run a command that get new content in a repository -- especially with the --auto flag -- is there a way to find out the size of the data to be copied? My case is simple. I'm just using USB sticks/drives. But I never know if the space is enough for the next `get --auto` command...

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0"
subject="comment 1"
date="2024-12-15T23:39:33Z"
content="""
Something like this should get you the answer: `git annex info --fast . --not --in here --and --want-get` (adapted from the example here: <https://git-annex.branchable.com/git-annex-info/>).
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Doable8234"
avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278"
subject="comment 2"
date="2024-12-16T08:20:43Z"
content="""
I've thought about this exact use case, though I never actually used it yet. One simple way to do this could be by using git annex preferred content settings. In the nodes that push out content, all you need to do is set up a cron job for `git annex sync --content`. Now you can make it push content wherever you want by adjusting the preferred content settings.
"""]]

View file

@ -0,0 +1,10 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0"
subject="comment 3"
date="2024-12-16T22:48:16Z"
content="""
There is a standard group called \"transfer\" which is meant for this kind of thing: <https://git-annex.branchable.com/preferred_content/standard_groups/>. This is especially applicable if there is a static preferred content expression that can be written for each repository (i.e. no ad-hoc gets, just something more structured).
To make it more dynamic you could include a match on a metadata tag in a repositories preferred content expression. Requesting a file would then be setting the tag on it (well, and a bunch of syncing in all repositories).
"""]]

View file

@ -0,0 +1,12 @@
[[!comment format=mdwn
username="Doable8234"
avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278"
subject="comment 4"
date="2024-12-14T08:15:22Z"
content="""
Thanks, Joey. That seems to work based on my testing. Appreciate the quick and precise response!
Fixing my actual repo will have to wait since one of my nodes is now offline, but hopefully that goes off without a glitch.
Also just want to say how awesome git annex is. I've been using it for nearly 10 years now and don't see myself ever wanting to stop.
"""]]

View file

@ -0,0 +1,16 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0"
subject="comment 1"
date="2024-12-16T22:19:34Z"
content="""
It _looks_ like you can just set annex.uuid before the first `git annex init` to achieve this:
```
git init / git clone
git config annex.uuid 00000000-0000-0000-0000-000000000003
git annex init
```
But I would say that doing so is ill-advised. You can set a description for each repository and give the remotes descriptive names instead. If you use shared UUIDs you will run into an issue if it ever happens that two of those repositories become connected.
"""]]

View file

@ -0,0 +1,83 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/59541f50d845e5f81aff06e88a38b9de"
subject="comment 3"
date="2024-12-13T22:02:14Z"
content="""
Your comment seems to be wrongly formatted. It was shown correctly in the notification mail, but doesn't show up here.
---
Just to document what I have tried out, for completeness: with what is already in place it is possible to connect two repositories over yggstack, it is just very awkward.
On one system you can do:
- `sudo mkdir /etc/tor && sudo touch /etc/tor/torrc` (without actually having tor installed)
- `sudo git annex enable-tor $(id -u)`
- `yggstack -genconf > yggstack.conf`
- `echo tor-annex::<pubkey>.pk.ygg:12345` (take the pubkey out of yggstack.conf)
- `socat TCP-LISTEN:12345,fork,reuseaddr UNIX-CONNECT:/var/lib/tor-annex/<uid>_<repo-uuid>/s`
- `yggstack -useconffile yggstack.conf -remote-tcp 12345:127.0.0.1:12345`
- `git annex p2p --gen-addresses`
On the other system do:
- `yggstack -autoconf -socks 127.0.0.1:9050`
- `git annex p2p --link` and paste in the generated address when asked (it should have the form `tor-annex::<pubkey>.pk.ygg:12345:<auth-token>`)
On the server side this simply exposes the p2p socket generated for tor through a different means, and on the client side this works because yggstack can be used similarly enough to tor (doing name resolution through the socks proxy at port 9050 and then connecting the supplied port).
---
I really like your proposal of a `p2p-annex::foo+<whatever>` remote; together with a way to tell remotedaemon to start a process exposing the socket it would make for an easily extendable mechanism. Imagine this:
Client side:
- `p2p-annex::foo+<addr>` would start `git-annex-p2p-foo <addr>` and talk to its stdin/stdout.
Server side:
- A configuration option `annex.start-p2psocket=true` would instruct remotedaemon to listen on .git/annex/p2psocket (I think a hardcoded location is fine, as there only really needs to be one such socket even with multiple networks, and somewhere under .git/annex is a good location to associate it with the repository and will always be writable by the user).
- A configuration option `annex.expose-p2p-via=foo` that could be supplied zero, one, or multiple times, and each of these configurations would instruct remotedaemon to start the external program git-annex-p2ptransport-foo after the p2p socket is ready (this configuration could also just point to a command to execute, but I thought it might be nice to stay with the theme of commonly prefixed programs).
With these things in place a third-party package git-annex-p2p-yggstack could provide a simple set of shell scripts to implement transport over yggstack:
For the server side there would be a `git-annex-p2ptransport-yggstack` along these lines (modulo proper process cleanup of course):
```
socat TCP-LISTEN:12345,fork,reuseaddr UNIX-CONNECT:.git/annex/p2psocket &
yggstack -useconffile .git/annex/p2ptransport/yggstack/yggstack.conf -remote-tcp 12345:127.0.0.1:12345
```
and a `git-annex-p2ptransport-enable-yggstack` like this:
```
git config --local annex.start-p2psocket true
git config --local --add annex.expose-p2p-via yggstack
if [ ! -f .git/annex/p2ptransport/yggstack/yggstack.conf ]; then
yggstack -genconf > .git/annex/p2ptransport/yggstack/yggstack.conf
fi
echo \"p2p-annex::yggstack+<pubkey-from-yggstack.conf>.pk.ygg:12345\" >> .git/annex/creds/p2paddrs
```
For the client-side it would provide `git-annex-p2p-yggstack` along these lines:
```
yggstack -autoconf -socks 127.0.0.1:1080
nc -X 5 -x 127.0.0.1:1080 <pubkey>.pk.ygg 12345
```
With that package installed one could then do `git annex p2ptransport enable-yggstack` followed by `git annex p2p --gen-addresses`. A `git annex remotedaemon` would now start everything on the server-side, and the client-side could connect using `git annex p2p --link` with the address from `--gen-addresses`.
---
I think this would be sufficiently flexible for most kinds of p2p transport one could come up with. E.g. a transport over fowl or even plain magic-wormhole (though the transit relay wouldn't appreciate it) could use `p2p-annex::fowl+<code>` where the code is a pre-generated token instead of the usual passphrases used by magic-wormhole. The server side would be a script that repeatedly waits for connections to that code, the client side just connects to it.
Even for more traditional p2p setups (tinc, wireguard, yggdrasil, etc.) where the transport is pre-set up at the system level this would just work if there was a helper for `p2p-annex::tcpip+<hostname>:<port>` (effectively just netcat again).
---
Configuration, program, and subcommand names etc. are of course open to bike-shedding. Some of the hardcoded ports above should be dynamically chosen, or completely avoided if the transport can do so (yggstack and fowl can't expose unix sockets directly yet, so the digression through the loopback device is needed for now).
What do you think?
"""]]

View file

@ -0,0 +1,20 @@
[[!comment format=mdwn
username="matrss"
avatar="http://cdn.libravatar.org/avatar/cd1c0b3be1af288012e49197918395f0"
subject="comment 4"
date="2024-12-15T18:13:00Z"
content="""
One more thought: the proposed `p2p-annex::foo+<addr>` remote makes one assumption that I don't think holds for all thinkable p2p transports. That assumption is that there is a public address for the server-side that can be trusted to be the expected other side.
For tor and yggstack this does hold: the public address (onion address of the hidden service for tor and the IPv6 derived from the public key of the yggstack peer (potentially resolved from a .pk.ygg DNS entry like above), respectively) ensures that the server side is who they are expected to be. There is no way for a third-party to pretend that they were the server-side, even if they knew the git remote string, because they would need to have the servers private key to do so.
This is not the case for fowl: with fowl one would essentially do `fowl <psk> ...` on both sides to create a tunnel between server and client. If the PSK were fully contained in the remote string then a third-party getting hold of that string could pretend to be the server (when the server side is currently not waiting for a connection itself) and steal the auth token from the client. So under the assumption that the remote string is not a secret this would be a problem.
But this problem can be overcome: with fowl both sides could simply derive the psk from the p2p auth token to establish the connection, essentially like so: `fowl <number derived from auth token>-<auth token> ...`. The git remote string would only need to contain the information to use fowl and some unique identifier for the remote then, so that the right auth token can be taken from .git/annex/creds.
Likewise, for other p2p transports that don't have stable and secure public addresses, necessary information exchange could also happen over magic-wormhole using the auth tokens, or the auth tokens could be used as PSKs between both sides if that's what the transport needs. This would e.g. apply for a hypothetical transport over webrtc data channels, where some kind of \"SDP\" has to be exchanged between both sides to establish a connection.
---
All that to say: I think `p2p-annex::foo+` would indeed be general enough for many conceivable means of transport, if a re-use of the auth tokens in the above fashion would be acceptable. And I can't think of anything against it, yet.
"""]]

View file

@ -0,0 +1,8 @@
[[!comment format=mdwn
username="Doable8234"
avatar="http://cdn.libravatar.org/avatar/b0d5fea745f92c3b8cc8ecc3dafa6278"
subject="comment 1"
date="2024-12-16T08:24:31Z"
content="""
I've absolutely no idea about the relative difficulty of implementing these, but it sounds to me like your second part `It would also perhaps be good to detect when matching options are used that don't make sense, and error out on commands like git-annex find --not or git-annex find -and -(` might actually be more important than the first!
"""]]