Commit graph

148 commits

Author SHA1 Message Date
Joey Hess
cf59d7f92c
GET and CHECKPRESENT amoung lowest cost cluster nodes
Before it was using a node that might have had a higher cost.

Also threw in a random selection from amoung the low cost nodes. Of
course this is a poor excuse for load balancing, but it's better than
nothing. Most of the time...
2024-06-27 14:36:55 -04:00
Joey Hess
3dad9446ce
distributed cluster cycle prevention
Added BYPASS to P2P protocol, and use it to avoid cycling between
cluster gateways.

Distributed clusters are working well now!
2024-06-27 12:20:22 -04:00
Joey Hess
cec2848e8a
support annex.jobs for clusters 2024-06-25 14:54:20 -04:00
Joey Hess
818030e4d3
improve handling of cluster nodes disconnecting 2024-06-25 14:10:06 -04:00
Joey Hess
1bfe7f8a53
honor preferred content settings of cluster nodes
Except when no nodes want a file, it has to be stored somewhere, so
store it on all. Which is not really desirable, but neither is having to
pick one.

ProtoAssociatedFile deserialization is rather broken, and this could
possibly affect preferred content expressions that match on filenames.

The inability to roundtrip whitespace like tabs and newlines through is
not a problem because preferred content expressions can't be written
that match on whitespace such as a tab. For example:

joey@darkstar:~/tmp/bench/z>git-annex wanted  origin-node2 'exclude=*CTRL-VTab*'
wanted origin-node2
git-annex: Parse error: Parse failure: near "*"

But, the filtering of control characters could perhaps be a problem. I think
that filtering is now obsolete, git-annex has comprehensive filtering of
control characters when displaying filenames, that happens at a higher level.
However, I don't want to risk a security hole so am leaving in that filtering
in ProtoAssociatedFile deserialization for now.
2024-06-25 11:43:09 -04:00
Joey Hess
8a341cd195
fix comparison
With this a PUT to two remotes that have different partial amounts
transferred works reliably. I'm not sure though that it doesn't have
fencepost errors.
2024-06-23 16:01:58 -04:00
Joey Hess
466c972913
don't use SUCCESS-PLUS unncessarily
When dropping from a proxied remote that is not a cluster,
SUCCESS-PLUS is not needed, so don't use it.
2024-06-23 10:07:26 -04:00
Joey Hess
2762f9c4ce
fix location log update for copy to 1-node cluster 2024-06-23 09:53:33 -04:00
Joey Hess
5b332a87be
dropping from clusters
Dropping from a cluster drops from every node of the cluster.
Including nodes that the cluster does not think have the content.
This is different from GET and CHECKPRESENT, which do trust the
cluster's location log. The difference is that removing from a cluster
should make 100% the content is gone from every node. So doing extra
work is ok. Compare with CHECKPRESENT where checking every node could
make it very expensive, and the worst that can happen in a false
negative is extra work being done.

Extended the P2P protocol with FAILURE-PLUS to handle the case where a
drop from one node succeeds, but a drop from another node fails. In that
case the entire cluster drop has failed.

Note that SUCCESS-PLUS is returned when dropping from a proxied remote
that is not a cluster, when the protocol version supports it. This is
because P2P.Proxy does not know when it's proxying for a single node
cluster vs for a remote that is not a cluster.
2024-06-23 09:43:40 -04:00
Joey Hess
ecab2e03b9
working PUT fanout to multiple remotes for clusters
Still need to check for fencepost errors on resume when
different nodes have different amounts of data.
2024-06-20 10:04:26 -04:00
Joey Hess
f18740699e
P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS
Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS
is complete, when a PUT stores to additional repositories
than the expected on, the location log is updated with the
additional UUIDs that contain the content.

Started implementing PUT fanout to multiple remotes for clusters.
It is untested, and I fear fencepost errors in the relative
offset calculations. And it is missing proxying for the protocol
after DATA.
2024-06-18 16:21:40 -04:00
Joey Hess
f049156a03
checkpresent support for clusters
This assumes that the proxy for a cluster has up-to-date location
logs. If it didn't, it might proxy the checkpresent to a node that no
longer has the content, while some other node still does, and so
it would incorrectly appear that the cluster no longer contains the
content.

Since cluster UUIDs are not stored to location logs,
git-annex fsck --fast when claiming to fix a location log when
that occurred would not cause any problems. And presumably the location
tracking would later get sorted out.

At least usually, changes to the content of nodes goes via the proxy,
and it will update its location logs, so they will be accurate. However,
if there were multiple proxies to the same cluster, or nodes were
accessed directly (or via proxy to the node and not the cluster),
the proxy's location log could certainly be wrong.

(The location log access for GET has the same issues.)
2024-06-18 11:16:16 -04:00
Joey Hess
88d9a02f7c
initial, working support for getting from clusters
Currently tends to put all the load on a single node, which will need to
be improved.
2024-06-18 11:01:10 -04:00
Joey Hess
ef26470810
ProxySelector data type 2024-06-17 19:19:15 -04:00
Joey Hess
7a839a983a
preparing for cluster node selection
Support selecting what remote to proxy for each top-level P2P protocol
message.

This only needs to be extended now to support fanout to multiple
nodes for PUT and REMOVE, and with a remote that fails for
LOCKCONTENT and UNLOCKCONTENT.

But a good first step would be to implement CHECKPRESENT and GET for
clusters. Both should select a node that actually does have the content.
That will allow a cluster to work for GET even when location tracking is
out of date.
2024-06-17 15:51:10 -04:00
Joey Hess
291280ced2
started on git-annex-shell cluster support
Works down to P2P protocol.

The question now is, how to handle protocol version negotiation for
clusters? Connecting to each node to find their protocol versions and
using the lowest would be too expensive with a lot of nodes. So it seems
that the cluster needs to pick its own protocol version to use with the
client.

Then it can either negotiate that same version with the nodes when
it comes time to use them, or it can translate between multiple protocol
versions. That seems complicated. Thinking it would be ok to refuse to
use a node if it is not able to negotiate the same protocol version with
it as with the client. That will mean that sometimes need nodes to be
upgraded when upgrading the cluster's proxy. But protocol versions
rarely change.
2024-06-17 15:10:04 -04:00
Joey Hess
c7ad44e4d1
work toward supporting proxying to multiple remotes at once
For eg, upload fanout.

Delay connecting to a remote until it's needed. When there are many
proxied remotes, it would not do for the proxy to connect to each of
them on startup; that could take a long time.
2024-06-17 14:16:44 -04:00
Joey Hess
83a1db8d17
more specific type 2024-06-17 13:04:40 -04:00
Joey Hess
b72ccc6f0c
improve types 2024-06-17 12:44:08 -04:00
Joey Hess
dfdda95053
proxy updates location tracking information
This does mean a redundant write to the git-annex branch. But,
it means that two clients can be using the same proxy, and after
one sends a file to a proxied remote, the other only has to pull from
the proxy to learn about that. It does not need to pull from every
remote behind the proxy (which it couldn't do anyway as git repo
access is not currently proxied).

Anyway, the overhead of this in git-annex branch writes is no worse
than eg, sending a file to a repository where git-annex assistant
is running, which then sends the file on to a remote, and updates
the git-annex branch then. Indeed, when the assistant also drops
the local copy, that results in more writes to the git-annex branch.
2024-06-12 11:37:14 -04:00
Joey Hess
96853cd833
finish P2P protocol proxying
CONNECT is not supported by git-annex-shell p2pstdio, but for proxying
to tor-annex remotes, it will be supported, and will make a git pull/push
to a proxied remote work the same with that as it does over ssh,
eg it accesses the proxy's git repo not the proxied remote's git repo.

The p2p protocol docs say that NOTIFYCHANGES is not always supported,
and it looked annoying to implement it for this, and it also seems
pretty useless, so make it be a protocol error. git-annex remotedaemon
will already be getting change notifications from the proxy's git repo,
so there's no need to get additional redundant change notifications for
proxied remotes that would be for changes to the same git repo.
2024-06-12 10:40:51 -04:00
Joey Hess
6e1df33960
minimized code duplication due to type checker limitations 2024-06-11 17:16:49 -04:00
Joey Hess
5beaffb412
proxying PUT now working
The almost identical code duplication between relayDATA and relayDATA'
is very annoying. I tried quite a few things to parameterize them, but
the type checker is having fits when I try it.
2024-06-11 16:56:52 -04:00
Joey Hess
a2f4a8eddf
proxying GET now working
Memory use is small and constant; receiveBytes returns a lazy bytestring
and it does stream.

Comparing speed of a get of a 500 mb file over proxy from origin-origin,
vs from the same remote over a direct ssh:

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from origin-origin
get bigfile (from origin-origin...)
ok
(recording state in git...)
1.89user 0.67system 0:10.79elapsed 23%CPU (0avgtext+0avgdata 68716maxresident)k
0inputs+984320outputs (0major+10779minor)pagefaults 0swaps

joey@darkstar:~/tmp/bench/client>/usr/bin/time git-annex get bigfile --from direct-ssh
get bigfile (from direct-ssh...)
ok
1.79user 0.63system 0:10.49elapsed 23%CPU (0avgtext+0avgdata 65776maxresident)k
0inputs+1024312outputs (0major+9773minor)pagefaults 0swaps

So the proxy doesn't add much overhead even when run on the same machine as
the client and remote.

Still, piping receiveBytes into sendBytes like this does suggest that the proxy
could be made to use less CPU resouces by using `sendfile()`.
2024-06-11 15:09:43 -04:00
Joey Hess
58d8ba5a4f
implement simple proxy actions (untested)
Still need to implement GET and PUT, and will implement CONNECT and
NOTIFYCHANGE for completeness.

All ServerMode checking is implemented for the proxy.

There are two possible approaches for how the proxy sends back messages
from the remote to the client. One would be to have a background thread
that reads messages and sends them back as they come in. The other,
which is being implemented so far, is to read messages from the remote
at points where it is expected to send them, and relay back to the
client before reading the next message from the client. At this point,
I'm unsure which approach would be better.

The need for proxynoresponse to be used by UNLOCKCONTENT, for example,
builds protocol knowledge into the proxy which it would not need with
the other method.
2024-06-11 12:56:20 -04:00
Joey Hess
373ae49c87
factor out helper functions
These will be used by the proxy, which needs to check the ServerMode in
the same way.
2024-06-11 12:04:58 -04:00
Joey Hess
92c83a417f
refactoring 2024-06-11 10:22:05 -04:00
Joey Hess
501d65eeab
started implementing git-annex-shell proxy
So far, it negotiates VERSION with both parties. This is a tricky dance.

Untested.
2024-06-10 18:01:36 -04:00
Joey Hess
9a8391078a
git-annex-shell: block relay requests
connRepo is only used when relaying git upload-pack and receive-pack.
That's only supposed to be used when git-annex-remotedaemon is serving
git-remote-tor-annex connections over tor. But, it was always set, and
so could be used in other places possibly.

Fixed by making connRepo optional in the P2P protocol interface.

In Command.EnableTor, it's not needed, because it only speaks the
protocol in order to check that it's able to connect back to itself via
the hidden service. So changed that to pass Nothing rather than the git
repo.

In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio,
so is making the requests, so will never need connRepo.

In git-annex-shell p2pstdio, it was accepting git upload-pack and
receive-pack requests over the P2P protocol, even though nothing sent
them. This is arguably a security hole, particularly if the user has
set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent
git push/pull via git-annex-shell.
2024-06-10 14:16:27 -04:00
Joey Hess
f6cf2dec4c
disk free checking for unsized keys
Improve disk free space checking when transferring unsized keys to
local git remotes. Since the size of the object file is known, can
check that instead.

Getting unsized keys from local git remotes does not check the actual
object size. It would be harder to handle that direction because the size
check is run locally, before anything involving the remote is done. So it
doesn't know the size of the file on the remote.

Also, transferring unsized keys to other remotes, including ssh remotes and
p2p remotes don't do disk size checking for unsized keys. This would need a
change in protocol.

(It does seem like it would be possible to implement the same thing for
directory special remotes though.)

In some sense, it might be better to not ever do disk free checking for
unsized keys, than to do it only sometimes. A user might notice this
direction working and consider it a bug that the other direction does not.
On the other hand, disk reserve checking is not implemented for most
special remotes at all, and yet it is implemented for a few, which is also
inconsistent, but best effort. And so doing this best effort seems to make
some sense. Fundamentally, if the user wants the size to always be checked,
they should not use unsized keys.

Sponsored-by: Brock Spratlen on Patreon
2024-01-16 14:29:10 -04:00
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Joey Hess
33a05a1a91
small RawFilePath optimisation 2023-03-02 10:53:12 -04:00
Joey Hess
54ad1b4cfb
Windows: Support long filenames in more (possibly all) of the code
Works around this bug in unix-compat:
https://github.com/jacobstanley/unix-compat/issues/56
getFileStatus and other FilePath using functions in unix-compat do not do
UNC conversion on Windows.

Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the
necessary conversion on windows to support long filenames.

Audited all imports of System.PosixCompat.Files to make sure that no
functions that operate on FilePath were imported from it. Instead, use
the equvilants from Utility.RawFilePath. In particular the
re-export of that module in Common had to be removed, which led to lots
of other changes throughout the code.

The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig
make Utility.Directory not be needed to build setup. And so let it use
Utility.RawFilePath, which depends on unix, which cannot be in
setup-depends.

Sponsored-by: Dartmouth College's Datalad project
2023-03-01 15:55:58 -04:00
Joey Hess
e95747a149
fix handling of corrupted data received from git remote
Recover from corrupted content being received from a git remote due eg to a
wire error, by deleting the temporary file when it fails to verify. This
prevents a retry from failing again.

Reversion introduced in version 8.20210903, when incremental verification
was added.

Only the git remote seems to be affected, although it is certianly
possible that other remotes could later have the same issue. This only
affects things passed to getViaTmp that return (False, UnVerified) due to
verification failing. As far as getViaTmp can tell, that could just as well
mean that the transfer failed in a way that would resume, so it cannot
delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally,
while other remotes do not, which is why only it seems affected.

A better fix perhaps would be to improve the types of the callback
passed to getViaTmp, so that some other value could be used to indicate
the state where the transfer succeeded but verification failed.

Sponsored-by: Boyd Stephen Smith Jr.
2022-01-07 13:25:33 -04:00
Joey Hess
8034f2e9bb
factor out IncrementalHasher from IncrementalVerifier 2021-11-09 12:33:22 -04:00
Joey Hess
88b63a43fa
distinguish between incremental verification failing and not being done
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 14:38:02 -04:00
Joey Hess
449851225a
refactor
IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.

It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 13:19:02 -04:00
Joey Hess
fa62c98910
simplify and speed up Utility.FileSystemEncoding
This eliminates the distinction between decodeBS and decodeBS', encodeBS
and encodeBS', etc. The old implementation truncated at NUL, and the
primed versions had to do extra work to avoid that problem. The new
implementation does not truncate at NUL, and is also a lot faster.
(Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the
primed versions.)

Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation,
and upgrading to it will speed up to/fromRawFilePath.

AFAIK, nothing relied on the old behavior of truncating at NUL. Some
code used the faster versions in places where I was sure there would not
be a NUL. So this change is unlikely to break anything.

Also, moved s2w8 and w82s out of the module, as they do not involve
filesystem encoding really.

Sponsored-by: Shae Erisson on Patreon
2021-08-11 12:13:31 -04:00
Joey Hess
de482c7eeb
move verifyKeyContent to Annex.Verify
The goal is that Database.Keys be able to use it; it can't use
Annex.Content.Presence due to an import loop.

Several other things also needed to be moved to Annex.Verify as a
conseqence.
2021-07-27 14:07:23 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
6940d4ad40
chance case of "transfer failed" to match "Transfer stalled" 2021-03-06 17:47:05 -04:00
Joey Hess
4b63e932f3
incremental checksum on upload to ssh or p2p 2021-02-10 12:41:05 -04:00
Joey Hess
94f6210b68
deal with possibility of short read by S.hGet
It may read less than requested, and may yield an empty string if the
file was somehow shorter than expected.
2021-02-09 22:20:05 -04:00
Joey Hess
62e152f210
incremental checksum on download from ssh or p2p
Checksum as content is received from a remote git-annex repository, rather
than doing it in a second pass.

Not tested at all yet, but I imagine it will work!

Not implemented for any special remotes, and also not implemented for
copies from local remotes. It may be that, for local remotes, it will
suffice to use rsync, rely on its checksumming, and simply return Verified.
(It would still make a checksumming pass when cp is used for COW, I guess.)
2021-02-09 17:03:27 -04:00
Joey Hess
dd39e9e255
suggest when user may want annex.stalldetection
When annex.stalldetection is not enabled, and a likely stall is detected,
display a suggestion to enable it.

Note that the progress meter display is not taken down when displaying
the message, so it will display like this:

	0%    8 B                 0 B/s
	  Transfer seems to have stalled. To handle stalling transfers, configure annex.stalldetection
	0%    10 B                0 B/s

Although of course if it's really stalled, it will never update
again after the message. Taking down the progress meter and starting
a new one doesn't seem too necessary given how unusual this is,
also this does help show the state it was at when it stalled.

Use of uninterruptibleCancel here is ok, the thread it's canceling
only does STM transactions and sleeps. The annex thread that gets
forked off is separate to avoid it being canceled, so that it
can be joined back at the end.

A module cycle required moving from dupState the precaching of the
remote list. Doing it at startConcurrency should cover all the cases
where the remote list is used in concurrent actions.

This commit was sponsored by Kevin Mueller on Patreon.
2021-02-03 15:57:19 -04:00
Joey Hess
94b323a8e8
use TotalSize more extensively 2020-12-11 12:10:43 -04:00
Joey Hess
a422a056f2
make getViaTmpFrom no longer update location log
All callers adjusted to update it themselves.

In Command.ReKey, and Command.SetKey, the cleanup action already did,
so it was updating the log twice before.

This fixes a bug when annex.stalldetection is set, as now
Command.Transferrer can skip updating the location log, and let it be
updated by the calling process.
2020-12-11 11:50:13 -04:00
Joey Hess
7e25c643cf
avoid P2P.Protocol defining instance Proto.Serializable AssociatedFile
It's got its own specific hacks, so it needs it own specific data type.
2020-12-09 13:46:42 -04:00
Joey Hess
4c47568876
refactoring
This is groundwork for using git-annex transferkeys to run transfers,
in order to allow stalled transfers to be interrupted and retried.

The new upload and download are closer to what git-annex transferkeys
does, so the plan is to make them use it.

Then things that were left using upload' and download' won't recover
from stalls. Notably, that includes import and export. But
at least get/move/copy will be able to. (Also the assistant hopefully,
but not yet.)

This commit was sponsored by Jake Vosloo on Patreon.
2020-12-07 14:49:17 -04:00