Commit graph

1791 commits

Author SHA1 Message Date
Joey Hess
80d82dba99
releasing package git-annex version 10.20241031 2024-10-31 17:20:13 -04:00
Joey Hess
ccbc5189b5
Fix hang when receiving a large file into a proxied special remote
Only indicate that we're done with the bytestring once it all gets written.
Otherwise, the end of it may get garbage collected before we can process
it, leading to a hang.

This seems to have been introduced in commit
cdc4bd7443. Which oddly was trying to fix a
very similar problem, but specific to a cluster node. In that commit,
things got out of order, with it signaling it was done with the bytestring
before it has written all of it to the file.

My test case for this bug is a directory special remote
with a file being sent to it via a proxy accessed via ssh or http.
The file was 10 mb, and it hung on the last few kb of it not being
received.

I've also tested this fix in the case of proxying to a cluster node
directory special remote over http, which was the case
cdc4bd7443 was dealing with.
2024-10-30 12:29:37 -04:00
Joey Hess
2ca6ecad58
add tip for DATA-PRESENT feature 2024-10-29 16:15:01 -04:00
Joey Hess
0117cdab11
document DATA-PRESENT in CHANGELOG
I wonder where else this could be documented? It's kind of a niche
feature, since it needs at least a partial custom implementation of the p2p
protocol or the p2phttp protocol. But it can save a lot of bandwidth and
avoid the proxy needing disk space to buffer files uploaded to a special
remote.
2024-10-29 15:07:30 -04:00
Joey Hess
8baccda98f
Merge branch 'master' into streamproxy 2024-10-22 09:49:28 -04:00
Joey Hess
bdf3a4747f
adjust: Allow any order of options when combining --hide-missing with options like --unlock.
optparse-applicative made this hard, the naive implementation this had
before didn't let --hide-missing come after --unlock. And just adding
additional <|> with --hide-missing coming after --unlock didn't work
either. So need to get some options and then combine them.
2024-10-21 16:03:39 -04:00
Joey Hess
de138c642b
p2phttp: Allow unauthenticated users to lock content by default
* p2phttp: Allow unauthenticated users to lock content by default.
* p2phttp: Added --unauth-nolocking option to prevent unauthenticated
  users from locking content.

The rationalle for this is that locking is not really a write operation, so
makes sense to allow in a repository that only allows read-only access. Not
supporting locking in that situation will prevent the user from dropping
content from a special remote they control in cases where the other copy of
the content is on the p2phttp server.

Also, when p2phttp is configured to also allow authenticated access,
lockcontent was resulting in a password prompt for users who had no way to
authenticate. And there is no good way to distinguish between the two types
of users client side.

--unauth-nolocking anticipates that this might be abused, and seems better
than disabling unauthenticated access entirely if a server is being
attacked. It may be that rate limiting locking by IP address or similar
would be an effective measure in such a situation. Or just limiting the
number of locks by anonymous users that can be live at any one time. Since
the impact of such an DOS attempt is limited to preventing dropping content
from the server, it seems not a very appealing target anyway.
2024-10-21 10:02:12 -04:00
Joey Hess
82e91b380a
add GITMANIFEST to parseKeyVariety
git-remote-annex: Fix bug that prevented using it with external special
remotes, leading to protocol error messages involving "GITMANIFEST".
2024-10-19 17:12:23 -04:00
Joey Hess
8c7047fc77
Merge branch 'master' into streamproxy 2024-10-18 10:18:59 -04:00
Joey Hess
3a53c60121
Allow enabling the servant build flag with older versions of stm
Allowing building with ghc 9.0.2 (debian stable).
2024-10-17 14:04:31 -04:00
Joey Hess
0629219617
p2phttp combining unauth and auth options
p2phttp: Support serving unauthenticated users while requesting
authentication for operations that need it. Eg, --unauth-readonly can be
combined with --authenv.

Drop locking currently needs authentication so it will prompt for that.
That still needs to be addressed somehow.
2024-10-17 11:10:28 -04:00
Joey Hess
d9b4bf4224
added retrieveKeyFileInOrder and ORDERED to external special remote protocol
I anticipate lots of external special remote programs will neglect
implementing this. Still, it's the right thing to do to assume that some
of them may write files out of order. Probably most external special
remotes will not be used with a proxy. When someone is using one with a
proxy, they can always get it fixed to send ORDERED.
2024-10-15 15:40:14 -04:00
Joey Hess
edaed18e4c
Sped up proxied downloads from special remotes, by streaming
Currently works for special remotes that don't use fileRetriever. Ones that
do will download to another filename and rename it into place, defeating
the streaming.

This actually benchmarks slightly slower when getting a large file from
a fast proxied special remote. However, when the proxied special remote
is slow, it will be a big win.
2024-10-15 12:25:15 -04:00
Joey Hess
fca26db22b
releasing package git-annex version 10.20240927 2024-09-30 19:15:57 -04:00
Joey Hess
dc6c0f0f1f
preparing for release later this week 2024-09-25 14:43:52 -04:00
Joey Hess
5a4bee24b8
fix sizebalanced empty size bug
Fix bug that prevented anything being stored in an empty repository whose
preferred content expression uses sizebalanced.
2024-09-23 14:30:18 -04:00
Joey Hess
52891711d2
git-annex sim command is working
Had to add Read instances to Key and NumCopies and some other similar
types. I only expect to use those in serializing a sim. Of course, this
risks that implementation changes break reading old data. For a sim,
that would not be a big problem.
2024-09-12 16:10:52 -04:00
Joey Hess
811dd95453
maxsize of 0 to disable 2024-09-09 09:32:43 -04:00
Joey Hess
340bdd0dac
treat "not present" in preferred content as invalid
Detect when a preferred content expression contains "not present", which
would lead to repeatedly getting and then dropping files, and make it never
match. This also applies to "not balanced" and "not sizebalanced".

--explain will tell the user when this happens

Note that getMatcher calls matchMrun' and does not check for unstable
negated limits. While there is no --present anyway, if there was,
it would not make sense for --not --present to complain about
instability and fail to match.
2024-09-03 13:50:06 -04:00
Joey Hess
8b2bd42540
Fix --debug display of onlyingroup preferred content expression. 2024-09-03 12:38:59 -04:00
Joey Hess
b3dc656153
releasing package git-annex version 10.20240831 2024-08-31 19:50:26 -04:00
Joey Hess
d0938d730b
Merge branch 'master' into balanced 2024-08-30 11:01:39 -04:00
Joey Hess
242c525659
lookupkey: Allow using --ref in a bare repository. 2024-08-30 10:55:48 -04:00
Joey Hess
70e2fca257
Added the annex.fullybalancedthreshhold git config. 2024-08-22 07:15:55 -04:00
Joey Hess
9e87061de2
Support "sizebalanced=" and "fullysizebalanced=" too
Might want to make --rebalance turn balanced=group:N where N > 1
to fullysizebalanced=group:N. Have not yet determined if that will
improve situations enough to be worth the extra work.
2024-08-21 15:01:54 -04:00
Joey Hess
99514f9d18
maxsize overview display and --json support 2024-08-18 12:08:13 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
bda23daa6c
update 2024-08-08 15:54:22 -04:00
Joey Hess
fd03b31633
update 2024-08-08 15:53:36 -04:00
Joey Hess
7e48e712b2
update 2024-08-08 15:52:52 -04:00
Joey Hess
2616056cde
Merge branch 'exportreeplus' 2024-08-08 15:31:57 -04:00
Joey Hess
c15c32b5f8
releasing package git-annex version 10.20240808 2024-08-08 15:27:04 -04:00
Joey Hess
7294d23d78
export: Added --from option
This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
2024-08-08 12:08:55 -04:00
Joey Hess
6d96734128
updateproxy, updatecluster check annexobjects=yes
updateproxy, updatecluster: Prevent using an exporttree=yes special remote
that does not have annexobjects=yes, since it will not work.
2024-08-07 12:27:24 -04:00
Joey Hess
8864a9e353
update 2024-08-07 11:49:53 -04:00
Joey Hess
b8f8c38e88
Merge branch 'master' into exportreeplus 2024-08-07 11:28:21 -04:00
Joey Hess
509b23fa00
catch ClientError from withClientM
When getting from a P2P HTTP remote, prompt for credentials when required,
instead of failing.

This feels like it might be a bug in servant-client. withClientM's type
suggests it would not throw a ClientError. But it does in this case.
2024-08-07 11:24:34 -04:00
Joey Hess
c53f61e93f
Merge branch 'master' into exportreeplus 2024-08-06 14:46:33 -04:00
Joey Hess
3cc03b4c96
fix file corruption when proxying an upload to a special remote
The file corruption consists of each chunk of the file being duplicated.
Since chunks are typically a fixed size, it would certianly be possible
to get from a corrupted file back to the original file. But this is still
bad data loss.

Reversion was in commit fcc052bed8.
Luckily that did not make the most recent release.
2024-08-06 14:41:19 -04:00
Joey Hess
3289b1ad02
proxying to exporttree=yes annexobjects=yes basically working
It works when using git-annex sync/push/assist, or when manually sending
all content to the proxied remote before pushing to the proxy remote.
But when the push comes before the content is sent, sending content does
not update the exported tree.
2024-08-06 14:21:23 -04:00
Joey Hess
9da2860812
Merge branch 'master' into exportreeplus 2024-08-02 18:45:44 -04:00
Joey Hess
c34d1da22a
Remove debug output (to stderr)
Accidentially included in last version. Only happens when running code that
uses remoteUrl.
2024-08-02 14:13:29 -04:00
Joey Hess
28b29f63dc
initial support for annexobjects=yes
Works but some commands may need changes to support special remotes
configured this way.
2024-08-02 14:07:45 -04:00
Joey Hess
3a1f39fbdf
Avoid loading cluster log at startup
This fixes a problem with datalad's test suite, where loading the cluster
log happened to cause the git-annex branch commits to take a different
shape, with an additional commit.

It's also faster though, since many commands don't need the cluster log.

Just fill Annex.clusters with a thunk.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-31 15:54:14 -04:00
Joey Hess
c1bc0bffc8
releasing package git-annex version 10.20240731 2024-07-31 14:05:01 -04:00
Joey Hess
d1b641cb1e
update stack.yaml to nightly-2024-07-29 and remove stack-lts-18.13.yaml
Primarily because Windows needs a dependency bump to get stm-2.5.1
for Servant build flag.

This includes Win32-2.13.4.0 and aws-0.24 which adds some features
that windows had been missing out on as well.

Lots of warnings about head and tail will need to eventually be
addressed. Of course AFAIK the uses of it in git-annex are all safe.
2024-07-29 20:09:37 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
074fad819d
changelog 2024-07-29 13:09:19 -04:00