Commit graph

34923 commits

Author SHA1 Message Date
lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040
925c203c09 2024-09-02 15:08:25 +00:00
Joey Hess
9d29b99ac4
add news item for git-annex 10.20240831 2024-08-31 19:50:36 -04:00
Joey Hess
698d9252a5
mention sizebalanced as well as balanced 2024-08-30 12:06:45 -04:00
Joey Hess
53b7375cc6
update 2024-08-30 11:14:45 -04:00
Joey Hess
54b6151412
document using balanced preferred content in a cluster 2024-08-30 11:08:32 -04:00
Joey Hess
d0938d730b
Merge branch 'master' into balanced 2024-08-30 11:01:39 -04:00
Joey Hess
242c525659
lookupkey: Allow using --ref in a bare repository. 2024-08-30 10:55:48 -04:00
yarikoptic
e2b7895cbc Added a comment 2024-08-29 18:35:47 +00:00
Joey Hess
f89a1b8216
remove stale live changes from reposize database
Reorganized the reposize database directory, and split up a column.

checkStaleSizeChanges needs to run before needLiveUpdate,
otherwise the process won't be holding a lock on its pid file, and
another process could go in and expire the live update it records. It
just so happens that they do get called in the correct order, since
checking balanced preferred content calls getLiveRepoSizes before
needLiveUpdate.

The 1 minute delay between checks is arbitrary, but will avoid excess
work. The downside of it is that, if a process is dropping a file and
gets interrupted, for 1 minute another process can expect a repository
will soon be smaller than it is. And so a process might send data to a
repository when a file is not really going to be dropped from it. But
note that can already happen if a drop takes some time in eg locking and
then fails. So it seems possible that live updates should only be
allowed to increase, rather than decrease the size of a repository.
2024-08-28 13:57:25 -04:00
Joey Hess
278adbb726
combine 2 queries 2024-08-28 11:00:59 -04:00
Joey Hess
e006acef22
avoid reposize database locking overhead when not needed
Only when the preferred content expression being matched uses balanced
preferred content is this overhead needed.

It might be possible to eliminate the locking entirely. Eg, check the
live changes before and after the action and re-run if they are not
stable. For now, this is good enough, it avoids existing preferred
content getting slow. If balanced preferred content turns out to be too
slow to check, that could be tried later.
2024-08-28 10:52:34 -04:00
matrss
833150fd25 Added a comment 2024-08-28 14:11:36 +00:00
mih
16f9042046 Added a comment: Needed to retrieve single file metadata from bare repo 2024-08-28 13:58:30 +00:00
matrss
3f62116d64 Added a comment 2024-08-28 08:47:33 +00:00
Joey Hess
0a119184e6
thoughts 2024-08-27 14:59:13 -04:00
Joey Hess
8555fb88ef
locking in checkLiveUpdate
This makes sure that two threads don't check balanced preferred content at the
same time, so each thread always sees a consistent picture of what is
happening.

This does add a fairly expensive file level lock to every check of
preferred content, in commands that use prepareLiveUpdate. It would
be good to only do that when live updates are actually needed, eg when
the preferred content expression uses balanced preferred content.
2024-08-27 13:12:43 -04:00
Joey Hess
4d2f95853d
closing in on finishing live reposizes
Fixed successfullyFinishedLiveSizeChange to not update the rolling total
when a redundant change is in RecentChanges.

Made setRepoSizes clear RecentChanges that are no longer needed.
It might be possible to clear those earlier, this is only a convenient
point to do it.

The reason it's safe to clear RecentChanges here is that, in order for a
live update to call successfullyFinishedLiveSizeChange, a change must be
made to a location log. If a RecentChange gets cleared, and just after
that a new live update is started, making the same change, the location
log has already been changed (since the RecentChange exists), and
so when the live update succeeds, it won't call
successfullyFinishedLiveSizeChange. The reason it doesn't
clear RecentChanges when there is a reduntant live update is because
I didn't want to think through whether or not all races are avoided in
that case.

The rolling total in SizeChanges is never cleared. Instead,
calcJournalledRepoSizes gets the initial value of it, and then
getLiveRepoSizes subtracts that initial value from the current value.
Since the rolling total can only be updated by updateRepoSize,
which is called with the journal locked, locking the journal in
calcJournalledRepoSizes ensures that the database does not change while
reading the journal.
2024-08-27 12:54:46 -04:00
Spencer
949be665c0 Added contributions section to track my bugs and inquiries 2024-08-26 20:02:03 +00:00
Joey Hess
21608716bd
started work on getLiveRepoSizes
Doesn't quite compile
2024-08-26 14:50:09 -04:00
Joey Hess
db89e39df6
partially fix concurrency issue in updating the rollingtotal
It's possible for two processes or threads to both be doing the same
operation at the same time. Eg, both dropping the same key. If one
finishes and updates the rollingtotal, then the other one needs to be
prevented from later updating the rollingtotal as well. And they could
finish at the same time, or with some time in between.

Addressed this by making updateRepoSize be called with the journal
locked, and only once it's been determined that there is an actual
location change to record in the log. updateRepoSize waits for the
database to be updated.

When there is a redundant operation, updateRepoSize won't be called,
and the redundant LiveUpdate will be removed from the database on
garbage collection.

But: There will be a window where the redundant LiveUpdate is still
visible in the db, and processes can see it, combine it with the
rollingtotal, and arrive at the wrong size. This is a small window, but
it still ought to be addressed. Unsure if it would always be safe to
remove the redundant LiveUpdate? Consider the case where two drops and a
get are all running concurrently somehow, and the order they finish is
[drop, get, drop]. The second drop seems redundant to the first, but
it would not be safe to remove it. While this seems unlikely, it's hard
to rule out that a get and drop at different stages can both be running
at the same time.
2024-08-26 09:43:32 -04:00
Joey Hess
03c7f99957
todo 2024-08-25 10:48:42 -04:00
Joey Hess
2b037d36a1
update 2024-08-24 15:06:00 -04:00
Joey Hess
6660984442
update 2024-08-24 13:15:39 -04:00
Joey Hess
d60a33fd13
improve live update starting
In an expression like "balanced=foo and exclude=bar", avoid it starting
a live update when the overall expression doesn't match.
2024-08-24 13:07:05 -04:00
Joey Hess
16f945459c
todo 2024-08-24 11:58:17 -04:00
Joey Hess
2f20b939b7
LiveUpdate db updates working
I've tested the behavior of the thread that waits for the LiveUpdate to
be finished, and it does get signaled and exit cleanly when the
LiveUpdate is GCed instead.

Made finishedLiveUpdate wait for the thread to finish updating the
database.

There is a case where GC doesn't happen in time and the database is left
with a live update recorded in it. This should not be a problem as such
stale data can also happen when interrupted and will need to be detected
when loading the database.

Balanced preferred content expressions now call startLiveUpdate.
2024-08-24 11:49:58 -04:00
Joey Hess
84d1bb746b
LiveUpdate for clusters 2024-08-24 10:20:12 -04:00
Joey Hess
18cd8bf43a
punt on LiveUpdate plumbing through assistant for now 2024-08-24 09:37:24 -04:00
yarikoptic
efdee386c0 initial report on desire to do handle pathspecs 2024-08-24 01:35:31 +00:00
yarikoptic
c3877f648c initial idea on another ability for get 2024-08-24 01:23:04 +00:00
Joey Hess
c3d40b9ec3
plumb in LiveUpdate (WIP)
Each command that first checks preferred content (and/or required
content) and then does something that can change the sizes of
repositories needs to call prepareLiveUpdate, and plumb it through the
preferred content check and the location log update.

So far, only Command.Drop is done. Many other commands that don't need
to do this have been updated to keep working.

There may be some calls to NoLiveUpdate in places where that should be
done. All will need to be double checked.

Not currently in a compilable state.
2024-08-23 16:35:12 -04:00
Joey Hess
4885073377
add live size changes to RepoSize database
Not yet used.
2024-08-23 12:51:00 -04:00
Joey Hess
dad1fb150f
update 2024-08-23 11:45:36 -04:00
Joey Hess
d0ab1550ec
possible design to address reposizes concurrency issues 2024-08-23 11:19:38 -04:00
gauss@055c9051f507c97fa5612f46c74ce636f5ecde10
d71ca87bc9 Added a comment: No root privileges server - annex-shell replaced by git-annex-shell 2024-08-23 01:51:49 +00:00
Joey Hess
8ade3fc5d6
improve docs 2024-08-22 08:09:10 -04:00
Joey Hess
abdd49d8c1
update 2024-08-22 07:53:56 -04:00
Joey Hess
173500872f
update 2024-08-22 07:17:04 -04:00
Joey Hess
70e2fca257
Added the annex.fullybalancedthreshhold git config. 2024-08-22 07:15:55 -04:00
Joey Hess
3fe67744b1
display new empty repos in maxsize table
A new repo that has no location log info yet, but has an entry in
uuid.log has 0 size, so make RepoSize aware of that.

Note that a new repo that does not yet appear in uuid.log will still not
be displayed.

When a remote is added but not synced with yet, it has no uuid.log
entry. If git-annex maxsize is used to configure that remote, it needs
to appear in the maxsize table, and the change to Command.MaxSize takes
care of that.
2024-08-22 07:03:22 -04:00
Spencer
acaa8e9cd5 Added a comment: Precise Workflow 2024-08-22 00:18:28 +00:00
Joey Hess
76ece2a699
make --rebalance of balanced use fullysizebalanced when useful
When the specified number of copies is > 1, and some repositories are
too full, it can be better to move content from them to other less full
repositories, in order to make space for new content.

annex.fullybalancedthreshhold is documented, but not implemented yet

This is not tested very well yet, and is known to sometimes take several
runs to stabalize.
2024-08-21 17:59:08 -04:00
Joey Hess
9e87061de2
Support "sizebalanced=" and "fullysizebalanced=" too
Might want to make --rebalance turn balanced=group:N where N > 1
to fullysizebalanced=group:N. Have not yet determined if that will
improve situations enough to be worth the extra work.
2024-08-21 15:01:54 -04:00
Joey Hess
4e1dcc0372
bug 2024-08-21 12:18:31 -04:00
Joey Hess
476d223bce
implement fullbalanced=group:N
Rebalancing this when it gets into a suboptimal situation will need
further work.
2024-08-20 13:51:02 -04:00
Matthew
4a9e637d36 Added a comment: Help with .nfsXXXX files 2024-08-19 21:20:59 +00:00
matrss
9cfdae4c3b Added a comment 2024-08-19 10:25:13 +00:00
Joey Hess
68a99a8f48
size based rebalancing design 2024-08-18 16:25:12 -04:00
Joey Hess
99514f9d18
maxsize overview display and --json support 2024-08-18 12:08:13 -04:00
xentac
74b953cded Added a comment 2024-08-18 03:17:12 +00:00
Joey Hess
f985c58d8e
consistently don't show sizes of empty repositories
This used to be the case, and when matching options are used, that code
path still omits them, so also omit them in the getRepoSize code path.
2024-08-17 15:09:16 -04:00
Joey Hess
b62b58b50b
git-annex info speed up using getRepoSizes 2024-08-17 14:54:31 -04:00
Joey Hess
d09a005f2b
update RepoSize database from git-annex branch incrementally
The use of catObjectStream is optimally fast. Although it might be
possible to combine this with git-annex branch merge to avoid some
redundant work.

Benchmarking, a git-annex branch that had 100000 files changed
took less than 1.88 seconds to run through this.
2024-08-17 13:35:00 -04:00
Spencer
40b49e2ddd Added a comment: Remote Helper? 2024-08-17 05:33:01 +00:00
matrss
bcf876e3a0 2024-08-16 15:52:32 +00:00
matrss
f057010086 Added a comment 2024-08-16 15:45:45 +00:00
Joey Hess
61d95627f3
fix Annex.repoSize sharing between threads 2024-08-16 10:56:51 -04:00
Joey Hess
e361b9ea3c
todo 2024-08-15 16:15:48 -04:00
Joey Hess
63ccf6ffa7
todo 2024-08-15 13:50:50 -04:00
Joey Hess
4a0c7e2b2c
update 2024-08-15 13:41:47 -04:00
Joey Hess
a2da9c526b
RepoSize concurrency fix
When loading the journalled repo sizes, make sure that the current
process is prevented from making changes to the journal in another
thread.
2024-08-15 13:37:41 -04:00
Joey Hess
06064f897c
update Annex.reposizes when changing location logs
The live update is only needed when Annex.reposizes has already been
populated.
2024-08-15 13:27:14 -04:00
Joey Hess
c376b1bd7e
show message when doing possibly expensive from scratch reposize calculation 2024-08-15 12:42:36 -04:00
Joey Hess
c200523bac
implement getRepoSizes
At this point the RepoSize database is getting populated, and it
all seems to be working correctly. Incremental updates still need to be
done to make it performant.
2024-08-15 12:31:56 -04:00
Joey Hess
eac4e9391b
finalize RepoSize database
Including locking on creation, handling of permissions errors, and
setting repo sizes.

I'm confident that locking is not needed while using this database.
Since writes happen in a single transaction. When there are two writers
that are recording sizes based on different git-annex branch commits,
one will overwrite what the other one recorded. Which is fine, it's only
necessary that the database stays consistent with the content of a
git-annex branch commit.
2024-08-15 12:29:34 -04:00
Atemu
e8997d8899 Added a comment 2024-08-15 15:40:20 +00:00
Joey Hess
3e6eb2a58d
implement journalledRepoSizes
Plan is to run this when populating Annex.reposizes on demand.
So Annex.reposizes will be up-to-date with the journal, including
crucially journal entries for private repositories. But also
anything that has been written to the journal by another process,
especially if the process was ran with annex.alwayscommit=false.

From there, Annex.reposizes can be kept up to date with changes made
by the running process.
2024-08-14 13:53:24 -04:00
pedro-lopes-de-azevedo
c75ecc5350 Added a comment: parameter --from not accepted 2024-08-14 14:27:54 +00:00
bvaa
11eb2ae6ec Added a comment 2024-08-14 07:18:26 +00:00
Joey Hess
90a79a6c1e
plan 2024-08-13 15:13:30 -04:00
Joey Hess
a979d8da41
update 2024-08-13 14:14:47 -04:00
Joey Hess
10d8b3cc63
fixed --rebalance stability on drop
Was checking the wrong uuid, oops
2024-08-13 13:32:11 -04:00
Joey Hess
745bc5c547
take maxsize into account for balanced preferred content
This is very innefficient, it will need to be optimised not to
calculate the sizes of repos every time.

Also, fixed a bug in balancedPicker that caused it to pick a too high
index when some repos were excluded due to being full.
2024-08-13 11:00:20 -04:00
Spencer
05a62e4e5f Added a comment: Workaround: --force-small 2024-08-13 07:05:57 +00:00
Spencer
3d252da06c Added a comment: Exact Moment Things Go Wrong 2024-08-13 06:22:11 +00:00
Spencer
ab5f920d77 .md linting 2024-08-13 04:46:53 +00:00
Spencer
8a91a8c208 2024-08-13 04:46:10 +00:00
Spencer
c4296fbd45 Added a comment: Still a Problem (on Mac?) 2024-08-13 04:21:33 +00:00
ewen
491cf67ce2 Added a comment: Most servers upgraded to TLS v1.2 EMS / TLS v1.3 2024-08-13 00:01:05 +00:00
Joey Hess
b201792391
update 2024-08-12 18:57:03 -04:00
Joey Hess
1e799e7842
update 2024-08-12 11:56:52 -04:00
Joey Hess
71043fe9f7
update 2024-08-12 10:01:48 -04:00
Joey Hess
bcd2b9a5c4
idea 2024-08-12 09:43:14 -04:00
Joey Hess
1265d7e5df
implement maxsize log and command
* maxsize: New command to tell git-annex how large the expected maximum
  size of a repository is.
* vicfg: Include maxsize configuration.
2024-08-11 15:41:26 -04:00
Joey Hess
3019b21c40
more formal documentation of balancing 2024-08-11 13:29:06 -04:00
Joey Hess
bd5affa362
use hmac in balanced preferred content
This deals with the possible security problem that someone could make an
unusually low UUID and generate keys that are all constructed to hash to
a number that, mod the number of repositories in the group, == 0.
So balanced preferred content would always put those keys in the
repository with the low UUID as long as the group contains the
number of repositories that the attacker anticipated.
Presumably the attacker than holds the data for ransom? Dunno.

Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs
combined together as the "secret", and the key as the "message". Now any
change in the set of UUIDs in a group will invalidate the attacker's
constructed keys from hashing to anything in particular.

Given that there are plenty of other things someone can do if they can
write to the repository -- including modifying preferred content so only
their repository wants files, and numcopies so other repositories drom
them -- this seems like safeguard enough.

Note that, in balancedPicker, combineduuids is memoized.
2024-08-10 16:32:54 -04:00
Joey Hess
bde58e6c71
todo 2024-08-09 16:57:10 -04:00
Joey Hess
412f6057e4
todo 2024-08-09 16:47:28 -04:00
xentac
fb186ab0a8 Added a comment 2024-08-09 19:31:12 +00:00
xentac
55a5cb7904 2024-08-09 19:22:19 +00:00
Joey Hess
f1cb5cb908
wrote git-annex maxsize man page 2024-08-09 14:57:11 -04:00
Joey Hess
5a6afff3d6
left off number option 2024-08-09 14:22:05 -04:00
Joey Hess
3ce2e95a5f
balanced preferred content and --rebalance
This all works fine. But it doesn't check repository sizes yet, and
without repository size checking, once a repository gets full, there
will be no other repository that will want its files.

Use of sha2 seems unncessary, probably alder2 or md5 or crc would have
been enough. Possibly just summing up the bytes of the key mod the number
of repositories would have sufficed. But sha2 is there, and probably
hardware accellerated. I doubt very much there is any security benefit
to using it though. If someone wants to construct a key that will be
balanced onto a given repository, sha2 is certianly not going to stop
them.
2024-08-09 14:16:09 -04:00
Joey Hess
152c87140b
update 2024-08-08 16:06:02 -04:00
Joey Hess
0959bfe5d3
update for exporttree=yes 2024-08-08 15:51:36 -04:00
Joey Hess
727b6a0b6d
update 2024-08-08 15:34:36 -04:00
Joey Hess
2616056cde
Merge branch 'exportreeplus' 2024-08-08 15:31:57 -04:00
Joey Hess
3b758aaad6
add news item for git-annex 10.20240808 2024-08-08 15:27:11 -04:00
Joey Hess
3ea835c7e8
proxied exporttree=yes versionedexport=yes remotes are not untrusted
This removes versionedExport, which was only used by the S3 special
remote. Instead, versionedexport=yes is a common way for remotes to
indicate that they are versioned.
2024-08-08 15:24:19 -04:00
Joey Hess
5c36177e58
proxied exporttree=yes remotes are untrustworthy
This is not perfect because it does not handle versioned special
remotes, which should not be untrustworthy, but now are when proxied.

The implementation turned out to be easy, because the exporttree field
is a default field, so is available in RemoteConfig even for git
remotes.
2024-08-08 14:43:53 -04:00
Joey Hess
b23c7f769e
update 2024-08-08 14:25:18 -04:00
Joey Hess
9663888c77
update 2024-08-08 14:05:05 -04:00
Joey Hess
a2eb3b450a
post-receive: use the exporttree=yes remote as a source
This handles cases where a single key is used by multiple files in the
exported tree. When using `git-annex push`, the key's content gets
stored in the annexobjects location, and then when the branch is pushed,
it gets renamed from the annexobjects location to the first exported
file. For subsequent exported files, a copy of the content needs to be
made. This causes it to download the key from the remote in order to
upload another copy to it.

This is not needed when using `git push` followed by `git-annex copy --to`
the proxied remote, because the received key is stored at all export
locations then.

Also, fixed handling of the synced branch push, it was exporting master
when synced/master was pushed.

Note that currently, the first push to the remote does not see that it
is able to get a key from it in order to upload it back. It displays
"(not available)". The second push is able to. Since git-annex push
pushes first the synced branch and then the branch, this does end up
with a full export being made, but it is not quite right.
2024-08-08 13:49:53 -04:00
Joey Hess
7294d23d78
export: Added --from option
This is similar to git-annex copy --from --to, in that it downloads a
local copy, locks it for removal, uploads it, and drops it. Removal of
the temporary local copy is done without verifying numcopies for the
same reason as that command.

I do wonder, looking at this, if there's a race where the local copy
gets used as a copy to allow some other drop in the narrow window after
it is downloaded and before it gets locked for removal. That would need
some other repository to have an out of date location log that says the
repository contains a copy of the key, in order for it to try to use it
as a copy. If there is such a race, git-annex copy/move would also be
vulnerable to it. It would be better to lock it for removal before
starting to download it! That is possible in v10 repositories, which do
use a separate content lock file.

Note that, when the exported tree contains several files that use the
same key, it will be downloaded repeatedly, once per time needed to
upload it. It would be possible to avoid that extra work, but it would
complicate this since the local copy would need to be preserved, locked
for removal, until the end. Also, that would mean that interrupting the
export would leave possibly a lot of temporarily downloaded keys in the
local repository, while currently it can only leave one.
2024-08-08 12:08:55 -04:00
Joey Hess
01edd186e9
update proxied exporttree=yes remote on receive of sync branch
Since git-annex sync sends the sync branch first, and only displays the
output of the push to the sync branch, this makes git-annex
post-retrieve's output when updating the exported tree be visible when
syncing.

This also makes syncing with a non-bare repository still update the
exported tree, even when the checked out branch is not able to be
updated. The sync branch gets sent regardless.
2024-08-07 13:11:06 -04:00
Joey Hess
55adbb6694
avoid trying to export tree to proxied exporttree=yes remotes
This avoids a lot of ugly messages when syncing with such a remote.
The export tree happens on the proxy side.
2024-08-07 13:00:19 -04:00
Joey Hess
6d96734128
updateproxy, updatecluster check annexobjects=yes
updateproxy, updatecluster: Prevent using an exporttree=yes special remote
that does not have annexobjects=yes, since it will not work.
2024-08-07 12:27:24 -04:00
Joey Hess
8864a9e353
update 2024-08-07 11:49:53 -04:00
Joey Hess
1e0f13ad7f
comment 2024-08-07 11:39:29 -04:00
Joey Hess
b8f8c38e88
Merge branch 'master' into exportreeplus 2024-08-07 11:28:21 -04:00
Joey Hess
509b23fa00
catch ClientError from withClientM
When getting from a P2P HTTP remote, prompt for credentials when required,
instead of failing.

This feels like it might be a bug in servant-client. withClientM's type
suggests it would not throw a ClientError. But it does in this case.
2024-08-07 11:24:34 -04:00
Joey Hess
43e1f590c9
comment 2024-08-07 10:47:47 -04:00
Joey Hess
1038567881
proxy stores received keys to known export locations
This handles the workflow where the branch is first pushed to the proxy,
and then files in the exported tree are later are copied to the proxied remote.

Turns out that the way the export log is structured, nothing needs
to be done to finalize the export once the last key is sent to it. Which
is great because that would have been a lot of complication. On
receiving the push, Command.Export runs and calls recordExportBeginning,
does as much as it can to update the export with the files currently
on it, and then calls recordExportUnderway. At that point, the
export.log records the export as "complete", but it's not really. And
that's fine. The same happens when using `git-annex export` when some
files are not available to send. Other repositories that have
access to the special remote can already retrieve files from it. As
the missing files get copied to the exported remote, all that needs
to be done is record each in the export db.

At this point, proxying to exporttree=yes annexobjects=yes special remotes
is fully working. Except for in the case where multiple files in the
tree use the same key, and the files are sent to the proxied remote
before pushing the tree.

It seems that even special remotes without annexobjects=yes will work if
used with the workflow where the git-annex branch is pushed before
copying files. But not with the `git-annex push` workflow.
2024-08-07 09:47:34 -04:00
matrss
3ccbcc5662 2024-08-07 12:12:29 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c
d3cc84db3b 2024-08-07 12:05:53 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c
e8f60e7daa 2024-08-07 12:04:42 +00:00
Joey Hess
ba1cb517c0
update 2024-08-06 14:46:56 -04:00
Joey Hess
c53f61e93f
Merge branch 'master' into exportreeplus 2024-08-06 14:46:33 -04:00
Joey Hess
f01d872059
fixed 2024-08-06 14:42:46 -04:00
Joey Hess
3289b1ad02
proxying to exporttree=yes annexobjects=yes basically working
It works when using git-annex sync/push/assist, or when manually sending
all content to the proxied remote before pushing to the proxy remote.
But when the push comes before the content is sent, sending content does
not update the exported tree.
2024-08-06 14:21:23 -04:00
Joey Hess
be5c86c248
refine 2024-08-06 12:15:18 -04:00
Joey Hess
4750ffbd3b
finalized design for proxying to exporttree=yes annexobjects=yes special remotes 2024-08-06 11:45:45 -04:00
Joey Hess
84d27cf34f
update 2024-08-06 11:13:51 -04:00
matrss
6d1592f857 2024-08-06 12:44:18 +00:00
Spencer
66ff2bc833 Added a comment: D: Correct 2024-08-05 22:17:55 +00:00
Joey Hess
a535eaa176
rename from annexobjects location on export
(When possible, of course it may not be there, or it may get renamed from
there for another exported file first. Or the remote may not support
renames.)

This will avoids redundant uploads.

An example case where this is important: Proxying to a exporttree remote,
a file is uploaded to it but is not yet in an exported tree. When the
exported tree is pushed, the remote needs to be updated by exporting to
it. In this case, the proxy doesn't have a copy of the file, so it would
need to download it from annexobjects before uploading it to the final
location. With this optimisation, it can just rename it.

However: If a key is used twice in an exported tree, it seems a proxy
will need to download and reupload anyway. Unless a copy operation is
added to exporttree remotes..
2024-08-04 12:19:10 -04:00
Joey Hess
a3d96474f2
rename to annexobjects location on unexport
This avoids needing to re-upload the file again to get it to the
annexobjects location, which git-annex sync was doing when it was
preferred content.

If the file is not preferred content, sync will drop it from the
annexobjects location.

If the file has been deleted from the tree, it will remain in the
annexobjects location until an unused/dropunused pass is done.
2024-08-04 11:58:07 -04:00
Joey Hess
6b63449133
update
Decided not to use the annexobjects location for exportTempName.

There doesn't seem to be any actual benefit to doing that, because an
export that renames to exportTempName always renames it back from that
to another location.

Also the annexobjects directory won't actually help with the paired
rename issue.
2024-08-04 11:34:00 -04:00
Joey Hess
ee076b68f5
strong verification on retrieval from annexobjects location
The file in the annexobjects location may have been renamed from a
previously exported file that got deleted in a subsequent export.
Or it may be renamed to annexobjects temporarily before being renamed to
another name (to handle eg pairwise renames).

But, an exported file is not guaranteed to contain the content of the
key that the local repository last exported there. Another tree could
have been exported from elsewhere in the meantime.

So, files in annexobjects do not necessarily have the content of their
key. And so have to be strongly verified when retrieving. The same as
is done when retrieving exported files.
2024-08-04 11:24:21 -04:00
Joey Hess
fe01a1e7e1
design work on annexobjects remotes 2024-08-03 19:51:03 -04:00
Joey Hess
a4a06404d4
sync --content with annexobjects=true exporttree remotes 2024-08-03 11:39:23 -04:00
Joey Hess
9497bf7fdb
update 2024-08-02 18:50:57 -04:00
Joey Hess
9da2860812
Merge branch 'master' into exportreeplus 2024-08-02 18:45:44 -04:00
Joey Hess
c4352adf6a
in unexport, check for annexobjects presence before updating location log
The key may still be in the annexobjects location.
2024-08-02 18:43:10 -04:00
Joey Hess
34c10d082d
status 2024-08-02 14:15:05 -04:00
Joey Hess
83fa76733f
status 2024-08-02 14:10:34 -04:00
Joey Hess
28b29f63dc
initial support for annexobjects=yes
Works but some commands may need changes to support special remotes
configured this way.
2024-08-02 14:07:45 -04:00
Spencer
cb192eafed removed 2024-08-02 04:37:11 +00:00
Spencer
bec9df965a Added a comment: Necro 2024-08-02 04:10:32 +00:00
Spencer
38de446248 Added a comment: Necro 2024-08-02 04:10:14 +00:00
dmcardle
94e34ca139 2024-08-01 14:25:18 +00:00
dmcardle
b6810e6fee Added a comment 2024-08-01 14:23:41 +00:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001
9e0044e990 2024-08-01 14:19:25 +00:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001
9728762a2c Added a comment 2024-08-01 13:49:41 +00:00
Spencer
dc1f707875 Added a comment: @joey 2024-07-31 20:10:06 +00:00
Joey Hess
3a1f39fbdf
Avoid loading cluster log at startup
This fixes a problem with datalad's test suite, where loading the cluster
log happened to cause the git-annex branch commits to take a different
shape, with an additional commit.

It's also faster though, since many commands don't need the cluster log.

Just fill Annex.clusters with a thunk.

Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project
2024-07-31 15:54:14 -04:00
Joey Hess
ffba57c9fc
cleanup comments on removed news post 2024-07-31 14:05:48 -04:00
Joey Hess
0403603483
add news item for git-annex 10.20240731 2024-07-31 14:05:11 -04:00
Joey Hess
f914ee61e3
analysis 2024-07-31 12:19:12 -04:00
Joey Hess
1e8208457f
pinged 2024-07-31 10:06:43 -04:00
Joey Hess
9e901d326d
comment 2024-07-31 10:04:08 -04:00
Spencer
6af48665e0 [Bug] Trust but Verify: RClone 2024-07-31 00:34:01 +00:00
Joey Hess
d52fd3cf83
update 2024-07-30 12:17:05 -04:00
Joey Hess
1500a9525d
todo 2024-07-30 11:58:44 -04:00
Joey Hess
1632beaf70
fix negative DATA when 1 node of a cluster has a partial transfer 2024-07-30 11:42:17 -04:00
Joey Hess
1560e0eee9
comment 2024-07-30 10:50:13 -04:00
Joey Hess
73703d1bef
close 2024-07-29 15:15:40 -04:00
Joey Hess
fcc052bed8
When proxying an upload to a special remote, verify the hash.
While usually uploading to a special remote does not verify the content,
the content in a repository is assumed to be valid, and there is no trust
boundary. But with a proxied special remote, there may be users who are
allowed to store objects, but are not really trusted.

Another way to look at this is it's the equivilant of git-annex-shell
checking the hash of received data, which it does (see StoreContent
implementation).
2024-07-29 13:40:51 -04:00
Joey Hess
380af6ac5f
update github badges
Seems the urls changed and the old ones will be falsely green forever.

Found new ones in readme at https://github.com/datalad/git-annex
2024-07-29 13:00:00 -04:00
Joey Hess
b4eb6e3ced
comment 2024-07-29 11:59:33 -04:00
Joey Hess
321e2adf66
don't think I ever implementned the 422 idea, it will 404 2024-07-29 11:49:40 -04:00
Joey Hess
d3f584fcdb
wording 2024-07-29 11:44:44 -04:00
Joey Hess
5f5c29fbe7
link 2024-07-29 11:43:30 -04:00
Joey Hess
f3b207a4b9
wording 2024-07-29 11:37:13 -04:00
Joey Hess
6068379e80
typo 2024-07-29 11:34:46 -04:00
Joey Hess
db66612b8f
Merge branch 'httpproto' 2024-07-29 11:33:39 -04:00
Joey Hess
74f81ebd04
Merge remote-tracking branch 'origin/httpproto' 2024-07-29 11:25:27 -04:00
Joey Hess
6f20085a60
update 2024-07-29 11:25:07 -04:00
Joey Hess
60b1c53df5
preparing to merge 2024-07-29 11:22:27 -04:00
Joey Hess
4f3ae96666
cleanly close proxy connection on interrupted PUT
An interrupted PUT to cluster that has a node that is a special remote
over http left open the connection to the cluster, so the next request
opens another one. So did an interrupted PUT directly to the proxied
special remote over http.

proxySpecialRemote was stuck waiting for all the DATA. Its connection
remained open so it kept waiting.

In servePut, checktooshort handles closing the P2P connection
when too short a data is received from PUT. But, checktooshort was only
called after the protoaction, which is what runs the proxy, which is
what was getting stuck. Modified it to run as a background thread,
which waits for the tooshortv to be written to, which gather always does
once it gets to the end of the data received from the http client.

That makes proxyConnection's releaseconn run once all data is received
from the http client. Made it close the connection handles before
waiting on the asyncworker thread. This lets proxySpecialRemote finish
processing any data from the handle, and then it will give up,
more or less cleanly, if it didn't receive enough data.

I say "more or less cleanly" because with both sides of the P2P
connection taken down, some protocol unhappyness results. Which can lead
to some ugly debug messages. But also can cause the asyncworker thread
to throw an exception. So made withP2PConnections not crash when it
receives an exception from releaseconn.

This did have a small change to the behavior of an interrupted PUT when
proxying to a regular remote. proxyConnection has a protoerrorhandler
that closes the proxy connection on a protocol error. But the proxy
connection is also closed by checktooshort when it closes the P2P
connection. Closing the same proxy connection twice is not a problem,
it just results in duplicated debug messages about it.
2024-07-29 10:37:19 -04:00
Joey Hess
c8e7231f48
add debugging of opening and closing connections to proxies 2024-07-29 09:52:26 -04:00
Joey Hess
7ac8d36f38
idea 2024-07-29 09:11:27 -04:00
stv0g
6352cebb92 Added a comment: importtree=yes Support 2024-07-29 06:50:01 +00:00
Joey Hess
cd89f91aa5
remove uuid from annex+http urls
Not needed it turns out.
2024-07-28 20:29:42 -04:00
Joey Hess
bc9cc79e85
set remote's annexUrl automatically
When the remote repository's git config file
has annex.url set to an annex+http url.
2024-07-28 20:13:41 -04:00
Joey Hess
c87cfe1e00
todo 2024-07-28 17:29:32 -04:00
Joey Hess
ccbdaf0448
documentation for p2phttp 2024-07-28 17:19:27 -04:00
Joey Hess
dfe65b92c8
avoid repeatedly parsing the proxy log 2024-07-28 16:04:20 -04:00
Joey Hess
2fdec6b4e1
update 2024-07-28 15:55:24 -04:00
Joey Hess
ddabc138ec
todo 2024-07-28 15:41:31 -04:00
Joey Hess
cdc4bd7443
fix hang in PUT of large file to a special remote node of a cluster over http 2024-07-28 15:34:59 -04:00
Joey Hess
66679c9bb4
remove temp file after upload to special remote 2024-07-28 14:36:45 -04:00
Joey Hess
9461793ffc
Merge remote-tracking branch 'origin/master' into httpproto 2024-07-28 14:24:15 -04:00
Joey Hess
ccd102cd19
update 2024-07-28 14:22:44 -04:00
Joey Hess
5e205f215d
clean shut down of cluster connection when PUT is interrupted
An interrupted `git-annex copy --to` a cluster via the http server,
when repeated, failed. The http server output "transfer already in
progress, or unable to take transfer lock". Apparently a second
connection was opened to the cluster, because the first connection
never got shut down.

Turned out the problem was that when proxying to a cluster, it would read a
short ByteString from the client, and send that to the nodes. But that left the
nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE
message from the nodes. So it didn't return, and so the cluster connection
stayed open.
2024-07-28 14:20:11 -04:00
Joey Hess
bdde6d829c
fix http proxying for a local git remote with a relative path
git-annex-shell expects an absolute path
2024-07-28 13:35:51 -04:00
Joey Hess
41667ad36b
found some bugs with clusters 2024-07-28 13:00:05 -04:00
Joey Hess
770aac97a7
share single BranchState amoung all threads
This fixes a problem when git-annex testremote is run against a cluster
accessed via the http server. Annex.Cluster uses the location log
to find nodes that contain a key when checking if the key is present or getting
it. Just after a key was stored to a cluster node, reading the location log
was not getting the UUID of that node.

Apparently the Annex action that wrote to the location log, and the one
that read from it were run with two different Annex states. The http server
does use several different Annex threads.

BranchState was part of the AnnexState, and so two threads could have
different BranchStates.

Moved BranchState to the AnnexRead, so all threads will see the common state.

This might possibly impact performance. If one thread is writing changes to the
branch, and another thread is reading from the branch, the writing thread will
now invalidate the BranchState's cache, which will cause the reading thread to
need to do extra work. But correctness is surely more important. If did is
found to have impacted performance, it could probably be dealt with by doing
smarter BranchState cache invalidation.

Another way this might impact performance is that the BranchState has a small
cache. If several threads were reading from the branch and relying on the value
they just read still being in the case, now a cache miss will be more likely.
Increasing the BranchState cache to the number of jobs might be a good
idea to amelorate that. But the cache is currently an innefficient list,
so making it large would need changes to the data types.

(Commit 4304f1b6ae dealt with a follow-on
effect of the bug fixed here.)
2024-07-28 12:30:27 -04:00
Joey Hess
fbbedae497
add --clusterjobs option and default to 1
The default of 1 is not ideal at all, but it avoids an accidental M*N
causing so much concurrency it becomes unusable.
2024-07-28 10:36:22 -04:00
Joey Hess
1259ad89b6
cluster support in http API server
Wired it up and it seems to basically work, although the test suite is
not fully passing.

Note that --jobs currently gets multiplied by the number of nodes in the
cluster, which is probably not good.
2024-07-28 10:17:29 -04:00
Joey Hess
0cdd418407
tested shutdown of connection to http proxied special remote
I had worried it might not work properly, but it does, the endv works.
2024-07-28 09:17:47 -04:00
Joey Hess
ef8f24f28c
fix PUT to http proxied special remote
It was hanging because it never sent FAILURE in the INVALID case.
And putoffset always triggers the INVALID case.
2024-07-28 09:14:42 -04:00
Joey Hess
0ea645944e
thoughts on exporttree 2024-07-27 19:59:54 -04:00
Joey Hess
1c0448e33c
update 2024-07-26 20:44:01 -04:00
Joey Hess
0fb86d2916
UNLOCKCONTENT is not a top-level request
proxyRequest was treating UNLOCKCONTENT as a separate request.
That made it possible for there to be two different connections to the
proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT
to the other one. A protocol error.

git-annex testremote now passes against a http proxied remote.
2024-07-26 20:39:06 -04:00
Joey Hess
a3dab58be2
fix hang at end of PUT to proxied p2p http remote
sendExactly will now be sure to evaluate the whole lazy ByteString.

In this case, the lazy ByteString was exactly the right lenth.
But, it seems that L.take caused it to not actually be fully evaluated.

In servePut, this manifested as gather never being fully evaluated,
which caused the hang.

Very, very subtle, and horrible bug. Clearly the use of lazy ByteString
(or really just laziness) is at fault, and it would be very worth moving
to conduit or whatever to avoid this.
2024-07-26 19:50:15 -04:00
Joey Hess
b431201e1f
update 2024-07-26 17:15:09 -04:00
Joey Hess
d1faa13d6a
implement proxy connection pool
removeOldestProxyConnectionPool will be innefficient the larger the pool
is. A better data structure could be more efficient. Eg, make each value
in the pool include the timestamp of its oldest element, then the oldest
value can be found and modified, rather than rebuilding the whole Map.

But, for pools of a few hundred items, this should be fine. It's O(n*n log n)
or so.

Also, when more than 1 connection with the same pool key exists,
it's efficient even for larger pools, since removeOldestProxyConnectionPool
is not needed.

The default of 1 idle connection could perhaps be larger.. like the
number of jobs? Otoh, it seems good to ramp up and down the number of
connections, which does happen. With 1, there is at most one stale
connection, which might cause a request to fail.
2024-07-26 17:03:31 -04:00
yarikoptic
de90a2c5de initial report on keeping association with the remote 2024-07-26 20:01:23 +00:00
Joey Hess
ad025b8e5e
clean up protocol version for proxying
The proxy always checks the protocol version of a remote before talking
to it in a version-specific way, so the protocol version in the ProxyParams
is the client's protocol version. The remote will always be at the same or
an older protocol version than the client.

Note that in relayDATAFinish, when the client is at protocol version 0,
the remote must thus be as well, and that's why its version is not
checked in the case for that.

With that clarified, it's evident that, in P2P.Http.State, there's no
need to look at the proxied remote's protocol version at all.
2024-07-26 13:49:05 -04:00
Joey Hess
576ec6ed71
fix hang in GET from http p2p proxy
serverP2PConnection = proxyfromclientconn causes serveGet to
signalFullyConsumedByteString to it, which is what it's waiting for
2024-07-26 12:51:00 -04:00
Joey Hess
f052091558
update 2024-07-26 11:01:45 -04:00
Joey Hess
cc1da2d516
http p2p proxy is now largely working 2024-07-26 10:44:10 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
17f7912dd5 Added a comment 2024-07-26 11:42:49 +00:00
Joey Hess
96ad0ccc5b
wip 2024-07-25 15:39:57 -04:00
Joey Hess
b13c2407af
p2phttp drop supports checking proof timestamps
At this point the p2phttp implementation is fully complete!
2024-07-25 10:11:09 -04:00
Joey Hess
6a3f755bfa
add common parameters to generic get API
Honestly this was just done to make the documentation correct. There's
no point in using these parameters. And they're optional.
2024-07-24 20:55:58 -04:00
Joey Hess
f5624a69e3
expire lock after 10 minutes initially
Once keeplocked is called, the lock will expire at the end
of that call. But if keeplocked never gets called, this avoids
the lock persisting forever.
2024-07-24 14:25:40 -04:00
Joey Hess
97836aafba
Remote.Git lockContent works with annex+http urls 2024-07-24 13:42:57 -04:00
Joey Hess
9fa9678585
Remote.Git removeKey works with annex+http urls
Does not yet handle drop proof lock timestamp checking.
2024-07-24 12:33:26 -04:00
Joey Hess
fd3bdb2300
update 2024-07-24 12:19:53 -04:00
Joey Hess
0d81d1ee2f
update 2024-07-24 12:18:51 -04:00
Joey Hess
cfdb80cd05
progress meter for p2phttp storeKey 2024-07-24 12:14:56 -04:00
Joey Hess
0280e2dd5e
update 2024-07-24 11:13:37 -04:00
Joey Hess
10f2c23fd7
fix slowloris timeout in hashing resume of download of large file
Hash the data that is already present in the file before connecting to
the http server.
2024-07-24 11:03:59 -04:00
Joey Hess
7bd616e169
Remote.Git retrieveKeyFile works with annex+http urls
This includes a bugfix to serveGet, it hung at the end.
2024-07-24 10:28:44 -04:00
Joey Hess
b4d749cc91
Merge branch 'master' into httpproto 2024-07-23 21:17:06 -04:00
Joey Hess
f7404a64c0
Propagate --force to git-annex transferrer
And other child processes.
2024-07-23 21:16:56 -04:00
Joey Hess
7d4045277a
bug 2024-07-23 21:02:31 -04:00
Joey Hess
48657405c6
cache credentials for p2phttp in memory 2024-07-23 18:45:02 -04:00
Joey Hess
b89c784a9b
use git credential when p2phttp needs auth 2024-07-23 18:11:15 -04:00
Joey Hess
73ffb58456
p2phttp support https 2024-07-23 15:37:36 -04:00
Joey Hess
b7149e897b
add --bind option and listen to both ipv4 and ipv6 by default 2024-07-23 15:19:56 -04:00
Joey Hess
b7454f1eeb
protocol version fallback on 404
and prettified errors
2024-07-23 14:58:49 -04:00
Joey Hess
2aa9154b1f
require a valid uuid at the end of an annex+http url 2024-07-23 12:30:27 -04:00
Joey Hess
75b1d50b99
add remoteAnnexP2PHttpUrl to RemoveGitConfig
This is always parsed, when building without servant, a Baseurl is not
generated, and users of it will need to fail.
2024-07-23 09:57:01 -04:00
Joey Hess
a6a03ca586
annex+http urls 2024-07-23 08:42:33 -04:00
Joey Hess
758cff0fde
update 2024-07-22 20:59:45 -04:00
Joey Hess
06de2ad972
change default port to 9417
Port 80 would need root, not a good idea, so pick something that might
work by default.

9418 is git protocol's port. 9419 is used by something, but nothing
known uses 9417, so it's as good a default as any.
2024-07-22 20:52:17 -04:00
Joey Hess
9984252ab5
P2P protocol is finalized 2024-07-22 19:50:08 -04:00
Joey Hess
e979e85bff
make serveKeepLocked check auth just to be safe 2024-07-22 19:15:52 -04:00
Joey Hess
f5dd7a8bc0
implemented serveLockContent (untested) 2024-07-22 17:38:42 -04:00
Joey Hess
b697c6b9da
fix TMVar left full crash affecting servePutOffset
Problem is that whatever is reading from the TMVar may not have read
from it yet before the client writes the next thing to it.
2024-07-22 15:48:46 -04:00
Joey Hess
3069e28dd8
implemented servePutOffset and clientPutOffset
But, it's buggy: the server hangs without processing the VALIDITY,
and I can't seem to work out why. As far as I can see, storefile
is getting as far as running the validitycheck, which is supposed to
read that, but never does.

This is especially strange because what seems like the same protocol
doesn't hang when servePut runs it. This made me think that it needed
to use inAnnexWorker to be more like servePut, but that didn't help.

Another small problem with this is that it does create an empty
.git/annex/tmp/ file for the key. Since this will usually be used in
combination with servePut, that doesn't seem worth worrying about much.
2024-07-22 15:04:10 -04:00
Joey Hess
b240a11b79
clientPut seeking to offset 2024-07-22 12:50:21 -04:00
Joey Hess
a01426b713
avoid padding in servePut
This means that when the client sends a truncated data to indicate
invalidity, DATA is not passed the full expected data. That leaves the
P2P connection in a state where it cannot be reused. While so far, they
are not reused, they will be later when proxies are supported. So, have
to close the P2P connection in this situation.
2024-07-22 12:30:30 -04:00
Joey Hess
efa0efdc44
avoid padding in clientPut
Instead truncate when necessary to indicate invalid content was sent.
Very similar to how serveGet handles it.
2024-07-22 11:47:24 -04:00
Joey Hess
72d0769ca5
avoid padding content in serveGet
Always truncate instead. The padding risked something not noticing the
content was bad and getting a file that was corrupted in a novel way
with the padding "X" at the end. A truncated file is better.
2024-07-22 11:19:52 -04:00
Joey Hess
4826a3745d
servePut and clientPut implementation
Made the data-length header required even for v0. This simplifies the
implementation, and doesn't preclude extra verification being done for
v0.

The connectionWaitVar is an ugly hack. In servePut, nothing waits
on the waitvar, and I could not find a good way to make anything wait on
it.
2024-07-22 10:27:44 -04:00
adehnert
8eadd02b52 Added a comment: git-annex for managing music 2024-07-21 19:08:45 +00:00
adehnert
12bc3ca2a7 2024-07-21 18:17:25 +00:00
adehnert
2b96f62ada 2024-07-21 18:17:11 +00:00
adehnert
264366f45d 2024-07-21 18:15:11 +00:00
adehnert
024b331a4b 2024-07-21 18:14:28 +00:00
adehnert
40c930a381 2024-07-21 18:14:03 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
1ee30b29ee Added a comment 2024-07-21 12:38:12 +00:00
adehnert
b143cb686d Added a comment: git annex sync --ff-only 2024-07-21 01:04:44 +00:00
nobodyinperson
b920655acd Added a comment: Also Serveo.net 2024-07-19 15:21:19 +00:00
kdm9
8a7fc275cb Added a comment 2024-07-19 13:11:05 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
2878343354 2024-07-19 12:12:56 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
a2ab2f70ea Added a comment 2024-07-19 08:26:31 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
b1db5115e0 Added a comment 2024-07-17 14:07:32 +00:00
yarikoptic
ba4d545776 reporting FTBFS on windows 2024-07-16 15:58:50 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
1287e4590f 2024-07-16 15:42:54 +00:00
mih
5bc00a55dd 2024-07-16 15:02:46 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
3590a17f9e Added a comment 2024-07-16 09:21:54 +00:00
nobodyinperson
a79176341d Added a comment 2024-07-15 18:32:36 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
cce86415b1 Added a comment 2024-07-15 15:18:28 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92
8ec34ea2a1 2024-07-15 14:57:26 +00:00
xentac
8102bf5fe2 Added a comment 2024-07-12 23:49:33 +00:00
xentac
df97c82c44 2024-07-12 23:37:36 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db
af4d90eea8 2024-07-12 08:11:56 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db
60eae008d8 2024-07-12 08:11:30 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db
ed11ce6fcb 2024-07-12 08:08:22 +00:00
Joey Hess
eb4fb388bd
only base64 non-utf8 2024-07-11 15:47:16 -04:00
Joey Hess
97a2d0e4fb
use worker pool in withLocalP2PConnections
This allows multiple clients to be handled at the same time.
2024-07-11 14:37:52 -04:00
Joey Hess
68227154fb
switch HTTP P2P protocol to base64url
Base64 can include '/', and with UUIDs and keys both used in routes,
the encoding needs to avoid that. Use base64url everywhere in the HTTP
protocol for consistency.
2024-07-11 12:31:41 -04:00
Joey Hess
14e0f778b7
simplify 2024-07-11 11:50:44 -04:00
Joey Hess
2228d56db3
serveGet invalidation 2024-07-11 11:42:32 -04:00
Joey Hess
a7383b5c59
move serveruuid into routes
In particular the generic get route needs it, so that when a single http
server is serving multiple repositories, it knows what repository to
use.
2024-07-11 11:19:20 -04:00
Joey Hess
3b37b9e53f
fix serveGet hang
This came down to SendBytes waiting on the waitv. Nothing ever filled
it.

Only Annex.Proxy needs the waitv, and it handles filling it. So make it
optional.
2024-07-11 07:46:52 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
a82a573f75 2024-07-11 07:47:27 +00:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3
9ce207532e 2024-07-11 07:23:30 +00:00
Joey Hess
8cb1332407
update 2024-07-10 16:10:08 -04:00
Joey Hess
f9b7ce7224
add Annex worker pool to P2PHttp
This will be needed for get and store, since those need to run Annex
actions.

withLocalP2PConnections will also probably use it.
2024-07-10 12:19:47 -04:00
Joey Hess
7c588a5791
implement remove-before
The reason to use removeBeforeRemoteEndTime is twofold.

First, removeBefore sends two protocol commands. Currently, the HTTP
protocol runner only supports sending a single command per invocation.

Secondly, the http server gets a monotonic timestamp from the client. So
translating back to a POSIXTime would be annoying.

The timestamp flow with a proxy will be:

- client gets timestamp, which gets the monotonic timestamp from the
  proxied remote via the proxy. The timestamp is currently not
  proxied when there is a single proxy.
- client calls remove-before
- http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE
  to the proxied remote.
2024-07-10 10:03:26 -04:00
Joey Hess
48f76cb3e8
implement serveRemove and send WWW-Authenticate header on auth failure 2024-07-10 09:13:01 -04:00
Joey Hess
97d0fc9b65
git-annex p2phttp options 2024-07-10 00:01:55 -04:00
Joey Hess
6a8a4d1775
authentication is implemented
just need to make Command.P2PHttp generate a GetServerMode from options
2024-07-09 20:54:47 -04:00
Joey Hess
08371c3745
started on auth 2024-07-09 17:30:55 -04:00
Joey Hess
b5b3d8cde2
update 2024-07-09 14:30:50 -04:00
Joey Hess
a3dd8b4bcb
capture API version in routes
Needed so the client can send it.
2024-07-09 12:04:29 -04:00
Joey Hess
751b8e0baf
implemented serveCheckPresent
Still need a way to run Proto though
2024-07-09 09:08:42 -04:00
yarikoptic
fade907c6a initial report from boox installation 2024-07-09 02:44:51 +00:00
Joey Hess
3f402a20a8
implement Locker 2024-07-08 21:00:10 -04:00
Joey Hess
b758b01692
add lockids to http p2p protocol 2024-07-08 20:18:55 -04:00
Joey Hess
69c4f07ab0
finish get API 2024-07-08 13:27:50 -04:00
Joey Hess
82d66ede5e
convert lockcontent api to http long polling
Websockets would work, but the problem with using them for this is that
each lockcontent call is a separate websocket connection. And that's an
actual TCP connection. One TCP connection per file dropped would be too
expensive. With http long polling, regular http pipelining can be used,
so it will reuse a TCP connection.

Unfortunately, at least with servant, bi-directional streams with long
polling don't result in true bidirectional full duplex communication.
Servant processes the whole client body stream before generating the server
body stream. I think it's entirely possible to do full bi-directional
communication over http, but it would need changes to servant.

And, there's no way for the client to tell if the server successfully
locked the content, since the server will keep processing the client
stream no matter what.:

So, added a new api endpoint, keeplocked. lockcontent will lock the key
for 10 minutes with retention lock, and then a call to keeplocked will
keep it locked for as long as needed. This does mean that there will
need to be a Map of locks by key, and I will probably want to add
some kind of lock identifier that lockcontent returns.
2024-07-08 12:57:46 -04:00
Joey Hess
838169ee86
status 2024-07-07 16:16:11 -04:00
Joey Hess
1dbb5ec70d
servant API type is complete 2024-07-07 12:59:12 -04:00
Joey Hess
4133063ab1
Merge branch 'master' into httpproto 2024-07-07 12:08:24 -04:00
Joey Hess
86ce3bf1e4
started servant implementation of HTTP P2P protocol 2024-07-07 12:08:10 -04:00
Joey Hess
9595f77584
Merge branch 'master' of ssh://git-annex.branchable.com 2024-07-05 15:37:43 -04:00
Joey Hess
40306d3fcf
finalizing HTTP P2p protocol some more
Added v2-v0 endpoints. These are tedious, but will be needed in order to
use the HTTP protocol to proxy to repositories with older git-annex,
where git-annex-shell will be speaking an older version of the protocol.

Changed GET to use 422 when the content is not present. 404 is needed to
detect when a protocol version is not supported.
2024-07-05 15:34:58 -04:00
Joey Hess
2fb3ef4d41
finalizing HTTP P2P protocol
Managed to avoid netstrings. Actually, using netstrings while streaming
lazy ByteString turns out to be very difficult. So instead, have a
header that specifies the expected amount of data, and then it can just
arrange to send a different amount of data if it needs to indicate
INVALID.

Also improved the interface for GET of a key.
2024-07-05 15:03:51 -04:00
Joey Hess
5e564947d7
use netstrings for framing binary data with json at the end
This will be easy to implement with servant. It's also very efficient,
and fairly future-proof. Eg, could add another frame with other data.

This does make it a bit harder to use this protocol, but netstrings
probably take about 5 minutes to implement? Let's see...

import Text.Read
import Data.List

toNetString :: String -> String
toNetString s = show (length s) ++ ":" ++ s ++ ","

nextNetString :: String -> Maybe (String, String)
nextNetString s = case break (== ':') s of
        ([], _) -> Nothing
        (sn, rest) -> do
                n <- readMaybe sn
                let (v, rest') = splitAt n (drop 1 rest)
                return (v, drop 1 rest')

Ok, well, that took about 10 minutes ;-)
2024-07-05 11:53:03 -04:00
Joey Hess
95ba4d4480
thoughts on CGI, and use json 2024-07-05 10:08:43 -04:00
git-annex@4a0625db6ced1ac00744697d5bac41393bcde646
81c9808cfa Added a comment 2024-07-05 10:22:46 +00:00
Joey Hess
3f9569e27f
update 2024-07-04 15:26:05 -04:00
Joey Hess
2ca51fe947
Merge branch 'master' of ssh://git-annex.branchable.com 2024-07-04 15:18:17 -04:00