git-annex

Author	SHA1	Message	Date
lucas.gautheron@f2b5c93a64b028c1ec8698b9c2412ed51ff22040	925c203c09		2024-09-02 15:08:25 +00:00
Joey Hess	9d29b99ac4	add news item for git-annex 10.20240831	2024-08-31 19:50:36 -04:00
Joey Hess	698d9252a5	mention sizebalanced as well as balanced	2024-08-30 12:06:45 -04:00
Joey Hess	53b7375cc6	update	2024-08-30 11:14:45 -04:00
Joey Hess	54b6151412	document using balanced preferred content in a cluster	2024-08-30 11:08:32 -04:00
Joey Hess	d0938d730b	Merge branch 'master' into balanced	2024-08-30 11:01:39 -04:00
Joey Hess	242c525659	lookupkey: Allow using --ref in a bare repository.	2024-08-30 10:55:48 -04:00
yarikoptic	e2b7895cbc	Added a comment	2024-08-29 18:35:47 +00:00
Joey Hess	f89a1b8216	remove stale live changes from reposize database Reorganized the reposize database directory, and split up a column. checkStaleSizeChanges needs to run before needLiveUpdate, otherwise the process won't be holding a lock on its pid file, and another process could go in and expire the live update it records. It just so happens that they do get called in the correct order, since checking balanced preferred content calls getLiveRepoSizes before needLiveUpdate. The 1 minute delay between checks is arbitrary, but will avoid excess work. The downside of it is that, if a process is dropping a file and gets interrupted, for 1 minute another process can expect a repository will soon be smaller than it is. And so a process might send data to a repository when a file is not really going to be dropped from it. But note that can already happen if a drop takes some time in eg locking and then fails. So it seems possible that live updates should only be allowed to increase, rather than decrease the size of a repository.	2024-08-28 13:57:25 -04:00
Joey Hess	278adbb726	combine 2 queries	2024-08-28 11:00:59 -04:00
Joey Hess	e006acef22	avoid reposize database locking overhead when not needed Only when the preferred content expression being matched uses balanced preferred content is this overhead needed. It might be possible to eliminate the locking entirely. Eg, check the live changes before and after the action and re-run if they are not stable. For now, this is good enough, it avoids existing preferred content getting slow. If balanced preferred content turns out to be too slow to check, that could be tried later.	2024-08-28 10:52:34 -04:00
matrss	833150fd25	Added a comment	2024-08-28 14:11:36 +00:00
mih	16f9042046	Added a comment: Needed to retrieve single file metadata from bare repo	2024-08-28 13:58:30 +00:00
matrss	3f62116d64	Added a comment	2024-08-28 08:47:33 +00:00
Joey Hess	0a119184e6	thoughts	2024-08-27 14:59:13 -04:00
Joey Hess	8555fb88ef	locking in checkLiveUpdate This makes sure that two threads don't check balanced preferred content at the same time, so each thread always sees a consistent picture of what is happening. This does add a fairly expensive file level lock to every check of preferred content, in commands that use prepareLiveUpdate. It would be good to only do that when live updates are actually needed, eg when the preferred content expression uses balanced preferred content.	2024-08-27 13:12:43 -04:00
Joey Hess	4d2f95853d	closing in on finishing live reposizes Fixed successfullyFinishedLiveSizeChange to not update the rolling total when a redundant change is in RecentChanges. Made setRepoSizes clear RecentChanges that are no longer needed. It might be possible to clear those earlier, this is only a convenient point to do it. The reason it's safe to clear RecentChanges here is that, in order for a live update to call successfullyFinishedLiveSizeChange, a change must be made to a location log. If a RecentChange gets cleared, and just after that a new live update is started, making the same change, the location log has already been changed (since the RecentChange exists), and so when the live update succeeds, it won't call successfullyFinishedLiveSizeChange. The reason it doesn't clear RecentChanges when there is a reduntant live update is because I didn't want to think through whether or not all races are avoided in that case. The rolling total in SizeChanges is never cleared. Instead, calcJournalledRepoSizes gets the initial value of it, and then getLiveRepoSizes subtracts that initial value from the current value. Since the rolling total can only be updated by updateRepoSize, which is called with the journal locked, locking the journal in calcJournalledRepoSizes ensures that the database does not change while reading the journal.	2024-08-27 12:54:46 -04:00
Spencer	949be665c0	Added contributions section to track my bugs and inquiries	2024-08-26 20:02:03 +00:00
Joey Hess	21608716bd	started work on getLiveRepoSizes Doesn't quite compile	2024-08-26 14:50:09 -04:00
Joey Hess	db89e39df6	partially fix concurrency issue in updating the rollingtotal It's possible for two processes or threads to both be doing the same operation at the same time. Eg, both dropping the same key. If one finishes and updates the rollingtotal, then the other one needs to be prevented from later updating the rollingtotal as well. And they could finish at the same time, or with some time in between. Addressed this by making updateRepoSize be called with the journal locked, and only once it's been determined that there is an actual location change to record in the log. updateRepoSize waits for the database to be updated. When there is a redundant operation, updateRepoSize won't be called, and the redundant LiveUpdate will be removed from the database on garbage collection. But: There will be a window where the redundant LiveUpdate is still visible in the db, and processes can see it, combine it with the rollingtotal, and arrive at the wrong size. This is a small window, but it still ought to be addressed. Unsure if it would always be safe to remove the redundant LiveUpdate? Consider the case where two drops and a get are all running concurrently somehow, and the order they finish is [drop, get, drop]. The second drop seems redundant to the first, but it would not be safe to remove it. While this seems unlikely, it's hard to rule out that a get and drop at different stages can both be running at the same time.	2024-08-26 09:43:32 -04:00
Joey Hess	03c7f99957	todo	2024-08-25 10:48:42 -04:00
Joey Hess	2b037d36a1	update	2024-08-24 15:06:00 -04:00
Joey Hess	6660984442	update	2024-08-24 13:15:39 -04:00
Joey Hess	d60a33fd13	improve live update starting In an expression like "balanced=foo and exclude=bar", avoid it starting a live update when the overall expression doesn't match.	2024-08-24 13:07:05 -04:00
Joey Hess	16f945459c	todo	2024-08-24 11:58:17 -04:00
Joey Hess	2f20b939b7	LiveUpdate db updates working I've tested the behavior of the thread that waits for the LiveUpdate to be finished, and it does get signaled and exit cleanly when the LiveUpdate is GCed instead. Made finishedLiveUpdate wait for the thread to finish updating the database. There is a case where GC doesn't happen in time and the database is left with a live update recorded in it. This should not be a problem as such stale data can also happen when interrupted and will need to be detected when loading the database. Balanced preferred content expressions now call startLiveUpdate.	2024-08-24 11:49:58 -04:00
Joey Hess	84d1bb746b	LiveUpdate for clusters	2024-08-24 10:20:12 -04:00
Joey Hess	18cd8bf43a	punt on LiveUpdate plumbing through assistant for now	2024-08-24 09:37:24 -04:00
yarikoptic	efdee386c0	initial report on desire to do handle pathspecs	2024-08-24 01:35:31 +00:00
yarikoptic	c3877f648c	initial idea on another ability for get	2024-08-24 01:23:04 +00:00
Joey Hess	c3d40b9ec3	plumb in LiveUpdate (WIP) Each command that first checks preferred content (and/or required content) and then does something that can change the sizes of repositories needs to call prepareLiveUpdate, and plumb it through the preferred content check and the location log update. So far, only Command.Drop is done. Many other commands that don't need to do this have been updated to keep working. There may be some calls to NoLiveUpdate in places where that should be done. All will need to be double checked. Not currently in a compilable state.	2024-08-23 16:35:12 -04:00
Joey Hess	4885073377	add live size changes to RepoSize database Not yet used.	2024-08-23 12:51:00 -04:00
Joey Hess	dad1fb150f	update	2024-08-23 11:45:36 -04:00
Joey Hess	d0ab1550ec	possible design to address reposizes concurrency issues	2024-08-23 11:19:38 -04:00
gauss@055c9051f507c97fa5612f46c74ce636f5ecde10	d71ca87bc9	Added a comment: No root privileges server - annex-shell replaced by git-annex-shell	2024-08-23 01:51:49 +00:00
Joey Hess	8ade3fc5d6	improve docs	2024-08-22 08:09:10 -04:00
Joey Hess	abdd49d8c1	update	2024-08-22 07:53:56 -04:00
Joey Hess	173500872f	update	2024-08-22 07:17:04 -04:00
Joey Hess	70e2fca257	Added the annex.fullybalancedthreshhold git config.	2024-08-22 07:15:55 -04:00
Joey Hess	3fe67744b1	display new empty repos in maxsize table A new repo that has no location log info yet, but has an entry in uuid.log has 0 size, so make RepoSize aware of that. Note that a new repo that does not yet appear in uuid.log will still not be displayed. When a remote is added but not synced with yet, it has no uuid.log entry. If git-annex maxsize is used to configure that remote, it needs to appear in the maxsize table, and the change to Command.MaxSize takes care of that.	2024-08-22 07:03:22 -04:00
Spencer	acaa8e9cd5	Added a comment: Precise Workflow	2024-08-22 00:18:28 +00:00
Joey Hess	76ece2a699	make --rebalance of balanced use fullysizebalanced when useful When the specified number of copies is > 1, and some repositories are too full, it can be better to move content from them to other less full repositories, in order to make space for new content. annex.fullybalancedthreshhold is documented, but not implemented yet This is not tested very well yet, and is known to sometimes take several runs to stabalize.	2024-08-21 17:59:08 -04:00
Joey Hess	9e87061de2	Support "sizebalanced=" and "fullysizebalanced=" too Might want to make --rebalance turn balanced=group:N where N > 1 to fullysizebalanced=group:N. Have not yet determined if that will improve situations enough to be worth the extra work.	2024-08-21 15:01:54 -04:00
Joey Hess	4e1dcc0372	bug	2024-08-21 12:18:31 -04:00
Joey Hess	476d223bce	implement fullbalanced=group:N Rebalancing this when it gets into a suboptimal situation will need further work.	2024-08-20 13:51:02 -04:00
Matthew	4a9e637d36	Added a comment: Help with .nfsXXXX files	2024-08-19 21:20:59 +00:00
matrss	9cfdae4c3b	Added a comment	2024-08-19 10:25:13 +00:00
Joey Hess	68a99a8f48	size based rebalancing design	2024-08-18 16:25:12 -04:00
Joey Hess	99514f9d18	maxsize overview display and --json support	2024-08-18 12:08:13 -04:00
xentac	74b953cded	Added a comment	2024-08-18 03:17:12 +00:00
Joey Hess	f985c58d8e	consistently don't show sizes of empty repositories This used to be the case, and when matching options are used, that code path still omits them, so also omit them in the getRepoSize code path.	2024-08-17 15:09:16 -04:00
Joey Hess	b62b58b50b	git-annex info speed up using getRepoSizes	2024-08-17 14:54:31 -04:00
Joey Hess	d09a005f2b	update RepoSize database from git-annex branch incrementally The use of catObjectStream is optimally fast. Although it might be possible to combine this with git-annex branch merge to avoid some redundant work. Benchmarking, a git-annex branch that had 100000 files changed took less than 1.88 seconds to run through this.	2024-08-17 13:35:00 -04:00
Spencer	40b49e2ddd	Added a comment: Remote Helper?	2024-08-17 05:33:01 +00:00
matrss	bcf876e3a0		2024-08-16 15:52:32 +00:00
matrss	f057010086	Added a comment	2024-08-16 15:45:45 +00:00
Joey Hess	61d95627f3	fix Annex.repoSize sharing between threads	2024-08-16 10:56:51 -04:00
Joey Hess	e361b9ea3c	todo	2024-08-15 16:15:48 -04:00
Joey Hess	63ccf6ffa7	todo	2024-08-15 13:50:50 -04:00
Joey Hess	4a0c7e2b2c	update	2024-08-15 13:41:47 -04:00
Joey Hess	a2da9c526b	RepoSize concurrency fix When loading the journalled repo sizes, make sure that the current process is prevented from making changes to the journal in another thread.	2024-08-15 13:37:41 -04:00
Joey Hess	06064f897c	update Annex.reposizes when changing location logs The live update is only needed when Annex.reposizes has already been populated.	2024-08-15 13:27:14 -04:00
Joey Hess	c376b1bd7e	show message when doing possibly expensive from scratch reposize calculation	2024-08-15 12:42:36 -04:00
Joey Hess	c200523bac	implement getRepoSizes At this point the RepoSize database is getting populated, and it all seems to be working correctly. Incremental updates still need to be done to make it performant.	2024-08-15 12:31:56 -04:00
Joey Hess	eac4e9391b	finalize RepoSize database Including locking on creation, handling of permissions errors, and setting repo sizes. I'm confident that locking is not needed while using this database. Since writes happen in a single transaction. When there are two writers that are recording sizes based on different git-annex branch commits, one will overwrite what the other one recorded. Which is fine, it's only necessary that the database stays consistent with the content of a git-annex branch commit.	2024-08-15 12:29:34 -04:00
Atemu	e8997d8899	Added a comment	2024-08-15 15:40:20 +00:00
Joey Hess	3e6eb2a58d	implement journalledRepoSizes Plan is to run this when populating Annex.reposizes on demand. So Annex.reposizes will be up-to-date with the journal, including crucially journal entries for private repositories. But also anything that has been written to the journal by another process, especially if the process was ran with annex.alwayscommit=false. From there, Annex.reposizes can be kept up to date with changes made by the running process.	2024-08-14 13:53:24 -04:00
pedro-lopes-de-azevedo	c75ecc5350	Added a comment: parameter --from not accepted	2024-08-14 14:27:54 +00:00
bvaa	11eb2ae6ec	Added a comment	2024-08-14 07:18:26 +00:00
Joey Hess	90a79a6c1e	plan	2024-08-13 15:13:30 -04:00
Joey Hess	a979d8da41	update	2024-08-13 14:14:47 -04:00
Joey Hess	10d8b3cc63	fixed --rebalance stability on drop Was checking the wrong uuid, oops	2024-08-13 13:32:11 -04:00
Joey Hess	745bc5c547	take maxsize into account for balanced preferred content This is very innefficient, it will need to be optimised not to calculate the sizes of repos every time. Also, fixed a bug in balancedPicker that caused it to pick a too high index when some repos were excluded due to being full.	2024-08-13 11:00:20 -04:00
Spencer	05a62e4e5f	Added a comment: Workaround: --force-small	2024-08-13 07:05:57 +00:00
Spencer	3d252da06c	Added a comment: Exact Moment Things Go Wrong	2024-08-13 06:22:11 +00:00
Spencer	ab5f920d77	.md linting	2024-08-13 04:46:53 +00:00
Spencer	8a91a8c208		2024-08-13 04:46:10 +00:00
Spencer	c4296fbd45	Added a comment: Still a Problem (on Mac?)	2024-08-13 04:21:33 +00:00
ewen	491cf67ce2	Added a comment: Most servers upgraded to TLS v1.2 EMS / TLS v1.3	2024-08-13 00:01:05 +00:00
Joey Hess	b201792391	update	2024-08-12 18:57:03 -04:00
Joey Hess	1e799e7842	update	2024-08-12 11:56:52 -04:00
Joey Hess	71043fe9f7	update	2024-08-12 10:01:48 -04:00
Joey Hess	bcd2b9a5c4	idea	2024-08-12 09:43:14 -04:00
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	3019b21c40	more formal documentation of balancing	2024-08-11 13:29:06 -04:00
Joey Hess	bd5affa362	use hmac in balanced preferred content This deals with the possible security problem that someone could make an unusually low UUID and generate keys that are all constructed to hash to a number that, mod the number of repositories in the group, == 0. So balanced preferred content would always put those keys in the repository with the low UUID as long as the group contains the number of repositories that the attacker anticipated. Presumably the attacker than holds the data for ransom? Dunno. Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs combined together as the "secret", and the key as the "message". Now any change in the set of UUIDs in a group will invalidate the attacker's constructed keys from hashing to anything in particular. Given that there are plenty of other things someone can do if they can write to the repository -- including modifying preferred content so only their repository wants files, and numcopies so other repositories drom them -- this seems like safeguard enough. Note that, in balancedPicker, combineduuids is memoized.	2024-08-10 16:32:54 -04:00
Joey Hess	bde58e6c71	todo	2024-08-09 16:57:10 -04:00
Joey Hess	412f6057e4	todo	2024-08-09 16:47:28 -04:00
xentac	fb186ab0a8	Added a comment	2024-08-09 19:31:12 +00:00
xentac	55a5cb7904		2024-08-09 19:22:19 +00:00
Joey Hess	f1cb5cb908	wrote git-annex maxsize man page	2024-08-09 14:57:11 -04:00
Joey Hess	5a6afff3d6	left off number option	2024-08-09 14:22:05 -04:00
Joey Hess	3ce2e95a5f	balanced preferred content and --rebalance This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.	2024-08-09 14:16:09 -04:00
Joey Hess	152c87140b	update	2024-08-08 16:06:02 -04:00
Joey Hess	0959bfe5d3	update for exporttree=yes	2024-08-08 15:51:36 -04:00
Joey Hess	727b6a0b6d	update	2024-08-08 15:34:36 -04:00
Joey Hess	2616056cde	Merge branch 'exportreeplus'	2024-08-08 15:31:57 -04:00
Joey Hess	3b758aaad6	add news item for git-annex 10.20240808	2024-08-08 15:27:11 -04:00
Joey Hess	3ea835c7e8	proxied exporttree=yes versionedexport=yes remotes are not untrusted This removes versionedExport, which was only used by the S3 special remote. Instead, versionedexport=yes is a common way for remotes to indicate that they are versioned.	2024-08-08 15:24:19 -04:00
Joey Hess	5c36177e58	proxied exporttree=yes remotes are untrustworthy This is not perfect because it does not handle versioned special remotes, which should not be untrustworthy, but now are when proxied. The implementation turned out to be easy, because the exporttree field is a default field, so is available in RemoteConfig even for git remotes.	2024-08-08 14:43:53 -04:00
Joey Hess	b23c7f769e	update	2024-08-08 14:25:18 -04:00
Joey Hess	9663888c77	update	2024-08-08 14:05:05 -04:00
Joey Hess	a2eb3b450a	post-receive: use the exporttree=yes remote as a source This handles cases where a single key is used by multiple files in the exported tree. When using `git-annex push`, the key's content gets stored in the annexobjects location, and then when the branch is pushed, it gets renamed from the annexobjects location to the first exported file. For subsequent exported files, a copy of the content needs to be made. This causes it to download the key from the remote in order to upload another copy to it. This is not needed when using `git push` followed by `git-annex copy --to` the proxied remote, because the received key is stored at all export locations then. Also, fixed handling of the synced branch push, it was exporting master when synced/master was pushed. Note that currently, the first push to the remote does not see that it is able to get a key from it in order to upload it back. It displays "(not available)". The second push is able to. Since git-annex push pushes first the synced branch and then the branch, this does end up with a full export being made, but it is not quite right.	2024-08-08 13:49:53 -04:00
Joey Hess	7294d23d78	export: Added --from option This is similar to git-annex copy --from --to, in that it downloads a local copy, locks it for removal, uploads it, and drops it. Removal of the temporary local copy is done without verifying numcopies for the same reason as that command. I do wonder, looking at this, if there's a race where the local copy gets used as a copy to allow some other drop in the narrow window after it is downloaded and before it gets locked for removal. That would need some other repository to have an out of date location log that says the repository contains a copy of the key, in order for it to try to use it as a copy. If there is such a race, git-annex copy/move would also be vulnerable to it. It would be better to lock it for removal before starting to download it! That is possible in v10 repositories, which do use a separate content lock file. Note that, when the exported tree contains several files that use the same key, it will be downloaded repeatedly, once per time needed to upload it. It would be possible to avoid that extra work, but it would complicate this since the local copy would need to be preserved, locked for removal, until the end. Also, that would mean that interrupting the export would leave possibly a lot of temporarily downloaded keys in the local repository, while currently it can only leave one.	2024-08-08 12:08:55 -04:00
Joey Hess	01edd186e9	update proxied exporttree=yes remote on receive of sync branch Since git-annex sync sends the sync branch first, and only displays the output of the push to the sync branch, this makes git-annex post-retrieve's output when updating the exported tree be visible when syncing. This also makes syncing with a non-bare repository still update the exported tree, even when the checked out branch is not able to be updated. The sync branch gets sent regardless.	2024-08-07 13:11:06 -04:00
Joey Hess	55adbb6694	avoid trying to export tree to proxied exporttree=yes remotes This avoids a lot of ugly messages when syncing with such a remote. The export tree happens on the proxy side.	2024-08-07 13:00:19 -04:00
Joey Hess	6d96734128	updateproxy, updatecluster check annexobjects=yes updateproxy, updatecluster: Prevent using an exporttree=yes special remote that does not have annexobjects=yes, since it will not work.	2024-08-07 12:27:24 -04:00
Joey Hess	8864a9e353	update	2024-08-07 11:49:53 -04:00
Joey Hess	1e0f13ad7f	comment	2024-08-07 11:39:29 -04:00
Joey Hess	b8f8c38e88	Merge branch 'master' into exportreeplus	2024-08-07 11:28:21 -04:00
Joey Hess	509b23fa00	catch ClientError from withClientM When getting from a P2P HTTP remote, prompt for credentials when required, instead of failing. This feels like it might be a bug in servant-client. withClientM's type suggests it would not throw a ClientError. But it does in this case.	2024-08-07 11:24:34 -04:00
Joey Hess	43e1f590c9	comment	2024-08-07 10:47:47 -04:00
Joey Hess	1038567881	proxy stores received keys to known export locations This handles the workflow where the branch is first pushed to the proxy, and then files in the exported tree are later are copied to the proxied remote. Turns out that the way the export log is structured, nothing needs to be done to finalize the export once the last key is sent to it. Which is great because that would have been a lot of complication. On receiving the push, Command.Export runs and calls recordExportBeginning, does as much as it can to update the export with the files currently on it, and then calls recordExportUnderway. At that point, the export.log records the export as "complete", but it's not really. And that's fine. The same happens when using `git-annex export` when some files are not available to send. Other repositories that have access to the special remote can already retrieve files from it. As the missing files get copied to the exported remote, all that needs to be done is record each in the export db. At this point, proxying to exporttree=yes annexobjects=yes special remotes is fully working. Except for in the case where multiple files in the tree use the same key, and the files are sent to the proxied remote before pushing the tree. It seems that even special remotes without annexobjects=yes will work if used with the workflow where the git-annex branch is pushed before copying files. But not with the `git-annex push` workflow.	2024-08-07 09:47:34 -04:00
matrss	3ccbcc5662		2024-08-07 12:12:29 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c	d3cc84db3b		2024-08-07 12:05:53 +00:00
git-annex@82b5fddc759dffdf749b19add6f0be2a0c78b62c	e8f60e7daa		2024-08-07 12:04:42 +00:00
Joey Hess	ba1cb517c0	update	2024-08-06 14:46:56 -04:00
Joey Hess	c53f61e93f	Merge branch 'master' into exportreeplus	2024-08-06 14:46:33 -04:00
Joey Hess	f01d872059	fixed	2024-08-06 14:42:46 -04:00
Joey Hess	3289b1ad02	proxying to exporttree=yes annexobjects=yes basically working It works when using git-annex sync/push/assist, or when manually sending all content to the proxied remote before pushing to the proxy remote. But when the push comes before the content is sent, sending content does not update the exported tree.	2024-08-06 14:21:23 -04:00
Joey Hess	be5c86c248	refine	2024-08-06 12:15:18 -04:00
Joey Hess	4750ffbd3b	finalized design for proxying to exporttree=yes annexobjects=yes special remotes	2024-08-06 11:45:45 -04:00
Joey Hess	84d27cf34f	update	2024-08-06 11:13:51 -04:00
matrss	6d1592f857		2024-08-06 12:44:18 +00:00
Spencer	66ff2bc833	Added a comment: D: Correct	2024-08-05 22:17:55 +00:00
Joey Hess	a535eaa176	rename from annexobjects location on export (When possible, of course it may not be there, or it may get renamed from there for another exported file first. Or the remote may not support renames.) This will avoids redundant uploads. An example case where this is important: Proxying to a exporttree remote, a file is uploaded to it but is not yet in an exported tree. When the exported tree is pushed, the remote needs to be updated by exporting to it. In this case, the proxy doesn't have a copy of the file, so it would need to download it from annexobjects before uploading it to the final location. With this optimisation, it can just rename it. However: If a key is used twice in an exported tree, it seems a proxy will need to download and reupload anyway. Unless a copy operation is added to exporttree remotes..	2024-08-04 12:19:10 -04:00
Joey Hess	a3d96474f2	rename to annexobjects location on unexport This avoids needing to re-upload the file again to get it to the annexobjects location, which git-annex sync was doing when it was preferred content. If the file is not preferred content, sync will drop it from the annexobjects location. If the file has been deleted from the tree, it will remain in the annexobjects location until an unused/dropunused pass is done.	2024-08-04 11:58:07 -04:00
Joey Hess	6b63449133	update Decided not to use the annexobjects location for exportTempName. There doesn't seem to be any actual benefit to doing that, because an export that renames to exportTempName always renames it back from that to another location. Also the annexobjects directory won't actually help with the paired rename issue.	2024-08-04 11:34:00 -04:00
Joey Hess	ee076b68f5	strong verification on retrieval from annexobjects location The file in the annexobjects location may have been renamed from a previously exported file that got deleted in a subsequent export. Or it may be renamed to annexobjects temporarily before being renamed to another name (to handle eg pairwise renames). But, an exported file is not guaranteed to contain the content of the key that the local repository last exported there. Another tree could have been exported from elsewhere in the meantime. So, files in annexobjects do not necessarily have the content of their key. And so have to be strongly verified when retrieving. The same as is done when retrieving exported files.	2024-08-04 11:24:21 -04:00
Joey Hess	fe01a1e7e1	design work on annexobjects remotes	2024-08-03 19:51:03 -04:00
Joey Hess	a4a06404d4	sync --content with annexobjects=true exporttree remotes	2024-08-03 11:39:23 -04:00
Joey Hess	9497bf7fdb	update	2024-08-02 18:50:57 -04:00
Joey Hess	9da2860812	Merge branch 'master' into exportreeplus	2024-08-02 18:45:44 -04:00
Joey Hess	c4352adf6a	in unexport, check for annexobjects presence before updating location log The key may still be in the annexobjects location.	2024-08-02 18:43:10 -04:00
Joey Hess	34c10d082d	status	2024-08-02 14:15:05 -04:00
Joey Hess	83fa76733f	status	2024-08-02 14:10:34 -04:00
Joey Hess	28b29f63dc	initial support for annexobjects=yes Works but some commands may need changes to support special remotes configured this way.	2024-08-02 14:07:45 -04:00
Spencer	cb192eafed	removed	2024-08-02 04:37:11 +00:00
Spencer	bec9df965a	Added a comment: Necro	2024-08-02 04:10:32 +00:00
Spencer	38de446248	Added a comment: Necro	2024-08-02 04:10:14 +00:00
dmcardle	94e34ca139		2024-08-01 14:25:18 +00:00
dmcardle	b6810e6fee	Added a comment	2024-08-01 14:23:41 +00:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001	9e0044e990		2024-08-01 14:19:25 +00:00
d@403a635aa8eaa8bfa8613acb6a375d9e06ed7001	9728762a2c	Added a comment	2024-08-01 13:49:41 +00:00
Spencer	dc1f707875	Added a comment: @joey	2024-07-31 20:10:06 +00:00
Joey Hess	3a1f39fbdf	Avoid loading cluster log at startup This fixes a problem with datalad's test suite, where loading the cluster log happened to cause the git-annex branch commits to take a different shape, with an additional commit. It's also faster though, since many commands don't need the cluster log. Just fill Annex.clusters with a thunk. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-07-31 15:54:14 -04:00
Joey Hess	ffba57c9fc	cleanup comments on removed news post	2024-07-31 14:05:48 -04:00
Joey Hess	0403603483	add news item for git-annex 10.20240731	2024-07-31 14:05:11 -04:00
Joey Hess	f914ee61e3	analysis	2024-07-31 12:19:12 -04:00
Joey Hess	1e8208457f	pinged	2024-07-31 10:06:43 -04:00
Joey Hess	9e901d326d	comment	2024-07-31 10:04:08 -04:00
Spencer	6af48665e0	[Bug] Trust but Verify: RClone	2024-07-31 00:34:01 +00:00
Joey Hess	d52fd3cf83	update	2024-07-30 12:17:05 -04:00
Joey Hess	1500a9525d	todo	2024-07-30 11:58:44 -04:00
Joey Hess	1632beaf70	fix negative DATA when 1 node of a cluster has a partial transfer	2024-07-30 11:42:17 -04:00
Joey Hess	1560e0eee9	comment	2024-07-30 10:50:13 -04:00
Joey Hess	73703d1bef	close	2024-07-29 15:15:40 -04:00
Joey Hess	fcc052bed8	When proxying an upload to a special remote, verify the hash. While usually uploading to a special remote does not verify the content, the content in a repository is assumed to be valid, and there is no trust boundary. But with a proxied special remote, there may be users who are allowed to store objects, but are not really trusted. Another way to look at this is it's the equivilant of git-annex-shell checking the hash of received data, which it does (see StoreContent implementation).	2024-07-29 13:40:51 -04:00
Joey Hess	380af6ac5f	update github badges Seems the urls changed and the old ones will be falsely green forever. Found new ones in readme at https://github.com/datalad/git-annex	2024-07-29 13:00:00 -04:00
Joey Hess	b4eb6e3ced	comment	2024-07-29 11:59:33 -04:00
Joey Hess	321e2adf66	don't think I ever implementned the 422 idea, it will 404	2024-07-29 11:49:40 -04:00
Joey Hess	d3f584fcdb	wording	2024-07-29 11:44:44 -04:00
Joey Hess	5f5c29fbe7	link	2024-07-29 11:43:30 -04:00
Joey Hess	f3b207a4b9	wording	2024-07-29 11:37:13 -04:00
Joey Hess	6068379e80	typo	2024-07-29 11:34:46 -04:00
Joey Hess	db66612b8f	Merge branch 'httpproto'	2024-07-29 11:33:39 -04:00
Joey Hess	74f81ebd04	Merge remote-tracking branch 'origin/httpproto'	2024-07-29 11:25:27 -04:00
Joey Hess	6f20085a60	update	2024-07-29 11:25:07 -04:00
Joey Hess	60b1c53df5	preparing to merge	2024-07-29 11:22:27 -04:00
Joey Hess	4f3ae96666	cleanly close proxy connection on interrupted PUT An interrupted PUT to cluster that has a node that is a special remote over http left open the connection to the cluster, so the next request opens another one. So did an interrupted PUT directly to the proxied special remote over http. proxySpecialRemote was stuck waiting for all the DATA. Its connection remained open so it kept waiting. In servePut, checktooshort handles closing the P2P connection when too short a data is received from PUT. But, checktooshort was only called after the protoaction, which is what runs the proxy, which is what was getting stuck. Modified it to run as a background thread, which waits for the tooshortv to be written to, which gather always does once it gets to the end of the data received from the http client. That makes proxyConnection's releaseconn run once all data is received from the http client. Made it close the connection handles before waiting on the asyncworker thread. This lets proxySpecialRemote finish processing any data from the handle, and then it will give up, more or less cleanly, if it didn't receive enough data. I say "more or less cleanly" because with both sides of the P2P connection taken down, some protocol unhappyness results. Which can lead to some ugly debug messages. But also can cause the asyncworker thread to throw an exception. So made withP2PConnections not crash when it receives an exception from releaseconn. This did have a small change to the behavior of an interrupted PUT when proxying to a regular remote. proxyConnection has a protoerrorhandler that closes the proxy connection on a protocol error. But the proxy connection is also closed by checktooshort when it closes the P2P connection. Closing the same proxy connection twice is not a problem, it just results in duplicated debug messages about it.	2024-07-29 10:37:19 -04:00
Joey Hess	c8e7231f48	add debugging of opening and closing connections to proxies	2024-07-29 09:52:26 -04:00
Joey Hess	7ac8d36f38	idea	2024-07-29 09:11:27 -04:00
stv0g	6352cebb92	Added a comment: importtree=yes Support	2024-07-29 06:50:01 +00:00
Joey Hess	cd89f91aa5	remove uuid from annex+http urls Not needed it turns out.	2024-07-28 20:29:42 -04:00
Joey Hess	bc9cc79e85	set remote's annexUrl automatically When the remote repository's git config file has annex.url set to an annex+http url.	2024-07-28 20:13:41 -04:00
Joey Hess	c87cfe1e00	todo	2024-07-28 17:29:32 -04:00
Joey Hess	ccbdaf0448	documentation for p2phttp	2024-07-28 17:19:27 -04:00
Joey Hess	dfe65b92c8	avoid repeatedly parsing the proxy log	2024-07-28 16:04:20 -04:00
Joey Hess	2fdec6b4e1	update	2024-07-28 15:55:24 -04:00
Joey Hess	ddabc138ec	todo	2024-07-28 15:41:31 -04:00
Joey Hess	cdc4bd7443	fix hang in PUT of large file to a special remote node of a cluster over http	2024-07-28 15:34:59 -04:00
Joey Hess	66679c9bb4	remove temp file after upload to special remote	2024-07-28 14:36:45 -04:00
Joey Hess	9461793ffc	Merge remote-tracking branch 'origin/master' into httpproto	2024-07-28 14:24:15 -04:00
Joey Hess	ccd102cd19	update	2024-07-28 14:22:44 -04:00
Joey Hess	5e205f215d	clean shut down of cluster connection when PUT is interrupted An interrupted `git-annex copy --to` a cluster via the http server, when repeated, failed. The http server output "transfer already in progress, or unable to take transfer lock". Apparently a second connection was opened to the cluster, because the first connection never got shut down. Turned out the problem was that when proxying to a cluster, it would read a short ByteString from the client, and send that to the nodes. But that left the nodes warning more. Meanwhile, the proxy was expecting a SUCCESS/FAILURE message from the nodes. So it didn't return, and so the cluster connection stayed open.	2024-07-28 14:20:11 -04:00
Joey Hess	bdde6d829c	fix http proxying for a local git remote with a relative path git-annex-shell expects an absolute path	2024-07-28 13:35:51 -04:00
Joey Hess	41667ad36b	found some bugs with clusters	2024-07-28 13:00:05 -04:00
Joey Hess	770aac97a7	share single BranchState amoung all threads This fixes a problem when git-annex testremote is run against a cluster accessed via the http server. Annex.Cluster uses the location log to find nodes that contain a key when checking if the key is present or getting it. Just after a key was stored to a cluster node, reading the location log was not getting the UUID of that node. Apparently the Annex action that wrote to the location log, and the one that read from it were run with two different Annex states. The http server does use several different Annex threads. BranchState was part of the AnnexState, and so two threads could have different BranchStates. Moved BranchState to the AnnexRead, so all threads will see the common state. This might possibly impact performance. If one thread is writing changes to the branch, and another thread is reading from the branch, the writing thread will now invalidate the BranchState's cache, which will cause the reading thread to need to do extra work. But correctness is surely more important. If did is found to have impacted performance, it could probably be dealt with by doing smarter BranchState cache invalidation. Another way this might impact performance is that the BranchState has a small cache. If several threads were reading from the branch and relying on the value they just read still being in the case, now a cache miss will be more likely. Increasing the BranchState cache to the number of jobs might be a good idea to amelorate that. But the cache is currently an innefficient list, so making it large would need changes to the data types. (Commit `4304f1b6ae` dealt with a follow-on effect of the bug fixed here.)	2024-07-28 12:30:27 -04:00
Joey Hess	fbbedae497	add --clusterjobs option and default to 1 The default of 1 is not ideal at all, but it avoids an accidental M*N causing so much concurrency it becomes unusable.	2024-07-28 10:36:22 -04:00
Joey Hess	1259ad89b6	cluster support in http API server Wired it up and it seems to basically work, although the test suite is not fully passing. Note that --jobs currently gets multiplied by the number of nodes in the cluster, which is probably not good.	2024-07-28 10:17:29 -04:00
Joey Hess	0cdd418407	tested shutdown of connection to http proxied special remote I had worried it might not work properly, but it does, the endv works.	2024-07-28 09:17:47 -04:00
Joey Hess	ef8f24f28c	fix PUT to http proxied special remote It was hanging because it never sent FAILURE in the INVALID case. And putoffset always triggers the INVALID case.	2024-07-28 09:14:42 -04:00
Joey Hess	0ea645944e	thoughts on exporttree	2024-07-27 19:59:54 -04:00
Joey Hess	1c0448e33c	update	2024-07-26 20:44:01 -04:00
Joey Hess	0fb86d2916	UNLOCKCONTENT is not a top-level request proxyRequest was treating UNLOCKCONTENT as a separate request. That made it possible for there to be two different connections to the proxied remote, with LOCKCONTENT being sent to one, and UNLOCKCONTENT to the other one. A protocol error. git-annex testremote now passes against a http proxied remote.	2024-07-26 20:39:06 -04:00
Joey Hess	a3dab58be2	fix hang at end of PUT to proxied p2p http remote sendExactly will now be sure to evaluate the whole lazy ByteString. In this case, the lazy ByteString was exactly the right lenth. But, it seems that L.take caused it to not actually be fully evaluated. In servePut, this manifested as gather never being fully evaluated, which caused the hang. Very, very subtle, and horrible bug. Clearly the use of lazy ByteString (or really just laziness) is at fault, and it would be very worth moving to conduit or whatever to avoid this.	2024-07-26 19:50:15 -04:00
Joey Hess	b431201e1f	update	2024-07-26 17:15:09 -04:00
Joey Hess	d1faa13d6a	implement proxy connection pool removeOldestProxyConnectionPool will be innefficient the larger the pool is. A better data structure could be more efficient. Eg, make each value in the pool include the timestamp of its oldest element, then the oldest value can be found and modified, rather than rebuilding the whole Map. But, for pools of a few hundred items, this should be fine. It's O(n*n log n) or so. Also, when more than 1 connection with the same pool key exists, it's efficient even for larger pools, since removeOldestProxyConnectionPool is not needed. The default of 1 idle connection could perhaps be larger.. like the number of jobs? Otoh, it seems good to ramp up and down the number of connections, which does happen. With 1, there is at most one stale connection, which might cause a request to fail.	2024-07-26 17:03:31 -04:00
yarikoptic	de90a2c5de	initial report on keeping association with the remote	2024-07-26 20:01:23 +00:00
Joey Hess	ad025b8e5e	clean up protocol version for proxying The proxy always checks the protocol version of a remote before talking to it in a version-specific way, so the protocol version in the ProxyParams is the client's protocol version. The remote will always be at the same or an older protocol version than the client. Note that in relayDATAFinish, when the client is at protocol version 0, the remote must thus be as well, and that's why its version is not checked in the case for that. With that clarified, it's evident that, in P2P.Http.State, there's no need to look at the proxied remote's protocol version at all.	2024-07-26 13:49:05 -04:00
Joey Hess	576ec6ed71	fix hang in GET from http p2p proxy serverP2PConnection = proxyfromclientconn causes serveGet to signalFullyConsumedByteString to it, which is what it's waiting for	2024-07-26 12:51:00 -04:00
Joey Hess	f052091558	update	2024-07-26 11:01:45 -04:00
Joey Hess	cc1da2d516	http p2p proxy is now largely working	2024-07-26 10:44:10 -04:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	17f7912dd5	Added a comment	2024-07-26 11:42:49 +00:00
Joey Hess	96ad0ccc5b	wip	2024-07-25 15:39:57 -04:00
Joey Hess	b13c2407af	p2phttp drop supports checking proof timestamps At this point the p2phttp implementation is fully complete!	2024-07-25 10:11:09 -04:00
Joey Hess	6a3f755bfa	add common parameters to generic get API Honestly this was just done to make the documentation correct. There's no point in using these parameters. And they're optional.	2024-07-24 20:55:58 -04:00
Joey Hess	f5624a69e3	expire lock after 10 minutes initially Once keeplocked is called, the lock will expire at the end of that call. But if keeplocked never gets called, this avoids the lock persisting forever.	2024-07-24 14:25:40 -04:00
Joey Hess	97836aafba	Remote.Git lockContent works with annex+http urls	2024-07-24 13:42:57 -04:00
Joey Hess	9fa9678585	Remote.Git removeKey works with annex+http urls Does not yet handle drop proof lock timestamp checking.	2024-07-24 12:33:26 -04:00
Joey Hess	fd3bdb2300	update	2024-07-24 12:19:53 -04:00
Joey Hess	0d81d1ee2f	update	2024-07-24 12:18:51 -04:00
Joey Hess	cfdb80cd05	progress meter for p2phttp storeKey	2024-07-24 12:14:56 -04:00
Joey Hess	0280e2dd5e	update	2024-07-24 11:13:37 -04:00
Joey Hess	10f2c23fd7	fix slowloris timeout in hashing resume of download of large file Hash the data that is already present in the file before connecting to the http server.	2024-07-24 11:03:59 -04:00
Joey Hess	7bd616e169	Remote.Git retrieveKeyFile works with annex+http urls This includes a bugfix to serveGet, it hung at the end.	2024-07-24 10:28:44 -04:00
Joey Hess	b4d749cc91	Merge branch 'master' into httpproto	2024-07-23 21:17:06 -04:00
Joey Hess	f7404a64c0	Propagate --force to git-annex transferrer And other child processes.	2024-07-23 21:16:56 -04:00
Joey Hess	7d4045277a	bug	2024-07-23 21:02:31 -04:00
Joey Hess	48657405c6	cache credentials for p2phttp in memory	2024-07-23 18:45:02 -04:00
Joey Hess	b89c784a9b	use git credential when p2phttp needs auth	2024-07-23 18:11:15 -04:00
Joey Hess	73ffb58456	p2phttp support https	2024-07-23 15:37:36 -04:00
Joey Hess	b7149e897b	add --bind option and listen to both ipv4 and ipv6 by default	2024-07-23 15:19:56 -04:00
Joey Hess	b7454f1eeb	protocol version fallback on 404 and prettified errors	2024-07-23 14:58:49 -04:00
Joey Hess	2aa9154b1f	require a valid uuid at the end of an annex+http url	2024-07-23 12:30:27 -04:00
Joey Hess	75b1d50b99	add remoteAnnexP2PHttpUrl to RemoveGitConfig This is always parsed, when building without servant, a Baseurl is not generated, and users of it will need to fail.	2024-07-23 09:57:01 -04:00
Joey Hess	a6a03ca586	annex+http urls	2024-07-23 08:42:33 -04:00
Joey Hess	758cff0fde	update	2024-07-22 20:59:45 -04:00
Joey Hess	06de2ad972	change default port to 9417 Port 80 would need root, not a good idea, so pick something that might work by default. 9418 is git protocol's port. 9419 is used by something, but nothing known uses 9417, so it's as good a default as any.	2024-07-22 20:52:17 -04:00
Joey Hess	9984252ab5	P2P protocol is finalized	2024-07-22 19:50:08 -04:00
Joey Hess	e979e85bff	make serveKeepLocked check auth just to be safe	2024-07-22 19:15:52 -04:00
Joey Hess	f5dd7a8bc0	implemented serveLockContent (untested)	2024-07-22 17:38:42 -04:00
Joey Hess	b697c6b9da	fix TMVar left full crash affecting servePutOffset Problem is that whatever is reading from the TMVar may not have read from it yet before the client writes the next thing to it.	2024-07-22 15:48:46 -04:00
Joey Hess	3069e28dd8	implemented servePutOffset and clientPutOffset But, it's buggy: the server hangs without processing the VALIDITY, and I can't seem to work out why. As far as I can see, storefile is getting as far as running the validitycheck, which is supposed to read that, but never does. This is especially strange because what seems like the same protocol doesn't hang when servePut runs it. This made me think that it needed to use inAnnexWorker to be more like servePut, but that didn't help. Another small problem with this is that it does create an empty .git/annex/tmp/ file for the key. Since this will usually be used in combination with servePut, that doesn't seem worth worrying about much.	2024-07-22 15:04:10 -04:00
Joey Hess	b240a11b79	clientPut seeking to offset	2024-07-22 12:50:21 -04:00
Joey Hess	a01426b713	avoid padding in servePut This means that when the client sends a truncated data to indicate invalidity, DATA is not passed the full expected data. That leaves the P2P connection in a state where it cannot be reused. While so far, they are not reused, they will be later when proxies are supported. So, have to close the P2P connection in this situation.	2024-07-22 12:30:30 -04:00
Joey Hess	efa0efdc44	avoid padding in clientPut Instead truncate when necessary to indicate invalid content was sent. Very similar to how serveGet handles it.	2024-07-22 11:47:24 -04:00
Joey Hess	72d0769ca5	avoid padding content in serveGet Always truncate instead. The padding risked something not noticing the content was bad and getting a file that was corrupted in a novel way with the padding "X" at the end. A truncated file is better.	2024-07-22 11:19:52 -04:00
Joey Hess	4826a3745d	servePut and clientPut implementation Made the data-length header required even for v0. This simplifies the implementation, and doesn't preclude extra verification being done for v0. The connectionWaitVar is an ugly hack. In servePut, nothing waits on the waitvar, and I could not find a good way to make anything wait on it.	2024-07-22 10:27:44 -04:00
adehnert	8eadd02b52	Added a comment: git-annex for managing music	2024-07-21 19:08:45 +00:00
adehnert	12bc3ca2a7		2024-07-21 18:17:25 +00:00
adehnert	2b96f62ada		2024-07-21 18:17:11 +00:00
adehnert	264366f45d		2024-07-21 18:15:11 +00:00
adehnert	024b331a4b		2024-07-21 18:14:28 +00:00
adehnert	40c930a381		2024-07-21 18:14:03 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	1ee30b29ee	Added a comment	2024-07-21 12:38:12 +00:00
adehnert	b143cb686d	Added a comment: `git annex sync --ff-only`	2024-07-21 01:04:44 +00:00
nobodyinperson	b920655acd	Added a comment: Also Serveo.net	2024-07-19 15:21:19 +00:00
kdm9	8a7fc275cb	Added a comment	2024-07-19 13:11:05 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	2878343354		2024-07-19 12:12:56 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	a2ab2f70ea	Added a comment	2024-07-19 08:26:31 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	b1db5115e0	Added a comment	2024-07-17 14:07:32 +00:00
yarikoptic	ba4d545776	reporting FTBFS on windows	2024-07-16 15:58:50 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	1287e4590f		2024-07-16 15:42:54 +00:00
mih	5bc00a55dd		2024-07-16 15:02:46 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	3590a17f9e	Added a comment	2024-07-16 09:21:54 +00:00
nobodyinperson	a79176341d	Added a comment	2024-07-15 18:32:36 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	cce86415b1	Added a comment	2024-07-15 15:18:28 +00:00
m.risse@77eac2c22d673d5f10305c0bade738ad74055f92	8ec34ea2a1		2024-07-15 14:57:26 +00:00
xentac	8102bf5fe2	Added a comment	2024-07-12 23:49:33 +00:00
xentac	df97c82c44		2024-07-12 23:37:36 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db	af4d90eea8		2024-07-12 08:11:56 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db	60eae008d8		2024-07-12 08:11:30 +00:00
ashton@37fa3fec6d2eef022a3491c85362a34141fbf0db	ed11ce6fcb		2024-07-12 08:08:22 +00:00
Joey Hess	eb4fb388bd	only base64 non-utf8	2024-07-11 15:47:16 -04:00
Joey Hess	97a2d0e4fb	use worker pool in withLocalP2PConnections This allows multiple clients to be handled at the same time.	2024-07-11 14:37:52 -04:00
Joey Hess	68227154fb	switch HTTP P2P protocol to base64url Base64 can include '/', and with UUIDs and keys both used in routes, the encoding needs to avoid that. Use base64url everywhere in the HTTP protocol for consistency.	2024-07-11 12:31:41 -04:00
Joey Hess	14e0f778b7	simplify	2024-07-11 11:50:44 -04:00
Joey Hess	2228d56db3	serveGet invalidation	2024-07-11 11:42:32 -04:00
Joey Hess	a7383b5c59	move serveruuid into routes In particular the generic get route needs it, so that when a single http server is serving multiple repositories, it knows what repository to use.	2024-07-11 11:19:20 -04:00
Joey Hess	3b37b9e53f	fix serveGet hang This came down to SendBytes waiting on the waitv. Nothing ever filled it. Only Annex.Proxy needs the waitv, and it handles filling it. So make it optional.	2024-07-11 07:46:52 -04:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3	a82a573f75		2024-07-11 07:47:27 +00:00
benjamin.poldrack@d09ccff6d42dd20277610b59867cf7462927b8e3	9ce207532e		2024-07-11 07:23:30 +00:00
Joey Hess	8cb1332407	update	2024-07-10 16:10:08 -04:00
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	7c588a5791	implement remove-before The reason to use removeBeforeRemoteEndTime is twofold. First, removeBefore sends two protocol commands. Currently, the HTTP protocol runner only supports sending a single command per invocation. Secondly, the http server gets a monotonic timestamp from the client. So translating back to a POSIXTime would be annoying. The timestamp flow with a proxy will be: - client gets timestamp, which gets the monotonic timestamp from the proxied remote via the proxy. The timestamp is currently not proxied when there is a single proxy. - client calls remove-before - http server calls removeBeforeRemoteEndTime which sends REMOVE-BEFORE to the proxied remote.	2024-07-10 10:03:26 -04:00
Joey Hess	48f76cb3e8	implement serveRemove and send WWW-Authenticate header on auth failure	2024-07-10 09:13:01 -04:00
Joey Hess	97d0fc9b65	git-annex p2phttp options	2024-07-10 00:01:55 -04:00
Joey Hess	6a8a4d1775	authentication is implemented just need to make Command.P2PHttp generate a GetServerMode from options	2024-07-09 20:54:47 -04:00
Joey Hess	08371c3745	started on auth	2024-07-09 17:30:55 -04:00
Joey Hess	b5b3d8cde2	update	2024-07-09 14:30:50 -04:00
Joey Hess	a3dd8b4bcb	capture API version in routes Needed so the client can send it.	2024-07-09 12:04:29 -04:00
Joey Hess	751b8e0baf	implemented serveCheckPresent Still need a way to run Proto though	2024-07-09 09:08:42 -04:00
yarikoptic	fade907c6a	initial report from boox installation	2024-07-09 02:44:51 +00:00
Joey Hess	3f402a20a8	implement Locker	2024-07-08 21:00:10 -04:00
Joey Hess	b758b01692	add lockids to http p2p protocol	2024-07-08 20:18:55 -04:00
Joey Hess	69c4f07ab0	finish get API	2024-07-08 13:27:50 -04:00
Joey Hess	82d66ede5e	convert lockcontent api to http long polling Websockets would work, but the problem with using them for this is that each lockcontent call is a separate websocket connection. And that's an actual TCP connection. One TCP connection per file dropped would be too expensive. With http long polling, regular http pipelining can be used, so it will reuse a TCP connection. Unfortunately, at least with servant, bi-directional streams with long polling don't result in true bidirectional full duplex communication. Servant processes the whole client body stream before generating the server body stream. I think it's entirely possible to do full bi-directional communication over http, but it would need changes to servant. And, there's no way for the client to tell if the server successfully locked the content, since the server will keep processing the client stream no matter what.: So, added a new api endpoint, keeplocked. lockcontent will lock the key for 10 minutes with retention lock, and then a call to keeplocked will keep it locked for as long as needed. This does mean that there will need to be a Map of locks by key, and I will probably want to add some kind of lock identifier that lockcontent returns.	2024-07-08 12:57:46 -04:00
Joey Hess	838169ee86	status	2024-07-07 16:16:11 -04:00
Joey Hess	1dbb5ec70d	servant API type is complete	2024-07-07 12:59:12 -04:00
Joey Hess	4133063ab1	Merge branch 'master' into httpproto	2024-07-07 12:08:24 -04:00
Joey Hess	86ce3bf1e4	started servant implementation of HTTP P2P protocol	2024-07-07 12:08:10 -04:00
Joey Hess	9595f77584	Merge branch 'master' of ssh://git-annex.branchable.com	2024-07-05 15:37:43 -04:00
Joey Hess	40306d3fcf	finalizing HTTP P2p protocol some more Added v2-v0 endpoints. These are tedious, but will be needed in order to use the HTTP protocol to proxy to repositories with older git-annex, where git-annex-shell will be speaking an older version of the protocol. Changed GET to use 422 when the content is not present. 404 is needed to detect when a protocol version is not supported.	2024-07-05 15:34:58 -04:00
Joey Hess	2fb3ef4d41	finalizing HTTP P2P protocol Managed to avoid netstrings. Actually, using netstrings while streaming lazy ByteString turns out to be very difficult. So instead, have a header that specifies the expected amount of data, and then it can just arrange to send a different amount of data if it needs to indicate INVALID. Also improved the interface for GET of a key.	2024-07-05 15:03:51 -04:00
Joey Hess	5e564947d7	use netstrings for framing binary data with json at the end This will be easy to implement with servant. It's also very efficient, and fairly future-proof. Eg, could add another frame with other data. This does make it a bit harder to use this protocol, but netstrings probably take about 5 minutes to implement? Let's see... import Text.Read import Data.List toNetString :: String -> String toNetString s = show (length s) ++ ":" ++ s ++ "," nextNetString :: String -> Maybe (String, String) nextNetString s = case break (== ':') s of ([], _) -> Nothing (sn, rest) -> do n <- readMaybe sn let (v, rest') = splitAt n (drop 1 rest) return (v, drop 1 rest') Ok, well, that took about 10 minutes ;-)	2024-07-05 11:53:03 -04:00
Joey Hess	95ba4d4480	thoughts on CGI, and use json	2024-07-05 10:08:43 -04:00
git-annex@4a0625db6ced1ac00744697d5bac41393bcde646	81c9808cfa	Added a comment	2024-07-05 10:22:46 +00:00
Joey Hess	3f9569e27f	update	2024-07-04 15:26:05 -04:00
Joey Hess	2ca51fe947	Merge branch 'master' of ssh://git-annex.branchable.com	2024-07-04 15:18:17 -04:00

... 4 5 6 7 8 ...

34923 commits