git-annex

Author	SHA1	Message	Date
Joey Hess	b83fdf66df	Allow enabling the servant build flag with older versions of stm Allowing building with ghc 9.0.2 (debian stable). Updated patch covering all uses of writeTMVar.	2024-10-17 20:55:31 -04:00
Joey Hess	3a53c60121	Allow enabling the servant build flag with older versions of stm Allowing building with ghc 9.0.2 (debian stable).	2024-10-17 14:04:31 -04:00
Joey Hess	76a1989a0e	implement openFileBeingWritten This bypasses the usual haskell file locking used to prevent opening a file for read that is being written to. This is unfortunately a bit of a hack. But it seems fairly unlikely to get broken by changes to ghc. I hope. Using fdToHandle' will also work. This does not work on windows because it uses openFd from posix. It would probably be possible to implement it for windows too, just opening the FD using the Win32 library instead. However, whether windows will allow reading from a file that is also being written to I don't know, and since in the git-annex case the writer could be another process (eg external special remote), that might be doing its own locking in windows, that seems a can of worms I'd prefer not to open.	2024-10-15 11:56:42 -04:00
Joey Hess	e8e4347fcc	update version for release	2024-09-27 10:01:44 -04:00
Joey Hess	84bbbeae9d	started on sim file parser	2024-09-11 11:53:25 -04:00
Joey Hess	b932acf4ad	started Annex.Sim Have most of the sim command handler, but to keep it pure while implementing the rest will need some refactoring. It seems likely that running the simulation itself will not be able to be entirely pure. Preferred content evaluation runs in Annex after all. Note that the somewhat awkward randomWords is because the i386ancient build depends on a version of random too old to support generating a random ByteString on its own.	2024-09-04 15:15:36 -04:00
Joey Hess	b3dc656153	releasing package git-annex version 10.20240831	2024-08-31 19:50:26 -04:00
Joey Hess	c3d40b9ec3	plumb in LiveUpdate (WIP) Each command that first checks preferred content (and/or required content) and then does something that can change the sizes of repositories needs to call prepareLiveUpdate, and plumb it through the preferred content check and the location log update. So far, only Command.Drop is done. Many other commands that don't need to do this have been updated to keep working. There may be some calls to NoLiveUpdate in places where that should be done. All will need to be double checked. Not currently in a compilable state.	2024-08-23 16:35:12 -04:00
Joey Hess	06064f897c	update Annex.reposizes when changing location logs The live update is only needed when Annex.reposizes has already been populated.	2024-08-15 13:27:14 -04:00
Joey Hess	745bc5c547	take maxsize into account for balanced preferred content This is very innefficient, it will need to be optimised not to calculate the sizes of repos every time. Also, fixed a bug in balancedPicker that caused it to pick a too high index when some repos were excluded due to being full.	2024-08-13 11:00:20 -04:00
Joey Hess	99a126bebb	added reposize database The idea is that upon a merge of the git-annex branch, or a commit to the git-annex branch, the reposize database will be updated. So it should always accurately reflect the location log sizes, but it will often be behind the actual current sizes. Annex.reposizes will start with the value from the database, and get updated with each transfer, so it will reflect a process's best understanding of the current sizes. When there are multiple processes all transferring to the same repo, Annex.reposize will not reflect transfers made by the other processes since the current process started. So when using balanced preferred content, it may make suboptimal choices, including trying to transfer content to the repo when another process has already filled it up. But this is the same as if there are multiple processes running on ifferent machines, so is acceptable. The reposize will eventually get an accurate value reflecting changes made by other processes or in other repos.	2024-08-12 11:19:58 -04:00
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	bd5affa362	use hmac in balanced preferred content This deals with the possible security problem that someone could make an unusually low UUID and generate keys that are all constructed to hash to a number that, mod the number of repositories in the group, == 0. So balanced preferred content would always put those keys in the repository with the low UUID as long as the group contains the number of repositories that the attacker anticipated. Presumably the attacker than holds the data for ransom? Dunno. Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs combined together as the "secret", and the key as the "message". Now any change in the set of UUIDs in a group will invalidate the attacker's constructed keys from hashing to anything in particular. Given that there are plenty of other things someone can do if they can write to the repository -- including modifying preferred content so only their repository wants files, and numcopies so other repositories drom them -- this seems like safeguard enough. Note that, in balancedPicker, combineduuids is memoized.	2024-08-10 16:32:54 -04:00
Joey Hess	c15c32b5f8	releasing package git-annex version 10.20240808	2024-08-08 15:27:04 -04:00
Joey Hess	c1bc0bffc8	releasing package git-annex version 10.20240731	2024-07-31 14:05:01 -04:00
Joey Hess	d1b641cb1e	update stack.yaml to nightly-2024-07-29 and remove stack-lts-18.13.yaml Primarily because Windows needs a dependency bump to get stm-2.5.1 for Servant build flag. This includes Win32-2.13.4.0 and aws-0.24 which adds some features that windows had been missing out on as well. Lots of warnings about head and tail will need to eventually be addressed. Of course AFAIK the uses of it in git-annex are all safe.	2024-07-29 20:09:37 -04:00
Joey Hess	54f2ea2b85	fix syntax	2024-07-29 19:13:31 -04:00
Joey Hess	ebe81ccdfa	servant build flag needs stm-2.5.1 For writeTMVar. Would be possible to rewrite to use something else, but I don't want to. Might be possible to write a writeTMVar that works with the old version of stm.	2024-07-29 19:10:00 -04:00
Joey Hess	ab22938c0b	fix build without servant	2024-07-24 15:13:02 -04:00
Joey Hess	b7454f1eeb	protocol version fallback on 404 and prettified errors	2024-07-23 14:58:49 -04:00
Joey Hess	b0eed55d4f	factor out http server and client into own modules To avoid a cycle when Remote.Git uses the client.	2024-07-23 14:12:38 -04:00
Joey Hess	6bbc4565e6	started wiring p2phttp into Remote.Git but we have a cycle, ugh	2024-07-23 13:53:10 -04:00
Joey Hess	5c39652235	starting support for remote.name.annexUrl set to annex+http In this case, Remote.Git should not use that url for all access to the repository. It will only be used for annex operations, which isn't done yet.	2024-07-23 09:12:21 -04:00
Joey Hess	fdb888a56a	update servant build flag make it work when building w/o assistant	2024-07-23 08:53:56 -04:00
Joey Hess	b290a72025	update deps	2024-07-11 14:51:45 -04:00
Joey Hess	9a592f946f	split module	2024-07-08 21:12:23 -04:00
Joey Hess	82d66ede5e	convert lockcontent api to http long polling Websockets would work, but the problem with using them for this is that each lockcontent call is a separate websocket connection. And that's an actual TCP connection. One TCP connection per file dropped would be too expensive. With http long polling, regular http pipelining can be used, so it will reuse a TCP connection. Unfortunately, at least with servant, bi-directional streams with long polling don't result in true bidirectional full duplex communication. Servant processes the whole client body stream before generating the server body stream. I think it's entirely possible to do full bi-directional communication over http, but it would need changes to servant. And, there's no way for the client to tell if the server successfully locked the content, since the server will keep processing the client stream no matter what.: So, added a new api endpoint, keeplocked. lockcontent will lock the key for 10 minutes with retention lock, and then a call to keeplocked will keep it locked for as long as needed. This does mean that there will need to be a Map of locks by key, and I will probably want to add some kind of lock identifier that lockcontent returns.	2024-07-08 12:57:46 -04:00
Joey Hess	9ee005e49a	dummy HasClient ClientM WebSocket Enough to let lockcontent routes be included and servant-client be used. But not enough to use servant-client with those routes. May need to implement a separate runner for that part of the protocol? Also some misc other stuff needed to use servant-client. And fix exposing of UUID in the JSON types. UUID does actually have aeson instances, but they're used elsewhere (metadata --batch, although only included to get it to compile, not actually used in there) and not suitable for use here since this must work with every possible UUID.	2024-07-07 21:21:45 -04:00
Joey Hess	9a726cedf6	servant server now compiling Just need to fill in some undefined	2024-07-07 14:48:20 -04:00
Joey Hess	1dbb5ec70d	servant API type is complete	2024-07-07 12:59:12 -04:00
Joey Hess	86ce3bf1e4	started servant implementation of HTTP P2P protocol	2024-07-07 12:08:10 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	5b6150e5d5	factor out Utility.MonotonicClock	2024-07-03 17:54:01 -04:00
Joey Hess	543c610a31	REMOVE-BEFORE and GETTIMESTAMP Only implemented server side, not used client side yet. And not yet implemented for proxies/clusters, for which there's a build warning about unhandled cases. This is P2P protocol version 3. Probably will be the only change in that version.. Added a dependency on clock to access a monotonic clock. On i386-ancient, that is at version 0.2.0.0.	2024-07-03 17:01:58 -04:00
Joey Hess	20ebb54b6f	prep release	2024-07-01 15:13:10 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
Joey Hess	202ea3ff2a	don't sync with cluster nodes by default Avoid `git-annex sync --content` etc from operating on cluster nodes by default since syncing with a cluster implicitly syncs with its nodes. This avoids a lot of unncessary work when a cluster has a lot of nodes just in checking if each node's preferred content is satisfied. And it avoids content being sent to nodes individually, so instead syncing with clusters always fanout uploads to nodes. The downside is that there are situations where a cluster's preferred content settings can be met, but those of its nodes are not. Or where a node does not contain a key, but the cluster does, and there are not enough copies of the key yet, so it would be desirable the send it there. I think that's an acceptable tradeoff. These kind of situations are ones where the cluster itself should probably be responsible for copying content to the node. Which it can do much less expensively than a client can. Part of the balanced preferred content design that I will be working on in a couple of months involves rebalancing clusters, so I expect to revisit this. The use of annex-sync config does allow running git-annex sync with a specific node, or nodes, and it will sync with it. And it's also possible to set annex-sync git configs to make it sync with a node by default. (Although that will require setting up an explicit git remote for the node rather than relying on the proxied remote.) Logs.Cluster.Basic is needed because Remote.Git cannot import Logs.Cluster due to a cycle. And the Annex.Startup load of clusters happens too late for Remote.Git to use that. This does mean one redundant load of the cluster log, though only when there is a proxy.	2024-06-25 10:24:38 -04:00
Joey Hess	d34326ab76	factor out Annex.Proxy	2024-06-18 10:51:37 -04:00
Joey Hess	f0d6114286	refactor cluster code into own module	2024-06-18 10:36:04 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	570ceffe8d	broke out initcluster One benefit of this is that a typo in annex-cluster-node config won't init a new cluster. Also it gets the cluster description set and is consistent with initremote.	2024-06-14 17:23:11 -04:00
Joey Hess	bbf261487d	add git-annex updatecluster command Seems to work fine, making the right changes to the git-annex branch.	2024-06-14 15:02:01 -04:00
Joey Hess	aa56d433d5	implement cluster.log Not used yet. (Or tested.) I did consider making the log start with the uuid of the node, followed by the cluster uuid (or uuids). That would perhaps mean a smaller write to the git-annex branch when adding a node, but overall the log file would be larger, and it will be read and cached near to startup on most git-annex runs.	2024-06-13 16:00:58 -04:00
Joey Hess	501d65eeab	started implementing git-annex-shell proxy So far, it negotiates VERSION with both parties. This is a tricky dance. Untested.	2024-06-10 18:01:36 -04:00
Joey Hess	f97f4b8bdb	Added updateproxy command and remote.name.annex-proxy configuration So far this only records proxy information on the git-annex branch.	2024-06-04 14:52:03 -04:00
Joey Hess	aeedca70ca	prep release	2024-05-30 17:53:33 -04:00
Joey Hess	424afe46d7	fix incremental push to preserve existing bundle keys in manifest Also broke Manifest out to its own type with a smart constructor. Sponsored-by: mycroft on Patreon	2024-05-13 09:47:05 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	e1447dc2e2	add git bundle interface Sponsored-by: mycroft on Patreon	2024-05-07 14:22:41 -04:00
Joey Hess	c7731cdbd9	add Backend.GitRemoteAnnex Making GITBUNDLE be in the backend list allows those keys to be hashed to verify, both when git-remote-annex downloads them, and by other transfers and by git fsck. GITMANIFEST is not in the backend list, because those keys will never be stored in .git/annex/objects and can't be verified in any case. This does mean that git-annex version will include GITBUNDLE in the list of backends. Also documented these in backends.mdwn Sponsored-by: Kevin Mueller on Patreon	2024-05-07 13:54:08 -04:00

1 2 3 4 5 ...

1098 commits