git-annex

Author	SHA1	Message	Date
Joey Hess	8239824d92	consistently omit clusters when calculating RepoSizes updateRepoSize is only called on the UUID of a repository, not any cluster it might be a node of. But overLocationLogs and overLocationLogsJournal were inclusing cluster UUIDs. So it was inconsistent. Currently I don't see any reason to calculate RepoSize for a cluster. It's not even clear what it should mean, the total size of all nodes, or the amount of information stored in the cluster in total?	2024-08-17 11:24:14 -04:00
Joey Hess	8ac2685b33	calcBranchRepoSizes without journal files This will be used to prime the RepoSizes database, which will always contain values that correpond to information in the git-annex branch, so without anything from journal files. Factored out overJournalFileContents which will later be used to update Annex.reposizes to include information from journal files. This will be partitcularly important to support private UUIDs which only ever get to journal files and not to the branch.	2024-08-14 03:19:30 -04:00
Joey Hess	5afbea25e7	avoid counting size of keys that are in the journal twice In calcRepoSizes and also git-annex info, when a key was in the journal, it was passed to the callback twice, so the calculated size was wrong.	2024-08-13 13:23:39 -04:00
Joey Hess	467d80101a	improve handling of unmerged git-annex branches in readonly repo git-annex info was displaying a message that didn't make sense in context. In calcRepoSizes, it seems better to return the information from the git-annex branch, rather than giving up. Especially since balanced preferred content uses it, and we can't just give up evaluating a preferred content expression if git-annex is to be usable in such a readonly repo. Commit `6d7ecd9e5d` nobly wanted git-annex to behave the same with such unmerged branches as it does when it can merge them. But for the purposes of preferred content, it seems to me there's a sense that such an unmerged branch is the same as a remote we have not pulled from. The balanced preferred content will either way operate under outdated information, and so make not the best choices.	2024-08-13 13:13:12 -04:00
Joey Hess	1265d7e5df	implement maxsize log and command * maxsize: New command to tell git-annex how large the expected maximum size of a repository is. * vicfg: Include maxsize configuration.	2024-08-11 15:41:26 -04:00
Joey Hess	3ce2e95a5f	balanced preferred content and --rebalance This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.	2024-08-09 14:16:09 -04:00
Joey Hess	169fd414eb	git-remote-annex: use annexLocationsBare There was no good reason for it to be using annexLocationsNonBare, and exporttree=yes annexobjects=yes is going to use annexLocationsBare, so this should as well for consistency. Since all returned ExportLocations are tried when retrieving objects, this won't break backwards compatability.	2024-08-02 13:13:44 -04:00
Joey Hess	3d14e2cf58	http server support for proxies, incomplete Refactored git-annex-shell code so this can use checkCanProxy'. At this point all that remains is opening a proxy connection, and using a proxy connection.	2024-07-25 13:19:24 -04:00
Joey Hess	f9b7ce7224	add Annex worker pool to P2PHttp This will be needed for get and store, since those need to run Annex actions. withLocalP2PConnections will also probably use it.	2024-07-10 12:19:47 -04:00
Joey Hess	86ce3bf1e4	started servant implementation of HTTP P2P protocol	2024-07-07 12:08:10 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	0033e6c0a6	Tab completion of many commands like info and trust now includes remotes Especially useful with proxied remotes and clusters, where the user may not be entirely familiar with the name and can learn by tab completion.	2024-06-30 12:39:18 -04:00
Joey Hess	07e899c9d3	git-annex-shell: proxy nodes located beyond remote cluster gateways Walking a tightrope between security and convenience here, because git-annex-shell needs to only proxy for things when there has been an explicit, local action to configure them. In this case, the user has to have run `git-annex extendcluster`, which now sets annex-cluster-gateway on the remote. Note that any repositories that the gateway is recorded to proxy for will be proxied onward. This is not limited to cluster nodes, because checking the node log would not add any security; someone could add any uuid to it. The gateway of course then does its own checking to determine if it will allow proxying for the remote.	2024-06-26 12:56:16 -04:00
Joey Hess	0b72b85df5	added git-annex extendcluster This works, but updatecluster does not work yet in multi-gateway clusters, nor do gateways relay to other gateways.	2024-06-26 10:26:54 -04:00
Joey Hess	cec2848e8a	support annex.jobs for clusters	2024-06-25 14:54:20 -04:00
Joey Hess	f916ce4b68	allow proxying to remotes that are nodes of clusters fixes reversion in `ca08f3fcc2`	2024-06-18 17:02:23 -04:00
Joey Hess	ca08f3fcc2	only proxy to a remote when remote.name.annex-proxy is set Avoids someone writing to proxy.log and gaining access to remotes of someone else's repository that they were not intended to be able to proxy to.	2024-06-18 11:43:10 -04:00
Joey Hess	291280ced2	started on git-annex-shell cluster support Works down to P2P protocol. The question now is, how to handle protocol version negotiation for clusters? Connecting to each node to find their protocol versions and using the lowest would be too expensive with a lot of nodes. So it seems that the cluster needs to pick its own protocol version to use with the client. Then it can either negotiate that same version with the nodes when it comes time to use them, or it can translate between multiple protocol versions. That seems complicated. Thinking it would be ok to refuse to use a node if it is not able to negotiate the same protocol version with it as with the client. That will mean that sometimes need nodes to be upgraded when upgrading the cluster's proxy. But protocol versions rarely change.	2024-06-17 15:10:04 -04:00
Joey Hess	3970bbb03b	Merge branch 'master' into proxy	2024-06-17 09:29:34 -04:00
Joey Hess	af79728ac3	tab complete special remotes An oversight.. And with the work in progress proxy and cluster, there can be additional remotes that are not listed in .git/config, but are available. Making those more discoverable is another big benefit of this.	2024-06-17 09:26:03 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	570ceffe8d	broke out initcluster One benefit of this is that a typo in annex-cluster-node config won't init a new cluster. Also it gets the cluster description set and is consistent with initremote.	2024-06-14 17:23:11 -04:00
Joey Hess	bbf261487d	add git-annex updatecluster command Seems to work fine, making the right changes to the git-annex branch.	2024-06-14 15:02:01 -04:00
Joey Hess	60e63fb85b	enable proxying for git-annex-shell p2pstdio	2024-06-11 13:07:04 -04:00
Joey Hess	649b87bedd	Merge branch 'master' into proxy	2024-06-10 14:26:18 -04:00
Joey Hess	9a8391078a	git-annex-shell: block relay requests connRepo is only used when relaying git upload-pack and receive-pack. That's only supposed to be used when git-annex-remotedaemon is serving git-remote-tor-annex connections over tor. But, it was always set, and so could be used in other places possibly. Fixed by making connRepo optional in the P2P protocol interface. In Command.EnableTor, it's not needed, because it only speaks the protocol in order to check that it's able to connect back to itself via the hidden service. So changed that to pass Nothing rather than the git repo. In Remote.Helper.Ssh, it's connecting to git-annex-shell p2pstdio, so is making the requests, so will never need connRepo. In git-annex-shell p2pstdio, it was accepting git upload-pack and receive-pack requests over the P2P protocol, even though nothing sent them. This is arguably a security hole, particularly if the user has set environment variables like GIT_ANNEX_SHELL_LIMITED to prevent git push/pull via git-annex-shell.	2024-06-10 14:16:27 -04:00
Joey Hess	d2576e5f1a	git-annex-shell: accept uuid of remote that proxying is enabled for For NotifyChanges and also for the fallthrough case where git-annex-shell passes a command off to git-shell, proxying is currently ignored. So every remote that is accessed via a proxy will be treated as the same git repository. Every other command listed in cmdsMap will need to check if Annex.proxyremote is set, and if so handle the proxying appropriately. Probably only P2PStdio will need to support proxying. For now, everything else refuses to work when proxying. The part of that I don't like is that there's the possibility a command later gets added to the list that doesn't check proxying. When proxying is not enabled, it's important that git-annex-shell not leak information that it would not have exposed before. Such as the names or uuids of remotes. I decided that, in the case where a repository used to have proxying enabled, but no longer supports any proxies, it's ok to give the user a clear error message indicating that proxying is not configured, rather than a confusing uuid mismatch message. Similarly, if a repository has proxying enabled, but not for the requested repository, give a clear error message. A tricky thing here is how to handle the case where there is more than one remote, with proxying enabled, with the specified uuid. One way to handle that would be to plumb the proxyRemoteName all the way through from the remote git-annex to git-annex-shell, eg as a field, and use only a remote with the same name. That would be very intrusive though. Instead, I decided to let the proxy pick which remote it uses to access a given Remote. And so it picks the least expensive one. The client after all doesn't necessarily know any details about the proxy's configuration. This does mean though, that if the least expensive remote is not accessible, but another remote would have worked, an access via the proxy will fail.	2024-06-10 12:44:35 -04:00
Joey Hess	f97f4b8bdb	Added updateproxy command and remote.name.annex-proxy configuration So far this only records proxy information on the git-annex branch.	2024-06-04 14:52:03 -04:00
Joey Hess	0155abfba4	git-remote-annex: Support urls like annex::https://example.com/foo-repo Using the usual url download machinery even allows these urls to need http basic auth, which is prompted for with git-credential. Which opens the possibility for urls that contain a secret to be used, eg the cipher for encryption=shared. Although the user is currently on their own constructing such an url, I do think it would work. Limited to httpalso for now, for security reasons. Since both httpalso (and retrieving this very url) is limited by the usual annex.security.allowed-ip-addresses configs, it's not possible for an attacker to make one of these urls that sets up a httpalso url that opens the garage door. Which is one class of attacks to keep in mind with this thing. It seems that there could be either a git-config that allows other types of special remotes to be set up this way, or special remotes could indicate when they are safe. I do worry that the git-config would encourage users to set it without thinking through the security implications. One remote config might be safe to access this way, but another config, for one with the same type, might not be. This will need further thought, and real-world examples to decide what to do.	2024-05-30 12:24:16 -04:00
Joey Hess	ecd3487d6d	run cleanupInitialization in all code paths This is just a good idea, I think. But it fixes this specific bug: With buggy git version 2.45.1, git clone from an annex:: url, which has a git-annex branch in it. Then in the repository, git fetch. That left .git/annex/objects/ populated with bundles, since it did not clean up. So later using git-annex failed to autoinit.	2024-05-29 12:57:10 -04:00
Joey Hess	e19916f54b	add config-uuid to annex:: url for --sameas remotes And use it to set annex-config-uuid in git config. This makes using the origin special remote work after cloning. Without the added Logs.Remote.configSet, instantiating the remote will look at the annex-config-uuid's config in the remote log, which will be empty, and so it will fail to find a special remote. The added deletion of files in the alternatejournaldir is just to make 100% sure they don't get committed to the git-annex branch. Now that they contain things that definitely should not be committed.	2024-05-29 12:50:00 -04:00
Joey Hess	e13678780c	fix perms of manifest object With the directory special remote, manifest objects uploaded by git-remote-annex were mode 600. This prevented accessing them from a httpalso special remote, for example. The directory special remote just copies the file perms. Which is fine except in this case the file perms were wrong.	2024-05-28 16:09:52 -04:00
Joey Hess	80d236b789	enable debug output When annex.debug is set, since --debug is not implemented for git-remote-annex.	2024-05-28 15:20:22 -04:00
Joey Hess	2ffe077cc2	git-remote-annex: brought back max-git-bundles config An incremental push that gets converted to a full push due to this config results in the inManifest having just one bundle in it, and the outManifest listing every other bundle. So it actually takes up more space on the special remote. But, it speeds up clone and fetch to not have to download a long series of bundles for incremental pushes.	2024-05-28 13:28:19 -04:00
Joey Hess	0975e792ea	git-remote-annex: Fix error display on clone cleanupInitialization gets run when an exception is thrown, so needs to avoid throwing exceptions itself, as that would hide the error message that the user needs to see.	2024-05-27 13:28:05 -04:00
Joey Hess	e64add7cdf	git-remote-annex: support importrree=yes remotes When exporttree=yes is also set. Probably it would also be possible to support ones with only importtree=yes, by enabling exporttree=yes for the remote only when using git-remote-annex, but let's keep this simple... I'm not sure what gets recorded in .git/annex/ state differently in the two cases that might cause a problem when doing that. Note that the full annex:: urls generated and displayed for such a remote omit the importree=yes. Which is ok, cloning from such an url uses an exporttree=remote, but the git-annex branch doesn't get written by this program, so once the real config is available from the git-annex branch, it will still function as an importree=yes remote.	2024-05-27 12:35:42 -04:00
Joey Hess	126d188812	reorg	2024-05-27 11:58:21 -04:00
Joey Hess	bb7b026b18	remove redundant call to checkSpecialRemoteProblems	2024-05-27 11:57:36 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	04a256a0f8	work around git "defense in depth" breakage with git clone checking for hooks This git bug also broke git-lfs, and I am confident it will be reverted in the next release. For now, cloning from an annex:: url wastes some bandwidth on the next pull by not caching bundles locally. If git doesn't fix this in the next version, I'd be tempted to rethink whether bundle objects need to be cached locally. It would be possible to instead remember which bundles have been seen and their heads, and respond to the list command with the heads, and avoid unbundling them agian in fetch. This might even be a useful performance improvement in the latter case. It would be quite a complication to a currently simple implementation though.	2024-05-24 15:49:53 -04:00
Joey Hess	6ccd09298b	convert srcref to a sha This fixes pushing a new ref that is the same as something already pushed. In findotherprereq, it compares two shas, which didn't work when one is actually not a sha but a ref. This is one of those cases where Sha being an alias for Ref makes it hard to catch mistakes. One of these days those need to be differentiated at the type level, but not today..	2024-05-24 15:33:35 -04:00
Joey Hess	cb59ec3efc	avoid duplicates building up in outManifest Happened exponentially since commit `1a3c60cc8e`	2024-05-24 15:10:56 -04:00
Joey Hess	1a3c60cc8e	git-remote-annex: avoid bundle object leakage in push race or interrupted push Locally record the manifest before uploading it or any bundles, and read it on the next push. Any bundles from the push that are not included in the currently being pushed manifest will get added to the outManifest, and so eventually get deleted. This deals with an interrupted push that is not resumed and instead something else is pushed. And it deals with a push race that overwrites the manifest. Of course, this can't help if one of those situations is followed by the local repo being deleted. But that's equivilant to doing a git-annex copy of a new annexed file to a special remote and then deleting the special repo w/o pushing. In either case the special remote ends up with a object in it that git-annex doesn't know about.	2024-05-24 12:47:32 -04:00
Joey Hess	10a60183e1	guard pushEmpty	2024-05-21 12:12:44 -04:00
Joey Hess	969da25d66	simplify	2024-05-21 12:08:34 -04:00
Joey Hess	be535caffc	fix deleting old keys in empty push	2024-05-21 11:53:03 -04:00
Joey Hess	7f768aef77	remove debug	2024-05-21 11:46:14 -04:00
Joey Hess	dc083bf8c8	fix storing outManifest	2024-05-21 11:44:47 -04:00
Joey Hess	b3d7ae51f0	fix edge case where git-annex branch does not have config for enabled special remote One way this could happen is cloning an empty special remote. A later fetch would then fail.	2024-05-21 11:27:49 -04:00
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00

1 2 3 4 5 ...

497 commits