git-annex

Author	SHA1	Message	Date
Joey Hess	169fd414eb	git-remote-annex: use annexLocationsBare There was no good reason for it to be using annexLocationsNonBare, and exporttree=yes annexobjects=yes is going to use annexLocationsBare, so this should as well for consistency. Since all returned ExportLocations are tried when retrieving objects, this won't break backwards compatability.	2024-08-02 13:13:44 -04:00
Joey Hess	1243af4a18	toward SafeDropProof expiry checking Added Maybe POSIXTime to SafeDropProof, which gets set when the proof is based on a LockedCopy. If there are several LockedCopies, it uses the closest expiry time. That is not optimal, it may be that the proof expires based on one LockedCopy but another one has not expired. But that seems unlikely to really happen, and anyway the user can just re-run a drop if it fails due to expiry. Pass the SafeDropProof to removeKey, which is responsible for checking it for expiry in situations where that could be a problem. Which really only means in Remote.Git. Made Remote.Git check expiry when dropping from a local remote. Checking expiry when dropping from a P2P remote is not yet implemented. P2P.Protocol.remove has SafeDropProof plumbed through to it for that purpose. Fixing the remaining 2 build warnings should complete this work. Note that the use of a POSIXTime here means that if the clock gets set forward while git-annex is in the middle of a drop, it may say that dropping took too long. That seems ok. Less ok is that if the clock gets turned back a sufficient amount (eg 5 minutes), proof expiry won't be noticed. It might be better to use the Monotonic clock, but that doesn't advance when a laptop is suspended, and while there is the linux Boottime clock, that is not available on other systems. Perhaps a combination of POSIXTime and the Monotonic clock could detect laptop suspension and also detect clock being turned back? There is a potential future flag day where p2pDefaultLockContentRetentionDuration is not assumed, but is probed using the P2P protocol, and peers that don't support it can no longer produce a LockedCopy. Until that happens, when git-annex is communicating with older peers there is a risk of data loss when a ssh connection closes during LOCKCONTENT.	2024-07-04 12:39:06 -04:00
Joey Hess	780367200b	remove dead nodes when loading the cluster log This is to avoid inserting a cluster uuid into the location log when only dead nodes in the cluster contain the content of a key. One reason why this is necessary is Remote.keyLocations, which excludes dead repositories from the list. But there are probably many more. Implementing this was challenging, because Logs.Location importing Logs.Cluster which imports Logs.Trust which imports Remote.List resulted in an import cycle through several other modules. Resorted to making Logs.Location not import Logs.Cluster, and instead it assumes that Annex.clusters gets populated when necessary before it's called. That's done in Annex.Startup, which is run by the git-annex command (but not other commands) at early startup in initialized repos. Or, is run after initialization. Note that is Remote.Git, it is unable to import Annex.Startup, because Remote.Git importing Logs.Cluster leads the the same import cycle. So ensureInitialized is not passed annexStartup in there. Other commands, like git-annex-shell currently don't run annexStartup either. So there are cases where Logs.Location will not see clusters. So it won't add any cluster UUIDs when loading the log. That's ok, the only reason to do that is to make display of where objects are located include clusters, and to make commands like git-annex get --from treat keys as being located in a cluster. git-annex-shell certainly does not do anything like that, and I'm pretty sure Remote.Git (and callers to Remote.Git.onLocalRepo) don't either.	2024-06-16 14:39:44 -04:00
Joey Hess	0155abfba4	git-remote-annex: Support urls like annex::https://example.com/foo-repo Using the usual url download machinery even allows these urls to need http basic auth, which is prompted for with git-credential. Which opens the possibility for urls that contain a secret to be used, eg the cipher for encryption=shared. Although the user is currently on their own constructing such an url, I do think it would work. Limited to httpalso for now, for security reasons. Since both httpalso (and retrieving this very url) is limited by the usual annex.security.allowed-ip-addresses configs, it's not possible for an attacker to make one of these urls that sets up a httpalso url that opens the garage door. Which is one class of attacks to keep in mind with this thing. It seems that there could be either a git-config that allows other types of special remotes to be set up this way, or special remotes could indicate when they are safe. I do worry that the git-config would encourage users to set it without thinking through the security implications. One remote config might be safe to access this way, but another config, for one with the same type, might not be. This will need further thought, and real-world examples to decide what to do.	2024-05-30 12:24:16 -04:00
Joey Hess	ecd3487d6d	run cleanupInitialization in all code paths This is just a good idea, I think. But it fixes this specific bug: With buggy git version 2.45.1, git clone from an annex:: url, which has a git-annex branch in it. Then in the repository, git fetch. That left .git/annex/objects/ populated with bundles, since it did not clean up. So later using git-annex failed to autoinit.	2024-05-29 12:57:10 -04:00
Joey Hess	e19916f54b	add config-uuid to annex:: url for --sameas remotes And use it to set annex-config-uuid in git config. This makes using the origin special remote work after cloning. Without the added Logs.Remote.configSet, instantiating the remote will look at the annex-config-uuid's config in the remote log, which will be empty, and so it will fail to find a special remote. The added deletion of files in the alternatejournaldir is just to make 100% sure they don't get committed to the git-annex branch. Now that they contain things that definitely should not be committed.	2024-05-29 12:50:00 -04:00
Joey Hess	e13678780c	fix perms of manifest object With the directory special remote, manifest objects uploaded by git-remote-annex were mode 600. This prevented accessing them from a httpalso special remote, for example. The directory special remote just copies the file perms. Which is fine except in this case the file perms were wrong.	2024-05-28 16:09:52 -04:00
Joey Hess	80d236b789	enable debug output When annex.debug is set, since --debug is not implemented for git-remote-annex.	2024-05-28 15:20:22 -04:00
Joey Hess	2ffe077cc2	git-remote-annex: brought back max-git-bundles config An incremental push that gets converted to a full push due to this config results in the inManifest having just one bundle in it, and the outManifest listing every other bundle. So it actually takes up more space on the special remote. But, it speeds up clone and fetch to not have to download a long series of bundles for incremental pushes.	2024-05-28 13:28:19 -04:00
Joey Hess	0975e792ea	git-remote-annex: Fix error display on clone cleanupInitialization gets run when an exception is thrown, so needs to avoid throwing exceptions itself, as that would hide the error message that the user needs to see.	2024-05-27 13:28:05 -04:00
Joey Hess	e64add7cdf	git-remote-annex: support importrree=yes remotes When exporttree=yes is also set. Probably it would also be possible to support ones with only importtree=yes, by enabling exporttree=yes for the remote only when using git-remote-annex, but let's keep this simple... I'm not sure what gets recorded in .git/annex/ state differently in the two cases that might cause a problem when doing that. Note that the full annex:: urls generated and displayed for such a remote omit the importree=yes. Which is ok, cloning from such an url uses an exporttree=remote, but the git-annex branch doesn't get written by this program, so once the real config is available from the git-annex branch, it will still function as an importree=yes remote.	2024-05-27 12:35:42 -04:00
Joey Hess	126d188812	reorg	2024-05-27 11:58:21 -04:00
Joey Hess	bb7b026b18	remove redundant call to checkSpecialRemoteProblems	2024-05-27 11:57:36 -04:00
Joey Hess	19418e81ee	git-remote-annex: Display full url when using remote with the shorthand url	2024-05-24 17:15:31 -04:00
Joey Hess	04a256a0f8	work around git "defense in depth" breakage with git clone checking for hooks This git bug also broke git-lfs, and I am confident it will be reverted in the next release. For now, cloning from an annex:: url wastes some bandwidth on the next pull by not caching bundles locally. If git doesn't fix this in the next version, I'd be tempted to rethink whether bundle objects need to be cached locally. It would be possible to instead remember which bundles have been seen and their heads, and respond to the list command with the heads, and avoid unbundling them agian in fetch. This might even be a useful performance improvement in the latter case. It would be quite a complication to a currently simple implementation though.	2024-05-24 15:49:53 -04:00
Joey Hess	6ccd09298b	convert srcref to a sha This fixes pushing a new ref that is the same as something already pushed. In findotherprereq, it compares two shas, which didn't work when one is actually not a sha but a ref. This is one of those cases where Sha being an alias for Ref makes it hard to catch mistakes. One of these days those need to be differentiated at the type level, but not today..	2024-05-24 15:33:35 -04:00
Joey Hess	cb59ec3efc	avoid duplicates building up in outManifest Happened exponentially since commit `1a3c60cc8e`	2024-05-24 15:10:56 -04:00
Joey Hess	1a3c60cc8e	git-remote-annex: avoid bundle object leakage in push race or interrupted push Locally record the manifest before uploading it or any bundles, and read it on the next push. Any bundles from the push that are not included in the currently being pushed manifest will get added to the outManifest, and so eventually get deleted. This deals with an interrupted push that is not resumed and instead something else is pushed. And it deals with a push race that overwrites the manifest. Of course, this can't help if one of those situations is followed by the local repo being deleted. But that's equivilant to doing a git-annex copy of a new annexed file to a special remote and then deleting the special repo w/o pushing. In either case the special remote ends up with a object in it that git-annex doesn't know about.	2024-05-24 12:47:32 -04:00
Joey Hess	10a60183e1	guard pushEmpty	2024-05-21 12:12:44 -04:00
Joey Hess	969da25d66	simplify	2024-05-21 12:08:34 -04:00
Joey Hess	be535caffc	fix deleting old keys in empty push	2024-05-21 11:53:03 -04:00
Joey Hess	7f768aef77	remove debug	2024-05-21 11:46:14 -04:00
Joey Hess	dc083bf8c8	fix storing outManifest	2024-05-21 11:44:47 -04:00
Joey Hess	b3d7ae51f0	fix edge case where git-annex branch does not have config for enabled special remote One way this could happen is cloning an empty special remote. A later fetch would then fail.	2024-05-21 11:27:49 -04:00
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00
Joey Hess	5f4ad2a5de	refactor	2024-05-21 09:51:19 -04:00
Joey Hess	3a38520aac	avoid interrupted push leaving remote without a manifest Added a backup manifest key, which is used if the main manifest key is not present. When uploading a new Manifest, it makes sure that it never drops one key except when the other key is present. It's entirely possible for the two manifest keys to get out of sync, due to races. The main one wins when it's present, it is possible for the main one being dropped to expose the backup one, which has a different push recorded.	2024-05-20 15:41:09 -04:00
Joey Hess	57b303148b	remove outdated note	2024-05-20 14:00:29 -04:00
Joey Hess	34a6db4f15	improve recovery from interrupted push On push, first try to drop all outManifest keys listed in the current manifest file, which resumes from an interrupted push that didn't get a chance to delete those keys. The new manifest gets its outManifest populated with the keys that were in the old manifest, plus any of the keys that were unable to be dropped. Note that it would be possible for uploadManifest to skip dropping old keys at all. The old keys would get dropped on the next push. But it seems better to delete stuff immediately rather than waiting. And the extra work is limited to push and typically is small. A remote where dropKey always fails will result in an outManifest that grows longer and longer. It would be possible to check if the remote has appendonly = True and avoid populating the outManifest. Of course, an appendonly remote will grow with every git push anyway. And currently only Remote.GitLFS sets that, which can't be used as a git-remote-annex remote anyway.	2024-05-20 13:49:45 -04:00
Joey Hess	4ce70533e9	don't delete manifest from remote on pushEmpty I missed this in commit `3f848564ac`, the absence of a manifest prevents fetching.	2024-05-20 13:01:13 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	2dfffa0621	bugfix When pushing branch foo, we don't want to delete other tracking branches. In particular, a full push needs all the tracking branches.	2024-05-14 16:17:27 -04:00
Joey Hess	0bf72ef103	max-git-bundles config for git-remote-annex	2024-05-14 14:23:40 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	ddf05c271b	fix cloning from an annex:: remote with exporttree=yes Updating the remote list needs the config to be written to the git-annex branch, which was not done for good reasons. While it would be possible to instead use Remote.List.remoteGen without writing to the branch, I already have a plan to discard git-annex branch writes made by git-remote-annex, so the simplest fix is to write the config to the branch. Sponsored-by: k0ld on Patreon	2024-05-13 14:35:17 -04:00
Joey Hess	34eae54ff9	git-remote-annex support exporttree=yes remotes Put the annex objects in .git/annex/objects/ inside the export remote. This way, when importing from the remote, they will be filtered out. Note that, when importtree=yes, content identifiers are used, and this means that pushing to a remote updates the git-annex branch. Urk. Will need to try to prevent that later, but I already had a todo about that for other reasons. Untested! Sponsored-By: Brock Spratlen on Patreon	2024-05-13 11:48:00 -04:00
Joey Hess	3f848564ac	refuse to fetch from a remote that has no manifest Otherwise, it can be confusing to clone from a wrong url, since it fails to download a manifest and so appears as if the remote exists but is empty. Sponsored-by: Jack Hill on Patreon	2024-05-13 09:47:21 -04:00
Joey Hess	424afe46d7	fix incremental push to preserve existing bundle keys in manifest Also broke Manifest out to its own type with a smart constructor. Sponsored-by: mycroft on Patreon	2024-05-13 09:47:05 -04:00
Joey Hess	97b309b56e	extend manifest with keys to be deleted This will eventually be used to recover from an interrupted fullPush and drop the old bundle keys it was unable to delete. Sponsored-by: Luke T. Shumaker on Patreon	2024-05-13 09:09:33 -04:00
Joey Hess	4d0543932e	pushEmpty: upload empty manifest	2024-05-10 14:40:38 -04:00
Joey Hess	1250bb26a0	reject annex:: url that omits a uuid Such as annex::?type=foo&... I accidentially left out the uuid when creating one, and the result is it appears to clone an empty repository. So let's guard against that mistake.	2024-05-10 13:59:35 -04:00
Joey Hess	ef5e9aa082	git-remote-annex working A few bugfixes. Have not tested extensively, but a push followed by a clone worked. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-05-10 13:55:46 -04:00
Joey Hess	3039331529	git-remote-annex: incremental pushing Untested Sponsored-by: Joshua Antonishen on Patreon	2024-05-10 13:32:37 -04:00
Joey Hess	f2d17cf154	git-remote-annex: mostly implemented pushing Full pushing will probably work, but is untested. Incremental pushing is not implemented yet. While a fairly straightforward port of the shell prototype, the details of exactly how to get the objects to the remote were tricky. And the prototype did not consider how to deal with partial failures and interruptions. I've taken considerable care to make sure it always leaves things in a consistent state when interrupted or when it loses access to a remote in the middle of a push. Sponsored-by: Leon Schuermann on Patreon	2024-05-09 16:18:10 -04:00
Joey Hess	797f27ab05	handle cloning from a special remote that does not contain a git-annex branch It did not seem possible to avoid creating a git-annex branch while git-remote-annex is running. Special remotes can even store their own state in it. So instead, if it didn't exist before git-remote-annex created it, it deletes it at the end. This does possibly allow a race condition, where git-annex init and perhaps other git-annex writing commands are run, that writes to the git-annex branch, at the same time a git-remote-annex process is being run by git fetch/push with a full annex:: url. Those writes would be lost. If the repository has already been initialized before git-remote-annex, that race won't happen. So it's pretty unlikely. Sponsored-by: Graham Spencer on Patreon	2024-05-08 18:37:43 -04:00
Joey Hess	59fc2005ec	git clone support for git-remote-annex Also support using annex:: urls that specify the whole special remote config. Both of these cases need a special remote to be initialized enough to use it, which means writing to .git/config but not to the git-annex branch. When cloning, the remote is left set up in .git/config, so further use of it, by git-annex or git-remote-annex will work. When using git with an annex:: url, a temporary remote is written to .git/config, but then removed at the end. While that's a little bit ugly, the fact is that the Remote interface expects that it's ok to set git configs of the remote that is being initialized. And it's nowhere near as ugly as the alternative of making a temporary git repository and initializing the special remote in there. Cloning from a repository that does not contain a git-annex branch and then later running git-annex init is currently broken, although I've gotten most of the way there to supporting it. See cleanupInitialization FIXME. Special shout out to git clone for running gitremote-helpers with GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not set. Resulting in needing the fixupRepo hack. Sponsored-by: unqueued on Patreon	2024-05-08 17:07:33 -04:00
Joey Hess	df5011ec43	git-remote-annex: fix hang on fetch Sponsored-by: k0ld on Patreon	2024-05-07 15:34:55 -04:00
Joey Hess	cdcf2fe3a2	git-remote-annex can fetch from an existing special remote Tested using a manually populated directory special remote. Pushing is still to be done. So is fetching from special remotes configured via the annex:: url. Sponsored-by: Brock Spratlen on Patreon	2024-05-07 15:13:41 -04:00
Joey Hess	947cf1c345	back to annex:: for git-remote-annex url Oh, turns out git needs two colons to use a gitremote-helper. Ok.	2024-05-07 14:37:29 -04:00
Joey Hess	483887591d	working toward git-remote-annex using a special remote Not quite there yet. Also, changed the format of GITBUNDLE keys to use only one '-' after the UUID. A sha256 does not contain that character, so can just split at the last one. Amusingly, the sha256 will probably not actually be verified. A git bundle contains its own checksums that git uses to verify it. And if someone wanted to replace the content of a GITBUNDLE object, they could just edit the manifest to use a new one whose sha256 does verify. Sponsored-by: Nicholas Golder-Manning	2024-05-06 16:28:04 -04:00

1 2

53 commits