git-annex

Author	SHA1	Message	Date
Joey Hess	34a6db4f15	improve recovery from interrupted push On push, first try to drop all outManifest keys listed in the current manifest file, which resumes from an interrupted push that didn't get a chance to delete those keys. The new manifest gets its outManifest populated with the keys that were in the old manifest, plus any of the keys that were unable to be dropped. Note that it would be possible for uploadManifest to skip dropping old keys at all. The old keys would get dropped on the next push. But it seems better to delete stuff immediately rather than waiting. And the extra work is limited to push and typically is small. A remote where dropKey always fails will result in an outManifest that grows longer and longer. It would be possible to check if the remote has appendonly = True and avoid populating the outManifest. Of course, an appendonly remote will grow with every git push anyway. And currently only Remote.GitLFS sets that, which can't be used as a git-remote-annex remote anyway.	2024-05-20 13:49:45 -04:00
Joey Hess	4ce70533e9	don't delete manifest from remote on pushEmpty I missed this in commit `3f848564ac`, the absence of a manifest prevents fetching.	2024-05-20 13:01:13 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	2dfffa0621	bugfix When pushing branch foo, we don't want to delete other tracking branches. In particular, a full push needs all the tracking branches.	2024-05-14 16:17:27 -04:00
Joey Hess	0bf72ef103	max-git-bundles config for git-remote-annex	2024-05-14 14:23:40 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	ddf05c271b	fix cloning from an annex:: remote with exporttree=yes Updating the remote list needs the config to be written to the git-annex branch, which was not done for good reasons. While it would be possible to instead use Remote.List.remoteGen without writing to the branch, I already have a plan to discard git-annex branch writes made by git-remote-annex, so the simplest fix is to write the config to the branch. Sponsored-by: k0ld on Patreon	2024-05-13 14:35:17 -04:00
Joey Hess	34eae54ff9	git-remote-annex support exporttree=yes remotes Put the annex objects in .git/annex/objects/ inside the export remote. This way, when importing from the remote, they will be filtered out. Note that, when importtree=yes, content identifiers are used, and this means that pushing to a remote updates the git-annex branch. Urk. Will need to try to prevent that later, but I already had a todo about that for other reasons. Untested! Sponsored-By: Brock Spratlen on Patreon	2024-05-13 11:48:00 -04:00
Joey Hess	3f848564ac	refuse to fetch from a remote that has no manifest Otherwise, it can be confusing to clone from a wrong url, since it fails to download a manifest and so appears as if the remote exists but is empty. Sponsored-by: Jack Hill on Patreon	2024-05-13 09:47:21 -04:00
Joey Hess	424afe46d7	fix incremental push to preserve existing bundle keys in manifest Also broke Manifest out to its own type with a smart constructor. Sponsored-by: mycroft on Patreon	2024-05-13 09:47:05 -04:00
Joey Hess	97b309b56e	extend manifest with keys to be deleted This will eventually be used to recover from an interrupted fullPush and drop the old bundle keys it was unable to delete. Sponsored-by: Luke T. Shumaker on Patreon	2024-05-13 09:09:33 -04:00
Joey Hess	4d0543932e	pushEmpty: upload empty manifest	2024-05-10 14:40:38 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	1250bb26a0	reject annex:: url that omits a uuid Such as annex::?type=foo&... I accidentially left out the uuid when creating one, and the result is it appears to clone an empty repository. So let's guard against that mistake.	2024-05-10 13:59:35 -04:00
Joey Hess	ef5e9aa082	git-remote-annex working A few bugfixes. Have not tested extensively, but a push followed by a clone worked. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-05-10 13:55:46 -04:00
Joey Hess	3039331529	git-remote-annex: incremental pushing Untested Sponsored-by: Joshua Antonishen on Patreon	2024-05-10 13:32:37 -04:00
Joey Hess	f2d17cf154	git-remote-annex: mostly implemented pushing Full pushing will probably work, but is untested. Incremental pushing is not implemented yet. While a fairly straightforward port of the shell prototype, the details of exactly how to get the objects to the remote were tricky. And the prototype did not consider how to deal with partial failures and interruptions. I've taken considerable care to make sure it always leaves things in a consistent state when interrupted or when it loses access to a remote in the middle of a push. Sponsored-by: Leon Schuermann on Patreon	2024-05-09 16:18:10 -04:00
Joey Hess	797f27ab05	handle cloning from a special remote that does not contain a git-annex branch It did not seem possible to avoid creating a git-annex branch while git-remote-annex is running. Special remotes can even store their own state in it. So instead, if it didn't exist before git-remote-annex created it, it deletes it at the end. This does possibly allow a race condition, where git-annex init and perhaps other git-annex writing commands are run, that writes to the git-annex branch, at the same time a git-remote-annex process is being run by git fetch/push with a full annex:: url. Those writes would be lost. If the repository has already been initialized before git-remote-annex, that race won't happen. So it's pretty unlikely. Sponsored-by: Graham Spencer on Patreon	2024-05-08 18:37:43 -04:00
Joey Hess	59fc2005ec	git clone support for git-remote-annex Also support using annex:: urls that specify the whole special remote config. Both of these cases need a special remote to be initialized enough to use it, which means writing to .git/config but not to the git-annex branch. When cloning, the remote is left set up in .git/config, so further use of it, by git-annex or git-remote-annex will work. When using git with an annex:: url, a temporary remote is written to .git/config, but then removed at the end. While that's a little bit ugly, the fact is that the Remote interface expects that it's ok to set git configs of the remote that is being initialized. And it's nowhere near as ugly as the alternative of making a temporary git repository and initializing the special remote in there. Cloning from a repository that does not contain a git-annex branch and then later running git-annex init is currently broken, although I've gotten most of the way there to supporting it. See cleanupInitialization FIXME. Special shout out to git clone for running gitremote-helpers with GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not set. Resulting in needing the fixupRepo hack. Sponsored-by: unqueued on Patreon	2024-05-08 17:07:33 -04:00
Joey Hess	df5011ec43	git-remote-annex: fix hang on fetch Sponsored-by: k0ld on Patreon	2024-05-07 15:34:55 -04:00
Joey Hess	cdcf2fe3a2	git-remote-annex can fetch from an existing special remote Tested using a manually populated directory special remote. Pushing is still to be done. So is fetching from special remotes configured via the annex:: url. Sponsored-by: Brock Spratlen on Patreon	2024-05-07 15:13:41 -04:00
Joey Hess	947cf1c345	back to annex:: for git-remote-annex url Oh, turns out git needs two colons to use a gitremote-helper. Ok.	2024-05-07 14:37:29 -04:00
Joey Hess	483887591d	working toward git-remote-annex using a special remote Not quite there yet. Also, changed the format of GITBUNDLE keys to use only one '-' after the UUID. A sha256 does not contain that character, so can just split at the last one. Amusingly, the sha256 will probably not actually be verified. A git bundle contains its own checksums that git uses to verify it. And if someone wanted to replace the content of a GITBUNDLE object, they could just edit the manifest to use a new one whose sha256 does verify. Sponsored-by: Nicholas Golder-Manning	2024-05-06 16:28:04 -04:00
Joey Hess	f4ba6e0c1e	add annex: url parser Changed the format of the url to use annex: rather than annex:: The reason is that in the future, might want to support an url that includes an uriAuthority part, eg: annex://foo@example.com:42/358ff77e-0bc3-11ef-bc49-872e6695c0e3?type=directory&encryption=none&directory=/mnt/foo/" To parse that foo@example.com:42 as an uriAuthority it needs to start with annex: rather than annex:: That would also need something to be done with uriAuthority, and also the uriPath (the UUID) is prefixed with "/" in that example. So the current parser won't handle that example currently. But this leaves the possibility for expansion. Sponsored-by: Joshua Antonishen on Patreon	2024-05-06 14:50:41 -04:00
Joey Hess	4b94fc371e	implement gitremote-helpers protocol parsing Sponsored-by: Leon Schuermann on Patreon	2024-05-06 14:07:27 -04:00
Joey Hess	a01d64a4ad	add git-remote-annex stub and build machinery Renamed git-remote-annex.sh, keeping it around for now for reference. Sponsored-by: Graham Spencer on Patreon	2024-05-06 13:05:58 -04:00
Joey Hess	90b389369f	fix name of gitremote-helpers The git man page has that name.	2024-05-06 12:07:05 -04:00
Gergely Risko	cb541b9ecd	Change --copies' meta parameter to NUMBER	2024-04-30 15:16:22 -04:00
Joey Hess	016d1bee88	add reregisterurl command What this can currently be used for is only to change an url from being used by a special remote to being used by the web remote. This could have been a --move-from option to registerurl. But, that would have complicated its option and --batch processing, and also would have complicated unregisterurl, which is implemented on top of Command.Registerurl. So, a separate command was actually less complicated to implement. The generic description of the command is because I want to make this command a catch-all for other url updating kind of things, if there are ever any more. Also because it was hard to come up with a good name for the specific action. I considered `git-annex moveurl`, but that seems to indicate data is perhaps actually being moved, and seems to sit at the same level as addurl and rmurl, and this command is at the plumbing level of registerurl and unregisterurl. Sponsored-by: Dartmouth College's DANDI project	2024-03-05 15:06:14 -04:00
Joey Hess	b9e147d282	Added --expected-present file matching option	2024-01-25 12:56:41 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	b55efc179a	add startAction parameter for KeySha I have a use planned for this in Command.Migrate. Sponsored-by: unqueued on Patreon	2023-12-06 13:28:02 -04:00
Joey Hess	1e31bf8122	copy/move --from-anywhere --to remote Implementation was simple because it's equivilant to --from=foo --to remote for each other remote, followed by --to remote when there's a local copy. (Or, in the edge case of --from-anywhere --to=here, it's the same as --to=here.) Note that, when the local repo does not have a copy, fromToPerform gets it from a remote, sends it to the destination, and drops the local copy. Another call to that for a second remote will notice that the dest now has a copy, and simply drop from the second remote, avoiding a second transfer. Also note that, when numcopies doesn't allow dropping it from everywhere, it will drop it from the cheapest remotes first (maybe not ideal) up to more expensive remotes, and finally from the local repo. So the local repo will generally end up holding a copy. Maybe not ideal in all cases either, but it seems no worse to do that than to end up with a copy undropped from a remote. And I'm not entirely happy with the output, eg: copy bigfile (from r3...) ok copy bigfile ok That makes sense if you think of the second line as being the same as what is output by `git-annex copy bigfile --to bar`, but it's less clear in this context. Maybe add "(from here...)"? Also the --json output doesn't have a machine-readable field for the "from" uuid, and maybe it should? Sponsored-by: Dartmouth College's DANDI project	2023-11-30 16:34:30 -04:00
Joey Hess	f3f864fc6d	findkeys: Support --largerthan and --smallerthan Sponsored-by: Brett Eisenberg on Patreon	2023-11-28 11:51:43 -04:00
Joey Hess	11cc9f1933	info: Added calculation of combined annex size of all repositories Factored out overLocationLogs from CmdLine.Seek, which can calculate this pretty fast even in a large repo. In my big repo, the time to run git-annex info went up from 1.33s to 8.5s. Note that the "backend usage" stats are for annexed files in the working tree only, not all annexed files. This new data source would let that be changed, but that would be a confusing behavior change. And I cannot retitle it either, out of fear something uses the current title (eg parsing the json). Also note that, while time says "402108maxresident" in my big repo now, up from "54092maxresident", top shows the RES constant at 64mb, and it was 48mb before. So I don't think there is a memory leak. I tried using deepseq to force full evaluation of addKeyCopies and memory use didn't change, which also says no memory leak. And indeed, not even calling addKeyCopies resulted in the same memory use. Probably the increased memory usage is buffering the stream of data from git in overLocationLogs. Sponsored-by: Brett Eisenberg on Patreon	2023-11-08 13:35:11 -04:00
Joey Hess	b0302c937d	avoid double RawFilePath conversion Minor optimisation.	2023-10-26 13:41:17 -04:00
Joey Hess	baf8e4f6ed	Override safe.bareRepository for git remotes Fix using git remotes that are bare when git is configured with safe.bareRepository = explicit Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:56:26 -04:00
Joey Hess	cbfd214993	set safe.directory when getting config for git-annex-shell or git remotes Fix more breakage caused by git's fix for CVE-2022-24765, this time involving a remote (either local or ssh) that is a repository not owned by the current user. Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:40:50 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	fa92383993	onlyingroup * Support "onlyingroup=" in preferred content expressions. * Support --onlyingroup= matching option. Sponsored-by: Jack Hill on Patreon	2023-07-31 14:43:58 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	e1fc9e204e	added git-annex satisfy This ended up having an interface like sync, rather than like get/copy/drop. That let it be implemented in terms of sync, which took a lot less code. Also, it lets it handle many of the edge cases that sync does, such as getting files that are not visible in a --hide-missing branch, and sending files to exporttree remotes. As well as being easier to implement, `git-annex satisfy myremote` makes sense as it satisfies the preferred content settings of the remote. `git-annex satisfy somefile` does not form a sentence that makes sense. So while -C can be a little bit annoying, it still makes sense to have this syntax. Note that, while I initially thought this would also satisfy numcopies, it does not. Arguably it ought to. But, sync does not send files in order to satisfy numcopies, it only sends files to satisfy preferred content. And it's important that this transfer the same files as sync does, because it will probably be used in a workflow where the user sometimes syncs and sometimes satisfies, and does not expect satisfy to do things that sync would not do. (Also opened a new bug that also affects sync et all, not only this command.) Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-29 15:34:53 -04:00
Joey Hess	114a2d7504	Fix display when run with -J1 Commit `b6642dde8a` broke it by enabling non-concurrent display mode while leaving concurrency set in the config and having already started concurrency earlier. (I don't actually know if that commit was a good idea.) Sponsored-By: Brett Eisenberg on Patreon	2023-06-15 10:07:54 -04:00
Joey Hess	c4ad9b1446	Fix bug in -z handling of trailing NUL in input The obvious way to fix this would be to adapt lines to split on null. However, it's actually nontrivial to rewrite lines. In particular it has a weird implementation to avoid a space leak. See: https://gitlab.haskell.org/ghc/ghc/-/issues/4334 Also, while that is a small amount of code, it's covered by a rather complex copyright and I'd have to include that copyright in git-annex. So, I opted to filter out the trailing empty string instead. Sponsored-by: Dartmouth College's Datalad project	2023-05-19 14:34:02 -04:00
Joey Hess	e955912ad0	git-annex assist assist: New command, which is the same as git-annex sync but with new files added and content transferred by default. (Also this fixes another reversion in git-annex sync, --commit --no-commit, and --message were not enabled, oops.) See added comment for why git-annex assist does commit staged changes elsewhere in the work tree, but only adds files under the cwd. Note that it does not support --no-commit, --no-push, --no-pull like sync does. My thinking is, why should it? If you want that level of control, use git commit, git annex push, git annex pull. Sync only got those options because pull and push were not split out. Sponsored-by: k0ld on Patreon	2023-05-18 14:37:43 -04:00
Joey Hess	31d21578c3	improve description of --debugfilter	2023-05-17 12:38:22 -04:00
Joey Hess	5df89d58c7	git-annex pull and push Split out two new commands, git-annex pull and git-annex push. Those plus a git commit are equivilant to git-annex sync. In a sense, git-annex sync conflates 3 things, and it would have been better to have push and pull from the beginning and not sync. Although note that git-annex sync --content is faster than a pull followed by a push, because it only has to walk the tree once, look at preferred content once, etc. So there is some value in git-annex sync in speed, as well as user convenience. And it would be hard to split out pull and push from sync, as far as the implementaton goes. The implementation inside sync was easy, just adjust SyncOptions so it does the right thing. Note that the new commands default to syncing content, unless annex.synccontent is explicitly set to false. I'd like sync to also do that, but that's a hard transition to make. As a start to that transition, I added a note to git-annex-sync.mdwn that it may start to do so in a future version of git-annex. But a real transition would necessarily involve displaying warnings when sync is used without --content, and time. Sponsored-by: Kevin Mueller on Patreon	2023-05-16 16:51:07 -04:00
Joey Hess	271f3b1ab4	uninit: Support --json and --json-error-messages Had to convert uninit to do everything that can error out inside a CommandStart. This was harder than feels nice. (Also, in passing, converted CommandCheck to use a data type, not a weird number that it was not clear how it managed to be unique.) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-11 13:43:02 -04:00
Joey Hess	be36e208c2	json object for FileNotFound When a nonexistant file is passed to a command and --json-error-messages is enabled, output a JSON object indicating the problem. (But git ls-files --error-unmatch still displays errors about such files in some situations.) I don't like the duplication of the name of the command introduced by this, but I can't see a great way around it. One way would be to pass the Command instead. When json is not enabled, the stderr is unchanged. This is necessary because some commands like find have custom output. So dislaying "find foo not found" would be wrong. So had to complicate things with toplevelFileProblem having different output with and without json. When not using --json-error-messages but still using --json, it displays the error to stderr, but does display a json object without the error. It does have an errorid though. Unsure how useful that behavior is. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 19:26:20 -04:00
Joey Hess	91ba0cc7fd	Revert "--json-exceptions" This reverts commit `a325524454`. Turns out this was predicated on an incorrect belief that json output didn't already sometimes lack the "key" field. Since json output already can when `giveup` was used, it seems unncessary to add a whole new option for this.	2023-04-25 17:37:34 -04:00

1 2 3 4 5 ...

444 commits