git-annex

Author	SHA1	Message	Date
Joey Hess	3e7324bbcb	only delete bundles on pushEmpty This avoids some apparently otherwise unsolveable problems involving races that resulted in the manifest listing bundles that were deleted. Removed the annex-max-git-bundles config because it can't actually result in deleting old bundles. It would still be possible to have a config that controls how often to do a full push, which would avoid needing to download too many bundles on clone, as well as needing to checkpresent too many bundles in verifyManifest. But it would need a different name and description.	2024-05-21 11:13:27 -04:00
Joey Hess	5f4ad2a5de	refactor	2024-05-21 09:51:19 -04:00
Joey Hess	3a38520aac	avoid interrupted push leaving remote without a manifest Added a backup manifest key, which is used if the main manifest key is not present. When uploading a new Manifest, it makes sure that it never drops one key except when the other key is present. It's entirely possible for the two manifest keys to get out of sync, due to races. The main one wins when it's present, it is possible for the main one being dropped to expose the backup one, which has a different push recorded.	2024-05-20 15:41:09 -04:00
Joey Hess	57b303148b	remove outdated note	2024-05-20 14:00:29 -04:00
Joey Hess	34a6db4f15	improve recovery from interrupted push On push, first try to drop all outManifest keys listed in the current manifest file, which resumes from an interrupted push that didn't get a chance to delete those keys. The new manifest gets its outManifest populated with the keys that were in the old manifest, plus any of the keys that were unable to be dropped. Note that it would be possible for uploadManifest to skip dropping old keys at all. The old keys would get dropped on the next push. But it seems better to delete stuff immediately rather than waiting. And the extra work is limited to push and typically is small. A remote where dropKey always fails will result in an outManifest that grows longer and longer. It would be possible to check if the remote has appendonly = True and avoid populating the outManifest. Of course, an appendonly remote will grow with every git push anyway. And currently only Remote.GitLFS sets that, which can't be used as a git-remote-annex remote anyway.	2024-05-20 13:49:45 -04:00
Joey Hess	4ce70533e9	don't delete manifest from remote on pushEmpty I missed this in commit `3f848564ac`, the absence of a manifest prevents fetching.	2024-05-20 13:01:13 -04:00
Joey Hess	adcebbae47	clean up git-remote-annex git-annex branch handling Implemented alternateJournal, which git-remote-annex uses to avoid any writes to the git-annex branch while setting up a special remote from an annex:: url. That prevents the remote.log from being overwritten with the special remote configuration from the url, which might not be 100% the same as the existing special remote configuration. And it prevents an overwrite deleting of other stuff that was already in the remote.log. Also, when the branch was created by git-remote-annex, only delete it at the end if nothing else has been written to it by another command. This fixes the race condition described in `797f27ab05`, where git-remote-annex set up the branch and git-annex init and other commands were run at the same time and their writes to the branch were lost.	2024-05-15 17:33:38 -04:00
Joey Hess	2dfffa0621	bugfix When pushing branch foo, we don't want to delete other tracking branches. In particular, a full push needs all the tracking branches.	2024-05-14 16:17:27 -04:00
Joey Hess	0bf72ef103	max-git-bundles config for git-remote-annex	2024-05-14 14:23:40 -04:00
Joey Hess	6f1039900d	prevent using git-remote-annex with unsuitable special remote configs I hope to support importtree=yes eventually, but it does not currently work. Added remote.<name>.allow-encrypted-gitrepo that needs to be set to allow using it with encrypted git repos. Note that even encryption=pubkey uses a cipher stored in the git repo to encrypt the keys stored in the remote. While it would be possible to not encrypt the GITBUNDLE and GITMANIFEST keys, and then allow using encryption=pubkey, it doesn't currently work, and that would be a complication that I doubt is worth it.	2024-05-14 13:52:20 -04:00
Joey Hess	ddf05c271b	fix cloning from an annex:: remote with exporttree=yes Updating the remote list needs the config to be written to the git-annex branch, which was not done for good reasons. While it would be possible to instead use Remote.List.remoteGen without writing to the branch, I already have a plan to discard git-annex branch writes made by git-remote-annex, so the simplest fix is to write the config to the branch. Sponsored-by: k0ld on Patreon	2024-05-13 14:35:17 -04:00
Joey Hess	34eae54ff9	git-remote-annex support exporttree=yes remotes Put the annex objects in .git/annex/objects/ inside the export remote. This way, when importing from the remote, they will be filtered out. Note that, when importtree=yes, content identifiers are used, and this means that pushing to a remote updates the git-annex branch. Urk. Will need to try to prevent that later, but I already had a todo about that for other reasons. Untested! Sponsored-By: Brock Spratlen on Patreon	2024-05-13 11:48:00 -04:00
Joey Hess	3f848564ac	refuse to fetch from a remote that has no manifest Otherwise, it can be confusing to clone from a wrong url, since it fails to download a manifest and so appears as if the remote exists but is empty. Sponsored-by: Jack Hill on Patreon	2024-05-13 09:47:21 -04:00
Joey Hess	424afe46d7	fix incremental push to preserve existing bundle keys in manifest Also broke Manifest out to its own type with a smart constructor. Sponsored-by: mycroft on Patreon	2024-05-13 09:47:05 -04:00
Joey Hess	97b309b56e	extend manifest with keys to be deleted This will eventually be used to recover from an interrupted fullPush and drop the old bundle keys it was unable to delete. Sponsored-by: Luke T. Shumaker on Patreon	2024-05-13 09:09:33 -04:00
Joey Hess	4d0543932e	pushEmpty: upload empty manifest	2024-05-10 14:40:38 -04:00
Joey Hess	ff5193c6ad	Merge branch 'master' into git-remote-annex	2024-05-10 14:20:36 -04:00
Joey Hess	1250bb26a0	reject annex:: url that omits a uuid Such as annex::?type=foo&... I accidentially left out the uuid when creating one, and the result is it appears to clone an empty repository. So let's guard against that mistake.	2024-05-10 13:59:35 -04:00
Joey Hess	ef5e9aa082	git-remote-annex working A few bugfixes. Have not tested extensively, but a push followed by a clone worked. Sponsored-by: Nicholas Golder-Manning on Patreon	2024-05-10 13:55:46 -04:00
Joey Hess	3039331529	git-remote-annex: incremental pushing Untested Sponsored-by: Joshua Antonishen on Patreon	2024-05-10 13:32:37 -04:00
Joey Hess	f2d17cf154	git-remote-annex: mostly implemented pushing Full pushing will probably work, but is untested. Incremental pushing is not implemented yet. While a fairly straightforward port of the shell prototype, the details of exactly how to get the objects to the remote were tricky. And the prototype did not consider how to deal with partial failures and interruptions. I've taken considerable care to make sure it always leaves things in a consistent state when interrupted or when it loses access to a remote in the middle of a push. Sponsored-by: Leon Schuermann on Patreon	2024-05-09 16:18:10 -04:00
Joey Hess	797f27ab05	handle cloning from a special remote that does not contain a git-annex branch It did not seem possible to avoid creating a git-annex branch while git-remote-annex is running. Special remotes can even store their own state in it. So instead, if it didn't exist before git-remote-annex created it, it deletes it at the end. This does possibly allow a race condition, where git-annex init and perhaps other git-annex writing commands are run, that writes to the git-annex branch, at the same time a git-remote-annex process is being run by git fetch/push with a full annex:: url. Those writes would be lost. If the repository has already been initialized before git-remote-annex, that race won't happen. So it's pretty unlikely. Sponsored-by: Graham Spencer on Patreon	2024-05-08 18:37:43 -04:00
Joey Hess	59fc2005ec	git clone support for git-remote-annex Also support using annex:: urls that specify the whole special remote config. Both of these cases need a special remote to be initialized enough to use it, which means writing to .git/config but not to the git-annex branch. When cloning, the remote is left set up in .git/config, so further use of it, by git-annex or git-remote-annex will work. When using git with an annex:: url, a temporary remote is written to .git/config, but then removed at the end. While that's a little bit ugly, the fact is that the Remote interface expects that it's ok to set git configs of the remote that is being initialized. And it's nowhere near as ugly as the alternative of making a temporary git repository and initializing the special remote in there. Cloning from a repository that does not contain a git-annex branch and then later running git-annex init is currently broken, although I've gotten most of the way there to supporting it. See cleanupInitialization FIXME. Special shout out to git clone for running gitremote-helpers with GIT_DIR set, but not in the git repository and with GIT_WORK_TREE not set. Resulting in needing the fixupRepo hack. Sponsored-by: unqueued on Patreon	2024-05-08 17:07:33 -04:00
Joey Hess	df5011ec43	git-remote-annex: fix hang on fetch Sponsored-by: k0ld on Patreon	2024-05-07 15:34:55 -04:00
Joey Hess	cdcf2fe3a2	git-remote-annex can fetch from an existing special remote Tested using a manually populated directory special remote. Pushing is still to be done. So is fetching from special remotes configured via the annex:: url. Sponsored-by: Brock Spratlen on Patreon	2024-05-07 15:13:41 -04:00
Joey Hess	947cf1c345	back to annex:: for git-remote-annex url Oh, turns out git needs two colons to use a gitremote-helper. Ok.	2024-05-07 14:37:29 -04:00
Joey Hess	483887591d	working toward git-remote-annex using a special remote Not quite there yet. Also, changed the format of GITBUNDLE keys to use only one '-' after the UUID. A sha256 does not contain that character, so can just split at the last one. Amusingly, the sha256 will probably not actually be verified. A git bundle contains its own checksums that git uses to verify it. And if someone wanted to replace the content of a GITBUNDLE object, they could just edit the manifest to use a new one whose sha256 does verify. Sponsored-by: Nicholas Golder-Manning	2024-05-06 16:28:04 -04:00
Joey Hess	f4ba6e0c1e	add annex: url parser Changed the format of the url to use annex: rather than annex:: The reason is that in the future, might want to support an url that includes an uriAuthority part, eg: annex://foo@example.com:42/358ff77e-0bc3-11ef-bc49-872e6695c0e3?type=directory&encryption=none&directory=/mnt/foo/" To parse that foo@example.com:42 as an uriAuthority it needs to start with annex: rather than annex:: That would also need something to be done with uriAuthority, and also the uriPath (the UUID) is prefixed with "/" in that example. So the current parser won't handle that example currently. But this leaves the possibility for expansion. Sponsored-by: Joshua Antonishen on Patreon	2024-05-06 14:50:41 -04:00
Joey Hess	4b94fc371e	implement gitremote-helpers protocol parsing Sponsored-by: Leon Schuermann on Patreon	2024-05-06 14:07:27 -04:00
Joey Hess	a01d64a4ad	add git-remote-annex stub and build machinery Renamed git-remote-annex.sh, keeping it around for now for reference. Sponsored-by: Graham Spencer on Patreon	2024-05-06 13:05:58 -04:00
Joey Hess	90b389369f	fix name of gitremote-helpers The git man page has that name.	2024-05-06 12:07:05 -04:00
Gergely Risko	cb541b9ecd	Change --copies' meta parameter to NUMBER	2024-04-30 15:16:22 -04:00
Joey Hess	016d1bee88	add reregisterurl command What this can currently be used for is only to change an url from being used by a special remote to being used by the web remote. This could have been a --move-from option to registerurl. But, that would have complicated its option and --batch processing, and also would have complicated unregisterurl, which is implemented on top of Command.Registerurl. So, a separate command was actually less complicated to implement. The generic description of the command is because I want to make this command a catch-all for other url updating kind of things, if there are ever any more. Also because it was hard to come up with a good name for the specific action. I considered `git-annex moveurl`, but that seems to indicate data is perhaps actually being moved, and seems to sit at the same level as addurl and rmurl, and this command is at the plumbing level of registerurl and unregisterurl. Sponsored-by: Dartmouth College's DANDI project	2024-03-05 15:06:14 -04:00
Joey Hess	b9e147d282	Added --expected-present file matching option	2024-01-25 12:56:41 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	b55efc179a	add startAction parameter for KeySha I have a use planned for this in Command.Migrate. Sponsored-by: unqueued on Patreon	2023-12-06 13:28:02 -04:00
Joey Hess	1e31bf8122	copy/move --from-anywhere --to remote Implementation was simple because it's equivilant to --from=foo --to remote for each other remote, followed by --to remote when there's a local copy. (Or, in the edge case of --from-anywhere --to=here, it's the same as --to=here.) Note that, when the local repo does not have a copy, fromToPerform gets it from a remote, sends it to the destination, and drops the local copy. Another call to that for a second remote will notice that the dest now has a copy, and simply drop from the second remote, avoiding a second transfer. Also note that, when numcopies doesn't allow dropping it from everywhere, it will drop it from the cheapest remotes first (maybe not ideal) up to more expensive remotes, and finally from the local repo. So the local repo will generally end up holding a copy. Maybe not ideal in all cases either, but it seems no worse to do that than to end up with a copy undropped from a remote. And I'm not entirely happy with the output, eg: copy bigfile (from r3...) ok copy bigfile ok That makes sense if you think of the second line as being the same as what is output by `git-annex copy bigfile --to bar`, but it's less clear in this context. Maybe add "(from here...)"? Also the --json output doesn't have a machine-readable field for the "from" uuid, and maybe it should? Sponsored-by: Dartmouth College's DANDI project	2023-11-30 16:34:30 -04:00
Joey Hess	f3f864fc6d	findkeys: Support --largerthan and --smallerthan Sponsored-by: Brett Eisenberg on Patreon	2023-11-28 11:51:43 -04:00
Joey Hess	11cc9f1933	info: Added calculation of combined annex size of all repositories Factored out overLocationLogs from CmdLine.Seek, which can calculate this pretty fast even in a large repo. In my big repo, the time to run git-annex info went up from 1.33s to 8.5s. Note that the "backend usage" stats are for annexed files in the working tree only, not all annexed files. This new data source would let that be changed, but that would be a confusing behavior change. And I cannot retitle it either, out of fear something uses the current title (eg parsing the json). Also note that, while time says "402108maxresident" in my big repo now, up from "54092maxresident", top shows the RES constant at 64mb, and it was 48mb before. So I don't think there is a memory leak. I tried using deepseq to force full evaluation of addKeyCopies and memory use didn't change, which also says no memory leak. And indeed, not even calling addKeyCopies resulted in the same memory use. Probably the increased memory usage is buffering the stream of data from git in overLocationLogs. Sponsored-by: Brett Eisenberg on Patreon	2023-11-08 13:35:11 -04:00
Joey Hess	b0302c937d	avoid double RawFilePath conversion Minor optimisation.	2023-10-26 13:41:17 -04:00
Joey Hess	baf8e4f6ed	Override safe.bareRepository for git remotes Fix using git remotes that are bare when git is configured with safe.bareRepository = explicit Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:56:26 -04:00
Joey Hess	cbfd214993	set safe.directory when getting config for git-annex-shell or git remotes Fix more breakage caused by git's fix for CVE-2022-24765, this time involving a remote (either local or ssh) that is a repository not owned by the current user. Sponsored-by: Dartmouth College's DANDI project	2023-09-07 14:40:50 -04:00
Joey Hess	cf8b30c914	oldkeys: New command that lists the keys used by old versions of a file The tricky thing about this turned out to be handling renames and reverts. For that, it has to make two passes over the git log, and to avoid buffering a possibly huge amount of logs in memory (ie the whole git log of an entire repository!), runs git log twice. (It might be possible to speed this up by asking git log to show a diff, and so avoid needing to use catKey.) Sponsored-By: Brock Spratlen on Patreon	2023-08-22 14:51:06 -04:00
Joey Hess	fa92383993	onlyingroup * Support "onlyingroup=" in preferred content expressions. * Support --onlyingroup= matching option. Sponsored-by: Jack Hill on Patreon	2023-07-31 14:43:58 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	e1fc9e204e	added git-annex satisfy This ended up having an interface like sync, rather than like get/copy/drop. That let it be implemented in terms of sync, which took a lot less code. Also, it lets it handle many of the edge cases that sync does, such as getting files that are not visible in a --hide-missing branch, and sending files to exporttree remotes. As well as being easier to implement, `git-annex satisfy myremote` makes sense as it satisfies the preferred content settings of the remote. `git-annex satisfy somefile` does not form a sentence that makes sense. So while -C can be a little bit annoying, it still makes sense to have this syntax. Note that, while I initially thought this would also satisfy numcopies, it does not. Arguably it ought to. But, sync does not send files in order to satisfy numcopies, it only sends files to satisfy preferred content. And it's important that this transfer the same files as sync does, because it will probably be used in a workflow where the user sometimes syncs and sometimes satisfies, and does not expect satisfy to do things that sync would not do. (Also opened a new bug that also affects sync et all, not only this command.) Sponsored-by: Nicholas Golder-Manning on Patreon	2023-06-29 15:34:53 -04:00
Joey Hess	114a2d7504	Fix display when run with -J1 Commit `b6642dde8a` broke it by enabling non-concurrent display mode while leaving concurrency set in the config and having already started concurrency earlier. (I don't actually know if that commit was a good idea.) Sponsored-By: Brett Eisenberg on Patreon	2023-06-15 10:07:54 -04:00
Joey Hess	c4ad9b1446	Fix bug in -z handling of trailing NUL in input The obvious way to fix this would be to adapt lines to split on null. However, it's actually nontrivial to rewrite lines. In particular it has a weird implementation to avoid a space leak. See: https://gitlab.haskell.org/ghc/ghc/-/issues/4334 Also, while that is a small amount of code, it's covered by a rather complex copyright and I'd have to include that copyright in git-annex. So, I opted to filter out the trailing empty string instead. Sponsored-by: Dartmouth College's Datalad project	2023-05-19 14:34:02 -04:00
Joey Hess	e955912ad0	git-annex assist assist: New command, which is the same as git-annex sync but with new files added and content transferred by default. (Also this fixes another reversion in git-annex sync, --commit --no-commit, and --message were not enabled, oops.) See added comment for why git-annex assist does commit staged changes elsewhere in the work tree, but only adds files under the cwd. Note that it does not support --no-commit, --no-push, --no-pull like sync does. My thinking is, why should it? If you want that level of control, use git commit, git annex push, git annex pull. Sync only got those options because pull and push were not split out. Sponsored-by: k0ld on Patreon	2023-05-18 14:37:43 -04:00
Joey Hess	31d21578c3	improve description of --debugfilter	2023-05-17 12:38:22 -04:00
Joey Hess	5df89d58c7	git-annex pull and push Split out two new commands, git-annex pull and git-annex push. Those plus a git commit are equivilant to git-annex sync. In a sense, git-annex sync conflates 3 things, and it would have been better to have push and pull from the beginning and not sync. Although note that git-annex sync --content is faster than a pull followed by a push, because it only has to walk the tree once, look at preferred content once, etc. So there is some value in git-annex sync in speed, as well as user convenience. And it would be hard to split out pull and push from sync, as far as the implementaton goes. The implementation inside sync was easy, just adjust SyncOptions so it does the right thing. Note that the new commands default to syncing content, unless annex.synccontent is explicitly set to false. I'd like sync to also do that, but that's a hard transition to make. As a start to that transition, I added a note to git-annex-sync.mdwn that it may start to do so in a future version of git-annex. But a real transition would necessarily involve displaying warnings when sync is used without --content, and time. Sponsored-by: Kevin Mueller on Patreon	2023-05-16 16:51:07 -04:00
Joey Hess	271f3b1ab4	uninit: Support --json and --json-error-messages Had to convert uninit to do everything that can error out inside a CommandStart. This was harder than feels nice. (Also, in passing, converted CommandCheck to use a data type, not a weird number that it was not clear how it managed to be unique.) Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-05-11 13:43:02 -04:00
Joey Hess	be36e208c2	json object for FileNotFound When a nonexistant file is passed to a command and --json-error-messages is enabled, output a JSON object indicating the problem. (But git ls-files --error-unmatch still displays errors about such files in some situations.) I don't like the duplication of the name of the command introduced by this, but I can't see a great way around it. One way would be to pass the Command instead. When json is not enabled, the stderr is unchanged. This is necessary because some commands like find have custom output. So dislaying "find foo not found" would be wrong. So had to complicate things with toplevelFileProblem having different output with and without json. When not using --json-error-messages but still using --json, it displays the error to stderr, but does display a json object without the error. It does have an errorid though. Unsure how useful that behavior is. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 19:26:20 -04:00
Joey Hess	91ba0cc7fd	Revert "--json-exceptions" This reverts commit `a325524454`. Turns out this was predicated on an incorrect belief that json output didn't already sometimes lack the "key" field. Since json output already can when `giveup` was used, it seems unncessary to add a whole new option for this.	2023-04-25 17:37:34 -04:00
Joey Hess	a325524454	--json-exceptions Added a --json-exceptions option, which makes some exceptions be output in json. The distinction is that --json-error-messages is for messages relating to a particular ActionItem, while --json-exceptions is for messages that are not, eg ones for a file that does not exist. It's unfortunate that we need two switches with such a fine distinction between them, but I'm worried about maintaining backwards compatability in the json output, to avoid breaking anything that parses it, and this was the way to make sure I didn't. toplevelWarning is generally used for the latter kind of message. And the other calls to toplevelWarning could be converted to showException. The only possible gotcha is that if toplevelWarning is ever called after starting acting on a file, it will add to the --json-error-messages of the json displayed for that file and converting to showException would be a behavior change. That seems unlikely, but I didn't convery everything to avoid needing to satisfy myself it was not a concern. Sponsored-by: Dartmouth College's Datalad project	2023-04-25 17:05:33 -04:00
Joey Hess	9155ed1072	configremote New command, currently limited to changing autoenable= setting of a special remote. It will probably never be used for more than that given the limitations on it. Sponsored-by: Brock Spratlen on Patreon	2023-04-18 15:30:49 -04:00
Joey Hess	3290a09a70	filter out control characters in warning messages Converted warning and similar to use StringContainingQuotedPath. Most warnings are static strings, some do refer to filepaths that need to be quoted, and others don't need quoting. Note that, since quote filters out control characters of even UnquotedString, this makes all warnings safe, even when an attacker sneaks in a control character in some other way. When json is being output, no quoting is done, since json gets its own quoting. This does, as a side effect, make warning messages in json output not be indented. The indentation is only needed to offset warning messages underneath the display of the file they apply to, so that's ok. Sponsored-by: Brett Eisenberg on Patreon	2023-04-10 15:55:44 -04:00
Joey Hess	cd544e548b	filter out control characters in error messages giveup changed to filter out control characters. (It is too low level to make it use StringContainingQuotedPath.) error still does not, but it should only be used for internal errors, where the message is not attacker-controlled. Changed a lot of existing error to giveup when it is not strictly an internal error. Of course, other exceptions can still be thrown, either by code in git-annex, or a library, that include some attacker-controlled value. This does not guard against those. Sponsored-by: Noam Kremen on Patreon	2023-04-10 13:50:51 -04:00
Joey Hess	2b940f7725	registerurl, unregisterurl: Added --remote option This serves two purposes. --remote=web bypasses other special remotes that claim the url, same as addurl --raw. And, specifying some other remote allows making sure that an url is claimed by the remote you expect, which makes then using setpresentkey not be fragile. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-04-05 15:54:41 -04:00
Yaroslav Halchenko	84b0a3707a	Apply codespell -w throughout	2023-03-17 15:14:58 -04:00
Joey Hess	54ad1b4cfb	Windows: Support long filenames in more (possibly all) of the code Works around this bug in unix-compat: https://github.com/jacobstanley/unix-compat/issues/56 getFileStatus and other FilePath using functions in unix-compat do not do UNC conversion on Windows. Made Utility.RawFilePath use convertToWindowsNativeNamespace to do the necessary conversion on windows to support long filenames. Audited all imports of System.PosixCompat.Files to make sure that no functions that operate on FilePath were imported from it. Instead, use the equvilants from Utility.RawFilePath. In particular the re-export of that module in Common had to be removed, which led to lots of other changes throughout the code. The changes to Build.Configure, Build.DesktopFile, and Build.TestConfig make Utility.Directory not be needed to build setup. And so let it use Utility.RawFilePath, which depends on unix, which cannot be in setup-depends. Sponsored-by: Dartmouth College's Datalad project	2023-03-01 15:55:58 -04:00
Joey Hess	c209e0f643	add FIELD?=GLOB to git-annex view usage And also to vadd usage. Also added some other things to the usage that were omitted before to save space. Adding even FIELD?=GLOB made the git-annex --help list of commands grow too wide for an 80 column display. So, removed the description of parameters from that list of commands. Sponsored-By: Brock Spratlen on Patreon	2023-02-07 18:09:10 -04:00
Joey Hess	a6c1d9752b	move/copy: option parsing for --from with --to Allowing --from and --to as an alternative to --from or --to is hard to do with optparse-applicative! The obvious approach of (pfrom <\|> pto <\|> pfromandto) does not work when pfromandto uses the same option names as pfrom and pto do. It compiles but the generated parser does not work for all desired combinations. Instead, have to parse optionally from and optionally to. When neither is provided, the parser succeeds, but it's a result that can't be handled. So, have to giveup after option parsing. There does not seem to be a way to make an optparse-applicative Parser give up internally either. Also, need seek' because I first tried making fto be a where binding, but that resulted in a hang when git-annex move was run without --from or --to. I think because startConcurrency was not expecting the stages value to contain an exception and so ended up blocking. Sponsored-by: Dartmouth College's DANDI project	2023-01-18 14:42:39 -04:00
Joey Hess	f8bc208e89	findkeys: New command, very similar to git-annex find but operating on keys I've long been asked for `git-annex find --all` or something like that, but pushed back on it because I feel that the command is analagous to find(1) and so it would be surprising for it to list keys rather than files. So instead, add a new findkeys subcommand. Note that the use of withKeyOptions is rather strange because usually that is used to fall back to --all rather than listing files, but here it's made to default to --all like behavior and never list files. A performance thing that could be improved is that withKeyOptions always reads and caches location logs. But findkeys with no options does not need them, so it could be made faster. That caching does speed up options like --in though. This is really just a subset of a more general performance thing that --all reads location logs sometimes unncessarily. Anyway, it needs to read the location log in order to checkDead, and it seems good that findkeys does skip dead keys. Also, cleaned up comments on git-annex-find man page asking for --all option. Sponsored-by: Dartmouth College's DANDI project	2023-01-17 14:51:57 -04:00
Joey Hess	a522a41a42	fix misleading helper function name noworktreeitems was false for NoWorkTreeItems which was hard to understand. Sponsored-by: Dartmouth College's DANDI project	2023-01-17 14:25:38 -04:00
Joey Hess	0b2dd374d8	--anything and --nothing Added --anything (and --nothing). Eg, git-annex find --anything will list all annexed files whether or not the content is present. This is slightly faster and clearer than --include=* or --exclude=* While I can't imagine how --nothing will be used, preferred content expressions already had anything and nothing, so might as well support both as matching options as well. Sponsored-by: Dartmouth College's Datalad project	2022-12-20 15:44:09 -04:00
Joey Hess	731e806c96	use lookupKeyStaged in --batch code paths Make --batch mode handle unstaged annexed files consistently whether the file is unlocked or not. Before this, a unstaged locked file would have the symlink on disk examined and operated on in --batch mode, while an unstaged unlocked file would be skipped. Note that, when not in batch mode, unstaged files are skipped over too. That is actually somewhat new behavior; as late as 7.20191114 a command like `git-annex whereis .` would operate on unstaged locked files and skip over unstaged unlocked files. That changed during optimisation of CmdLine.Seek with apparently little fanfare or notice. Turns out that rmurl still behaved that way when given an unstaged file on the command line. It was changed to use lookupKeyStaged to handle its --batch mode. That also affected its non-batch mode, but since that's just catching up to the change earlier made to most other commands, I have not mentioed that in the changelog. It may be that other uses of lookupKey should also change to lookupKeyStaged. But it may also be that would slow down some things, or lead to unwanted behavior changes, so I've kept the changes minimal for now. An example of a place where the use of lookupKey is better than lookupKeyStaged is in Command.AddUrl, where it looks to see if the file already exists, and adds the url to the file when so. It does not matter there whether the file is staged or not (when it's locked). The use of lookupKey in Command.Unused likewise seems good (and faster). Sponsored-by: Nicholas Golder-Manning on Patreon	2022-10-26 14:43:06 -04:00
Joey Hess	b2ee2496ee	remove whenAnnexed and ifAnnexed In preparation for adding a new variation on lookupKey. Sponsored-by: Max Thoursie on Patreon	2022-10-26 14:06:32 -04:00
Joey Hess	ba7ecbc6a9	avoid flushing keys db queue after each Annex action The flush was only done Annex.run' to make sure that the queue was flushed before git-annex exits. But, doing it there means that as soon as one change gets queued, it gets flushed soon after, which contributes to excessive writes to the database, slowing git-annex down. (This does not yet speed git-annex up, but it is a stepping stone to doing so.) Database queues do not autoflush when garbage collected, so have to be flushed explicitly. I don't think it's possible to make them autoflush (except perhaps if git-annex sqitched to using ResourceT..). The comment in Database.Keys.closeDb used to be accurate, since the automatic flushing did mean that all writes reached the database even when closeDb was not called. But now, closeDb or flushDb needs to be called before stopping using an Annex state. So, removed that comment. In Remote.Git, change to using quiesce everywhere that it used to use stopCoProcesses. This means that uses on onLocal in there are just as slow as before. I considered only calling closeDb on the local git remotes when git-annex exits. But, the reason that Remote.Git calls stopCoProcesses in each onLocal is so as not to leave git processes running that have files open on the remote repo, when it's on removable media. So, it seemed to make sense to also closeDb after each one, since sqlite may also keep files open. Although that has not seemed to cause problems with removable media so far. It was also just easier to quiesce in each onLocal than once at the end. This does likely leave performance on the floor, so could be revisited. In Annex.Content.saveState, there was no reason to close the db, flushing it is enough. The rest of the changes are from auditing for Annex.new, and making sure that quiesce is called, after any action that might possibly need it. After that audit, I'm pretty sure that the change to Annex.run' is safe. The only concern might be that this does let more changes get queued for write to the db, and if git-annex is interrupted, those will be lost. But interrupting git-annex can obviously already prevent it from writing the most recent change to the db, so it must recover from such lost data... right? Sponsored-by: Dartmouth College's Datalad project	2022-10-12 14:12:23 -04:00
Joey Hess	85dbc21c1c	fix typo	2022-10-07 12:30:07 -04:00
Joey Hess	70d2ece381	improve usage These commands operate on not only remotes, but any way a repository can be specified, including "here" etc. Sponsored-by: Graham Spencer on Patreon	2022-10-03 13:49:42 -04:00
Joey Hess	2478e9e03a	restage: New git-annex command, handles restaging unlocked files This is much easier and less failure-prone than having the user run git update-index --refresh themselves. Sponsored-by: Dartmouth College's DANDI project	2022-09-23 16:29:59 -04:00
Joey Hess	66bd4f80b3	Improved handling of --time-limit when combined with -J When concurrency is enabled, there can be worker threads still running when the time limit is checked. Exiting right there does not give those threads time to finish what they're doing. Instead, the seeking is wrapped up, and git-annex then shuts down cleanly. The whole point of --time-limit existing, rather than using timeout(1) when running git-annex is to let git-annex finish the action(s) it is working on when the time limit is reached, and shut down cleanly. I noticed this problem when investigating why restagePointerFile might not have run after get/drop of an unlocked file. With --time-limit -J, a worker thread may have finished updating a work tree file, and be killed by the time limit check before it can run restagePointerFile. So despite --time-limit running the shutdown actions, the work tree file didn't get restaged. Sponsored-by: Dartmouth College's DANDI project	2022-09-22 12:54:52 -04:00
Joey Hess	eefc026370	fix reversion on skipping dead keys in --all/bare Fix a reversion that made dead keys not be skipped when operating on all keys via --all or in a bare repo. (Introduced in version 8.20200720) Also, improved the documentation of git-annex-dead, it does not only apply to fsck --all. Also, made git-annex fsck, when run on a file whose key is dead, display that. Before, it displayed that only when run with --all, but with this fix, it skips dead keys with --all. But it can still be run on a file that uses a dead key, and displaying "This key is dead" explains to the user why it does not consider missing content for it to be a problem. Sponsored-by: k0ld on Patreon	2022-09-13 14:38:13 -04:00
Joey Hess	8a4cfd4f2d	use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink Fix crash importing from a directory special remote that contains a broken symlink. The crash was in listImportableContentsM but some other places in Remote.Directory also seemed like they could have the same problem. Also audited for other places that have such a problem. Not all calls to getFileStatus are bad, in some cases it's better to crash on something unexpected. For example, `git-annex import path` when the path is a broken symlink should crash, the same as when it does not exist. Many of the getFileStatus calls are like that, particularly when they involve .git/annex/objects which should never have a broken symlink in it. Fixed a few other possible cases of the problem. Sponsored-by: Lawrence Brogan on Patreon	2022-09-05 13:46:32 -04:00
Joey Hess	ed39979ac8	import: Avoid following symbolic links inside directories being imported Too big a footgun. This does not prevent attackers who can write to the directory being imported from racing the check. But they can cause anything to be imported anyway, so would be limited to getting the legacy import to follow into a directory they do not write to, and move files out of it into the annex. (The directory special remote does not have that problem since it does not move files.) Sponsored-by: Jack Hill on Patreon	2022-08-19 13:31:16 -04:00
Joey Hess	3a513cfe73	add --dry-run: New option This is intended for users who want to see what it would output in order to eg, check if a file would be added to git or the annex. It is not intended as a way for scripts to get information. Sponsored-by: Dartmouth College's Datalad project	2022-08-03 11:16:04 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	8040ecf9b8	final readonly values moves to AnnexRead At this point I've checked all AnnexState values and these were all that remained that could move. Pity that Annex.repo can't move, but it gets modified sometimes.. A couple of AnnexState values are set by options and could be AnnexRead, but happen to use Annex when being set. Sponsored-by: Max Thoursie on Patreon	2022-06-28 16:04:58 -04:00
Joey Hess	cb9cf30c48	move several readonly values to AnnexRead This improves performance to a small extent in several places. Sponsored-by: Tobias Ammann on Patreon	2022-06-28 15:40:19 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	ce91f10132	fix annex.skipunknown false error propagation Propagate nonzero exit status from git ls-files when a specified file does not exist, or a specified directory does not contain any files checked into git. The recent completion of the annex.skipunknown transition exposed this bug, that has unfortunately been lurking all along. It is also possible that git ls-files errors out for some other reason -- perhaps a permission problem -- and this will also fix error propagation in such situations. Sponsored-by: Dartmouth College's Datalad project	2022-02-28 12:54:56 -04:00
Joey Hess	5b373a9dd2	read a consistent amount from pointer file A few places were reading the max symlink size of a pointer file, then passing tp parseLinkTargetOrPointer. Which is fine currently, but to support pointer files with lines of data after the pointer, enough has to be read that parseLinkTargetOrPointer can be assured of seeing enough of that data to know if it's correctly formatted. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 12:52:34 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	7f6b2ca49c	handle overBranchFileContents with read-only unmerged git-annex branches This makes --all error out in that situation. Which is better than ignoring information from the branches. To really handle the branches right, overBranchFileContents would need to both query all the branches and union merge file contents (or perhaps not provide any file content), as well as diffing between branches to find files that are only present in the unmerged branches. And also, it would need to handle transitions.. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:30:51 -04:00
Joey Hess	d9d0fe5fa4	disable precaching git-annex branch when there are unmerged branches in a read-only repo The way precaching works, it can't merge in information from those branches efficiently, so just disable it and fall back to Annex.Branch.get in order to get the correct information. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 14:08:50 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	afe5d9e137	fix duplicated commands in git-annex-shell usage display	2021-10-11 15:42:27 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	ab7b5a492c	--batch-keys New --batch-keys option added to these commands: get, drop, move, copy, whereis git-annex-matching-options had to be reworded since some of its options can be used to match on keys, not only files. Sponsored-by: Luke Shumaker on Patreon	2021-08-25 14:21:12 -04:00
Joey Hess	fa62c98910	simplify and speed up Utility.FileSystemEncoding This eliminates the distinction between decodeBS and decodeBS', encodeBS and encodeBS', etc. The old implementation truncated at NUL, and the primed versions had to do extra work to avoid that problem. The new implementation does not truncate at NUL, and is also a lot faster. (Benchmarked at 2x faster for decodeBS and 3x for encodeBS; more for the primed versions.) Note that filepath-bytestring 1.4.2.1.8 contains the same optimisation, and upgrading to it will speed up to/fromRawFilePath. AFAIK, nothing relied on the old behavior of truncating at NUL. Some code used the faster versions in places where I was sure there would not be a NUL. So this change is unlikely to break anything. Also, moved s2w8 and w82s out of the module, as they do not involve filesystem encoding really. Sponsored-by: Shae Erisson on Patreon	2021-08-11 12:13:31 -04:00
Joey Hess	47d3dccf19	whereused implemented except --historical Sponsored-by: Jack Hill on Patreon	2021-07-14 14:27:21 -04:00
Joey Hess	8a13bbedd6	--size-limit exit 101 Sponsored-by: Mark Reidenbach on Patreon	2021-06-04 16:43:47 -04:00
Joey Hess	771a122c9e	add --size-limit option When this option is not used, there should be effectively no added overhead, thanks to the optimisation in `b3cd0cc6ba`. When an action fails on a file, the size of the file still counts toward the size limit. This was necessary to support concurrency, but also generally seems like the right choice. Most commands that operate on annexed files support the option. export and import do not, and I don't know if it would make sense for export to.. Why would you want an incomplete export? sync doesn't, and while it would be easy to make it support it for transferring files, it's not clear if dropping files should also take the size limit into account. Commands like add that don't operate on annexed files don't support the option either. Exiting 101 not yet implemented. Sponsored-by: Denis Dzyubenko on Patreon	2021-06-04 16:16:53 -04:00
Joey Hess	b3cd0cc6ba	minor optimisation Avoid a second mvar access. Sponsored-by: Jochen Bartl on Patreon	2021-06-04 14:57:19 -04:00
Joey Hess	b5f5475ed6	New matching options --excludesamecontent and --includesamecontent The normalisation of filenames turns out to be the tricky part here, because the associated files coming out of the keys db may look like "./foo/bar" or "../bar". For the former to match a glob like "foo/", it needs to be normalised. Note that, on windows, normalise "./foo/bar" = "foo\\bar" which a glob like "foo/" won't match. So the glob is matched a second time, on the toInternalGitPath, so allowing the user to provide a glob with the slashes in either direction. However, this still won't support some wacky edge cases like the user providing a glob of "foo/bar\\*" Sponsored-by: Dartmouth College's Datalad project	2021-05-25 13:08:18 -04:00
Joey Hess	a58c90ccf4	skeleton of filter-branch command, with option parser	2021-05-14 10:59:48 -04:00
Joey Hess	0e830b6bb5	make remoteKeyToRemoteName safer If it's passed a ConfigKey such as annex.version, avoid returning an empty remote name and return Nothing instead. Also, foo.bar.baz is not treated as a remote named "bar".	2021-04-23 13:29:21 -04:00
Joey Hess	653b719472	fix --all to include not yet committed files from the journal Fix bug caused by recent optimisations that could make git-annex not see recently recorded status information when configured with annex.alwayscommit=false. This does mean that --all can end up processing the same key more than once, but before the optimisations that introduced this bug, it used to also behave that way. So I didn't try to fix that; it's an edge case and anyway git-annex behaves well when run on the same key repeatedly. I am not too happy with the use of a MVar to buffer the list of files in the journal. I guess it doesn't defeat lazy streaming of the list, if that list is actually generated lazily, and anyway the size of the journal is normally capped and small, so if configs are changed to make it huge and this code path fire, git-annex using enough memory to buffer it all is not a large problem.	2021-04-21 15:40:32 -04:00

1 2 3 4 5 ...

498 commits