git-annex

Author	SHA1	Message	Date
Joey Hess	eefc026370	fix reversion on skipping dead keys in --all/bare Fix a reversion that made dead keys not be skipped when operating on all keys via --all or in a bare repo. (Introduced in version 8.20200720) Also, improved the documentation of git-annex-dead, it does not only apply to fsck --all. Also, made git-annex fsck, when run on a file whose key is dead, display that. Before, it displayed that only when run with --all, but with this fix, it skips dead keys with --all. But it can still be run on a file that uses a dead key, and displaying "This key is dead" explains to the user why it does not consider missing content for it to be a problem. Sponsored-by: k0ld on Patreon	2022-09-13 14:38:13 -04:00
Joey Hess	d2c842e9a1	don't force use of conduit in withUrlOptionsPromptingCreds Use curl for downloads from git remotes when annex.url-options and other git configs are set. If the url needs a password, curl will fail, and git credential will not be used to prompt for it. But the user can set --netrc in url-options and put the password in the netrc file. This also means that url-options settings like -4 will take effect. That was the case before commit `1883f7ef8f` forced conduit to be used.	2022-09-09 16:07:32 -04:00
Joey Hess	9621beabc4	cache credentials in memory when doing http basic auth to a git remote When accessing a git remote over http needs a git credential prompt for a password, cache it for the lifetime of the git-annex process, rather than repeatedly prompting. The git-lfs special remote already caches the credential when discovering the endpoint. And presumably commands like git pull do as well, since they may download multiple urls from a remote. The TMVar CredentialCache is read, so two concurrent calls to getBasicAuthFromCredential will both prompt for a credential. There would already be two concurrent password prompts in such a case, and existing uses of `prompt` probably avoid it. Anyway, it's no worse than before.	2022-09-09 14:20:32 -04:00
Joey Hess	8a4cfd4f2d	use getSymbolicLinkStatus not getFileStatus to avoid crash on broken symlink Fix crash importing from a directory special remote that contains a broken symlink. The crash was in listImportableContentsM but some other places in Remote.Directory also seemed like they could have the same problem. Also audited for other places that have such a problem. Not all calls to getFileStatus are bad, in some cases it's better to crash on something unexpected. For example, `git-annex import path` when the path is a broken symlink should crash, the same as when it does not exist. Many of the getFileStatus calls are like that, particularly when they involve .git/annex/objects which should never have a broken symlink in it. Fixed a few other possible cases of the problem. Sponsored-by: Lawrence Brogan on Patreon	2022-09-05 13:46:32 -04:00
Joey Hess	a93163d6f7	optimise linker in linux standalone tarballs Trick the linker into not doing unncessary work searching for optimised libraries that are not present, by symlinking the directories where optimised libs would be to the main lib dir. This reduces the ENOENT of git-annex init by about 1/2. The linker always finds the files where it looks first time now. I have not looked at what the wall clock speedup might be, it's probably rather small. If a x86-64-v5 comes to be, the list will need to be extended. And there may be other directories used on some machines that I have missed. Not done for arm64 yet, or any uncommon architectures. Sponsored-by: Dartmouth College's Datalad project	2022-08-30 15:20:04 -04:00
Joey Hess	78440ca37d	move assistant and webapp build-depends into main build-depends For some reason, cabal 3.4.1.0 builds w/o the assistant and webapp, even when the flag is explicitly turned on. Moving the build-depends from inside the if flag section to the main build-depends somehow fixes this. Since the webapp build deps are thus always available, there is no reason not to build the webapp when building the assistant. So, got rid of the webapp build flag. Kept the assistant build flag for now, since building without it does at least still speed up the build. Sponsored-by: Brock Spratlen on Patreon	2022-08-29 15:23:49 -04:00
Joey Hess	cbac6c680b	remove changelog entry about reverted stack.yaml change	2022-08-22 12:02:22 -04:00
Joey Hess	e801634875	prep release	2022-08-22 12:02:04 -04:00
Yaroslav Halchenko	0151976676	Typo fix unncessary -> unnecessary. Detected while reading recent CHANGELOG entry but then decided to apply to entire codebase and docs since why not?	2022-08-20 09:40:19 -04:00
Joey Hess	ed39979ac8	import: Avoid following symbolic links inside directories being imported Too big a footgun. This does not prevent attackers who can write to the directory being imported from racing the check. But they can cause anything to be imported anyway, so would be limited to getting the legacy import to follow into a directory they do not write to, and move files out of it into the annex. (The directory special remote does not have that problem since it does not move files.) Sponsored-by: Jack Hill on Patreon	2022-08-19 13:31:16 -04:00
Joey Hess	94029995fa	fix git-annex add regression on deleted file Fix a regression in 10.20220624 that caused git-annex add to crash when there was an unstaged deletion. Sponsored-by: Dartmouth College's Datalad project	2022-08-19 12:55:49 -04:00
Joey Hess	840bd50390	make it easier to use curl for unusual url schemes Use curl when annex.security.allowed-url-schemes includes an url scheme not supported by git-annex internally, as long as annex.security.allowed-ip-addresses is configured to allow using curl. Sponsored-by: Luke Shumaker on Patreon	2022-08-15 12:22:13 -04:00
Joey Hess	e60766543f	add annex.dbdir (WIP) WIP: This is mostly complete, but there is a problem: createDirectoryUnder throws an error when annex.dbdir is set to outside the git repo. annex.dbdir is a workaround for filesystems where sqlite does not work, due to eg, the filesystem not properly supporting locking. It's intended to be set before initializing the repository. Changing it in an existing repository can be done, but would be the same as making a new repository and moving all the annexed objects into it. While the databases get recreated from the git-annex branch in that situation, any information that is in the databases but not stored in the branch gets lost. It may be that no information ever gets stored in the databases that cannot be reconstructed from the branch, but I have not verified that. Sponsored-by: Dartmouth College's Datalad project	2022-08-11 16:58:53 -04:00
Joey Hess	2530012fa3	fix wording	2022-08-10 12:32:49 -04:00
Joey Hess	abd417d4fe	Avoid running multiple bup split processes concurrently Since bup split is not concurrency safe. Used a lock file so that 2 git-annex processes only run one bup split between them (per bup repo). (Concurrent writes from different git-annex repository clones to the same bup repo could still have concurrency problems.) Sponsored-by: Noam Kremen on Patreon	2022-08-08 18:54:06 -04:00
Joey Hess	5bc70e2da5	When bup split fails, display its stderr It seems worth noting here that I emailed bup's author about bup split being noisy on stderr even with -q in approximately 2011. That never got fixed. Its current repo on github only accepts pull requests, not bug reports. Needing to add such complexity to deal with such a longstanding unfixed issue is not fun. Sponsored-by: Kevin Mueller on Patreon	2022-08-05 13:57:20 -04:00
Joey Hess	f94908f2a6	improve output when storing to bup bup split outputs to stderr even with -q. This was discarded when using -J, but it was still outputting when not using -J, and so was git-annex. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-08-05 12:29:33 -04:00
Joey Hess	a23fd7349f	work around git segfault Work around bug in git 2.37 that causes a segfault when when core.untrackedCache is set, and broke git-annex init. Depending on when git gets fixed and how widely the buggy versions are used, this could be reverted quite soon, or need to linger for a long time. It only makes git-annex init a tiny bit slower in a new repo. Sponsored-by: Max Thoursie on Patreon	2022-08-04 14:20:57 -04:00
Joey Hess	3a513cfe73	add --dry-run: New option This is intended for users who want to see what it would output in order to eg, check if a file would be added to git or the annex. It is not intended as a way for scripts to get information. Sponsored-by: Dartmouth College's Datalad project	2022-08-03 11:16:04 -04:00
Joey Hess	570b1aa6a1	Allow find --branch to be used in a bare repository, the same as the deprecated findref can be This will allow later fully deprecating and removing findref. Sponsored-by: Erik Bjäreholt on Patreon	2022-07-29 12:52:12 -04:00
Joey Hess	be19a68276	new matching options --want-get-by and --want-drop-by Sponsored-by: Graham Spencer on Patreon	2022-07-28 13:26:03 -04:00
Joey Hess	b5dc04099e	stack.yaml: Updated to lts-19.16 Last try at this broke on windows with a problem installing ghc, but I wanted to try again. Also this has a version of aws that allows using aeson 2.0, which has a potential security fix.	2022-07-26 16:04:49 -04:00
Joey Hess	d905232842	use ResourcePool for hash-object handles Avoid starting an unncessary number of git hash-object processes when concurrency is enabled. Sponsored-by: Dartmouth College's DANDI project	2022-07-25 17:32:39 -04:00
Joey Hess	63cef2ae0b	v8 repositories automatically upgrade to v9 (And v9 later on to v10.) When v9/v10 were added, making v8 automatically upgrade was deferred "for a few months" to prevent interoperability problems if users also have an old version of git-annex. Of course that could still be the case, but there has been a good amount of time and this can't be put off forever. Allow setting annex.autoupgraderepository to false to avoid this upgrade. Previously, that only prevented upgrades from no longer supported git-annex versions, but v8 is still supported, and users may want to keep on v8 to interoperate with an old git-annex version. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-07-25 16:20:04 -04:00
Joey Hess	a0e788c94a	releasing package git-annex version 10.20220724	2022-07-25 14:07:20 -04:00
Joey Hess	4e88137a28	prevent appends except when annex.alwayscompact=false I would like for a new repo version to enable appends, but to do so safely would need a v11 followed by a 1 year delay followed by a v12 that does it. Since a similar v9 and v10 transition is currently happening, and is less than 6 months along in most repos, it does not feel wise to stack up another year-long transition behind that. What if I need to hurry up a new repo version for some other change? Added todo so I remember to make this change at some time when a v11 and probably v12 repo version do make sense. Sponsored-by: Dartmouth College's DANDI project	2022-07-20 13:23:55 -04:00
Joey Hess	36f0bdcd57	add annex.alwayscompact Added annex.alwayscompact setting which can be unset to speed up writes to the git-annex branch in some cases. Sponsored-by: Dartmouth College's DANDI project	2022-07-18 16:39:19 -04:00
Joey Hess	a2b1f369d1	disable journalIgnorable in enableInteractiveBranchAccess Fix a reversion that prevented --batch commands (and the assistant) from noticing data written to the journal by other commands. I have not identified which commit broke this for sure, but probably it was `aeca7c2207` --batch commands that wrote to the journal avoided the problem since journalIgnorable sets unset on write. It's a little bit surprising that nobody noticed that query --batch commands did not see data written by other commands. Sponsored-by: Dartmouth College's DANDI project	2022-07-15 13:48:41 -04:00
Joey Hess	093ad89ead	S3: Avoid writing or checking the uuid file in the S3 bucket when importtree=yes or exporttree=yes It does not make sense for either; importing from an existing bucket should not write to it. And the user may not have write access at all. And exporting to a bucket should not write other files. Also this prevents the uuid file being imported after being written. Sponsored-by: Dartmouth College's DANDI project	2022-07-14 15:05:51 -04:00
Joey Hess	50c2cac7e7	adb: Added configuration setting oldandroid=true To avoid using find -printf, which was first supported in Android around 2019-2020. Probing seems too fragile, and execing stat once per file is too slow to do when there's a faster way available, which brought me to an option... Sponsored-by: Brett Eisenberg on Patreon	2022-07-13 18:00:47 -04:00
Joey Hess	fbc3c223a6	filter-process: Fix protocol for empty files This caused git to complain that filter-process failed and kill it with signal 15. Because it wrote an extra flushPkt for an empty file, which git did not expect, and so git saw an unexpected response to the next request. Luckily, filter-process is only used by default in v9 and up, and v8 is still the default. Also, git had to be updating an empty file, followed by another file, which is a fairly unlikely situation. And git restarts filter-process after this happens and uses it to filter the rest of the files. So this isn't a crippling bug. Sponsored-by: Luke Shumaker on Patreon	2022-07-13 17:13:54 -04:00
Joey Hess	201e41cffd	add: Fix reversion when adding an annex link that has been moved to another directory Fixes commit `f259be7f39` Sponsored-by: Dartmouth College's Datalad project	2022-07-05 16:22:41 -04:00
Joey Hess	d01530ac21	Revert "lts-19.13 (ghc 9.0.2)" This reverts commit `d2bc268317`. That seemed to break building on windows, before it starts building git-annex at all, it tried to install ghc and something blew up: Processing archive: C:\Users\runneradmin\AppData\Local\Programs\stack\x86_64-windows\ghc-9.0.2.tar.xz Extracting ghc-9.0.2.tar ... Extracted total of 11790 files from ghc-9.0.2.tar C:\Users\runneradmin\AppData\Local\Programs\stack\x86_64-windows\ghc-9.0.2-tmp-6d0fbe7f3b29e56c\ghc-9.0.2\: renameDirectory:pathIsDirectory:CreateFile "\\\\?\\C:\\Users\\runneradmin\\AppData\\Local\\Programs\\stack\\x86_64-windows\\ghc-9.0.2-tmp-6d0fbe7f3b29e56c\\ghc-9.0.2\\": does not exist (The system cannot find the file specified.) Hopefully a newer ghc version or updated stackage version will fix this at some point, in the meantime revert it.	2022-07-05 13:13:25 -04:00
Joey Hess	02ef3d6a64	fix build with assistant disabled and webapp enabled The webapp modules cannot build with the assistant disabled, so make the webapp be under the assistant build flag. Sponsored-by: Jarkko Kniivilä on Patreon	2022-06-29 14:19:18 -04:00
Joey Hess	b223988e22	remove --backend from global options --backend is no longer a global option, and is only accepted by commands that actually need it. Three commands that used to support backend but don't any longer are watch, webapp, and assistant. It would be possible to make them support it, but I doubt anyone used the option with these. And in the case of webapp and assistant, the option was handled inconsistently, only taking affect when the command is run with an existing git-annex repo, not when it creates a new one. Also, renamed GlobalOption etc to AnnexOption. Because there are many options of this type that are not actually global (any more) and get added to commands that need them. Sponsored-by: Kevin Mueller on Patreon	2022-06-29 13:33:25 -04:00
Joey Hess	21c50c0f72	fix parallel copy from/to a local git repo Improve handling of parallelization with -J when copying content from/to a git remote that is a local path. Sponsored-by: Nicholas Golder-Manning on Patreon	2022-06-29 12:40:12 -04:00
Joey Hess	d2bc268317	lts-19.13 (ghc 9.0.2)	2022-06-28 14:49:33 -04:00
Joey Hess	c1b9ea2759	The 23 never happened release. It's 24 somewhere..	2022-06-23 13:55:54 -04:00
Joey Hess	57d088e9c2	fix release version	2022-06-23 13:35:14 -04:00
Joey Hess	86968a4047	releasing package git-annex version 10.20220526	2022-06-23 13:31:36 -04:00
Joey Hess	f259be7f39	fix overwrite race with small file that got large When adding a small file, it does not get locked down, so can be modified after git-annex checks that it's small. The use of queued git add made the race window nice and wide too. Fixed by checking if the file has changed, and by not using git add. Instead, have to recapitulate git add's handling of things like symlinks and executable files. Sponsored-by: Jochen Bartl on Patreon	2022-06-14 16:38:56 -04:00
Joey Hess	78a3d44ea0	get rid of racy addLink The remaining callers all did not rely on it checking gitignore, so were easy to convert. They were susceptable to the same overwrite race as add and fix, although less likely to have it and a narrower window than add's race. Command.Rekey in passing got an unncessary call to removeFile deleted. addSymlink handles deleting any existing worktree file.	2022-06-14 14:47:15 -04:00
Joey Hess	64c7f60f7a	fixed overwrite race with git-annex fix Similar to git-annex add, git-annex fix queued git add, so if a file got modified before git add ran, the wrong content would be staged, perhaps a large file content. Sponsored-by: Brock Spratlen on Patreon	2022-06-14 14:19:58 -04:00
Joey Hess	dd6dec4eb1	fix add overwrite race with git-annex add to annex This is not a complete fix for all such races, only the one where a large file gets changed while adding and gets added to git rather than to the annex. addLink needs to go away, any caller of it is probably subject to the same kind of race. (Also, addLink itself fails to check gitignore when symlinks are not supported.) ingestAdd no longer checks gitignore. (It didn't check it consistently before either, since there were cases where it did not run git add!) When git-annex import calls it, it's already checked gitignore itself earlier. When git-annex add calls it, it's usually on files found by withFilesNotInGit, which handles checking ignores. There was one other case, when git-annex add --batch calls it. In that case, old git-annex behaved rather badly, it would seem to add the file, but git add would later fail, leaving the file as an unstaged annex symlink. That behavior has also been fixed. Sponsored-by: Brett Eisenberg on Patreon	2022-06-14 13:37:19 -04:00
Joey Hess	6d0b243d9d	avoid cleaning up move log when drop from remote fails move: Improve resuming a move that succeeded in transferring the content, but where dropping failed due to eg a network problem, in cases where numcopies checks prevented the resumed move from dropping the object from the source repository. This was earlier done for moves that got interrupted during the drop stage. Sponsored-by: Svenne Krap on Patreon	2022-06-09 15:26:25 -04:00
Joey Hess	13fc6a9b6a	fix to use 1 chunk for empty file Fix retrival of an empty file that is stored in a special remote with chunking enabled. The speculative chunk stuff caused a reversion by adding an empty list for the empty file. Which is just wrong; the empty file is still stored on the remote, and should be retrieved like any other file. It uses 1 chunk, so `max 1` is the simple fix. Sponsored-by: Noam Kremen on Patreon	2022-06-09 14:24:56 -04:00
Joey Hess	14584e7a38	initremote type=git probe uuid rather than matching path of an existing remote to find the uuid. The main benefit of this is that locations not using ssh:// will work now, including both paths and host:/path The other benefit is that it's a simpler interface, no need to have an existing remote with the same url and some other name. Although that will still work of course. This does rely on tryGitConfigRead working when given a Git.Repo that is not a remote. Luckily, it works fine that way. Also, tryGitConfigRead will auto-init a local repo that has a git-annex branch. I did not enable auto-init of ssh repos though. The uuid discovery actually happens twice; initremote discovers it, and uses it to store the special remote config, but does not set it in the git remote it creates. So the next run of git-annex does uuid discovery again, and caches it that time. This could be improved for a tiny speedup, but I didn't want to complicate things for that in this commit. Sponsored-by: Dartmouth College's DANDI project	2022-06-09 13:16:50 -04:00
Joey Hess	c59ea5b1ca	info: Added --autoenable option Use cases include using git-annex init --no-autoenable and then going back and enabling the special remotes that have autoenable configured. As well as just querying to remember which ones have it enabled. It lists all special remotes that have autoenable=yes whether currently enabled or not. And it can be used with --json. I pondered making this "git-annex info autoenable", but that seemed wrong because then if the use has a directory named "autoenable", it's unclear what they are asking for. (Although "git-annex info remote" may be similarly unclear.) Making it an option does mean that it can't be provided via --batch though. Sponsored-by: Dartmouth College's Datalad project	2022-06-01 14:20:38 -04:00
Joey Hess	0d50c90794	init: Added --no-autoenable option Someone may disagree with what repositories are set to autoenable and it's good to have local overrides. See https://github.com/datalad/datalad/issues/6634 Sponsored-by: Dartmouth College's Datalad project	2022-06-01 13:27:49 -04:00
Joey Hess	b60d85c4c0	releasing package git-annex version 10.20220525	2022-05-25 14:01:31 -04:00
Joey Hess	85f9193167	fix git-annex test -p test: When limiting tests to run with -p, work around tasty limitation by automatically including dependent tests. This fixes a reversion because it didn't used to use dependencies and forced tasty to run the init tests first. That changed when parallelizing the test suite. It will sometimes do a little more work than strictly required, because it adds init tests deps when limited to eg quickcheck tests, which don't depend on them. But this only adds a few seconds work. Sponsored-by: Dartmouth College's Datalad project	2022-05-23 14:24:54 -04:00
Joey Hess	af0d854460	deal with git's changes for CVE-2022-24765 Deal with git's recent changes to fix CVE-2022-24765, which prevent using git in a repository owned by someone else. That makes git config --list not list the repo's configs, only global configs. So annex.uuid and annex.version are not visible to git-annex. It displayed a message about that, which is not right for this situation. Detect the situation and display a better message, similar to the one other git commands display. Also, git-annex init when run in that situation would overwrite annex.uuid with a new one, since it couldn't see the old one. Add a check to prevent it running too in this situation. It may be that this fix has security implications, if a config set by the malicious user who owns the repo causes git or git-annex to run code. I don't think any git-annex configs get run by git-annex init. It may be that some git config of a command does get run by one of the git commands that git-annex init runs. ("git status" is the command that prompted the CVE-2022-24765, since core.fsmonitor can cause it to run a command). Since I don't know how to exploit this, I'm not treating it as a security fix for now. Note that passing --git-dir makes git bypass the security check. git-annex does pass --git-dir to most calls to git, which it does to avoid needing chdir to the directory containing a git repository when accessing a remote. So, it's possible that somewhere in git-annex it gets as far as running git with --git-dir, and git reads some configs that are unsafe (what CVE-2022-24765 is about). This seems unlikely, it would have to be part of git-annex that runs in git repositories that have no (visible) annex.uuid, and git-annex init is the only one that I can think of that then goes on to run git, as discussed earlier. But I've not fully ruled out there being others.. The git developers seem mostly worried about "git status" or a similar command implicitly run by a shell prompt, not an explicit use of git in such a repository. For example, Ævar Arnfjörð Bjarma wrote: > * There are other bits of config that also point to executable things, > e.g. core.editor, aliases etc, but nothing has been found yet that > provides the "at a distance" effect that the core.fsmonitor vector > does. > > I.e. a user is unlikely to go to /tmp/some-crap/here and run "git > commit", but they (or their shell prompt) might run "git status", and > if you have a /tmp/.git ... Sponsored-by: Jarkko Kniivilä on Patreon	2022-05-20 14:38:27 -04:00
Joey Hess	aa414d97c9	make fsck normalize object locations The purpose of this is to fix situations where the annex object file is stored in a directory structure other than where annex symlinks point to. But it will also move object files from the hashdirmixed back to hashdirlower if the repo configuration makes that the normal location. It would have been more work to avoid that than to let it do it. Sponsored-by: Dartmouth College's Datalad project	2022-05-16 15:38:06 -04:00
Joey Hess	54809e9eb3	fix untrustworthiness of import/export remotes Commit `36133f27c0` had a boolean flip in it, aaargh. Special remotes with importtree=yes or exporttree=yes are once again treated as untrusted, since files stored in them can be deleted or modified at any time. Sponsored-by: Kevin Mueller on Patreon	2022-05-09 15:53:23 -04:00
Joey Hess	e8a601aa24	incremental verification for retrieval from import remotes Sponsored-by: Dartmouth College's Datalad project	2022-05-09 15:39:43 -04:00
Joey Hess	d1cce869ed	implement dataUnits finally Added support for "megabit" and related bandwidth units in annex.stalldetection and everywhere else that git-annex parses data units. Note that the short form is "Mbit" not "Mb" because that differs from "MB" only in case, and git-annex parses units case-insensitively. It would be horrible if two different versions of git-annex parsed the same value differently, so I don't think "Mb" can be supported. See comment for bonus sad story from my childhood. Sponsored-by: Nicholas Golder-Manning	2022-05-05 15:25:11 -04:00
Joey Hess	4e4c44ed8e	hah, I mean 0504 of course	2022-05-04 11:47:40 -04:00
Joey Hess	cb0e89bf77	releasing package git-annex version 10.20220404	2022-05-04 11:46:56 -04:00
Joey Hess	0406c33f58	fix git-annex repair false positive Avoid treating refs/annex/last-index or other refs that are not commit objects as evidence of repository corruption. The repair code checks to find bad refs by trying to run `git log` on them, and assumes that no output means something is broken. But git log on a tree object is empty. This was worth fixing generally, not as a special case, since it's certainly possible that other things store tree or other objects in refs. Sponsored-by: Max Thoursie on Patreon	2022-05-04 11:32:23 -04:00
Joey Hess	43701759a3	disable shellescape for rsync 3.2.4 rsync 3.2.4 broke backwards-compatability by preventing exposing filenames to the shell. Made the rsync and gcrypt special remotes detect this and disable shellescape. An alternative fix would have been to always set RSYNC_OLD_ARGS=1. Which would avoid the overhead of probing rsync --help for each affected remote. But that is really very fast to run, and it seemed better to switch to the modern code path rather than keeping on using the bad old code path. Sponsored-by: Tobias Ammann on Patreon	2022-05-03 12:12:41 -04:00
Joey Hess	280d41b58f	Fix a build failure with ghc 9.2.2 Thanks, gnezdo for the patch.	2022-05-02 14:21:48 -04:00
Joey Hess	17b20a2450	Fix test failure on NFS when cleaning up gpg temp directory Using removePathForcibly avoids concurrent removal problems. The i386ancient build still uses an old version of ghc and directory that do not include removePathForcibly though. Sponsored-by: Dartmouth College's Datalad project	2022-04-19 13:33:33 -04:00
Joey Hess	fd65de0eb9	multicast: Support uftp 5.0 by switching from aes256-cbc to aes256-gcm aes256-gcm is supported by both 4.x and 5.x, while 5.x dropped aes256-cbc. Sponsored-by: Graham Spencer on Patreon	2022-04-19 12:02:10 -04:00
Joey Hess	ff6b36c706	assistant prompt pushing of manual commits to remotes assistant: When annex.autocommit is set, notice commits that the user makes manually, and push them out to remotes promptly. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2022-03-31 13:02:16 -04:00
Joey Hess	d266a41f8d	prevent numcopies or mincopies being configured to 0 Ignore annex.numcopies set to 0 in gitattributes or git config, or by git-annex numcopies or by --numcopies, since that configuration would make git-annex easily lose data. Same for mincopies. This is a continuation of the work to make data only be able to be lost when --force is used. It earlier led to the --trust option being disabled, and similar reasoning applies here. Most numcopies configs had docs that strongly discouraged setting it to 0 anyway. And I can't imagine a use case for setting to 0. Not that there might not be one, but it's just so far from the intended use case of git-annex, of managing and storing your data, that it does not seem like it makes sense to cater to such a hypothetical use case, where any git-annex drop can lose your data at any time. Using a smart constructor makes sure every place avoids 0. Note that this does mean that NumCopies is for the configured desired values, and not the actual existing number of copies, which of course can be 0. The name configuredNumCopies is used to make that clear. Sponsored-by: Brock Spratlen on Patreon	2022-03-28 15:20:34 -04:00
Joey Hess	959beeea9f	releasing package git-annex version 10.20220322	2022-03-22 13:56:45 -04:00
Joey Hess	a460aa8b70	Removed the NetworkBSD build flag Debian stable and the i386ancient build both have a new enough network to not need this flag any longer. Sponsored-by: Svenne Krap on Patreon	2022-03-22 11:52:52 -04:00
Joey Hess	982eb7ed0d	remove vendored http-client-restricted Removed vendored copy of http-client-restricted, and removed the HttpClientRestricted build flag that avoided that dependency. http-client-restricted is in Debian stable, and the i386ancient build also uses it, so I think this vendored copy is no longer needed. Sponsored-by: Noam Kremen on Patreon	2022-03-22 11:50:06 -04:00
Joey Hess	42b6f24e67	reorder	2022-03-21 16:02:24 -04:00
Joey Hess	6079b0c72c	fix reversion add: Avoid unncessarily converting a newly unlocked file to be stored in git when it is not modified, even when annex.largefiles does not match it. This fixes a reversion in version 10.20220222, where git-annex unlock followed by git-annex add, followed by git commit file could result in git thinking the file was modified after the commit. I do have half a mind to remove the withUnmodifiedUnlockedPointers part of git-annex add. It seems weird, despite that old bug report arguing a case of consistency that it ought to behave that way. When git-annex add surpises me, it seems likely it's wrong.. But for now, this is the smallest possible fix. Sponsored-by: Dartmouth College's Datalad project	2022-03-21 15:54:04 -04:00
Joey Hess	3e2f1f73cb	add back inode to directory special remote ContentIdentifier Directory special remotes with importtree=yes have changed to once more take inodes into account. This will cause extra work when importing from a directory on a FAT filesystem that changes inodes on every mount. To avoid that extra work, set ignoreinodes=yes when initializing a new directory special remote, or change the configuration of your existing remote: git-annex enableremote foo ignoreinodes=yes This will mean a one-time re-import of all contents from every directory special remote due to the changed setting. `73df633a62` thought it was too unlikely that there would be modifications that the inode number was needed to notice. That was probably right; it's very unlikely that a file will get modified and end up with the same size and mtime as before. But, what was not considered is that a program like NextCloud might write two files with different content so closely together that they share the mtime. The inode is necessary to detect that situation. Sponsored-by: Max Thoursie on Patreon	2022-03-21 13:12:02 -04:00
Joey Hess	025c18128b	test: Added --jobs option Default to the number of CPU cores, which seems about optimal on my laptop. Using one more saves me 2 seconds actually. Better packing of workers improves speed significantly. In 2 tests runs, I saw segfaulting workers despite my attempt to work around that issue. So detect when a worker does, and re-run it. Removed installSignalHandlers again, because I was seeing an error "lost signal due to full pipe", which I guess was somehow caused by using it. Sponsored-by: Dartmouth College's Datalad project	2022-03-16 14:42:07 -04:00
Joey Hess	b1934cc794	changelog	2022-03-02 18:27:20 -04:00
Joey Hess	2fc46e1871	git-annex test from standalone speedup Avoid git-annex test being very slow when run from within the standalone linux tarball or OSX app. It may not really be necessary to add to PATH the directory where the git-annex binary resides, but it can't hurt. Most places where the test suite or git-annex run git-annex, they use programPath, so won't need a modified PATH. But I'm not sure if that's always the case. Sponsored-by: Dartmouth College's Datalad project	2022-03-01 16:08:55 -04:00
Joey Hess	ce91f10132	fix annex.skipunknown false error propagation Propagate nonzero exit status from git ls-files when a specified file does not exist, or a specified directory does not contain any files checked into git. The recent completion of the annex.skipunknown transition exposed this bug, that has unfortunately been lurking all along. It is also possible that git ls-files errors out for some other reason -- perhaps a permission problem -- and this will also fix error propagation in such situations. Sponsored-by: Dartmouth College's Datalad project	2022-02-28 12:54:56 -04:00
Joey Hess	f4b046252a	Run annex.thawcontent-command before deleting an object file In case annex.freezecontent-command did something that would prevent deletion. Sponsored-by: Dartmouth College's Datalad project	2022-02-24 14:11:02 -04:00
Joey Hess	28bc5ce232	ignore write bits being set when there is a freeze hook When annex.freezecontent-command is set, and the filesystem does not support removing write bits, avoid treating it as a crippled filesystem. The hook may be enough to prevent writing on its own, and some filesystems ignore attempts to remove write bits. Sponsored-by: Dartmouth College's Datalad project	2022-02-24 13:28:31 -04:00
Joey Hess	64ccb4734e	smudge: Warn when encountering a pointer file that has other content appended to it It will then proceed to add the file the same as if it were any other file containing possibly annexable content. Usually the file is one that was annexed before, so the new, probably corrupt content will also be added to the annex. If the file was not annexed before, the content will be added to git. It's not possible for the smudge filter to throw an error here, because git then just adds the file to git anyway. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 15:17:08 -04:00
Joey Hess	67245ae00f	fully specify the pointer file format This format is designed to detect accidental appends, while having some room for future expansion. Detect when an unlocked file whose content is not present has gotten some other content appended to it, and avoid treating it as a pointer file, so that appended content will not be checked into git, but will be annexed like any other file. Dropped the max size of a pointer file down to 32kb, it was around 80 kb, but without any good reason and certianly there are no valid pointer files anywhere that are larger than 8kb, because it's just been specified what it means for a pointer file with additional data even looks like. I assume 32kb will be good enough for anyone. ;-) Really though, it needs to be some smallish number, because that much of a file in git gets read into memory when eg, catting pointer files. And since we have no use cases for the extra lines of a pointer file yet, except possibly to add some human-visible explanation that it is a git-annex pointer file, 32k seems as reasonable an arbitrary number as anything. Increasing it would be possible, eg to 64k, as long as users of such jumbo pointer files didn't mind upgrading all their git-annex installations to one that supports the new larger size. Sponsored-by: Dartmouth College's Datalad project	2022-02-23 14:20:31 -04:00
Joey Hess	1c4b0b4c2b	releasing package git-annex version 10.20220222	2022-02-22 13:33:45 -04:00
Joey Hess	ce1b3a9699	info: Allow using matching options in more situations File matching options like --include will be rejected in situations where there is no filename to match against. (Or where there is a filename but it's not relative to the cwd, or otherwise seemed too bothersome to match against.) The addition of listKeys' was necessary to avoid using more memory in the common case of "git-annex info". Adding a filterM would have caused the list to buffer in memory and not stream. This is an ugly hack, but listKeys had previously run Annex operations inside unafeInterleaveIO (for direct mode). And matching against a matcher should hopefully not change any Annex state. This does allow for eg `git-annex info somefile --include=*.ext` although why someone would want to do that I don't really know. But it seems to make sense to allow it. But, consider: `git-annex info ./somefile --include=somefile` This does not match, so will not display info about somefile. If the user really wants to, they can `--include=./somefile`. Using matching options like --copies or --in=remote seems likely to be slower than git-annex find with those options, because unlike such commands, info does not have optimised streaming through the matcher. Note that `git-annex info remote` is not the same as `git-annex info --in remote`. The former shows info about all files in the remote. The latter shows local keys that are also in that remote. The output should make that clear, but this still seems like a point where users could get confused. Sponsored-by: Jochen Bartl on Patreon	2022-02-21 14:46:07 -04:00
Joey Hess	faf84aa5c2	Avoid git status taking a long time after git-annex unlock of many files. Implemented by making Git.Queue have a FlushAction, which can accumulate along with another action on files, and runs only once the other action has run. This lets git-annex unlock queue up git update-index actions, without conflicting with the restagePointerFiles FlushActions. In a repository with filter-process enabled, git-annex unlock will often not take any more time than before, though it may when the files are large. Either way, it should always slow down less than git-annex status speeds up. When filter-process is not enabled, git-annex unlock will slow down as much as git status speeds up. Sponsored-by: Jochen Bartl on Patreon	2022-02-18 15:06:40 -04:00
Joey Hess	07215cfeb5	complete annex.skipunknown transition annex.skipunknown now defaults to false, so commands like `git annex get foo*` will not silently skip over files/dirs that are not checked into git. Sponsored-by: Brock Spratlen on Patreon	2022-02-18 13:18:05 -04:00
Joey Hess	0edf01d7d4	registerurl,unregisterurl: rework output and support --json * registerurl, unregisterurl: Improved output when reading from stdin to be more like other batch commands. * registerurl, unregisterurl: Added --json and --json-error-messages options. Note that this did change the --batch output in a way that could possibly break something that expected the old output to never change. I think it's acceptable to break that because there has never been a guarantee of unchanging output format except with --batch for most commands. The old output was just really weird too! One possible wart is that "git-annex registerurl" with no options now seems to just hang, since it's waiting for stdin input. Before, it said "registerurl (stdin)" which was clearer about what's happenening. But this is a deprecated mode anyway, --batch makes clear what's happening. If anything, this problem would be a reason to eventually remove the support for reading from stdin w/o --batch. Sponsored-by: Dartmouth College's Datalad project	2022-02-14 13:29:20 -04:00
Joey Hess	6992250d63	fix obviously wrong attoparsec parser takeByteString can only be used at the end of a parser, not before other input. This was a dumb enough mistake that I audited the rest of the code base for similar mistakes. Pity that attoparsec cannot avoid it at the type level. Fixes git-annex forget propagation between repositories. (reversion introduced in version 7.20190122) Sponsored-by: Brock Spratlen on Patreon	2022-02-07 14:15:17 -04:00
Joey Hess	46d5098ff4	Pass --no-textconv when running git diff internally Seems that --no-ext-diff and -c diff.external= are not enough to disable external diff command when gitattributes textconv specifies it. I'm pretty sure that --no-ext-diff and -c diff.external= are not both needed, but not 100%. Something about -G may need the latter to fully disable diffs in some cases. So kept that part as it was. Sponsored-by: Dartmouth College's Datalad project	2022-02-01 13:43:18 -04:00
Joey Hess	a32ff6cef0	adb: Avoid find failing with "Argument list too long" The "+" argument only runs the command once, so is not safe to use. Using ";" instead would have been the simplest fix, but also the slowest. Since my phone has an xargs that supports -0, I piped find to xargs instead. Unsure how portable this will be, perhaps some android's don't have xargs -0 or find -printf to send null terminated output. The business with pipefail is necessary to make a failure of find cause the import to fail. Probably this works on all androids, but if not, it will probably just result in a failure of find being ignored. It would be possible to make ignorefinderror just disable setting pipefail, but then if some android has a shell that has pipefail enabled by default, ignorefinderror would not work, so I kept the \|\| true approach for that. Sponsored-by: Max Thoursie on Patreon	2022-01-31 13:19:09 -04:00
Joey Hess	e6e60b644b	releasing package git-annex version 10.20220127	2022-01-27 14:53:22 -04:00
Joey Hess	835c50966a	reject batch options combined with non-batch options Reject combinations of --batch (or --batch-keys) with options like --all or --key or with filenames. Most commands ignored the non-batch items when batch mode was enabled. For some reason, addurl and dropkey both processed first the specified non-batch items, followed by entering batch mode. Changed them to also error out, for consistency. Sponsored-by: Dartmouth College's Datalad project	2022-01-26 13:00:19 -04:00
Joey Hess	213185c788	clarify	2022-01-24 15:01:08 -04:00
Joey Hess	47084b8a1d	enable filter.annex.process in v9 This has tradeoffs, but is generally a win, and users who it causes git add to slow down unacceptably for can just disable it again. It needed to happen in an upgrade, since there are git-annex versions that do not support it, and using such an old version with a v8 repository with filter.annex.process set will cause bad behavior. By enabling it in v9, it's guaranteed that any git-annex version that can use the repository does support it. Although, this is not a perfect protection against problems, since an old git-annex version, if it's used with a v9 repository, will cause git add to try to run git-annex filter-process, which will fail. But at least, the user is unlikely to have an old git-annex in path if they are using a v9 repository, since it won't work in that repository. Sponsored-by: Dartmouth College's Datalad project	2022-01-21 13:11:18 -04:00
Joey Hess	7e7a7140ce	update for v10 Sponsored-by: Dartmouth College's Datalad project	2022-01-21 12:32:44 -04:00
Joey Hess	f54c58f0df	Avoid crashing when run in a bare git repo that somehow contains an index file Do not populate the keys database with associated files, because a bare repo has no working tree, and so it does not make sense to populate it. Queries of associated files in the keys database always return empty lists in a bare repo, even if it's somehow populated. One way it could be populated is if a user converts a non-bare repo to a bare repo. Note that Git.Config.isBare does a string comparison, so this is not free! But, that string comparison is very small compared to a sqlite query. Sponsored-by: Erik Bjäreholt on Patreon	2022-01-11 13:01:49 -04:00
Joey Hess	525473aa5a	adb: Added ignorefinderror configuration parameter On a phone with Calyxos, adb find in /sdcard complains: find: ./Android/data/com.android.providers.downloads.ui: Permission denied But otherwise works, so this option makes import and export work ok, except for that one app's data. Sponsored-by: Graham Spencer	2022-01-10 21:17:00 -04:00
Joey Hess	e95747a149	fix handling of corrupted data received from git remote Recover from corrupted content being received from a git remote due eg to a wire error, by deleting the temporary file when it fails to verify. This prevents a retry from failing again. Reversion introduced in version 8.20210903, when incremental verification was added. Only the git remote seems to be affected, although it is certianly possible that other remotes could later have the same issue. This only affects things passed to getViaTmp that return (False, UnVerified) due to verification failing. As far as getViaTmp can tell, that could just as well mean that the transfer failed in a way that would resume, so it cannot delete the temp file itself. Remote.Git and P2P.Annex use getViaTmp internally, while other remotes do not, which is why only it seems affected. A better fix perhaps would be to improve the types of the callback passed to getViaTmp, so that some other value could be used to indicate the state where the transfer succeeded but verification failed. Sponsored-by: Boyd Stephen Smith Jr.	2022-01-07 13:25:33 -04:00
Joey Hess	21c0d5be6e	comment	2022-01-07 12:27:19 -04:00
Joey Hess	e416635021	renameremote: Better handling of case where there are multiple special remotes with a name Instead of renaming one at random, error out and ask that a uuid be specified. Sponsored-by: Brett Eisenberg on Patreon	2022-01-05 15:24:02 -04:00
Joey Hess	58afb00f6e	enableremote: Better handling of the unusual case where multiple special remotes have been initialized with the same name Before it would pick one at random, though preferring ones that were not dead over dead ones. Now, if one is dead and the other not, it will use the non-dead one. But if both are not dead, or both dead, it will error out, suggesting the user clarify what they want to enable. Sponsored-by: Luke Shumaker on Patreon	2022-01-05 15:12:11 -04:00
Joey Hess	7e2f5edd68	avoid exporting non-annexed symlinks So that importing does not replace them with plain files. This works similarly to how the previous handling of submodules and matchers did, except that annexed symlinks still get exported as plain files of course, it's only non-annexed symlinks that it does not make sense to export. When symlinks have previously been exported, updating the export will unexport them after upgrading to this commit. Sponsored-by: Kevin Mueller on Patreon	2022-01-03 14:21:50 -04:00
Joey Hess	479ec0d533	releasing package git-annex version 8.20211231	2021-12-31 15:11:50 -04:00
Joey Hess	6d7ecd9e5d	merge git-annex branch in memory in read-only repository Improved support for using git-annex in a read-only repository, git-annex branch information from remotes that cannot be merged into the git-annex branch will now not crash it, but will be merged in memory. To avoid this making git-annex behave one way in a read-only repository, and another way when it can write, it's important that Annex.Branch.get return the same thing (modulo log file compaction) in both cases. This manages that mostly. There are some exceptions: - When there is a transition in one of the remote git-annex branches that has not yet been applied to the local or other git-annex branches. Transitions are not handled. - `git-annex log` runs git log on the git-annex branch, and so it will not be able to show information coming from the other, not yet merged branches. - Annex.Branch.files only looks at files in the git-annex branch and not unmerged branches. This affects git-annex info output. - Annex.Branch.hs.overBranchFileContents ditto. Affects --all and also importfeed (but importfeed cannot work in a read-only repo anyway). - CmdLine.Seek.seekFilteredKeys when precaching location logs. Note use of Annex.Branch.fullname - Database.ContentIdentifier.needsUpdateFromLog and updateFromLog These warts make this not suitable to be merged yet. This readonly code path is more expensive, since it has to query several branches. The value does get cached, but still large queries will be slower in a read-only repository when there are unmerged git-annex branches. When annex.merge-annex-branches=false, updateTo skips doing anything, and so the read-only repository code does not get triggered. So a user who is bothered by the extra work can set that. Other writes to the repository can still result in permissions errors. This includes the initial creation of the git-annex branch, and of course any writes to the git-annex branch. Sponsored-by: Dartmouth College's Datalad project	2021-12-27 13:21:15 -04:00
Joey Hess	5ff55f622d	improve sync message in export edge case sync: Better error message when unable to export to a remote because remote.name.annex-tracking-branch is configured to a ref that does not exist. It does not suggest how to fix the problem because there are several possible solutions: Change the git config to point to something that does exist, git add some files, or put files on the special remote that will be imported and so populate the ref. I considered just silently not doing anything, which is what it does when annex-tracking-branch = master and nothing has been committed to master yet. But it seems better to be explicit about it, since this is a fairly confusing situation to find yourself in. Sponsored-By: Max Thoursie on Patreon	2021-12-23 14:45:01 -04:00
Joey Hess	c2e46f4707	improve git command queue flushing with time limit So that eg, addurl of several large files that take time to download will update the index for each file, rather than deferring the index updates to the end. In cases like an add of many smallish files, where a new file is being added every few seconds. In that case, the queue will still build up a lot of changes which are flushed at once, for best performance. Since the default queue size is 10240, often it only gets flushed once at the end, same as before. (Notice that updateQueue updated _lastchanged when adding a new item to the queue without flushing it; that is necessary to avoid it flushing the queue every 5 minutes in this case.) But, when it takes more than a 5 minutes to add a file, the overhead of updating the index immediately is probably small, so do it after each file. This avoids git-annex potentially taking a very very long time indeed to stage newly added files, which can be annoying to the user who would like to get on with doing something with the files it's already added, eg using git mv to rename them to a better name. This is only likely to cause a problem if it takes say, 30 seconds to update the index; doing an extra 30 seconds of work after every 5 minute file add would be less optimal. Normally, updating the index takes significantly less time than that. On a SSD with 100k files it takes less than 1 second, and the index write time is bound by disk read and write so is not too much worse on a hard drive. So I hope this will not impact users, although if it does turn out to, the time limit could be made configurable. A perhaps better way to do it would be to have a background worker thread that wakes up every 60 seconds or so and flushes the queue. That is made somewhat difficult because the queue can contain Annex actions and so this would add a new source of concurrency issues. So I'm trying to avoid that approach if possible. Sponsored-by: Erik Bjäreholt on Patreon	2021-12-14 12:23:19 -04:00
Joey Hess	dbba231e06	Improve error message display when autoinit fails Due to eg, a permissions problem.	2021-12-09 14:38:12 -04:00
Joey Hess	4b19626a36	Fix build with ghc 9.0.1 Continuing along the same lines as commit `2739adc258`, it seems that while Remote -> Retriever expands to the same data type this changes it to, ghc 9.0.1 refuses to consider them equiviant. I guess it has something to do with the forall? The rest of the build all succeeds, although the stack build then crashes: Linking .stack-work/dist/x86_64-linux-tinfo6/Cabal-3.4.0.0/build/git-annex/git-annex ... Completed 233 action(s). Prelude.chr: bad argument: 2214592520 This issue seems likely to be about it: https://github.com/commercialhaskell/stack/pull/5508 I'm building with stack from debian, version 2.3.3, so a newer stack probably avoids that. Anyway, despite that stack problem, the git-annex binary is built, and works. The stack.yaml I used for this build was patched as follows: diff --git a/stack.yaml b/stack.yaml index 8dac87c15..62c4b5b9d 100644 --- a/stack.yaml +++ b/stack.yaml @@ -1,6 +1,6 @@ flags: git-annex: - production: true + production: false assistant: true pairing: true torrentparser: true @@ -14,7 +14,7 @@ flags: httpclientrestricted: true packages: - '.' -resolver: lts-18.13 +resolver: nightly-2021-09-07 extra-deps: - IfElse-0.85 - aws-0.22 Sponsored-by: Graham Spencer on Patreon	2021-12-08 15:08:02 -04:00
Joey Hess	ae4c56b28a	Revert "fix too early close of shared lock file" This reverts commit `66b2536ea0`. I misunderstood commit `ac56a5c2a0` and caused a FD leak when pid locking is not used. A LockHandle contains an action that will close the underlying lock file, and that action is run when it is closed. In the case of a shared lock, the lock file is opened once for each LockHandle, and only the one for the LockHandle that is being closed will be closed.	2021-12-06 12:51:28 -04:00
Joey Hess	ed0afbc36b	avoid concurrent threads trying to take pid lock at same time Seem there are several races that happen when 2 threads run PidLock.tryLock at the same time. One involves checkSaneLock of the side lock file, which may be deleted by another process that is dropping the lock, causing checkSaneLock to fail. And even with the deletion disabled, it can still fail, Probably due to linkToLock failing when a second thread overwrites the lock file. The same can happen when 2 processes do, but then one process just fails to take the lock, which is fine. But with 2 threads, some actions where failing even though the process as a whole had the pid lock held. Utility.LockPool.PidLock already maintains a STM lock, and since it uses LockShared, 2 threads can hold the pidlock at the same time, and when the first thread drops the lock, it will remain held by the second thread, and so the pid lock file should not get deleted until the last thread to hold it drops the lock. Which is the right behavior, and why a LockShared STM lock is used in the first place. The problem is that each time it takes the STM lock, it then also calls PidLock.tryLock. So that was getting called repeatedly and concurrently. Fixed by noticing when the shared lock is already held, and stop calling PidLock.tryLock again, just use the pid lock that already exists then. Also, LockFile.PidLock.tryLock was deleting the pid lock when it failed to take the lock, which was entirely wrong. It should only drop the side lock. Sponsored-by: Dartmouth College's Datalad project	2021-12-01 17:14:39 -04:00
Joey Hess	66b2536ea0	fix too early close of shared lock file This fixes a reversion introduced in commit `ac56a5c2a0`. I didn't notice there that it was handling the case of a shared lock file that was still open elsewhere by not running the close action. This was especially deadly when annex.pidlock is set, as it caused early deletion of the pid lock file. Sponsored-by: Dartmouth College's Datalad project	2021-12-01 17:06:28 -04:00
Joey Hess	567f63ba47	export: Avoid unncessarily re-exporting non-annexed files that were already exported Commit `b6e4ed9aa7` made non-annexed files be re-uploaded every time, since they're not tracked in the location log, and it made it check the location log. Don't do that for non-annexed files. Sponsored-by: Brock Spratlen on Patreon	2021-11-29 14:02:38 -04:00
Joey Hess	01a5ee6998	addurl, youtube-dl: When --check-raw prevents downloading an url, still continue with any downloads that come after it, rather than erroring out Sponsored-By: Mark Reidenbach on Patreon	2021-11-28 19:40:06 -04:00
Joey Hess	1d513540e9	Fix build with old versions of feed library	2021-11-23 16:06:51 -04:00
Joey Hess	74fcc389d8	releasing package git-annex version 8.20211123	2021-11-23 15:20:24 -04:00
Joey Hess	5a7f253974	support git 2.34.0's handling of merge conflict between annexed and non-annexed file This version of git -- or its new default "ort" resolver -- handles such a conflict by staging two files, one with the original name and the other named file~ref. Use unmergedSiblingFile when the latter is detected. (It doesn't do that when the conflict is between a directory and a file or symlink though, so see previous commit for how that case is handled.) The sibling file has to be deleted separately, because cleanConflictCruft may not delete it -- that only handles files that are annex links, but the sibling file may be the non-annexed file side of the conflict. The graftin code had assumed that, when the other side of a conclict is a symlink, the file in the work tree will contain the non-annexed content that we want it to contain. But that is not the case with the new git; the file may be the annex link and needs to be replaced with the content, while the annex link will be written as a -variant file. (The weird doesDirectoryExist check in graftin turns out to still be needed, test suite failed when I tried to remove it.) Test suite passes with new git with ort resolver default. Have not tried it with old git or other defaults. Sponsored-by: Noam Kremen on Patreon	2021-11-22 16:10:24 -04:00
Joey Hess	766720d093	soften language in changelog This bug mostly would happen when the downloads ran very fast or were all failing (how I reproduced it), because there have to be two downloads that finish very close to the same time to trigger the race. So most users of -J probably would not see much impact from the bug.	2021-11-19 12:52:22 -04:00
Joey Hess	623a775609	fix cat-file leak in get with -J Bugfix: When -J was enabled, getting files leaked a ever-growing number of git cat-file processes. (Since commit `dd39e9e255`) The leak happened when mergeState called stopNonConcurrentSafeCoProcesses. While stopNonConcurrentSafeCoProcesses usually manages to stop everything, there was a race condition where cat-file processes were leaked. Because catFileStop modifies Annex.catfilehandles in a non-concurrency safe way, and could clobber modifications made in between. Which should have been ok, since originally catFileStop was only used at shutdown. Note the comment on catFileStop saying it should only be used when nothing else is using the handles. It would be possible to make catFileStop race-safe, but it should just not be used in a situation where a race is possible. So I didn't bother. Instead, the fix is just not to stop any processes in mergeState. Because in order for mergeState to be called, dupState must have been run, and it enables concurrency mode, stops any non-concurrent processes, and so all processes that are running are concurrency safea. So there is no need to stop them when merging state. Indeed, stopping them would be extra work, even if there was not this bug. Sponsored-by: Dartmouth College's Datalad project	2021-11-19 12:51:08 -04:00
Joey Hess	31be0770a5	importfeed: Display url before starting youtube-dl download It was displaying a blank line before.	2021-11-17 13:23:55 -04:00
Joey Hess	c3af94eff4	releasing package git-annex version 8.20211117	2021-11-17 12:20:29 -04:00
Joey Hess	2bd778a46e	importfeed: Fix a crash when used in a non-unicode locale See comment for analysis. At first I thought I'd need to convert all T.unpack in git-annex, but luckily not -- so long as the Text is read from a file, the filesystem encoding is applied and T.unpack is fine. It's only when using Feed that the filesystem encoding is not applied. While this fixes the crash, it does result in some mojibake, eg: itemid=http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-��-questions/ Have not tracked that down, but it must be unrelated, because I've verified that it roundtrips when using encodeUf8: joey@darkstar:~/src/git-annex>LANG=C ghci Utility/FileSystemEncoding.hs ghci> useFileSystemEncoding ghci> Just f <- Text.Feed.Import.parseFeedFromFile "/home/joey/tmp/career_tools_podcasts.xml" ghci> Just (_, x) = Text.Feed.Query.getItemId (Text.Feed.Query.feedItems f !! 0) ghci> decodeBS (Data.Text.Encoding.encodeUtf8 x) "http://www.manager-tools.com/2014/01/choosing-a-company-work-chapter-7-\56546\56448\56467-questions/" ghci> writeFile "foo" $ decodeBS (Data.Text.Encoding.encodeUtf8 x) Writes a file containing the ENDASH character. Sponsored-by: Jochen Bartl on Patreon	2021-11-15 15:04:21 -04:00
Joey Hess	aa6e54ac6e	Fix a typo in the name of youtube-dl (reversion introduced in version 8.20210903)	2021-11-13 08:58:36 -04:00
Joey Hess	51b73ea1fc	migrate: New --remove-size option While intended for converting URL keys added by addurl --fast to be as if added by addurl --relaxed, it can also be used to remove size from other types of keys. Although that is not likely to be useful for checksummed keys, I suppose it could be used for WORM or other non-checksum keys. Specifying the --remove-size option does not prevent other migrations from taking effect if there's a key upgrade to perform, or if the backend has changed. So --backend=URL needs to be used to prevent migrating an URL key to the default backend. Note that it's not possible to use git-annex migrate to convert from a non-URL key to an URL key, as URL keys cannot be generated, except by addurl. So while this can get the same effect as --relaxed would have when addurl --fast was used, when --fast was not used, it won't work, or if --backend=URL is not used will remove the size but not prevent checksum verification, which is not useful. Due to this complexity, I decided not to mention it in the git-annex addurl man page. Sponsored-by: Jochen Bartl on Patreon	2021-11-12 13:28:28 -04:00
Joey Hess	f3326b8b5a	git-lfs gitlab interoperability fix git-lfs: Fix interoperability with gitlab's implementation of the git-lfs protocol, which requests Content-Encoding chunked. Sponsored-by: Dartmouth College's Datalad project	2021-11-10 13:51:11 -04:00
Joey Hess	9d3ce224e3	uninit edge cases * uninit: Avoid error message when no commits have been made to the repository yet. * uninit: Avoid error message when there is no git-annex branch. Sponsored-by: Svenne Krap on Patreon	2021-11-08 16:47:00 -04:00
Joey Hess	68257e9076	add git-annex filter-process filter-process: New command that can make git add/checkout faster when there are a lot of unlocked annexed files or non-annexed files, but that also makes git add of large annexed files slower. Use it by running: git config filter.annex.process 'git-annex filter-process' Fully tested and working, but I have not benchmarked it at all. And, incremental hashing is not done when git add uses it, so extra work is done in that case. Sponsored-by: Mark Reidenbach on Patreon	2021-11-04 15:02:36 -04:00
Joey Hess	438e5b56aa	tighter --json parsing for metadata metadata --batch --json: Reject input whose "fields" does not consist of arrays of strings. Such invalid input used to be silently ignored. Used to be that parseJSON for a JSONActionItem ran parseJSON separately for the itemAdded, and if that failed, did not propagate the error. That allowed different items with differently named fields to be parsed. But it was actually only used to parse "fields" for metadata, so that flexability is not needed. The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed only because of the wonky way Command.MetaData adds onto the started json object. Note that this line got a dummy type signature added, just because the type checker needs it to be some type. itemFields = Nothing :: Maybe Bool Since it's Nothing, it doesn't really matter what type it is, and the value gets turned into json and is then thrown away. Sponsored-by: Kevin Mueller on Patreon	2021-11-01 14:42:37 -04:00
Joey Hess	80f1354685	metadata --batch: Avoid crashing when a non-annexed file is input Turns out that CommandStart actions do not have their exceptions caught, which is why the giveup was causing a crash. Mostly these actions do not do very much work on their own, but it does seem possible there are other commands whose CommandStart also throws an exception. So, my first attempt at a fix was to catch those exceptions. But, --json-error-messages then causes a difficulty, because in order to output a json error message, an action needs to have been started; that sets up the json object that the error message will be included in a field of. While it would be possible to output an object with just an error field, this would be json output of a format that the user has no reason to expect, that happens only in an exceptional circumstance. That is something I have always wanted to avoid with the json output; while git-annex man pages don't document what the json looks like, the output has always been made to be self-describing. Eg, it includes "error-messages":[] even when there's no errors. With that ruled out, it doesn't seem a good idea to catch CommandStart exceptions and display the error to stderr when --json-error-messages is set. And so I don't know if it makes sense to catch exceptions from that at all. Maybe I'd have a different opinion if --json-error-messages did not exist though. So instead, output a blank line like other batch commands do. This also leaves open the possibility of implementing support for matching object with metadata --json, which would also want to output a blank line when the input didn't match. Sponsored-by: Dartmouth College's DANDI project	2021-11-01 13:40:43 -04:00
Joey Hess	c260833a6b	releasing package git-annex version 8.20211028	2021-10-28 12:00:56 -04:00
Joey Hess	eb95ed4863	fix addurl concurrency issue addurl: Support adding the same url to multiple files at the same time when using -J with --batch --with-files. Implementation was easier than expected, was able to reuse OnlyActionOn. While it will download the url's content multiple times, that seems like the best thing to do; see my comment for why. Sponsored-by: Dartmouth College's DANDI project	2021-10-27 16:15:41 -04:00
Joey Hess	669037862a	avoid redundant freezeContent call This opens the potential for the object file to be in place but git-annex is interrupted before it can freeze it. git-annex fsck already fixes that situation, which can also occur when lockContentForRemoval thaws content. Also improve comment to not be Windows-specific.	2021-10-27 14:18:10 -04:00
Joey Hess	0756625e1b	update, bugfix also fixed git-annex info	2021-10-27 12:22:02 -04:00
Joey Hess	b2c48fb86b	Fix using lookupkey inside a subdirectory Caused by dirContains ".." "foo" being incorrectly False. Also added a test of dirContains, which includes all the previous bug fixes I could find and some obvious cases. Reversion in version 8.20211011 Sponsored-by: Brett Eisenberg on Patreon	2021-10-26 15:00:45 -04:00
Joey Hess	5a9e6b1fd4	when private journal file exists, still read from git-annex branch Fix bug that caused stale git-annex branch information to read when annex.private or remote.name.annex-private is set. The private journal file should not prevent reading more current information from the git-annex branch, but used to. Note that, overBranchFileContents has to do additional work now, when there's a private journal file, it reads from the branch redundantly and more slowly. Sponsored-by: Jack Hill on Patreon	2021-10-26 13:43:50 -04:00
Joey Hess	2801528eb2	oops, I misread, still happens for adjusted branches	2021-10-20 13:45:56 -04:00
Joey Hess	f7b5a5c9ed	changelog A user tested `0f38ad9a69` on WSL, and it seems to have fixed the problem.	2021-10-20 13:26:01 -04:00
Joey Hess	f4bdecc4ec	improve sqlite MultiWriter handling of read after write This removes a messy caveat that was easy to forget and caused at least one bug. The price paid is that, after a write to a MultiWriter db, it has to close the db connection that it had been using to read, and open a new connection. So it might be a little bit slower. But, writes are usually batched together, so there's often only a single write, and so there should not be much of a slowdown. Notice that SingleWriter already closed the db connection after a write, so paid the same overhead. This is the second try at fixing a bug: git-annex get when run as the first git-annex command in a new repo did not populate all unlocked files. (Reversion in version 8.20210621) Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-19 15:13:29 -04:00
Joey Hess	29d687dce9	When retrival from a chunked remote fails, display the error that occurred when downloading the chunk Rather than the error that occurred when trying to download the unchunked content, which is less likely to actually be stored in the remote. Sponsored-by: Boyd Stephen Smith Jr. on Patreon	2021-10-14 12:45:05 -04:00
Joey Hess	b36cc0320e	avoid crashing tilde expansion on user who does not exist git does not crash when there's a remote configured for a user who does not exist, and this prevents git-annex from crashing too. Consider that a user might exist on one system but not another, and the git repo be moved between systems. So not crashing is desirable. Note that git fetch seems to mishandle a remote path like ~foo/bar when the user does not exist. While it does access ./~foo/bar, and gets as far as running git-upload-pack on the path, it then complains there is no such repo. So different parts of git seem to be doing different things in that edge case. Anyway, git-annex does not need to be bug-for-bug compatible with git. Sponsored-by: Jack Hill on Patreon	2021-10-13 09:16:36 -04:00
Joey Hess	c2a44eab50	move gpg tmp home to system temp dir test: Put gpg temp home directory in system temp directory, not filesystem being tested. Since I've found indications gpg can fail talking to the agent when the socket ends up on eg, fat. And to hopefully fix this bug report I've followed up on. The main risk in using the system temp dir is that TMPDIR could be set to a long directory path, which is too long to put a unix socket in. To partially amelorate that risk, it uses either an absolute or a relative path, whichever is shorter. (Hopefully gpg will not convert it to a longer form of the path..) If the user sets TMPDIR to something so long a path to it + "S.gpg-agent" is too long, I suppose that's their issue to deal with. Sponsored-by: Dartmouth College's Datalad project	2021-10-12 13:29:56 -04:00
Joey Hess	17a0fa3dbc	negotiate P2P protocol version for tor remotes This negotiation is not supported by versions of git-annex older than 6.20180312. Well, maybe really 6.20180227 or so, but using that in the changelog simplifies things since it was the version for the other changes as well. See commit `c81768d425` for the back story. As well as allowing for future protocol improvements, this will result in negoatiating protocol version 1, which is an improvement over default version 0. In fact, it looks like no supported version of git-annex will use protocol version 0, since version 1 was introduced in 6.20180227. Still, removing the code for version 0 seems unncessary. See commit `31e1adc005`. Sponsored-by: Brett Eisenberg on Patreon.	2021-10-11 15:58:51 -04:00
Joey Hess	f8816d2b92	remove list of removed commands the list was wrong and also users shouldn't need to know	2021-10-11 15:43:19 -04:00
Joey Hess	7bdc7350a5	remove git-annex-shell compat code * Removed support for accessing git remotes that use versions of git-annex older than 6.20180312. * git-annex-shell: Removed several commands that were only needed to support git-annex versions older than 6.20180312. (lockcontent, recvkey, sendkey, transferinfo, commit) The P2P protocol was added in that version, and used ever since, so this code was only needed for interop with older versions. "git-annex-shell commit" is used by newer git-annex versions, though unnecessarily so, because the p2pstdio command makes a single commit at shutdown. Luckily, it was run with stderr and stdout sent to /dev/null, and non-zero exit status or other exceptions are caught and ignored. So, that was able to be removed from git-annex-shell too. git-annex-shell inannex, recvkey, sendkey, and dropkey are still used by gcrypt special remotes accessed over ssh, so those had to be kept. It would probably be possible to convert that to using the P2P protocol, but it would be another multi-year transition. Some git-annex-shell fields were able to be removed. I hoped to remove all of them, and the very concept of them, but unfortunately autoinit is used by git-annex sync, and gcrypt uses remoteuuid. The main win here is really in Remote.Git, removing piles of hairy fallback code. Sponsored-by: Luke Shumaker	2021-10-11 15:36:51 -04:00
Joey Hess	e28cf82b45	releasing package git-annex version 8.20211011	2021-10-11 12:53:17 -04:00
Joey Hess	022bb6174c	Merge branch 'borgchunks'	2021-10-08 13:26:45 -04:00
Joey Hess	69f8e6c7c0	ImportableContentsChunkable This improves the borg special remote memory usage, by letting it only load one archive's worth of filenames into memory at a time, and building up a larger tree out of the chunks. When a borg repository has many archives, git-annex could easily OOM before. Now, it will use only memory proportional to the number of annexed keys in an archive. Minor implementation wart: Each new chunk re-opens the content identifier database, and also a new vector clock is used for each chunk. This is a minor innefficiency only; the use of continuations makes it hard to avoid, although putting the database handle into a Reader monad would be one way to fix it. It may later be possible to extend the ImportableContentsChunkable interface to remotes that are not third-party populated. However, that would perhaps need an interface that does not use continuations. The ImportableContentsChunkable interface currently does not allow populating the top of the tree with anything other than subtrees. It would be easy to extend it to allow putting files in that tree, but borg doesn't need that so I left it out for now. Sponsored-by: Noam Kremen on Patreon	2021-10-08 13:15:22 -04:00
Joey Hess	1c11dd4793	avoid cursor jitter when updating progress display When the progress display gets longer, and then shorter again, it causes the cursor to jitter back and forth. Somehow I never noticed this until this morning, but then it became intolerable to watch. To fix it, pad the progress display to the maximum length it's occupied. Sponsored-by: Svenne Krap on Patreon	2021-10-07 11:16:41 -04:00
Joey Hess	45dfddd33f	convert ExportLocation to ShortByteString to avoid PINNED memory fragmentation This adds the overhead of a copy whenever converting to/from ExportLocation and ImportLocation. borg: Some improvements to memory use when importing a lot of archives. (It's still pretty bad.) Sponsored-by: Mark Reidenbach on Patreon	2021-10-05 14:51:55 -04:00
Joey Hess	9012fa0187	reinject: Fix crash when reinjecting a file from outside the repository Commit `4bf7940d6b` introduced this problem, but was otherwise doing a good thing. Problem being that fileRef "/foo" used to return ":./foo", which was actually wrong, but as long as there was no foo in the local repository, catKey could operate on it without crashing. After that fix though, fileRef would return eg "../../foo", resulting in fileRef returning ":./../../foo", which will make git cat-file crash since that's not a valid path in the repo. Fix is simply to make fileRef detect paths outside the repo and return Nothing. Then catKey can be skipped. This needed several bugfixes to dirContains as well, in previous commits. In Command.Smudge, this led to needing to check for Nothing. That case should actually never happen, because the fileoutsiderepo check will detect it earlier. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 14:06:34 -04:00
Joey Hess	b9a1cc512d	avoid uncessary call to inAnnex sync --content: Avoid a redundant checksum of a file that was incrementally verified, when used on NTFS and perhaps other filesystems. When sync has just gotten the content, it does not need to check inAnnex a second time. On NTFS, for some reason the write of the inode cache after it gets the content is not immediately able to be read, and with an empty/non-matching inode cache due to that stale data, inAnnex falls back to hashing the whole object to determine if it's present. Sponsored-by: Brock Spratlen on Patreon	2021-10-01 12:02:35 -04:00
Joey Hess	b9aa2ce8d1	resume properly when copying a file to/from a local git remote is interrupted (take 2) This method avoids breaking test_readonly. Just check if the dest file exists, and avoid CoW probing when it does, so when CoW probing fails, it can resume where the previous non-CoW copy left off. If CoW has been probed already to work, delete the dest file since a CoW copy will presumably work. It seems like it would be almost as good to just skip CoW copying in this case too, but consider that the dest file might have started to be copied from some other remote, not using CoW, but CoW has been probed to work to copy from the current place. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:03:01 -04:00
Joey Hess	7ccf642863	revert change that broke test_readonly commit `63d508e885` broke test_readonly. When a local git remote is readonly, tryCopyCoW run to copy a file from it failed at withOtherTmp. Sponsored-by: Dartmouth College's Datalad project	2021-09-27 16:02:41 -04:00
Joey Hess	9ea8106bb0	sped up git-annex smudge --clean by 25% Disabling git-annex branch update for this command is ok, because it does not use any information from the branch, but only logs the location when it adds a key. Sponsored-by: Dartmouth College's Datalad project	2021-09-24 14:15:20 -04:00

1 2 3 4 5 ...

1498 commits