git-annex

Author	SHA1	Message	Date
Joey Hess	28720c795f	limit url downloads to whitelisted schemes Security fix! Allowing any schemes, particularly file: and possibly others like scp: allowed file exfiltration by anyone who had write access to the git repository, since they could add an annexed file using such an url, or using an url that redirected to such an url, and wait for the victim to get it into their repository and send them a copy. * Added annex.security.allowed-url-schemes setting, which defaults to only allowing http and https URLs. Note especially that file:/ is no longer enabled by default. * Removed annex.web-download-command, since its interface does not allow supporting annex.security.allowed-url-schemes across redirects. If you used this setting, you may want to instead use annex.web-options to pass options to curl. With annex.web-download-command removed, nearly all url accesses in git-annex are made via Utility.Url via http-client or curl. http-client only supports http and https, so no problem there. (Disabling one and not the other is not implemented.) Used curl --proto to limit the allowed url schemes. Note that this will cause git annex fsck --from web to mark files using a disallowed url scheme as not being present in the web. That seems acceptable; fsck --from web also does that when a web server is not available. youtube-dl already disabled file: itself (probably for similar reasons). The scheme check was also added to youtube-dl urls for completeness, although that check won't catch any redirects it might follow. But youtube-dl goes off and does its own thing with other protocols anyway, so that's fine. Special remotes that support other domain-specific url schemes are not affected by this change. In the bittorrent remote, aria2c can still download magnet: links. The download of the .torrent file is otherwise now limited by annex.security.allowed-url-schemes. This does not address any external special remotes that might download an url themselves. Current thinking is all external special remotes will need to be audited for this problem, although many of them will use http libraries that only support http and not curl's menagarie. The related problem of accessing private localhost and LAN urls is not addressed by this commit. This commit was sponsored by Brett Eisenberg on Patreon.	2018-06-16 11:57:50 -04:00
Joey Hess	0b7f6d24d3	rename BlobType and add submodule to it This was badly named, it's a not a blob necessarily, but anything that a tree can refer to. Also removed the Show instance which was used for serialization to git format, instead use fmtTreeItemType. This commit was supported by the NSF-funded DataLad project.	2018-05-14 14:45:41 -04:00
Joey Hess	89e1a05a8f	Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale As long as all code imports Utility.Aeson rather than Data.Aeson, and no Strings that may contain utf-8 characters are used for eg, object keys via T.pack, this is guaranteed to fix the problem everywhere that git-annex generates json. It's kind of annoying to need to wrap ToJSON with a ToJSON', especially since every data type that has a ToJSON instance has to be ported over. However, that only took 50 lines of code, which is worth it to ensure full coverage. I initially tried an alternative approach of a newtype FileEncoded, which had to be used everywhere a String was fed into aeson, and chasing down all the sites would have been far too hard. Did consider creating an intentionally overlapping instance ToJSON String, and letting ghc fail to build anything that passed in a String, but am not sure that wouldn't pollute some library that git-annex depends on that happens to use ToJSON String internally. This commit was supported by the NSF-funded DataLad project.	2018-04-16 16:21:21 -04:00
Joey Hess	6063b3df3f	Dial back optimisation when building on arm Prevent ghc and llc from running out of memory when optimising some files. Sean Whitton reported that doing this only in Test.hs was insufficient, the build still OOMed by the time it got to Test.hs. He had earlier found the build worked when these options are applied globally. See https://ghc.haskell.org/trac/ghc/ticket/14821 for why it needs -O1; once that's fixed it may suffice to use "GHC-Options: -O2 -optlo-O2", although it may also be that the -O1 prevents ghc from using/leaking as much memory. os(arm) should match armel, armhf, armeb, and arm. It probably also matches arm64, somewhat unfortunately since arm64 systems probably tend to have more memory. See list of arches in https://hackage.haskell.org/package/Cabal-1.22.2.0/docs/src/Distribution-System.html This commit was sponsored by Henrik Riomar on Patreon.	2018-03-04 19:48:07 -04:00
Joey Hess	8ccfbd14d0	Split Test.hs and avoid optimising it much, to need less memory to compile. The ghc options were found by Sean Whitton; the debian arm autobuilders need those to build w/o OOM, and it seems to involve llvm using too much memory to optimize Test. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2018-02-18 11:48:48 -04:00
Joey Hess	25703e1413	finally really add back custom-setup stanza Fourth or fifth try at this and finally found a way to make it work. Absurd amount of busy-work forced on me by change in cabal's behavior. Split up Utility modules that need posix stuff out of ones used by Setup. Various other hacks around inability for Setup to use anything that ifdefs a use of unix. Probably lost a full day of my life to this. This is how build systems make their users hate them. Just saying.	2017-12-31 16:36:39 -04:00
Joey Hess	1f5bf73af0	Revert "git-annex.cabal: Add back custom-setup stanza, so cabal new-build works." This reverts commit `51228c2306`. No, still doesn't work when built with cabal. It did with stack; stack must somehow make the unix package implicitly available. With cabal, System.Posix.Process and System.Posix.Env are both missing.	2017-12-31 14:09:41 -04:00
Joey Hess	51228c2306	git-annex.cabal: Add back custom-setup stanza, so cabal new-build works. Seems I had all the work in past commits to make this build, at least on linux. I'm actually surprised it does, without a unix dep, Utility.Env still builds ok somehow despite using System.Posix.Env. This commit was sponsored by Fernando Jimenez on Patreon.	2017-12-31 13:54:41 -04:00
Joey Hess	79857d7e9f	Removed the testsuite build flag Test suite is always included. Building with this flag disabled has actually been broken for some time, since Command.TestRemote uses tasty. Fewer build flags are better, so good time to drop it. This commit was sponsored by Thomas Hochstein on Patreon.	2017-12-20 12:25:03 -04:00
Joey Hess	308cd1383c	fold Build/SysConfig.hs into BuildInfo via include This avoids warnings from stack about the module not being listed in the cabal file. So, the generated file is also renamed to Build/SysConfig. Note that the setup program seems to be cached despite these changes; I had to cabal clean to get cabal to update it so that Build/SysConfig was written. This commit was sponsored by Jochen Bartl on Patreon.	2017-12-14 12:46:57 -04:00
Joey Hess	3cc94c1667	.noannex file A top-level .noannex file will prevent git-annex init from being used in a repository. This is useful for repositories that have a policy reason not to use git-annex. The content of the file will be displayed to the user who tries to run git-annex init. This also affects git annex reinit and initialization via the webapp. It does not affect automatic inits, when there's a sibling git-annex branch already. This commit was supported by the NSF-funded DataLad project.	2017-12-13 14:34:32 -04:00
Joey Hess	429132f496	try again to avoid directory removal issues on NFS `af6068525a` seems to not have worked; though the keys database should not have any files open after closeDb, NFS seems to be creating some files where while the directory is being removed, which causes the removal to fail. So instead, try renaming the directory out of the way. This commit was supported by the NSF-funded DataLad project.	2017-12-05 14:25:51 -04:00
Joey Hess	1b6cbb63e9	still can't express custom-setup deps They need unix on non-windows, for Utility.Env, which Build.Configure uses, but cabal can't express that in a custom-setup stanza. To avoid this problem, Utility.Env would need to be moved into unix-compat..	2017-11-14 14:59:51 -04:00
Joey Hess	8d68112be5	split out setEnv to avoid adding dep Windows needs the setenv package in custom-setup, but I don't want to pull it in on unix, which would probably break some builds and need more work. Instead, split out setEnv to a separate module. Quite likely, unix-compat will get a portable environment layer, and then both modules can be removed from here. This commit was sponsored by Øyvind Andersen Holm.	2017-11-14 14:28:49 -04:00
Joey Hess	e696b086dc	avoid build warning on windows	2017-10-24 12:24:06 -04:00
Joey Hess	5c32196a37	fix process and FD leak Fix process and file descriptor leak that was exposed when git-annex was built with ghc 8.2.1. Apparently ghc has changed its behavior of GC of open file handles that are pipes to running processes. That broke git-annex test on OSX due to running out of FDs. Audited for all uses of Annex.new and made stopCoProcesses be called once it's done with the state. Fixed several places that might have leaked in other situations than running the test suite. This commit was sponsored by Ewen McNeill.	2017-09-29 22:36:08 -04:00
Joey Hess	f84e34883c	test: Fix reversion that made it only run inside a git repository. Using annexeval to run probeCrippledFileSystem' caused Git.CurrentRepo.get to be run. Fixed easily since probeCrippledFileSystem' had no need to use the Annex monad. This commit was sponsored by Ethan Aubin.	2017-09-29 15:08:18 -04:00
Joey Hess	db2a06b66f	init: Display an additional message when it detects a filesystem that allows writing to files whose write bit is not set.	2017-08-28 13:21:18 -04:00
Joey Hess	d39c120afa	add annex-ignore-command and annex-sync-command configs Added remote configuration settings annex-ignore-command and annex-sync-command, which are dynamic equivilants of the annex-ignore and annex-sync configurations. For this I needed a new DynamicConfig infrastructure. Its implementation should be as fast as before when there is no dynamic config, and it caches so shell commands are only run once. Note that annex-ignore-command exits nonzero when the remote should be ignored. While that may seem backwards, it allows using the same command for it as for annex-sync-command when you want to disable both. This commit was sponsored by Trenton Cronholm on Patreon.	2017-08-17 13:54:14 -04:00
Joey Hess	8526cd7c92	test: Avoid most situations involving failure to delete test directories By forking a worker process and only deleting the test directory once it exits. This way, if a test leaves files open, they'll get closed when the worker exits, so avoiding failure to delete open files on Windows, and failure to delete directories due to NFS lock files. If a test leaves a git worker process running, the closed pipes should cause the worker to exit too, also avoiding the problem there. The 10 second sleep ought to give plenty of time for such worker processes to exit, although this is of course a race. Finally, even if test directory fails to be deleted still, it won't appear as if the last test in the test suite failed; the error will be displayed at the very end. This commit was supported by the NSF-funded DataLad project.	2017-08-14 16:29:47 -04:00
Joey Hess	af6068525a	Fix a git-annex test failure when run on NFS due to NFS lock files preventing directory removal. Should fix this: lock (v6 --force): FAIL Exception: .git/annex/keys: removeDirectoryRecursive: unsatisfied constraints (Directory not empty) Verified that the test case still catches the regression it's meant to. This commit was supported by the NSF-funded DataLad project.	2017-08-14 15:11:42 -04:00
Joey Hess	2cecc8d2a3	Added GIT_ANNEX_VECTOR_CLOCK environment variable Can be used to override the default timestamps used in log files in the git-annex branch. This is a dangerous environment variable; use with caution. Note that this only affects writing to the logs on the git-annex branch. It is not used for metadata in git commits (other env vars can be set for that). There are many other places where timestamps are still used, that don't get committed to git, but do touch disk. Including regular timestamps of files, and timestamps embedded in some files in .git/annex/, including the last fsck timestamp and timestamps in transfer log files. A good way to find such things in git-annex is to get for getPOSIXTime and getCurrentTime, although some of the results are of course false positives that never hit disk (unless git-annex gets swapped out..) So this commit does NOT necessarily make git-annex comply with some HIPPA privacy regulations; it's up to the user to determine if they can use it in a way compliant with such regulations. Benchmarking: It takes 0.00114 milliseconds to call getEnv "GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log files can be written with an added overhead of only 0.114 seconds. That should be by far swamped by the actual overhead of writing the log files and making the commit containing them. This commit was supported by the NSF-funded DataLad project.	2017-08-14 14:19:58 -04:00
Joey Hess	da8e84efe9	fix failing quickcheck properties QuickCheck 2.10 found a counterexample eg "\929184" broke the property. As far as I can tell, Git.Filename is matching how git handles encoding of strange high unicode characters in filenames for display. Git does not display high unicode characters, and instead displays the C-style escaped form of each byte. This is ambiguous, but since git is not unicode aware, it doesn't need to roundtrip parse it. So, making Git.FileName's roundtrip test only chars < 256 seems fine. Utility.Format.format uses encode_c, in order to mimic git, so that's ok. Utility.Format.gen uses decode_c, but only so that stuff like "\n" in the format string is handled. If the format string contains C-style octal escapes, they will be converted to ascii characters, and not combined into unicode characters, but that should not be a problem. If the user wants unicode characters, they can include them in the format string, without escaping them. Finally, decode_c is used by Utility.Gpg.secretKeys, because gpg --with-colons hex-escapes some characters in particular ':' and '\\'. gpg passes unicode through, so this use of decode_c is not a problem. This commit was sponsored by Henrik Riomar on Patreon.	2017-06-17 16:48:00 -04:00
Joey Hess	61b7af97ec	fix test suite breakage caused by GIT_ANNEX_USE_GIT_SSH	2017-04-07 16:56:04 -04:00
Joey Hess	df2639218a	improve git-annex-shell exit status propigation	2017-03-20 12:45:10 -04:00
Joey Hess	1484422e19	test move with ssh remote	2017-03-17 19:18:45 -04:00
Joey Hess	002513e194	test suite infra for testing mocked ssh remotes This commit was supported by the NSF-funded DataLad project.	2017-03-17 19:14:41 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	ca0daa8bb8	factor non-type stuff out of Key	2017-02-24 13:42:30 -04:00
Edward Betts	0750913136	correct spelling mistakes	2017-02-12 17:30:23 -04:00
Joey Hess	e7e36b6e72	import: Changed how --deduplicate, --skip-duplicates, and --clean-duplicates determine if a file is a duplicate Before, only content known to be present somewhere was considered a duplicate. Now, any content that has been annexed before will be considered a duplicate, even if all annexed copies of the data have been lost. Note that --clean-duplicates and --deduplicate still check numcopies, so won't delete duplicate files unless there's an annexed copy. This makes import use the same method as reinject --known. The man page already said that duplicate meant "its content is either present in the local repository already, or git-annex knows of another repository that contains it, or it was present in the annex before but has been removed now". So, this is really only bringing the implementation into line with the man page. This commit was sponsored by Jochen Bartl on Patreon.	2017-02-07 17:41:58 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	ee309d6941	lock: Fix edge cases where data loss could occur in v6 mode. In the case where the pointer file is in place, and not the content of the object, lock's performNew was called with filemodified=True, which caused it to try to repopulate the object from an unmodified associated file, of which there were none. So, the content of the object got thrown away incorrectly. This was the cause (although not the root cause) of data loss in https://github.com/datalad/datalad/issues/1020 The same problem could also occur when the work tree file is modified, but the object is not, and lock is called with --force. Added a test case for this, since it's excercising the same code path and is easier to set up than the problem above. Note that this only occurred when the keys database did not have an inode cache recorded for the annex object. Normally, the annex object would be in there, but there are of course circumstances where the inode cache is out of sync with reality, since it's only a cache. Fixed by checking if the object is unmodified; if so we don't need to try to repopulate it. This does add an additional checksum to the unlock path, but it's already checksumming the worktree file in another case, so it doesn't slow it down overall. Further investigation found a similar problem occurred when smudge --clean is called on a file and the inode cache is not populated. cleanOldKeys deleted the unmodified old object file in this case. This was also fixed by checking if the object is unmodified. In general, use of getInodeCaches and sameInodeCache is potentially dangerous if the inode cache has not gotten populated for some reason. Better to use isUnmodified. I breifly auited other places that check the inode cache, and did not see any immediate problems, but it would be easy to miss this kind of problem.	2016-10-17 13:58:43 -04:00
Joey Hess	66ebf1a8f9	add test case for sync_in_adjusted_branch_deleted_recently_added_files This commit was sponsored by Denis Dzyubenko on Patreon.	2016-10-11 14:22:49 -04:00
Joey Hess	aee5db0d47	squelch build warning on windows	2016-09-06 14:59:32 -04:00
Joey Hess	870873bdaa	Removed dependency on json library; all JSON is now handled by aeson. I've eyeballed all --json commands, and the only difference should be that some fields are re-ordered.	2016-07-26 19:15:34 -04:00
Joey Hess	0c713a94bd	uninit: Fix crash due to trying to write to deleted keys db. Reversion introduced by v6 mode support, affects v5 too. Also fix a similar crash when the webapp is used to delete a repository.	2016-07-12 14:18:35 -04:00
Joey Hess	b0682f2b5f	add test case for http://git-annex.branchable.com/bugs/Assistant_keeps_deleting_all_the_files_in_my_repo/	2016-06-13 12:59:10 -04:00
Joey Hess	b66e517b28	reproduced	2016-06-13 12:38:11 -04:00
Joey Hess	0ea1969275	update test suite for lock/unlock with missing file content change in v6	2016-06-09 16:16:39 -04:00
Joey Hess	5ed3e6df3c	better failure diagnosis	2016-06-03 12:59:11 -04:00
Joey Hess	b9ce477fa2	plumb RemoteGitConfig through to decryptCipher	2016-05-23 17:33:32 -04:00
Joey Hess	80b86ff78d	fix recent test suite reversion git annex adjust --force will overwrite any current adjusted branch. I didn't document this because for the user, deleting the branch is just as good.	2016-05-23 11:23:30 -04:00
Joey Hess	8f1525e35b	fix test suite breakage	2016-05-23 11:06:36 -04:00
Joey Hess	0273cd5005	adjusted branches need git 2.2.0 or newer When git-annex is used with a git version older than 2.2.0, disable support for adjusted branches, since GIT_COMMON_DIR is needed to update them and was first added in that version of git.	2016-04-22 12:29:32 -04:00
Joey Hess	d13819b503	use a separate tmp dir for the test home	2016-04-20 15:27:59 -04:00
Joey Hess	1de54ac671	a couple of tests chdir in ways that need an absolute path in the overridden HOME	2016-04-20 15:09:46 -04:00
Joey Hess	c3fdaf764d	Isolate test suite from global git config settings.	2016-04-20 15:04:38 -04:00
Joey Hess	ded990be8f	add simple test for conflict resolution in adjusted branch This is not really extensive enough, but a start..	2016-04-12 13:01:31 -04:00
Joey Hess	3120b63560	don't assume git-annex is in path when calling itself from test suite	2016-02-16 16:29:04 -04:00

1 2 3 4 5

242 commits