git-annex

Author	SHA1	Message	Date
Joey Hess	7e69063a29	support annex.shared-sop-command for encryption=shared This works well, and it interoperates with gpg in my testing (although some SOP commands might choose to use a profile that does not so caveat emptor). Note that for creating the Cipher, gpg --gen-random is still used. SOP does not have an eqivilant, and as long as the user has gpg around, which seems likely, it doesn't matter that it uses gpg here, it's not being used for encryption. That seemed better than implementing a second way to get high quality entropy, at least for now. The need for the sop command to run in an empty directory has each call to encrypt and decrypt creating a new temporary directory. That is some unncessary overhead, though probably swamped by the overhead of running the sop command. This could be improved in the future by passing an already empty directory to them, or a sufficiently empty directory (.git/annex/tmp would probably suffice). Sponsored-by: Brett Eisenberg on Patreon	2024-01-12 13:31:18 -04:00
Joey Hess	dd3e779020	more groundwork for StatelessOpenPGP no behavior changes	2024-01-12 13:11:36 -04:00
Joey Hess	790600f7b2	close send side of password pipe on exec This avoids a hang approximately 1% of the time when running the test suite on StatelessOpenPGP. Since I've not seen git-annex hang when running git like that, I guess git probably does something that avoids hanging similarly. Still, fixed the same problem in Utility.Gpg too. Sponsored-by: Kevin Mueller on Patreon	2024-01-10 17:31:58 -04:00
Joey Hess	d98f02a5b0	test annex.shared-sop-command Test a specified Stateless OpenPGP command with eg: git-annex test --test-git-config annex.shared-sop-command=sqop Also documented that config and another one, but so far only the test suite uses the configs, have not yet implemented using it for actual symmetric encryption. Sponsored-by: Joshua Antonishen on Patreon	2024-01-10 16:30:38 -04:00
Joey Hess	812cbf0e17	Stateless OpenPGP interface Implemented according to https://www.ietf.org/archive/id/draft-dkg-openpgp-stateless-cli-09.html#name-encrypt-encrypt-a-message Not yet used by git-annex. Sponsored-by: Leon Schuermann on Patreon	2024-01-10 15:59:35 -04:00
Joey Hess	b728e935bc	correct comment	2024-01-10 15:59:16 -04:00
Joey Hess	478f0870d1	update comment	2024-01-10 13:24:09 -04:00
Joey Hess	de6a297d36	assistant: When generating a gpg secret key, avoid hardcoding the key algorithm and size This aims to future-proof gpg key generation. OpenPGP is in flux with a conflict over standards ongoing. It seems not unlikely that different systems will have different gpg commands that support different algorithms. This also simplifies the code by using the --quick-gen-key interface rather than the experimental batch interface. It seems less likely that --quick-gen-key will break than an experimental interface (whose documentation I can no longer find). --quick-gen-key is supported since gpg 2.1.0 (2014). Sponsored-by: Graham Spencer on Patreon	2024-01-09 15:31:53 -04:00
Joey Hess	0e9bc41588	Revert "import Data.Time.Clock to build with time-1.9.1" This reverts commit `7484191284`. Not necessary after all.	2023-12-27 19:11:15 -04:00
Joey Hess	7484191284	import Data.Time.Clock to build with time-1.9.1 In more recent versions Data.Time exports secondsToNominalDiffTime etc, but originally it did not.	2023-12-27 16:58:40 -04:00
Joey Hess	4b52657b37	fix build with old version of time package Can't truncate timestamp resolution with that version.	2023-12-27 15:33:46 -04:00
Joey Hess	c64db46b7f	refactor	2023-12-18 21:35:00 -04:00
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	eb59da9dd2	Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving.	2023-12-11 15:04:06 -04:00
Joey Hess	d06aee7ce0	make commitMigration interuption safe Fixed inversion of control issue, so the tree is recorded in streamLogFile finalizer. Sponsored-by: Leon Schuermann on Patreon	2023-12-06 16:29:58 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	1a586f80e6	remove debug print	2023-12-05 15:56:58 -04:00
Joey Hess	a6eb7d7339	prevent relatedTemplate from truncating a filename to end in "." Avoid a problem with temp file names ending in "." on certian filesystems that have problems with such filenames. relatedTemplate is quite an ugly hack really; since it doesn't know the max filename length of the filesystem it can only assume that the filename is max allowed length. When given the input "lh.aparc.DKTatlas.annot", it wants to reserve 20 characters for tempfile so it truncates to "lh.". That ending period is apparently a problem on some filesystem (FAT eats it, but does not throw EINVAL; ntfs does not seem bothered by it, I don't know what FUSE filesystem the bug reporter was really using). Sponsored-by: Brett Eisenberg on Patreon	2023-12-05 12:38:14 -04:00
Joey Hess	c1037da2e5	attribution armoring Read a fun paper where they got chatgpt to emit large chunks of code https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html "ChatGPT memorized significant fractions of its training dataset"	2023-11-29 14:42:44 -04:00
Joey Hess	f1c2e18b8d	improve attribution armoring Split out an author parameter, will make it easier to add authors and reads better. Got rid of the function without the copyright year, because an adversary could have mechanically changed the function with a copyright year to the one without, and so bypassed the protection of LLM copyright year hallucination. Sponsored-by: Luke T. Shumaker on Patreon	2023-11-21 11:34:21 -04:00
Joey Hess	e901d31feb	exhaustiveness check fix	2023-11-20 21:34:29 -04:00
Joey Hess	dab9687184	improve attribution armoring	2023-11-20 21:20:37 -04:00
Joey Hess	d5d570a96c	avoid replacing otherwise While authorJoeyHess is True same as otherwise, ghc's exhastiveness checker turns out to special case otherwise. So this avoids warnings.	2023-11-20 20:25:51 -04:00
Joey Hess	cda3e85164	make my authorship explicit in the code This is intended to guard against LLM code theft, which is the current bubble technology de jour. Note that authorJoeyHess' with a year older than the year I began developing git-annex will behave badly, by intention. Eg, it will spin and eventually crash. This is not the first anti-LLM protection in git-annex. For example see `9562da790f`. That method, while much harder for an adversary to detect and remove, also complicates code somewhat significantly, and needs extensions to be enabled. There are also probably significantly fewer ways to implement that method in Haskell. This new approach, by contrast, will be easy to add throughout the code base, with very little effort, and without complicating reading or maintaining it any more than noticing that yes, I am the author of this code. An adversary could of course remove all calls to these functions before feeding code into their LLM-based laundry facility. I think this would need to be done manually, or with the help of some fairly advanced Haskell parsing though. In some cases, authorJoeyHess needs to be removed, while in other places it needs to be replaced with a value. Also a monadic use of authorJoeyHess' may involve other added monadic machinery which would need to be eliminated to keep the code compiling. Alternatively, an adversary could replace my name with something innocuous. This would be clear intent to remove author attribution from my code, even more than running it through an LLM laundry is. If you work for a large company that is laundering my code through an LLM, please do us a favor and use your immense privilege to quit and go do something socially beneficial. I will not explain further developments of this code in such detail, and you have better things to do than playing cat and mouse with me as I explore directions such as extending this approach to the type level. Sponsored-by: k0ld on Patreon	2023-11-20 12:29:12 -04:00
Joey Hess	c41ca6c832	convert StorableCipher to ByteString This allows getting rid of the ugly and error prone handling of "bag of bytes" String in Remote.Helper.Encryptable. Avoiding breakage like that dealt with by commit `9862d64bf9` And allows converting Utility.Gpg to use ByteString for IO, which is a welcome change. Tested the new git-annex interoperability with old, using all 3 encryption= types. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-11-01 14:39:49 -04:00
Joey Hess	ea2876ae77	add PackageImports This makes loading it in ghci work when both crypton and cryptonite are installed.	2023-10-30 14:10:46 -04:00
Joey Hess	0f3b78ec29	simplify	2023-10-26 14:00:02 -04:00
Joey Hess	c873586e14	eliminate s2w8 and w82s Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode characters. Luckily, genUUIDInNameSpace is only ever used on ASCII strings as far as I can determine. In particular, git-remote-gcrypt's gcrypt-id is an ASCII string.	2023-10-26 13:12:57 -04:00
Joey Hess	3742263c99	simplify base64 to only use ByteString Note the use of fromString and toString from Data.ByteString.UTF8 dated back to commit `9b93278e8a`. Back then it was using the dataenc package for base64, which operated on Word8 and String. But with the switch to sandi, it uses ByteString, and indeed fromB64' and toB64' were already using ByteString without that complication. So I think there is no risk of such an encoding related breakage. I also tested the case that `9b93278e8a` fixed: git-annex metadata -s foo='a …' x git-annex metadata x metadata x foo=a … In Remote.Helper.Encryptable, it was avoiding using Utility.Base64 because of that UTF8 conversion. Since that's no longer done, it can just use it now.	2023-10-26 13:10:05 -04:00
Joey Hess	6a61c7ff45	Fix crash of enableremote when the special remote has embedcreds=yes The crash occurred because writeCreds got called twice, and writeFileProtected neglected to close its file handle, so the file was open for write when written the second time. It seems unncessary and suboptimal that writeCreds gets called twice. One call is from getRemoteCredPair and the other from setRemoteCredPair'. What happens is that in the enableremote case, code that also runs at initremote does unncessary work. Might be possible to improve that, but I've gone for the simple fix. Sponsored-by: k0ld on Patreon	2023-10-20 13:19:12 -04:00
Joey Hess	54da44d42a	Support being built with crypton rather than cryptonite crypton is a fork of cryptonite, and cryptonite's github repo has been archived. Some deps are already using cryptonite so it's clearly the way forward. Added a build flag without a default, so cabal configure will select on its own which to use. stack files pin to cryptonite for now. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-09-21 12:43:42 -04:00
Joey Hess	50300a47fe	Removed the vendored git-lfs and the GitLfs build flag AFAICS all git-annex builds are using the git-lfs library not the vendored copy. Debian stable now includes a new enough haskell-git-lfs package as well. Last time this was tried it did not.	2023-08-28 13:12:31 -04:00
Joey Hess	88b0bb5793	fix build on windows thanks to jkniiv	2023-08-18 13:03:47 -04:00
Joey Hess	10b5f79e2d	fix empty tree import when directory does not exist Fix behavior when importing a tree from a directory remote when the directory does not exist. An empty tree was imported, rather than the import failing. Merging that tree would delete every file in the branch, if those files had been exported to the directory before. The problem was that dirContentsRecursive returned [] when the directory did not exist. Better for it to throw an exception. But in commit `74f0d67aa3` back in 2012, I made it never theow exceptions, because exceptions throw inside unsafeInterleaveIO become untrappable when the list is being traversed. So, changed it to list the contents of the directory before entering unsafeInterleaveIO. So exceptions are thrown for the directory. But still not if it's unable to list the contents of a subdirectory. That's less of a problem, because the subdirectory does exist (or if not, it got removed after being listed, and it's ok to not include it in the list). A subdirectory that has permissions that don't allow listing it will have its contents omitted from the list still. (Might be better to have it return a type that includes indications of errors listing contents of subdirectories?) The rest of the changes are making callers of dirContentsRecursive use emptyWhenDoesNotExist when they relied on the behavior of it not throwing an exception when the directory does not exist. Note that it's possible some callers of dirContentsRecursive that used to ignore permissions problems listing a directory will now start throwing exceptions on them. The fix to the directory special remote consisted of not making its call in listImportableContentsM use emptyWhenDoesNotExist. So it will throw an exception as desired. Sponsored-by: Joshua Antonishen on Patreon	2023-08-15 12:57:41 -04:00
Joey Hess	9aac41f86c	remove unused imports	2023-08-15 12:43:26 -04:00
Joey Hess	be028f10e5	split out Utility.Url.Parse This is mostly for git-repair which can't include all of Utility.Url without adding many dependencies that are not really necessary.	2023-08-14 12:28:10 -04:00
Joey Hess	adda6c1088	Add git-annex remote refs that are not newer to the merged refs list Significant startup speed increase by avoiding repeatedly checking if some remote git-annex branch refs need to be merged when it is not newer. One way this could happen is when there are 2 remotes that are themselves connected. The git-annex branch on the first remote gets updated. Then the second remote pulls from the first, and merges in its git-annex branch. Then the local repo pulls from the second remote, and merges its git-annex branch. At this point, a pull from the first remote will get a git-annex branch that is not newer, but is not on the merged refs list. In my big repo, git-annex startup time dropped from 4 seconds to 0.1 seconds. There were 5 to 10 such remote refs out of 18 remotes. Sponsored-by: Graham Spencer on Patreon	2023-08-09 13:31:36 -04:00
Joey Hess	85aadcfa1e	windows back to lts-18.13 temporarily I can't seem to get stack to resolve dependencies with Win32-2.13.4.0, no matter what I try. Why it blows up, I don't know. And allow-newer: true actually causes it to downgrade Win32 to the one version that won't build. Unbelivable that allows downgrades. So just gonna have to wait for that to get into stackage nightly, and then stack.yaml can be updated to use that, and the changes in this commit reverted.	2023-08-02 12:49:38 -04:00
Joey Hess	461330c585	remove support for building with older Win32 No need to preserve this since the cabal file depends on the newer one.	2023-08-02 11:59:57 -04:00
Joey Hess	9a60f5b65f	fix build on windows	2023-08-02 10:43:20 -04:00
Joey Hess	8adafdd013	avoid cpp failure on windows Seems that while the module is not imported by anything on windows, it still gets cpped, and MIN_VERSION_unix is not defined so it failed to preprocess.	2023-08-02 10:08:00 -04:00
Joey Hess	68c9b08faf	fix build with unix-2.8.0 Changed the parameters to openFd. So needed to add a small wrapper library to keep supporting older versions as well.	2023-08-01 18:41:27 -04:00
Joey Hess	63f76d0ac3	fix build with unix-2.8.0 It made UserInfo into a pattern to discourage manually constructing them, so just to use UserInfo in a type signature of a function that consumes them, have to import the new ByteString module.	2023-08-01 17:47:30 -04:00
Joey Hess	d76f088dc4	fix build on windows	2023-08-01 17:39:24 -04:00
Joey Hess	fb640bc2f4	support building with unix-compat 0.7 It removed System.PosixCompat.User.	2023-08-01 15:17:43 -04:00
Joey Hess	08071a1b90	improve match result display simplifier Sponsored-by: Dartmouth College's DANDI project	2023-07-26 15:28:57 -04:00
Joey Hess	70de4a7e6d	fix bug in match result display simplifier Sponsored-by: Dartmouth College's DANDI project	2023-07-26 15:28:49 -04:00
Joey Hess	518a51a8a0	--explain for preferred/required content matching And annex.largefiles and annex.addunlocked. Also git-annex matchexpression --explain explains why its input expression matches or fails to match. When there is no limit, avoid explaining why the lack of limit matches. This is also done when no preferred content expression is set, although in a few cases it defaults to a non-empty matcher, which will be explained. Sponsored-by: Dartmouth College's DANDI project	2023-07-26 14:50:04 -04:00
Joey Hess	f25eeedeac	initial implementation of --explain Currently it only displays explanations of options like --in and --copies. In the future, it should explain preferred content expression evaluation and other decisions. The explanations of a few things could be better. In particular, "standard" will just appear as-is (or as "!standard" if it doesn't match), rather than explaining why the standard preferred content expression for the group matches or not. Currently as implemented, it goes to stdout, and so commands like git-annex find that have custom output will not display --explain information. Perhaps that should change, dunno. Sponsored-by: Dartmouth College's DANDI project	2023-07-25 16:52:57 -04:00
Joey Hess	cf40e2d4b6	Revert "use existing debug machinery for explain" This reverts commit `409572c9e4`.	2023-07-25 15:53:50 -04:00

1 2 3 4 5 ...

1685 commits