git-annex

Author	SHA1	Message	Date
Joey Hess	d33ab4bbe4	add preciseSize	2024-08-11 15:40:21 -04:00
Joey Hess	bd5affa362	use hmac in balanced preferred content This deals with the possible security problem that someone could make an unusually low UUID and generate keys that are all constructed to hash to a number that, mod the number of repositories in the group, == 0. So balanced preferred content would always put those keys in the repository with the low UUID as long as the group contains the number of repositories that the attacker anticipated. Presumably the attacker than holds the data for ransom? Dunno. Anyway, the partial solution is to use HMAC (sha256) with all the UUIDs combined together as the "secret", and the key as the "message". Now any change in the set of UUIDs in a group will invalidate the attacker's constructed keys from hashing to anything in particular. Given that there are plenty of other things someone can do if they can write to the repository -- including modifying preferred content so only their repository wants files, and numcopies so other repositories drom them -- this seems like safeguard enough. Note that, in balancedPicker, combineduuids is memoized.	2024-08-10 16:32:54 -04:00
Joey Hess	3ce2e95a5f	balanced preferred content and --rebalance This all works fine. But it doesn't check repository sizes yet, and without repository size checking, once a repository gets full, there will be no other repository that will want its files. Use of sha2 seems unncessary, probably alder2 or md5 or crc would have been enough. Possibly just summing up the bytes of the key mod the number of repositories would have sufficed. But sha2 is there, and probably hardware accellerated. I doubt very much there is any security benefit to using it though. If someone wants to construct a key that will be balanced onto a given repository, sha2 is certianly not going to stop them.	2024-08-09 14:16:09 -04:00
Joey Hess	99b7a0cfe9	use REMOVE-BEFORE in P2P protocol Only clusters still need to be fixed to close this todo.	2024-07-04 13:47:38 -04:00
Joey Hess	e9bd03fc4b	use Boottime clock on linux Better to advance while suspended, that way a stale content retention lock will expire while suspended. The fallback the Monotonic to support older kernels assumes that there are not systems where Boottime sometimes succeeds and sometimes fails. If that ever happened, the clock would probably not monotonically advance as it read from different clocks on different calls! I don't see any indication in clock_gettime(2) errono list that it can fail intermittently so probably this is ok.	2024-07-03 18:44:38 -04:00
Joey Hess	5b6150e5d5	factor out Utility.MonotonicClock	2024-07-03 17:54:01 -04:00
Joey Hess	f18740699e	P2P protocol version 2, adding SUCCESS-PLUS and ALREADY-HAVE-PLUS Client side support for SUCCESS-PLUS and ALREADY-HAVE-PLUS is complete, when a PUT stores to additional repositories than the expected on, the location log is updated with the additional UUIDs that contain the content. Started implementing PUT fanout to multiple remotes for clusters. It is untested, and I fear fencepost errors in the relative offset calculations. And it is missing proxying for the protocol after DATA.	2024-06-18 16:21:40 -04:00
Joey Hess	da2c02162c	Fix Windows build with Win32 2.13.4+ Thanks, Oleg Tolmatcev	2024-06-03 13:04:15 -04:00
Yaroslav Halchenko	87e2ae2014	run codespell throughout fixing typos automagically === Do not change lines below === { "chain": [], "cmd": "codespell -w", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^	2024-05-01 15:46:21 -04:00
Joey Hess	81608c3c37	windows build fix	2024-03-26 13:51:51 -04:00
Joey Hess	418e97e847	fix build warnings on windows	2024-03-26 13:11:53 -04:00
Joey Hess	7c5007279c	Windows: Fix escaping output to terminal when using old versions of MinTTY	2024-03-26 13:09:21 -04:00
Joey Hess	db95de6f2b	clean up windows build warnings about unused imports	2024-03-26 13:06:52 -04:00
Joey Hess	5a3ed3f910	attribution armoring	2024-03-07 19:19:45 -04:00
Joey Hess	c2d6c02c27	Added dependency on unbounded-delays And stop vendoring part of it. This is a free dependency because tasty depends on it. Sponsored-by: Leon Schuermann on Patreon	2024-02-27 13:11:59 -04:00
Joey Hess	8e9ee31621	webapp: Added --port option, and annex.port config The getSocket comment that mentioned using ":port" in the hostname seems to have been incorrect or be out of date. After all, the bug report came when the user first tried doing that, and it didn't work. Sponsored-by: the NIH-funded NICEMAN (ReproNim TR&D3) project	2024-01-25 14:08:36 -04:00
Joey Hess	8da85fd3a3	RawFilePath conversion Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:26:21 -04:00
Joey Hess	703a70cafa	avoid watchFileSize running backward This is groundwork for using watchFileSize for downloads from external special remotes. In Annex.Content.downloadUrl, this potentially avoids jitter in the progress meter. When downloading with conduit, the meter gets updated based on both the size of the file, and on the data flowing through conduit. If that has not yet been flushed to the file, it seems possible for the meter to run backwards when meter is updated with the file size. It's probably only a few kb of jitter, so may not be visible. Sponsored-by: Dartmouth College's DANDI project	2024-01-19 14:11:27 -04:00
Joey Hess	7e69063a29	support annex.shared-sop-command for encryption=shared This works well, and it interoperates with gpg in my testing (although some SOP commands might choose to use a profile that does not so caveat emptor). Note that for creating the Cipher, gpg --gen-random is still used. SOP does not have an eqivilant, and as long as the user has gpg around, which seems likely, it doesn't matter that it uses gpg here, it's not being used for encryption. That seemed better than implementing a second way to get high quality entropy, at least for now. The need for the sop command to run in an empty directory has each call to encrypt and decrypt creating a new temporary directory. That is some unncessary overhead, though probably swamped by the overhead of running the sop command. This could be improved in the future by passing an already empty directory to them, or a sufficiently empty directory (.git/annex/tmp would probably suffice). Sponsored-by: Brett Eisenberg on Patreon	2024-01-12 13:31:18 -04:00
Joey Hess	dd3e779020	more groundwork for StatelessOpenPGP no behavior changes	2024-01-12 13:11:36 -04:00
Joey Hess	790600f7b2	close send side of password pipe on exec This avoids a hang approximately 1% of the time when running the test suite on StatelessOpenPGP. Since I've not seen git-annex hang when running git like that, I guess git probably does something that avoids hanging similarly. Still, fixed the same problem in Utility.Gpg too. Sponsored-by: Kevin Mueller on Patreon	2024-01-10 17:31:58 -04:00
Joey Hess	d98f02a5b0	test annex.shared-sop-command Test a specified Stateless OpenPGP command with eg: git-annex test --test-git-config annex.shared-sop-command=sqop Also documented that config and another one, but so far only the test suite uses the configs, have not yet implemented using it for actual symmetric encryption. Sponsored-by: Joshua Antonishen on Patreon	2024-01-10 16:30:38 -04:00
Joey Hess	812cbf0e17	Stateless OpenPGP interface Implemented according to https://www.ietf.org/archive/id/draft-dkg-openpgp-stateless-cli-09.html#name-encrypt-encrypt-a-message Not yet used by git-annex. Sponsored-by: Leon Schuermann on Patreon	2024-01-10 15:59:35 -04:00
Joey Hess	b728e935bc	correct comment	2024-01-10 15:59:16 -04:00
Joey Hess	478f0870d1	update comment	2024-01-10 13:24:09 -04:00
Joey Hess	de6a297d36	assistant: When generating a gpg secret key, avoid hardcoding the key algorithm and size This aims to future-proof gpg key generation. OpenPGP is in flux with a conflict over standards ongoing. It seems not unlikely that different systems will have different gpg commands that support different algorithms. This also simplifies the code by using the --quick-gen-key interface rather than the experimental batch interface. It seems less likely that --quick-gen-key will break than an experimental interface (whose documentation I can no longer find). --quick-gen-key is supported since gpg 2.1.0 (2014). Sponsored-by: Graham Spencer on Patreon	2024-01-09 15:31:53 -04:00
Joey Hess	0e9bc41588	Revert "import Data.Time.Clock to build with time-1.9.1" This reverts commit `7484191284`. Not necessary after all.	2023-12-27 19:11:15 -04:00
Joey Hess	7484191284	import Data.Time.Clock to build with time-1.9.1 In more recent versions Data.Time exports secondsToNominalDiffTime etc, but originally it did not.	2023-12-27 16:58:40 -04:00
Joey Hess	4b52657b37	fix build with old version of time package Can't truncate timestamp resolution with that version.	2023-12-27 15:33:46 -04:00
Joey Hess	c64db46b7f	refactor	2023-12-18 21:35:00 -04:00
Joey Hess	9a67ed0f10	importtree: support preferred content expressions needing keys When importing from a special remote, support preferred content expressions that use terms that match on keys (eg "present", "copies=1"). Such terms are ignored when importing, since the key is not known yet. When "standard" or "groupwanted" is used, the terms in those expressions also get pruned accordingly. This does allow setting preferred content to "not (copies=1)" to make a special remote into a "source" type of repository. Importing from it will import all files. Then exporting to it will drop all files from it. In the case of setting preferred content to "present", it's pruned on import, so everything gets imported from it. Then on export, it's applied, and everything in it is left on it, and no new content is exported to it. Since the old behavior on these preferred content expressions was for importtree to error out, there's no backwards compatability to worry about. Except that sync/pull/etc will now import where before it errored out.	2023-12-18 16:27:59 -04:00
Joey Hess	eb59da9dd2	Lower precision of timestamps in git-annex branch This can reduce the size of the branch by up to 8%. My test was running git-annex add 1000 times on one file each. Lots of different high-resolution timestamps were recorded before and eliminating those, after packing, the git repo was 8% smaller. Due to the use of vector clocks, high resolution timestamps are not necessary to make clear which information is most recent when eg, a value is changed repeatedly in the same second. In such a case, the vector clock will be advanced to the next second after the last modification. For example, running git-annex numcopies 1; git-annex numcopies 2 The first will record the current second, while the next records the second after that even if it runs in the same second. As for conflicting information written to two different clones of the repository, this will make git-annex sometimes pick information that was written earlier in a second over information written later in the same second. Usually git-annex does not write conflicting information, but there are some cases where it could. Eg, storing an object on a remote can update the remote state log with some state. If two repos both store the same object, and end up storing different remote state for some reason, this can result in one that ran a tiny bit later winning. Such a situation seems unlikely to be user visible. And a small amount of clock skew could already result in such things. The only case I can think of where this might be a user visible change is if a configuration command like git-annex numcopies is being run in 2 clones of a repository on the same machine at very close to the same time. Then the user will know which they ran last, and git-annex won't. If that did become a problem, this could be dialed back to eg log milliseconds with still some space saving.	2023-12-11 15:04:06 -04:00
Joey Hess	d06aee7ce0	make commitMigration interuption safe Fixed inversion of control issue, so the tree is recorded in streamLogFile finalizer. Sponsored-by: Leon Schuermann on Patreon	2023-12-06 16:29:58 -04:00
Joey Hess	0bd8b17b59	log migration trees to git-annex branch This will allow distributed migration: Start a migration in one clone of a repo, and then update other clones. commitMigration is a bit of a bear.. There is some inversion of control that needs some TMVars. Also streamLogFile's finalizer does not handle recording the trees, so an interrupt at just the wrong time can cause migration.log to be emptied but the git-annex branch not updated. Sponsored-by: Graham Spencer on Patreon	2023-12-06 15:40:03 -04:00
Joey Hess	1a586f80e6	remove debug print	2023-12-05 15:56:58 -04:00
Joey Hess	a6eb7d7339	prevent relatedTemplate from truncating a filename to end in "." Avoid a problem with temp file names ending in "." on certian filesystems that have problems with such filenames. relatedTemplate is quite an ugly hack really; since it doesn't know the max filename length of the filesystem it can only assume that the filename is max allowed length. When given the input "lh.aparc.DKTatlas.annot", it wants to reserve 20 characters for tempfile so it truncates to "lh.". That ending period is apparently a problem on some filesystem (FAT eats it, but does not throw EINVAL; ntfs does not seem bothered by it, I don't know what FUSE filesystem the bug reporter was really using). Sponsored-by: Brett Eisenberg on Patreon	2023-12-05 12:38:14 -04:00
Joey Hess	c1037da2e5	attribution armoring Read a fun paper where they got chatgpt to emit large chunks of code https://not-just-memorization.github.io/extracting-training-data-from-chatgpt.html "ChatGPT memorized significant fractions of its training dataset"	2023-11-29 14:42:44 -04:00
Joey Hess	f1c2e18b8d	improve attribution armoring Split out an author parameter, will make it easier to add authors and reads better. Got rid of the function without the copyright year, because an adversary could have mechanically changed the function with a copyright year to the one without, and so bypassed the protection of LLM copyright year hallucination. Sponsored-by: Luke T. Shumaker on Patreon	2023-11-21 11:34:21 -04:00
Joey Hess	e901d31feb	exhaustiveness check fix	2023-11-20 21:34:29 -04:00
Joey Hess	dab9687184	improve attribution armoring	2023-11-20 21:20:37 -04:00
Joey Hess	d5d570a96c	avoid replacing otherwise While authorJoeyHess is True same as otherwise, ghc's exhastiveness checker turns out to special case otherwise. So this avoids warnings.	2023-11-20 20:25:51 -04:00
Joey Hess	cda3e85164	make my authorship explicit in the code This is intended to guard against LLM code theft, which is the current bubble technology de jour. Note that authorJoeyHess' with a year older than the year I began developing git-annex will behave badly, by intention. Eg, it will spin and eventually crash. This is not the first anti-LLM protection in git-annex. For example see `9562da790f`. That method, while much harder for an adversary to detect and remove, also complicates code somewhat significantly, and needs extensions to be enabled. There are also probably significantly fewer ways to implement that method in Haskell. This new approach, by contrast, will be easy to add throughout the code base, with very little effort, and without complicating reading or maintaining it any more than noticing that yes, I am the author of this code. An adversary could of course remove all calls to these functions before feeding code into their LLM-based laundry facility. I think this would need to be done manually, or with the help of some fairly advanced Haskell parsing though. In some cases, authorJoeyHess needs to be removed, while in other places it needs to be replaced with a value. Also a monadic use of authorJoeyHess' may involve other added monadic machinery which would need to be eliminated to keep the code compiling. Alternatively, an adversary could replace my name with something innocuous. This would be clear intent to remove author attribution from my code, even more than running it through an LLM laundry is. If you work for a large company that is laundering my code through an LLM, please do us a favor and use your immense privilege to quit and go do something socially beneficial. I will not explain further developments of this code in such detail, and you have better things to do than playing cat and mouse with me as I explore directions such as extending this approach to the type level. Sponsored-by: k0ld on Patreon	2023-11-20 12:29:12 -04:00
Joey Hess	c41ca6c832	convert StorableCipher to ByteString This allows getting rid of the ugly and error prone handling of "bag of bytes" String in Remote.Helper.Encryptable. Avoiding breakage like that dealt with by commit `9862d64bf9` And allows converting Utility.Gpg to use ByteString for IO, which is a welcome change. Tested the new git-annex interoperability with old, using all 3 encryption= types. Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project	2023-11-01 14:39:49 -04:00
Joey Hess	ea2876ae77	add PackageImports This makes loading it in ghci work when both crypton and cryptonite are installed.	2023-10-30 14:10:46 -04:00
Joey Hess	0f3b78ec29	simplify	2023-10-26 14:00:02 -04:00
Joey Hess	c873586e14	eliminate s2w8 and w82s Note that the use of s2w8 in genUUIDInNameSpace made it truncate unicode characters. Luckily, genUUIDInNameSpace is only ever used on ASCII strings as far as I can determine. In particular, git-remote-gcrypt's gcrypt-id is an ASCII string.	2023-10-26 13:12:57 -04:00
Joey Hess	3742263c99	simplify base64 to only use ByteString Note the use of fromString and toString from Data.ByteString.UTF8 dated back to commit `9b93278e8a`. Back then it was using the dataenc package for base64, which operated on Word8 and String. But with the switch to sandi, it uses ByteString, and indeed fromB64' and toB64' were already using ByteString without that complication. So I think there is no risk of such an encoding related breakage. I also tested the case that `9b93278e8a` fixed: git-annex metadata -s foo='a …' x git-annex metadata x metadata x foo=a … In Remote.Helper.Encryptable, it was avoiding using Utility.Base64 because of that UTF8 conversion. Since that's no longer done, it can just use it now.	2023-10-26 13:10:05 -04:00
Joey Hess	6a61c7ff45	Fix crash of enableremote when the special remote has embedcreds=yes The crash occurred because writeCreds got called twice, and writeFileProtected neglected to close its file handle, so the file was open for write when written the second time. It seems unncessary and suboptimal that writeCreds gets called twice. One call is from getRemoteCredPair and the other from setRemoteCredPair'. What happens is that in the enableremote case, code that also runs at initremote does unncessary work. Might be possible to improve that, but I've gone for the simple fix. Sponsored-by: k0ld on Patreon	2023-10-20 13:19:12 -04:00
Joey Hess	54da44d42a	Support being built with crypton rather than cryptonite crypton is a fork of cryptonite, and cryptonite's github repo has been archived. Some deps are already using cryptonite so it's clearly the way forward. Added a build flag without a default, so cabal configure will select on its own which to use. stack files pin to cryptonite for now. Sponsored-by: Nicholas Golder-Manning on Patreon	2023-09-21 12:43:42 -04:00
Joey Hess	50300a47fe	Removed the vendored git-lfs and the GitLfs build flag AFAICS all git-annex builds are using the git-lfs library not the vendored copy. Debian stable now includes a new enough haskell-git-lfs package as well. Last time this was tried it did not.	2023-08-28 13:12:31 -04:00

1 2 3 4 5 ...

1703 commits