git-annex

Author	SHA1	Message	Date
Joey Hess	19e78816f0	convert Key to ShortByteString This adds the overhead of a copy when serializing and deserializing keys. I have not benchmarked much, but runtimes seem barely changed at all by that. When a lot of keys are in memory, it improves memory use. And, it prevents keys sometimes getting PINNED in memory and failing to GC, which is a problem ByteString has sometimes. In particular, git-annex sync from a borg special remote had that problem and this improved its memory use by a large amount. Sponsored-by: Shae Erisson on Patreon	2021-10-05 20:20:08 -04:00
Joey Hess	ed684f651e	add incremental hashing interface to Backend As yet unused. Backend.External could perhaps implement it too, although that would involve sending chunks of data to it via a pipe or something, so likely to be slow.	2021-02-09 15:00:51 -04:00
Joey Hess	9b0dde834e	convert getFileSize to RawFilePath Lots of nice wins from this in avoiding unncessary work, and I think nothing got slower. This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.	2020-11-05 11:32:57 -04:00
Joey Hess	c4cc2cdf4c	rename getKey to genKey for consistency with external backend protocol	2020-07-20 14:06:05 -04:00
Joey Hess	172743728e	move cryptographicallySecure into Backend type This is groundwork for external backends, but also makes sense to keep this information with the rest of a Backend's implementation. Also, removed isVerifiable. I noticed that the same information is encoded by whether a Backend implements verifyKeyContent or not.	2020-07-20 12:17:42 -04:00
Joey Hess	d010ab04be	sped up the --all option by 2x to 16x by using git cat-file --buffer This assumes that no location log files will have a newline or carriage return in their name. catObjectStream skips any such files due to cat-file not supporting them. Keys have been prevented from containing newlines since 2011, commit `480495beb4`. If some old repo had a key with a newline in it, --all will just skip processing that key. Other things, like .git/annex/unused files certianly assume no newlines in keys too, and AFAICR, such keys never actually worked. Carriage return is escaped by preSanitizeKeyName since 2013. WORM keys generated before that point could perhaps contain a CR. (URL probably not, http probably doesn't support an URL with a raw CR in it.) So, added a warning in fsck about such keys. Although, fsck --all will naturally skip them, so won't be able to warn about them. Not entirely satisfactory, but I'll bet there are not really any such keys in existence. Thanks to Lukey for finding this optimisation.	2020-07-07 13:54:04 -04:00
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	c31e1be781	convert KeySource to RawFilePath	2020-02-21 10:04:44 -04:00
Joey Hess	bdec7fed9c	convert TopFilePath to use RawFilePath Adds a dependency on filepath-bytestring, an as yet unreleased fork of filepath that operates on RawFilePath. Git.Repo also changed to use RawFilePath for the path to the repo. This does eliminate some RawFilePath -> FilePath -> RawFilePath conversions. And filepath-bytestring's </> is probably faster. But I don't expect a major performance improvement from this. This is mostly groundwork for making Annex.Location use RawFilePath, which will allow for a conversion-free pipleline.	2019-12-09 15:07:21 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	727767e1e2	make everything build again after ByteString Key changes	2019-01-11 16:39:46 -04:00
Joey Hess	4ecba916a1	annex.maxextensionlength Added annex.maxextensionlength for use cases where extensions longer than 4 characters are needed. This commit was sponsored by Henrik Riomar on Patreon.	2018-09-24 12:10:18 -04:00
Joey Hess	96c055eda2	migrate: WORM keys containing spaces will be migrated to not contain spaces anymore To work around the problem that the external special remote protocol does not support keys containing spaces. This commit was sponsored by Denis Dzyubenko on Patreon.	2017-08-17 15:09:38 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	cad3349001	rename fsckKey to verifyKeyContent No behavior changes.	2015-10-01 13:29:17 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	4f657aa14e	add getFileSize, which can get the real size of a large file on Windows Avoid using fileSize which maxes out at just 2 gb on Windows. Instead, use hFileSize, which doesn't have a bounded size. Fixes support for files > 2 gb on Windows. Note that the InodeCache code only needs to compare a file size, so it doesn't matter it the file size wraps. So it has been left as-is. This was necessary both to avoid invalidating existing inode caches, and because the code passed FileStatus around and would have become more expensive if it called getFileSize. This commit was sponsored by Christian Dietrich.	2015-01-20 17:09:24 -04:00
Joey Hess	c0f2b992ed	Generate shorter keys for WORM and URL, avoiding keys that are longer than used for SHA256, so as to not break on systems like Windows that have very small maximum path length limits.	2015-01-06 17:58:57 -04:00
Joey Hess	9711d529c8	WORM backend: Switched to include the relative path to the file inside the repository, rather than just the file's base name. Note that if you're relying on such things to keep files separate with WORM, you should really be using a better backend.	2014-09-11 14:50:18 -04:00
Joey Hess	f0df660570	WORM backend: When adding a file in a subdirectory, avoid including the subdirectory in the key name.	2014-08-12 14:38:53 -04:00
Joey Hess	13bbb61a51	add key stability checking interface Needed for resuming from chunks. Url keys are considered not stable. I considered treating url keys with a known size as stable, but just don't feel that is enough information.	2014-07-27 12:33:46 -04:00
Joey Hess	d751591ac8	add chunk metadata to Key Added new fields for chunk number, and chunk size. These will not appear in normal keys ever, but will be used for chunked data stored on special remotes. This commit was sponsored by Jouni K Seppanen.	2014-07-24 13:36:23 -04:00
Joey Hess	9d71903c2f	migrate: Avoid re-checksumming when migrating from hashE to hash backend.	2014-07-10 17:06:04 -04:00
Joey Hess	1be4d281d6	Better sanitization of problem characters when generating URL and WORM keys. FAT has a lot of characters it does not allow in filenames, like ? and * It's probably the worst offender, but other filesystems also have limitiations. In 2011, I made keyFile escape : to handle FAT, but missed the other characters. It also turns out that when I did that, I was also living dangerously; any existing keys that contained a : had their object location change. Oops. So, adding new characters to escape to keyFile is out. Well, it would be possible to make keyFile behave differently on a per-filesystem basis, but this would be a real nightmare to get right. Consider that a rsync special remote uses keyFile to determine the filenames to use, and we don't know the underlying filesystem on the rsync server.. Instead, I have gone for a solution that is backwards compatable and simple. Its only downside is that already generated URL and WORM keys might not be able to be stored on FAT or some other filesystem that dislikes a character used in the key. (In this case, the user can just migrate the problem keys to a checksumming backend. If this became a big problem, fsck could be made to detect these and suggest a migration.) Going forward, new keys that are created will escape all characters that are likely to cause problems. And if some filesystem comes along that's even worse than FAT (seems unlikely, but here it is 2013, and people are still using FAT!), additional characters can be added to the set that are escaped without difficulty. (Also, made WORM limit the part of the filename that is embedded in the key, to deal with filesystem filename length limits. This could have already been a problem, but is more likely now, since the escaping of the filename can make it longer.) This commit was sponsored by Ian Downes	2013-10-05 15:01:49 -04:00
Joey Hess	abe8d549df	fix permission damage (thanks, Windows)	2013-05-11 23:54:25 -04:00
Joey Hess	18bdff3fae	clean up from windows porting	2013-05-11 18:23:41 -04:00
Joey Hess	3c7e30a295	git-annex now builds on Windows (doesn't work)	2013-05-11 15:03:00 -05:00
Joey Hess	e71f85645e	handle shasum's leading \ in checksum with certian unsual filenames Bugfix: Remove leading \ from checksums output by shasum commands, when the filename contains \ or a newline. Closes: #696384 fsck: Still accept checksums with a leading \ as valid, now that above bug is fixed. * migrate: Remove leading \ in checksums	2012-12-20 17:07:10 -04:00
Joey Hess	e0fdfb2e70	maintain set of files pendingAdd Kqueue needs to remember which files failed to be added due to being open, and retry them. This commit gets the data in place for such a retry thread. Broke KeySource out into its own file, and added Eq and Ord instances so it can be stored in a Set.	2012-06-20 16:31:46 -04:00
Joey Hess	d3cee987ca	separate source of content from the filename associated with the key when generating a key This already made migrate's code a lot simpler.	2012-06-05 19:51:03 -04:00
Joey Hess	d36525e974	convert fsckKey to a Maybe This way it's clear when a backend does not implement its own fsck checks.	2012-01-19 13:51:30 -04:00
Joey Hess	4a02c2ea62	type alias cleanup	2011-12-31 04:11:58 -04:00
Joey Hess	6a6ea06cee	rename	2011-10-05 16:02:51 -04:00
Joey Hess	cfe21e85e7	rename	2011-10-04 00:59:08 -04:00
Joey Hess	8ef2095fa0	factor out common imports no code changes	2011-10-03 23:29:48 -04:00
Joey Hess	e784757376	hlint tweaks Did all sources except Remotes/* and Command/*	2011-07-15 03:12:05 -04:00
Joey Hess	9f1577f746	remove unused backend machinery The only remaining vestiage of backends is different types of keys. These are still called "backends", mostly to avoid needing to change user interface and configuration. But everything to do with storing keys in different backends was gone; instead different types of remotes are used. In the refactoring, lots of code was moved out of odd corners like Backend.File, to closer to where it's used, like Command.Drop and Command.Fsck. Quite a lot of dead code was removed. Several data structures became simpler, which may result in better runtime efficiency. There should be no user-visible changes.	2011-07-05 19:57:46 -04:00
Joey Hess	703c437bd9	rename modules for data types into Types/ directory	2011-06-01 21:56:04 -04:00
Joey Hess	c43e3b5c78	check key size when available, no matter the backend Now that SHA and other backends can have size info, fsck should check it whenever available.	2011-03-23 02:10:59 -04:00
Joey Hess	4594bd51c1	rename file	2011-03-15 22:04:50 -04:00
Joey Hess	9d49fe2c17	first pass at using new keys It compiles. It sorta works. Several subcommands are FIXME marked and broken, because things that used to accept separate --backend and --key params need to be changed to accept just a --key that encodes all the key info, now that there is metadata in keys.	2011-03-15 21:34:13 -04:00
Joey Hess	72d2684016	Rethink filename encoding handling for display. Since filename encoding may or may not match locale settings, any attempt to decode filenames will fail for some files. So instead, do all output in binary mode.	2011-03-12 15:30:17 -04:00
Joey Hess	a3daac8a8b	only enable SHA backends that configure finds support for	2011-03-02 13:47:45 -04:00
Joey Hess	5a50a7cf13	update unicode FilePath handling Based on http://hackage.haskell.org/trac/ghc/ticket/3307 , whether FilePath contains decoded unicode varies by OS. So, add a configure check for it. Also, renamed showFile to filePathToString	2011-02-11 15:37:37 -04:00
Joey Hess	fe55b4644e	Fix display of unicode filenames. Internally, the filenames are stored as un-decoded unicode. I tried decoding them, but then haskell tries to access the wrong files. Hmm. So, I've unhappily chosen option "B", which is to decode filenames before they are displayed.	2011-02-10 14:21:44 -04:00
Joey Hess	167523f09d	better directory handling Rename Locations functions for better consitency, and make their values more consistent too. Used </> rather than manually building paths. There are still more places that manually do so, but are tricky, due to the behavior of </> when the second FilePath is absolute. So I only changed places where it obviously was relative.	2011-01-27 17:00:32 -04:00
Joey Hess	616d1d4a20	rename TypeInternals to BackendTypes Now that it only contains types used by the backends	2011-01-26 00:37:50 -04:00

1 2

65 commits