git-annex

Author	SHA1	Message	Date
Joey Hess	3334d3831b	change retrieveExport and getKey to throw exception retrieveExport is part of ongoing transition to make remote methods throw exceptions, rather than silently hide them. getKey very rarely fails, and when it does it's always for the same reason (user configured annex.backend to url for some reason). So, this will avoid dealing with Nothing everywhere it's used. This commit was sponsored by Ilya Shlyakhter on Patreon.	2020-05-15 13:45:53 -04:00
Joey Hess	81d402216d	cache the serialization of a Key This will speed up the common case where a Key is deserialized from disk, but is then serialized to build eg, the path to the annex object. Previously attempted in `4536c93bb2` and reverted in `96aba8eff7`. The problems mentioned in the latter commit are addressed now: Read/Show of KeyData is backwards-compatible with Read/Show of Key from before this change, so Types.Distribution will keep working. The Eq instance is fixed. Also, Key has smart constructors, avoiding needing to remember to update the cached serialization. Used git-annex benchmark: find is 7% faster whereis is 3% faster get when all files are already present is 5% faster Generally, the benchmarks are running 0.1 seconds faster per 2000 files, on a ram disk in my laptop.	2019-11-22 17:49:16 -04:00
Joey Hess	8355dba5cc	plumb MeterUpdate into getKey No behavior changes, but this shows everywhere that a progress meter could be displayed when hashing a file to add to the annex. Many of the places don't make sense to display a progress meter though, eg when importing the copy of the file probably swamps the hashing of the file.	2019-06-25 11:43:24 -04:00
Joey Hess	40ecf58d4b	update licenses from GPL to AGPL This does not change the overall license of the git-annex program, which was already AGPL due to a number of sources files being AGPL already. Legally speaking, I'm adding a new license under which these files are now available; I already released their current contents under the GPL license. Now they're dual licensed GPL and AGPL. However, I intend for all my future changes to these files to only be released under the AGPL license, and I won't be tracking the dual licensing status, so I'm simply changing the license statement to say it's AGPL. (In some cases, others wrote parts of the code of a file and released it under the GPL; but in all cases I have contributed a significant portion of the code in each file and it's that code that is getting the AGPL license; the GPL license of other contributors allows combining with AGPL code.)	2019-03-13 15:48:14 -04:00
Joey Hess	9c4650358c	add KeyVariety type Where before the "name" of a key and a backend was a string, this makes it a concrete data type. This is groundwork for allowing some varieties of keys to be disabled in file2key, so git-annex won't use them at all. Benchmarks ran in my big repo: old git-annex info: real 0m3.338s user 0m3.124s sys 0m0.244s new git-annex info: real 0m3.216s user 0m3.024s sys 0m0.220s new git-annex find: real 0m7.138s user 0m6.924s sys 0m0.252s old git-annex find: real 0m7.433s user 0m7.240s sys 0m0.232s Surprising result; I'd have expected it to be slower since it now parses all the key varieties. But, the parser is very simple and perhaps sharing KeyVarieties uses less memory or something like that. This commit was supported by the NSF-funded DataLad project.	2017-02-24 15:16:56 -04:00
Joey Hess	737e45156e	remove 163 lines of code without changing anything except imports	2016-01-20 16:36:33 -04:00
Joey Hess	cad3349001	rename fsckKey to verifyKeyContent No behavior changes.	2015-10-01 13:29:17 -04:00
Joey Hess	77c43a388e	fromkey, registerurl: Allow urls to be specified instead of keys, and generate URL keys. This is especially useful because the caller doesn't need to generate valid url keys, which involves some escaping of characters, and may involve taking a md5sum of the url if it's too long.	2015-05-22 22:41:36 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	c0f2b992ed	Generate shorter keys for WORM and URL, avoiding keys that are longer than used for SHA256, so as to not break on systems like Windows that have very small maximum path length limits.	2015-01-06 17:58:57 -04:00
Joey Hess	13bbb61a51	add key stability checking interface Needed for resuming from chunks. Url keys are considered not stable. I considered treating url keys with a known size as stable, but just don't feel that is enough information.	2014-07-27 12:33:46 -04:00
Joey Hess	9d71903c2f	migrate: Avoid re-checksumming when migrating from hashE to hash backend.	2014-07-10 17:06:04 -04:00
Joey Hess	1be4d281d6	Better sanitization of problem characters when generating URL and WORM keys. FAT has a lot of characters it does not allow in filenames, like ? and * It's probably the worst offender, but other filesystems also have limitiations. In 2011, I made keyFile escape : to handle FAT, but missed the other characters. It also turns out that when I did that, I was also living dangerously; any existing keys that contained a : had their object location change. Oops. So, adding new characters to escape to keyFile is out. Well, it would be possible to make keyFile behave differently on a per-filesystem basis, but this would be a real nightmare to get right. Consider that a rsync special remote uses keyFile to determine the filenames to use, and we don't know the underlying filesystem on the rsync server.. Instead, I have gone for a solution that is backwards compatable and simple. Its only downside is that already generated URL and WORM keys might not be able to be stored on FAT or some other filesystem that dislikes a character used in the key. (In this case, the user can just migrate the problem keys to a checksumming backend. If this became a big problem, fsck could be made to detect these and suggest a migration.) Going forward, new keys that are created will escape all characters that are likely to cause problems. And if some filesystem comes along that's even worse than FAT (seems unlikely, but here it is 2013, and people are still using FAT!), additional characters can be added to the set that are escaped without difficulty. (Also, made WORM limit the part of the filename that is embedded in the key, to deal with filesystem filename length limits. This could have already been a problem, but is more likely now, since the escaping of the filename can make it longer.) This commit was sponsored by Ian Downes	2013-10-05 15:01:49 -04:00
Joey Hess	ddd46db09a	Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!	2013-07-30 19:18:29 -04:00
Joey Hess	e71f85645e	handle shasum's leading \ in checksum with certian unsual filenames Bugfix: Remove leading \ from checksums output by shasum commands, when the filename contains \ or a newline. Closes: #696384 fsck: Still accept checksums with a leading \ as valid, now that above bug is fixed. * migrate: Remove leading \ in checksums	2012-12-20 17:07:10 -04:00
Joey Hess	2172cc586e	where indenting	2012-11-11 00:51:07 -04:00
Joey Hess	d3cee987ca	separate source of content from the filename associated with the key when generating a key This already made migrate's code a lot simpler.	2012-06-05 19:51:03 -04:00
Joey Hess	8f9b501515	handle really long urls Using the whole url as a key can make the filename too long. Truncate and use a md5sum for uniqueness if necessary.	2012-02-16 02:05:06 -04:00
Joey Hess	17fed709c8	addurl --fast: Verifies that the url can be downloaded (only getting its head), and records the size in the key.	2012-02-10 19:23:46 -04:00
Joey Hess	d36525e974	convert fsckKey to a Maybe This way it's clear when a backend does not implement its own fsck checks.	2012-01-19 13:51:30 -04:00
Joey Hess	4a02c2ea62	type alias cleanup	2011-12-31 04:11:58 -04:00
Joey Hess	6a6ea06cee	rename	2011-10-05 16:02:51 -04:00
Joey Hess	cfe21e85e7	rename	2011-10-04 00:59:08 -04:00
Joey Hess	8ef2095fa0	factor out common imports no code changes	2011-10-03 23:29:48 -04:00
Joey Hess	dede05171b	addurl: --fast can be used to avoid immediately downloading the url. The tricky part about this is that to generate a key, the file must be present already. Worked around by adding (back) an URL key type, which is used for addurl --fast.	2011-08-06 14:57:22 -04:00
Joey Hess	2cdacfbae6	remove URL backend	2011-07-01 16:01:04 -04:00
Joey Hess	703c437bd9	rename modules for data types into Types/ directory	2011-06-01 21:56:04 -04:00
Joey Hess	6246b807f7	migrate: Support migrating v1 SHA keys to v2 SHA keys with size information that can be used for free space checking.	2011-03-23 17:57:10 -04:00
Joey Hess	7b5b127608	Fix dropping of files using the URL backend.	2011-03-17 11:49:21 -04:00
Joey Hess	da504f647f	fromkey, and url backend download work now	2011-03-15 22:28:18 -04:00
Joey Hess	4594bd51c1	rename file	2011-03-15 22:04:50 -04:00
Joey Hess	a3daac8a8b	only enable SHA backends that configure finds support for	2011-03-02 13:47:45 -04:00
Joey Hess	fcdc4797a9	use ShellParam type So, I have a type checked safe handling of filenames starting with dashes, throughout the code.	2011-02-28 16:18:55 -04:00
Joey Hess	836e71297b	Support filenames that start with a dash; when such a file is passed to a utility it will be escaped to avoid it being interpreted as an option.	2011-02-25 01:13:01 -04:00
Joey Hess	e1d213d6e3	make filename available to fsck messages	2011-01-26 20:37:46 -04:00
Joey Hess	616d1d4a20	rename TypeInternals to BackendTypes Now that it only contains types used by the backends	2011-01-26 00:37:50 -04:00
Joey Hess	082b022f9a	successfully split Annex and AnnexState out of TypeInternals	2011-01-25 21:49:04 -04:00
Joey Hess	109a719b03	parameterize Backend type This allows the Backend type to not depend on the Annex type, and so the Annex type can later be moved out of TypeInternals.	2011-01-25 21:02:34 -04:00
Joey Hess	653ad35a9f	In .gitattributes, the git-annex-numcopies attribute can be used to control the number of copies to retain of different types of files.	2010-11-28 15:28:20 -04:00
Joey Hess	5fa25a812a	fsck improvements * fsck: Check if annex.numcopies is satisfied. * fsck: Verify the sha1 of files when the SHA1 backend is used. * fsck: Verify the size of files when the WORM backend is used. * fsck: Allow specifying individual files to fsk if fscking everything is not desired. * fsck: Fix bug, introduced in 0.04, in detection of unused data.	2010-11-13 14:59:27 -04:00
Joey Hess	070e8530c1	refactoring, no code changes really	2010-11-08 15:15:21 -04:00
Joey Hess	cf4c926f2e	more Wall cleaning	2010-10-31 16:00:32 -04:00
Joey Hess	7c0777c60d	avoid unnessary newlines before progress in quiet mode	2010-10-29 14:10:55 -04:00
Joey Hess	d92f186fc4	convert safeSystem to boolSystem to fix ctrl-c handling	2010-10-29 14:07:26 -04:00
Joey Hess	833d4b342e	copyright statements	2010-10-27 16:53:54 -04:00
Joey Hess	c7664588f8	use safesystem	2010-10-19 01:19:56 -04:00
Joey Hess	f3dcc8489d	gratuitous rename	2010-10-18 02:06:27 -04:00
Joey Hess	8398b9ab4a	cleanup output	2010-10-17 13:17:34 -04:00
Joey Hess	909f619c07	tweaks	2010-10-16 16:20:49 -04:00
Joey Hess	5de102d5b9	rename backends more	2010-10-15 19:33:10 -04:00

50 commits