git-annex

Author	SHA1	Message	Date
Joey Hess	a1730cd6af	adeiu, MissingH Removed dependency on MissingH, instead depending on the split library. After laying groundwork for this since 2015, it was mostly straightforward. Added Utility.Tuple and Utility.Split. Eyeballed System.Path.WildMatch while implementing the same thing. Since MissingH's progress meter display was being used, I re-implemented my own. Bonus: Now progress is displayed for transfers of files of unknown size. This commit was sponsored by Shane-o on Patreon.	2017-05-16 01:03:52 -04:00
Joey Hess	6dd806f1ad	stop using MissingH for MD5 Cryptonite is faster and allocates less, and I want to get rid of MissingH use. Note that the new dependency on memory is free; it's a dependency of cryptonite. This commit was supported by the NSF-funded DataLad project.	2017-05-15 21:36:03 -04:00
Joey Hess	23d71423e1	work around ghc segfault hSetEncoding of a closed handle segfaults. https://ghc.haskell.org/trac/ghc/ticket/7161 `8484c0c197` introduced the crash. In particular, stdin may get closed (by eg, getContents) and then trying to set its encoding will crash. We didn't need to adjust stdin's encoding anyway, but only stderr, to work around https://github.com/yesodweb/persistent/issues/474 Thanks to Mesar Hameed for assistance related to reproducing this bug.	2016-12-30 18:14:19 -04:00
Joey Hess	8484c0c197	Always use filesystem encoding for all file and handle reads and writes. This is a big scary change. I have convinced myself it should be safe. I hope!	2016-12-24 14:46:31 -04:00
Joey Hess	4224fae71f	optimise read and write for Keys database (untested) Writes are optimised by queueing up multiple writes when possible. The queue is flushed after the Annex monad action finishes. That makes it happen on program termination, and also whenever a nested Annex monad action finishes. Reads are optimised by checking once (per AnnexState) if the database exists. If the database doesn't exist yet, all reads return mempty. Reads also cause queued writes to be flushed, so reads will always be consistent with writes (as long as they're made inside the same Annex monad). A future optimisation path would be to determine when that's not necessary, which is probably most of the time, and avoid flushing unncessarily. Design notes for this commit: - separate reads from writes - reuse a handle which is left open until program exit or until the MVar goes out of scope (and autoclosed then) - writes are queued - queue is flushed periodically - immediate queue flush before any read - auto-flush queue when database handle is garbage collected - flush queue on exit from Annex monad (Note that this may happen repeatedly for a single database connection; or a connection may be reused for multiple Annex monad actions, possibly even concurrent ones.) - if database does not exist (or is empty) the handle is not opened by reads; reads instead return empty results - writes open the handle if it was not open previously	2015-12-23 19:18:52 -04:00
Joey Hess	04e150abb3	use intercalate instead of MissingH's join The two functions are identical.	2015-11-17 17:27:24 -04:00
Joey Hess	e953be11af	avoid throwing exception when String is not encoded using the filesystem encoding Since _encodeFilePath generates a String that doesn't use the filesystem encoding, when this exception is caught, we know we already have such a String, and can just return it as-is.	2015-08-12 10:57:48 -04:00
Joey Hess	4e4e11849a	fix test suite fail in LANG=C This was caused by `23e9d3bb77` an Arbitrary String is not necessarily encoded using the filesystem encoding, and in a non-utf8 locale, encodeBS throws an exception on such a string. All I could think to do is limit test data to ascii. This shouldn't be a problem in practice, because the all Strings in git-annex that are not generated by Arbitrary should be loaded in a way that does apply the filesystem encoding.	2015-08-12 10:36:51 -04:00
Joey Hess	23e9d3bb77	Fix setting/setting/viewing metadata that contains unicode or other special characters, when in a non-unicode locale. Oh boy, not again. So, another place that the filesystem encoding needs to be applied. Yay. In passing, I changed decodeBS so if a NUL is embedded in the input, the resulting FilePath doesn't get truncated at that NUL. This was needed to make prop_b64_roundtrips pass, and on reviewing the callers of decodeBS, I didn't see any where this wouldn't make sense. When a FilePath is used to operate on the filesystem, it'll get truncated at a NUL anyway, whereas if a String is being used for something else, it might conceivably have a NUL in it, and we wouldn't want it to get truncated when going through decodeBS. (NB: There may be a speed impact from this change.)	2015-08-11 18:40:59 -04:00
Joey Hess	ed4fe02896	disable horrible tab warning, needed in every file that Setup.hs pulls in This is certianly a cabal bug for not passing the build options in the cabal file when building Setup.hs. And, why oh why did ghc enable this warning by default? So unhappy with this choice.	2015-05-10 16:31:50 -04:00
Joey Hess	9b93278e8a	metadata: Fix encoding problem that led to mojibake when storing metadata strings that contained both unicode characters and a space (or '!') character. The fix is to stop using w82s, which does not properly reconstitute unicode strings. Instrad, use utf8 bytestring to get the [Word8] to base64. This passes unicode through perfectly, including any invalid filesystem encoded characters. Note that toB64 / fromB64 are also used for creds and cipher embedding. It would be unfortunate if this change broke those uses. For cipher embedding, note that ciphers can contain arbitrary bytes (should really be using ByteString.Char8 there). Testing indicated it's not safe to use the new fromB64 there; I think that characters were incorrectly combined. For credpair embedding, the username or password could contain unicode. Before, that unicode would fail to round-trip through the b64. So, I guess this is not going to break any embedded creds that worked before. This bug may have affected some creds before, and if so, this change will not fix old ones, but should fix new ones at least.	2015-03-04 12:54:30 -04:00
Joey Hess	afc5153157	update my email address and homepage url	2015-01-21 12:50:09 -04:00
Joey Hess	7b50b3c057	fix some mixed space+tab indentation This fixes all instances of " \t" in the code base. Most common case seems to be after a "where" line; probably vim copied the two space layout of that line. Done as a background task while listening to episode 2 of the Type Theory podcast.	2014-10-09 15:09:11 -04:00
Joey Hess	2427832bed	relicense general utility library code to BSD Omitted a couple of files what have had significant contributions from others.	2014-05-10 11:01:27 -03:00
Joey Hess	1052eeface	Windows: Fix some filename encoding bugs. http://git-annex.branchable.com/bugs/Unicode_file_names_ignored_on_Windows/ Not a complete fix yet.	2014-03-19 15:57:56 -04:00
Joey Hess	ddd46db09a	Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit. Started with a problem when running addurl on a really long url, because the whole url is munged into the filename. Ended up doing a fairly extensive review for places where filenames could get too large, although it's hard to say I'm not missed any.. Backend.Url had a 128 character limit, which is fine when the limit is 255, but not if it's a lot shorter on some systems. So check the pathconf() limit. Note that this could result in fromUrl creating different keys for the same url, if run on systems with different limits. I don't see this is likely to cause any problems. That can already happen when using addurl --fast, or if the content of an url changes. Both Command.AddUrl and Backend.Url assumed that urls don't contain a lot of multi-byte unicode, and would fail to truncate an url that did properly. A few places use a filename as the template to make a temp file. While that's nice in that the temp file name can be easily related back to the original filename, it could lead to `git annex add` failing to add a filename that was at or close to the maximum length. Note that in Command.Add.lockdown, the template is still derived from the filename, just with enough space left to turn it into a temp file. This is an important optimisation, because the assistant may lock down a bunch of files all at once, and using the same template for all of them would cause openTempFile to iterate through the same set of names, looking for an unused temp file. I'm not very happy with the relatedTemplate hack, but it avoids that slowdown. Backend.WORM does not limit the filename stored in the key. I have not tried to change that; so git annex add will fail on really long filenames when using the WORM backend. It seems better to preserve the invariant that a WORM key always contains the complete filename, since the filename is the only unique material in the key, other than mtime and size. Since nobody has complained about add failing (I think I saw it once?) on WORM, probably it's ok, or nobody but me uses it. There may be compatability problems if using git annex addurl --fast or the WORM backend on a system with the 255 limit and then trying to use that repo in a system with a smaller limit. I have not tried to deal with those. This commit was sponsored by Alexander Brem. Thanks!	2013-07-30 19:18:29 -04:00
Joey Hess	60c31afc38	add decodeW8	2012-09-13 19:14:29 -04:00
Joey Hess	ee2acd474f	[Word8] to filesystem encoded String My, GHC makes this hard.	2012-06-20 12:51:25 -04:00
Joey Hess	f9d44cccd9	perhaps more clear type	2012-03-10 11:38:38 -04:00
Joey Hess	bca3fd65b9	fix key directory hash calculation code Fix Key directory hash calculation code to behave as it did before version 3.20120227 when a key contains non-ascii. The hash directories for a given Key are based on its md5sum. Prior to ghc 7.4, Keys contained raw, undecoded bytes, so the md5sum was taken of each byte in turn. With the ghc 7.4 filename encoding change, keys contains decoded unicode characters (possibly with surrigates for undecodable bytes). This changes the result of the md5sum, since the md5sum used is pure haskell and supports unicode. And that won't do, as git-annex will start looking in a different hash directory for the content of a key. The surrigates are particularly bad, since that's essentially a ghc implementation detail, so could change again at any time. Also, changing the locale changes how the bytes are decoded, which can also change the md5sum. Symptoms would include things like: * git annex fsck would complain that no copies existed of a file, despite its symlink pointing to the content that was locally present * git annex fix would change the symlink to use the wrong hash directory. Only WORM backend is likely to have been affected, since only it tends to include much filename data (SHA1E could in theory also be affected). I have not tried to support the hash directories used by git-annex versions 3.20120227 to 3.20120308, so things added with those versions with WORM will require manual fixups. Sorry for the inconvenience!	2012-03-09 20:03:51 -04:00
Joey Hess	d6e77595ba	factor out Utility.FileSystemEncoding	2012-03-09 19:08:10 -04:00
Joey Hess	789254747b	refactor	2012-03-09 18:52:03 -04:00

22 commits