Commit graph

1268 commits

Author SHA1 Message Date
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
0e44985210
remove duplicate import 2019-01-14 18:26:38 -04:00
Joey Hess
e0c4ac99b5
convert serializeKey' to strict ByteString
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
2019-01-14 17:03:46 -04:00
Joey Hess
5d98cba923
use ByteStrings when reading annex symlinks and pointers
Now there's a ByteString used all the way from disk to Key.

The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.

Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
2019-01-14 15:37:08 -04:00
Joey Hess
fc21cccf1c
slight optimisation more 2019-01-11 19:56:31 -04:00
Joey Hess
16c798b5ef
switch MetaValue to ByteString and MetaField to Text
MetaField was already limited to alphanumerics, so it makes sense to use
Text for it.

Note that technically a UUID can contain invalid UTF-8, and so
remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8
values with '?' or whatever. In practice, a UUID is usually also text,
I only kept open the possibility of it containing invalid UTF-8 to avoid
breaking parsing of strange UUIDs in git-annex branch files. So, I
decided to let this edge case slip by.

Have not updated the rest of the code base yet for this change, as the
change took 2.5 hours longer than I expected to get working properly.
2019-01-07 14:18:24 -04:00
Joey Hess
a80922a594
support for ByteStrings 2019-01-07 12:29:25 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
f574d8af10
comment typo 2019-01-03 00:22:05 -04:00
Joey Hess
3ba6e9bb96
use attoparsec parser for String parsing, 10x speedup
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.

    benchmarking parse/old
    time                 9.657 μs   (9.600 μs .. 9.732 μs)
                         1.000 R²   (0.999 R² .. 1.000 R²)
    mean                 9.703 μs   (9.645 μs .. 9.785 μs)
    std dev              231.6 ns   (161.5 ns .. 323.7 ns)
    variance introduced by outliers: 25% (moderately inflated)

    benchmarking parse/new
    time                 834.6 ns   (797.1 ns .. 886.9 ns)
                         0.987 R²   (0.976 R² .. 0.999 R²)
    mean                 816.4 ns   (802.7 ns .. 845.1 ns)
    std dev              62.39 ns   (37.66 ns .. 108.4 ns)
    variance introduced by outliers: 82% (severely inflated)

There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
2019-01-02 13:28:44 -04:00
Joey Hess
3c74dcd4e1
attoparsec parser for POSIXTime
(Not yet used anywhere.)

Benchmarking

{-# LANGUAGE OverloadedStrings #-}

import Criterion.Main
import Utility.TimeStamp
import Data.Attoparsec.ByteString

main = defaultMain
	[ bgroup "parse"
		[ bench "new" $ whnf (parseOnly (parserPOSIXTime <* endOfInput)) "1431286201.113452s"
		, bench "old" $ whnf parsePOSIXTime "1431286201.113452s"
		]
	]

benchmarking parse/new
time                 643.6 ns   (640.2 ns .. 646.7 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 645.3 ns   (642.1 ns .. 650.9 ns)
std dev              14.59 ns   (9.194 ns .. 22.07 ns)
variance introduced by outliers: 29% (moderately inflated)

benchmarking parse/old
time                 9.657 μs   (9.600 μs .. 9.732 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 9.703 μs   (9.645 μs .. 9.785 μs)
std dev              231.6 ns   (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)

So old took 9703 ns to parse, and new 643 ns.
2019-01-02 12:48:53 -04:00
Joey Hess
ba2c0663f9
comments 2019-01-01 22:48:14 -04:00
Joey Hess
ec1b9da72f
avoid abusing from/toRawFilePath for non-FilePaths 2019-01-01 22:44:04 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
1b44426805
avoid conflicting definitions of Template type
When both modules are imported and then re-exported.
2018-12-30 15:03:31 -04:00
Joey Hess
5480b3a9af
fix bogus ghc 8.6.3 build warning
ghc warned that the guards did not cover all values of h, but they
clearly do, and when rewritten as a case statement the warning goes
away.

Probably a ghc bug, but I kind of prefer the case statement over the
guards anyway.
2018-12-30 14:43:27 -04:00
Joey Hess
14971414dc
Make test suite work better when the temp directory is on NFS.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.

Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
2018-12-19 12:44:56 -04:00
Joey Hess
850d19d038
add dropFromEnd 2018-11-23 11:24:05 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
ff9bd9620e
Fix resume of download of url when the whole file content is already actually downloaded
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-12 16:08:47 -04:00
Joey Hess
051dfcb3be
Revert "fix comment"
This reverts commit bac7d34e71.

The comment was right; ARG_MAX is the total length of all arguments.
2018-11-06 17:26:20 -04:00
Joey Hess
bac7d34e71
fix comment 2018-11-06 11:42:31 -04:00
Joey Hess
5ad5d45d4c
make Arbitrary POSIXTime include decimal half the time 2018-10-31 16:27:55 -04:00
Joey Hess
2ca408dc33
Increase minimum QuickCheck version. 2018-10-31 15:53:22 -04:00
Joey Hess
f00b329e0c
remove unused import 2018-10-30 13:38:29 -04:00
Joey Hess
86df2a08fe
fix windows build 2018-10-30 11:09:45 -04:00
Joey Hess
5ab0f48ffb
high-res mtimes
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).

Including on Windows.

With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
2018-10-30 00:41:26 -04:00
Joey Hess
48af284872
fix parse of negative posix time
Should never happen, but..
2018-10-29 23:40:34 -04:00
Joey Hess
a8ad577d1d
fix parsing of timestamp w/o trailing 's'
Luckily, this did not affect any git-annex log files, since they all
include the trailing 's' for backwards compatability reasons.

But, if I later want to drop that, this is the first commit where
git-annex can be trusted to parse that right.

The misparse caused it to be off by up to 10 seconds.
2018-10-29 23:36:47 -04:00
Joey Hess
3d1b22dc8e
factor out another function 2018-10-29 23:33:56 -04:00
Joey Hess
2e9f128dea
moved module and relicensed 2018-10-29 23:13:36 -04:00
Joey Hess
5d97898a7c
touch files with high-resolution timestamp
Needs unix 2.7.2, but that was included in ghc 8.0.1 (and much older)
so not really a new dep.
2018-10-29 22:25:21 -04:00
Joey Hess
94b7968f1f
forgot to remove this when dropping support for old ghc 2018-10-29 22:01:06 -04:00
Joey Hess
595fb98473
add small delay to avoid problems on systems with low-resolution mtime
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.

Seen on amd64 and i386 builders, repeatedly but intermittently:

	unused: FAIL (4.86s)
	Test.hs:928:
	git diff did not show changes to unlocked file

And I think other such failures, all involving v7/v6 mode tests.

I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.

The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.

Indeed, notice the identical timestamps:

	builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.894942036 +0000
	Change: 2018-10-29 22:14:10.894942036 +0000
	 Birth: -
	  File: foo
	  Size: 2         	Blocks: 8          IO Block: 4096   regular file
	Device: 801h/2049d	Inode: 3546179     Links: 1
	Access: (0644/-rw-r--r--)  Uid: ( 1000/ builder)   Gid: ( 1000/ builder)
	Access: 2018-10-29 22:14:10.894942036 +0000
	Modify: 2018-10-29 22:14:10.898942036 +0000
	Change: 2018-10-29 22:14:10.898942036 +0000
	 Birth: -

I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.

So, I hope that this small delay will avoid the problem.

Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
2018-10-29 19:31:26 -04:00
Joey Hess
234842a347
v7
Install new git hooks in this version.

This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.

I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.

The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.

newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.

This commit was sponsored by John Pellman on Patreon.
2018-10-25 18:24:23 -04:00
Joey Hess
38d691a10f
removed the old Android app
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.

[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/

This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.

Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.

Some documentation specific to the Android app (screenshots etc) needs
to be updated still.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-10-13 01:41:11 -04:00
Joey Hess
45e09ea7f3
debug the full adjusted Request
So that the user-agent etc are included in the debug.
2018-10-04 13:45:27 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
502c5a4917
remove support for old http-client version
git-annex already bumped to a newer version for the http security fix.

This commit was sponsored by mo on Patreon.
2018-10-03 12:00:07 -04:00
Joey Hess
c88e8c8249
unify error display 2018-10-03 11:56:52 -04:00
Joey Hess
26a02cb386
display error when an invalid url is downloaded
download is documented as displaying an error when download fails, but
it didn't when the url was not valid at all. That leads to confusing
behavior.

Also, display the url with --debug
2018-09-25 13:38:20 -04:00
Joey Hess
cc82f81227
More FreeBSD build fixes.
Untested, on FreeBSD but enough to fix the listed build errors.

Seems that System.Posix.Files must have used to export this stuff and it
was split.

This commit was sponsored by Peter on Patreon.
2018-09-24 11:25:56 -04:00
Joey Hess
ceee7758a5
fix \ escaping 2018-09-22 11:33:08 -04:00
Joey Hess
d2c351f547
update windows NUL for ghc 8.6.1
This should also work with older ghc, since the path is a windows device
namespace path.
2018-09-22 11:31:55 -04:00
Joey Hess
2aae6e84af
Support newlines in filenames.
Work around git cat-file --batch's protocol not supporting newlines by
running git cat-file not batched and passing the filename as a
parameter.

Of course this is quite a lot less efficient, especially because it
currently runs it multiple times to query for different pieces of
information.

Also, it has subtly different behavior when the batch process was
started and then some changes were made, in which case the batch process
sees the old index but this workaround sees the current index. Since
that batch behavior is mostly a problem that affects the assistant and has
to be worked around in it, I think I can get away with this difference.

I don't know of any other problems with newlines in filenames, everything
else in git I can think of supports -z. And git-annex's json output
supports newlines in filenames so downstream parsers from git-annex will be ok.
git-annex commands that use --batch themselves don't support newlines
in input filenames; using --json --batch is currently a way around that
problem.

This commit was sponsored by Ewen McNeill on Patreon.
2018-09-20 13:45:44 -04:00
Yaroslav Halchenko
b976eb5353
BF(minor): missing space after "Unsupported url scheme" msg before the scheme 2018-09-18 18:19:20 -04:00
Joey Hess
b3c9c59d3d
--debug urls
When git-annex used wget and curl, --debug would show urls. So there can't
be any new security problem with doing so.

This commit was sponsored by John Pellman on Patreon.
2018-09-14 12:46:39 -04:00
Joey Hess
b18fb1e343
clean P2P protocol shutdown on EOF
Avoids "git-annex-shell: <stdin>: hGetChar: end of file"
being displayed by the test suite, due to the way it
runs git-annex-shell without using ssh.

git-annex-shell over ssh was not affected because git-annex hangs up the
ssh connection and so never sees the error message that git-annnex-shell
probably did emit.

This commit was sponsored by Ryan Newton on Patreon.
2018-09-13 10:46:37 -04:00
Joey Hess
872640549b
comment typo 2018-09-05 13:57:06 -04:00
Joey Hess
f4788f3853
clarify comment
haskell-mountpoints contains android specific code, but it's not used
when git-annex was built for linux and is running on android.
2018-09-05 11:22:27 -04:00