When downloading an url and the destination file exists but is empty,
avoid using http range to resume, since a range "bytes=0-" is an unusual
edge case that it's best to avoid relying on working.
This is known to fix a case where importfeed downloaded a partial feed from
such a server. Since importfeed uses withTmpFile, the destination always exists
empty, so it would particularly tickle such problem servers. Resuming from 0
is otherwise possible, but unlikely.
Add back support for ftp urls, which was disabled as part of the fix for
security hole CVE-2018-10857 (except for configurations which enabled curl
and bypassed public IP address restrictions). Now it will work if allowed
by annex.security.allowed-ip-addresses.
The NonEmpty instance was moved out of QuickCheck and into a package
with more deps than I want to drag in, so I'm providing my own instance,
but with older QuickCheck, use theirs to avoid overlapping.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
An empty list of [ContentIdenfier] serialized to the same thing
as a single ContentIdentifier "". Avoid this ambiguity by requiring the
list be non-empty.
It got removed from network-3.0.0.0 and nothing in the haskell ecosystem
currently provides it (which seems it ought to be fixed).
Tested new code on both little-endian and big-endian with:
ghci> hostAddressToTuple $ fromJust $ embeddedIpv4 (0,0,0,0,0,0xffff,0x7f00,1)
(127,0,0,1)
The gitAnnexTmpOtherDir cleanup made it be deleted too early sometimes,
and so the test suite failed. Also there was a report of a similar
failure which likely had a similar cause and hopwfully this fixes that
too.
This reverts commit 4536c93bb2.
That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.
It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.
Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
The builder produces a lazy ByteString, and L.toStrict has to copy it,
but needing to use the builder is no longer to common case; the
serialization will normally be cached already as a strict ByteString,
and this avoids keyFile' needing to use L.toStrict . serializeKey'
Now there's a ByteString used all the way from disk to Key.
The main complication in this conversion was the use of fromInternalGitPath
in several places to munge things on Windows. The things that used that
were changed to parse the ByteString using either path separator.
Also some code that had read from files to a String lazily was changed
to read a minimal strict ByteString.
MetaField was already limited to alphanumerics, so it makes sense to use
Text for it.
Note that technically a UUID can contain invalid UTF-8, and so
remoteMetaDataPrefix's use of T.pack . fromUUID could replace non-UTF8
values with '?' or whatever. In practice, a UUID is usually also text,
I only kept open the possibility of it containing invalid UTF-8 to avoid
breaking parsing of strange UUIDs in git-annex branch files. So, I
decided to let this edge case slip by.
Have not updated the rest of the code base yet for this change, as the
change took 2.5 hours longer than I expected to get working properly.
This is not as efficient as using ByteStrings throughout, but converting
the String to ByteString is actually significantly faster than the old
parser.
benchmarking parse/old
time 9.657 μs (9.600 μs .. 9.732 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 9.703 μs (9.645 μs .. 9.785 μs)
std dev 231.6 ns (161.5 ns .. 323.7 ns)
variance introduced by outliers: 25% (moderately inflated)
benchmarking parse/new
time 834.6 ns (797.1 ns .. 886.9 ns)
0.987 R² (0.976 R² .. 0.999 R²)
mean 816.4 ns (802.7 ns .. 845.1 ns)
std dev 62.39 ns (37.66 ns .. 108.4 ns)
variance introduced by outliers: 82% (severely inflated)
There is a small behavior change from the old parsePOSIXTime,
which accepted any amount of trailing whitespace after the timestamp.
That behavior was not documented, and it doesn't seem anything relied on it.
ghc warned that the guards did not cover all values of h, but they
clearly do, and when rewritten as a case statement the warning goes
away.
Probably a ghc bug, but I kind of prefer the case statement over the
guards anyway.
Deleting directories is one of the great unsolved problems of CS, thanks to
abominations like NFS lock files and Windows and races with other processes
cleaning up after themselves in the background. The gpg test harness
sometimes failed to delete its temp directory on NFS. Avoid the problem
class by not deleting it at all, and putting it inside the tmp repo being
tested. The test suite's more robust (and/or nonsensical) workarounds for
deleting its test dir will thus be used, hopefully avoiding the problem
until an OS finds a new way to violate POSIX and the laws of nature.
Note that this means that the .gnupg directory will be on whatever
filesystem the test suite is being run on, which may be a lesser quality
filesystem than gpg is really expecting. Gpg does not seem to need to
write sockets etc to there so this seems ok. The only known problem is
that if the filesystem forces a directory mode like 777, gpg will warn
about unsafe home directory perms, but it still works.
Don't much like that there's no way to distinguish between having the whole
content and having an old version of the file that's bigger, but of course
resuming a http transfer can always yield the wrong result if the file on
the http server is changing, and git-annex will detect that when it
verifies the downloaded content.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
Cache high-resolution mtimes for improved detection of modified files in v7
(and direct mode).
Including on Windows.
With back-compat support so old low-res mtimes won't break anything, and
so the new information also won't break old versions of git-annex.
Luckily, this did not affect any git-annex log files, since they all
include the trailing 's' for backwards compatability reasons.
But, if I later want to drop that, this is the first commit where
git-annex can be trusted to parse that right.
The misparse caused it to be off by up to 10 seconds.
I've seen intermittent failures of the test suite with v6 for a long time,
it seems to have possibly gotten worse with the changes around v7. Or just
being unlucky; all tests failed today.
Seen on amd64 and i386 builders, repeatedly but intermittently:
unused: FAIL (4.86s)
Test.hs:928:
git diff did not show changes to unlocked file
And I think other such failures, all involving v7/v6 mode tests.
I managed to reproduce the unused failure with --keep-failures,
and inside the repo, git diff was indeed not showing any changes for
the modified unlocked file.
The two stats will be the same other than mtime; the old and new files have
the same size and inode, since the test case writes to the file and then
overwrites it.
Indeed, notice the identical timestamps:
builder@orca:~/gitbuilder/build/.t/tmprepo335$ echo 1 > foo; stat foo; echo 2 > foo; stat foo
File: foo
Size: 2 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 3546179 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ builder) Gid: ( 1000/ builder)
Access: 2018-10-29 22:14:10.894942036 +0000
Modify: 2018-10-29 22:14:10.894942036 +0000
Change: 2018-10-29 22:14:10.894942036 +0000
Birth: -
File: foo
Size: 2 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 3546179 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ builder) Gid: ( 1000/ builder)
Access: 2018-10-29 22:14:10.894942036 +0000
Modify: 2018-10-29 22:14:10.898942036 +0000
Change: 2018-10-29 22:14:10.898942036 +0000
Birth: -
I'm seeing this in Linux VMs; it doesn't happen on my laptop. I've also
not experienced the intermittent test suite failures on my laptop.
So, I hope that this small delay will avoid the problem.
Update: I didn't, indeed I then reproduced the same failure on my
laptop, so it must be due to something else. But keeping this change anyway
since not needing to worry about lowish-resolution mtime in the test suite seems
worthwhile.
Install new git hooks in this version.
This does beg the question of what to do if git later gets eg a
post-smudge hook, that could run git-annex smudge --update. I think the
thing to do in that case would be to make git-annex smudge --update
install the new hooks. That way, as the user uses git-annex, the hook
would be created pretty quickly and without needing any extra syscalls
except for when git-annex smudge --update is called.
I considered doing something like that for installation of the
post-checkout and post-merge hooks, which would have avoided the need
for v7. But the only place it was cheap to do it would be in git-annex smudge
which could cheaply notice that smudge.log didn't exist yet and so know
the hooks needed to be installed. But since smudge used to populate pointer
files, it would be quite surprising if a single git checkout/merge failed
to update the work tree, and so that idea didn't work out.
The other reason for v7 is psychological -- users don't need to worry
about whether they might be running an old version of git-annex that
doesn't support their v7 repository very well. And bug reports about
"v6" have gotten a bit of a bad association in my head since they often
hit one of the known limitations and didn't realize it was experimental.
newtyped RepoVersion Int to avoid needing 2 comparisons in
versionSupportsUnlockedPointers etc. Also it's just nicer.
This commit was sponsored by John Pellman on Patreon.
Running git-annex linux builds in termux seems to work well enough that the
only reason to keep the Android app would be to support Android 4-5, which
the old Android app supported, and which I don't know if the termux method
works on (although I see no reason why it would not).
According to [1], Android 4-5 remains on around 29% of devices, down from
51% one year ago.
[1] https://www.statista.com/statistics/271774/share-of-android-platforms-on-mobile-devices-with-android-os/
This is a rather large commit, but mostly very straightfoward removal of
android ifdefs and patches and associated cruft.
Also, removed support for building with very old ghc < 8.0.1, and with
yesod < 1.4.3, and without concurrent-output, which were only being used
by the cross build.
Some documentation specific to the Android app (screenshots etc) needs
to be updated still.
This commit was sponsored by Brett Eisenberg on Patreon.