In 2013, I wrote "Cryptohash benchmarks 90 to 101% faster than external
hashers". Re-benchmarking today, I found cryptonite's sha256 consistently
outperformed coreutils by 10% for large files. Tested 10 mb, 100 mb, 1 gb
files with both sha256 and sha512. And for smaller files, the external
process startup time swamps the hash time.
Perhaps cryptonite has improved. Or it could just do better on my
current CPU Intel(R) Pentium(R) CPU 4410Y @ 1.50GHz). Anyway, even if cryptonite
is slower in some situations, seems likely it would only be marginally slower;
it's got the same class of highly optimised C code under the hood as coreutils.
The main difference between the two sha256 implementations seems to be
how much of the inner loop they unroll..
This commit was sponsored by Henrik Riomar on Patreon.
This way, autobuilders can run it before building, in a tree where
git-annex was already built, and get the current git rev embedded in the
build.
Before, the version number only updated when the setup program was run,
which was up to cabal and not on every build.
This turns out to have been prolimatic because building from a git tag
would make a git-annex that claimed a different version than the tag's
version.
The git rev is still included in the version, and people who want
to know the exact tree that was built can just use that, so there's no
real benefit to automatically advancing the date part of the version.
* For url downloads, git-annex now defaults to using a http library,
rather than wget or curl. But, if annex.web-options is set, it will
use curl. To use the .netrc file, run:
git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
git-annex builds).
Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.
This commit was supported by the NSF-funded DataLad project.
Fourth or fifth try at this and finally found a way to make it work.
Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.
Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.
Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.
This commit was sponsored by Jochen Bartl on Patreon.
They need unix on non-windows, for Utility.Env, which Build.Configure uses,
but cabal can't express that in a custom-setup stanza.
To avoid this problem, Utility.Env would need to be moved into
unix-compat..
wget was broken even in the previous old release of the windows bundle,
this is not new breakage. msys-idn-11.dll and probably more would be needed
to use it. git for windows includes msys-idn2-0.dll instead.
Not tested yet.
The EvilLinker workaround is removed. That got fixed in ghc 8.0.1,
(per https://ghc.haskell.org/trac/ghc/ticket/8596)
which will finally be used by the windows autobuilder now.
I have not deleted the EvilLinker yet (or closed its bugs).
This commit was sponsored by John Peloquin on Patreon.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
9c4650358c changed the Read instance for
Key.
I've checked all uses of that instance (by removing it and seeing what
breaks), and they're all limited to the webapp, except one.
That is GitAnnexDistribution's Read instance.
So, 9c4650358c would have broken upgrades
of git-annex from downloads.kitenet.net. Once the .info files there got
updated for a new release, old releases would have failed to parse them
and never upgraded.
To fix this, I found a way to make the .info files that contain
GitAnnexDistribution values be readable by the old version of git-annex.
This commit was sponsored by Ewen McNeill.
This adds one extra line of output when a download is successful,
after the progress bar. I don't much like that, but wget does not provide a
way to show HTTP errors without it.
This allows using functions that generate CreateProcess and passing the
result to processTranscript', which is more flexible, and also simpler
than the old interface.
This commit was sponsored by Riku Voipio.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Currently only done for utf-8 locales because the charset can easily be
told for those. Other locales don't include the charset in their name.
The locale definition is generated under git-annex.linux/locales.
So, this only works if the user can write there.
If locale generation fails for any reason, it's silently skipped.
The git-annex-standalone.deb installs the bundle under /usr, so this locale
generation won't work for non-root users.
Version mismatches between the system locale-archive and the glibc in the
bundle have been observed to cause git crashes.
Unfortunately, this causes locales to not be used in the linux standalone
bundle, as was the case until version 6.20160419.
glibc hardcodes the path to /usr/lib/locale/locale-archive and does not
let an environment variable cause a different locale-archive file to be used.
The only other option to include locales in the bundle would be to include
exploded locale definition directories in the bundle for a number of
locales, generated by localedef. But these take at least 300 kb per locale,
and there are a great many locales; it would be hundreds of megabytes to
include them all.
(Hmm, we could include localdef in the bundle, and check LANG in runshell
and compile the locale directories on the fly. This would need
/usr/share/i18n/ and /usr/lib/locale-archive to be included in the bundle.
It's.. doable.)
I know this is going to once again cause users of the bundle to complain
that eg, ls doesn't show their unicode filenames right. Better than strange
crashes though.
This was originally done in a7ef05a9, but got lost in some change to the
Makefile. Use CROSS_COMPILE=Android to tell configure that it's configuring
for android instead of passing it a parameter.
The code block in git-annex-smudge(1) was misformatted. Code blocks
start with tabs, so replace "\s" with " ". Tested to not affect
anything except git-annex-smudge.1.
This actually runs faster than building the man pages from the makefile
did. But the main purpose is to let Setup.hs import Build.Mans and so not
need the makefile.
The tarball on hackage will include only the files needed for cabal install;
it is NOT the full git-annex source tree. While it's totally obnoxious that
cabal files need every file listed out when basic wildcard support could
avoid hundreds of lines, and have to be maintained when files are added,
this does get the tarball size back down to 1 mb.
This also stops stack from complaining that it found modules not listed in
the cabal file.
debian/changelog, debian/NEWS, debian/copyright: Converted to symlinks
to CHANGELOG, NEWS, and COPYRIGHT, which used to symlink to these instead.
This avoids needing to include debian/ in the hackage tarball.
Setup.hs: Build man pages at install time using make and mdwn2man.
If it fails, which it probably will on windows, just skip installing
them.
It started exporting a isSymbolicLink which supports windows. But,
git-annex does no use symlinks on windows yet and this conflicts with the
function by the same name from unix-compat, so hide it.
Had to workaround various problems in clamscan. Increased its max filesize
a lot, because it's too small to check git-annex. Manual unpacking seemed
to be needed for dmg and tar.gz.
This is pretty complicated, but I have both "git-annex" and "git annex"
working both in the git bash shell even with git not added to path.
And, when git's added to path, both work from MS-DOS prompt window too.
I think that the webapp startup does still need git in path, so
instructions will keep saying to do that. But, users often disregard them,
and hopefully this will reduce support traffic.
Also, switched the wget from the cygwin one to the msys2 one, avoiding the
complication of needing to bundle any cygwin dlls.
Using msysgit with git-annex is no longer supported.
At the same time, I'm updating the rsync.exe in my downloads repository
with the one from msys2.
Note that rsync is currently still being ldded and installed in Git/cmd/
like the other cygwin programs. The ldd fails and this failure is ignored.
It would be better to special case it to go in Git/usr/bin/, so that the
user can't run rsync in a dos prompt window, which doesn't work, as it needs
additional libs. However, as far as git-annex running rsync running ssh,
it works ok in this location.
Removed the ssh.cmd and ssh-keygen.cmd; these are not needed with git for
windows. Keeping them would let ssh be run manually from a dos prompt
window, but that's not really a goal.
It was failing at link time, some problem with terminatePID.
Re-implemented that to not use a C wrapper function, which cleared up the
problem. Removed old EvilLinker hack with must have been related to the
same problem.
Note that I have not tested this with older ghc's. In
f11f7520b5 I mention having tried this
approach before, and getting segfaults.. So, who knows. It seems to work
fine with ghc 7.10 at least.
As a result of the Makefile changes, the Debian package is built
with various hardening options. Although their benefit to a largely
haskell program is unknown.
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
Since DistributionUpdate builds an Annex object, the new relative paths
code caused it to make a git object with paths like ../lib/downloads.
However, it actually runs the system's installed git-annex, and that old
version did not like being run in this situation. Probably it was buggy.
Fixed by chdir to the downloads repo before doing anything else.
Reverts 965e106f24
Unfortunately, this caused breakage on Windows, and possibly elsewhere,
because parentDir and takeDirectory do not behave the same when there is a
trailing directory separator.
parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /
This commit was sponsored by Audric SCHILTKNECHT.
Found these with:
git grep "^ " $(find -type f -name \*.hs) |grep -v ': where'
Unfortunately there is some inline hamlet that cannot use tabs for
indentation.
Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.