https://github.com/haskell/cabal/issues/4655
This means that when a module is conditionally imported via ifdef
depending on the OS or build flags, the cabal file has to mirror the
same logic there to only list the module then.
Since there are lots of OS's and lots of combinations of build flags
here, it's rather difficult to know if the cabal file has been completelty
correctly updated to match the source code.
So I am very unhappy with needing to update things in two places. I've
only tested this on linux with most build flags enables; this will
probably need significant time and testing to catch every cabal file
tweak that this change to Cabal requires. And it will be a continual
source of compile failures going forward when the code is modified and
the cabal file not also updated.
DRY DRY DRY, I repeat myself, but: DRY! Sigh..
(Also, had to remove all Build.* that are standalone programs from the
Other-Modules list, because since cabal passes those modules to ghc when
building git-annex, it complains that they use module Main. Those
modules are only used when building with the Makefile anyway, so this
change shouldn't break anything.)
This commit was sponsored by Thomas Hochstein on Patreon.
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.
No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.
Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.
SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.
Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.
This commit was sponsored by Jochen Bartl on Patreon.
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.
For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.
Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.
This commit was sponsored by Trenton Cronholm on Patreon.
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.
Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).
There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.
A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)
So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.
Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.
This commit was supported by the NSF-funded DataLad project.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
Cryptonite is faster and allocates less, and I want to get rid of
MissingH use.
Note that the new dependency on memory is free; it's a dependency of
cryptonite.
This commit was supported by the NSF-funded DataLad project.
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.
So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.
Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.
(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)
This commit was sponsored by Jeff Goeke-Smith on Patreon.
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.
Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
* Added post-recieve hook, which makes updateInstead work with direct
mode and adjusted branches.
* init: Set up the post-receive hook.
This commit was sponsored by Fernando Jimenez on Patreon.
Refactored some common code into initDb.
This only deals with the problem when creating new databases. If a repo
got bad permissions into it, it's up to the user to deal with it.
This commit was sponsored by Ole-Morten Duesund on Patreon.
... to control the default behavior in all clones of a repository.
This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.
The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.
Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.
This commit was sponsored by Trenton Cronholm on Patreon.
This interacts with it using stdio, which is surprisingly hard.
sendFile does not currently work, due to
https://github.com/warner/magic-wormhole/issues/108
Parsing the output to find the magic code is done as robustly as
possible, and should continue to work unless wormhole radically changes
the format of its codes. Presumably it will never output something that
looks like a wormhole code before the actual wormhole code; that would
also break this. It would be better if there was a way to make
wormhole not mix the code with other output, as requested in
https://github.com/warner/magic-wormhole/issues/104
Only exchange of files/directories is supported. To exchange messages,
https://github.com/warner/magic-wormhole/issues/99 would need to be resolved.
I don't need message exchange however.
Added to change notification to P2P protocol.
Switched to a TBChan so that a single long-running thread can be
started, and serve perhaps intermittent requests for change
notifications, without buffering all changes in memory.
The P2P runner currently starts up a new thread each times it waits
for a change, but that should allow later reusing a thread. Although
each connection from a peer will still need a new watcher thread to run.
The dependency on stm-chans is more or less free; some stuff in yesod
uses it, so it was already indirectly pulled in when building with the
webapp.
This commit was sponsored by Francois Marier on Patreon.
Similar to GCrypt remotes, P2P remotes have an url, so Remote.Git has to
separate them out and handle them, passing off to Remote.P2P.
This commit was sponsored by Ignacio on Patreon.
Each worker thread needs to run in the Annex monad, but the
remote-daemon's liftAnnex can only run 1 action at a time. Used
Annex.Concurrent to deal with that.
P2P.Annex is incomplete as of yet.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This is most of the way to having the p2p protocol working over tor
hidden services, at least enough to do git push/pull.
The free monad was split into two, one for network operations and the
other for local (Annex) operations. This will allow git-remote-tor-annex
to run only an IO action, not needing the Annex monad.
This commit was sponsored by Remy van Elst on Patreon.
For use with tor hidden services, and perhaps other transports later.
Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.
Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.
This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
git-annex.cabal: Loosen bounds on persistent to allow 2.5, which on Debian
has been patched to work with esqueleto. This may break cabal's resolver on
non-Debian systems; if so, either use stack to build, or run cabal with
--constraint='persistent ==2.2.4.1' Hopefully this mess with esqueleto will
be resolved soon.
https://github.com/prowdsponsor/esqueleto/issues/137
I've long considered the XMPP support in git-annex a wart.
It's nice to remove it.
(This also removes the NetMessager, which was only used for XMPP, and the
daemonstatus's desynced list (likewise).)
Existing XMPP remotes should be ignored by git-annex.
This commit was sponsored by Brock Spratlen on Patreon.
Tor unfortunately does not come out of the box configured to let hidden
services register themselves on the fly via the ControlPort.
And, changing the config to enable the ControlPort and a particular type
of auth for it may break something already using the ControlPort, or
lessen the security of the system.
So, this leaves only one option to us: Add a hidden service to the
torrc. git-annex enable-tor does so, and picks an unused high port for
tor to listen on for connections to the hidden service.
It's up to the caller to somehow pick a local port to listen on
that won't be used by something else. That may be difficult to do..
This commit was sponsored by Jochen Bartl on Patreon.
This gets rid of quite a lot of ugly hacks around json generation.
I doubt that any real-world json parsers can parse incomplete objects, so
while it's not as nice to need to wait for the complete object, especially
for commands like `git annex info` that take a while, it doesn't seem worth
the added complexity.
This also causes the order of fields within the json objects to be
reordered. Since any real json parser shouldn't care, the only possible
problem would be with ad-hoc parsers of the old json output.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
Simplify Solver's task by requesting version 2.2.4.1 of the persistent
package instead of just providing the persistent < 2.5 constraint.
With only the persistent < 2.5 constraint, and with --flags=s3\ webapp
and --max-backjumps=10000, CI timed out after two hours with Solver
still trying to find a solution.
This is a follow-up to 18e458db, since there's been a regression in the
situation between 6.20160619 and 6.20160808, probably simply because
Hackage is a moving target.
Use nextRandom to generate the random UUID, rather than using randomIO.
This gets fixes for the following two bugs in the uuid library.
However, this did not impact git-annex much, so a hard depedency has
not been added on uuid-1.3.12.
https://github.com/aslatter/uuid/issues/15
"v4 UUIDs are not that random"
This doesn't greatly affect git-annex, because even with only
2^64 possible UUIDs, the chance that two git-annex repositories
that are clones of the same git repo get the same UUID is miniscule.
And, git-annex generates only one UUID per run, so preducting
subsequent UUIDs is not a problem.
https://github.com/aslatter/uuid/issues/16
"Remove Random instance for UUID, or mark it as deprecated"
git-annex was using that instance; let's stop before it gets
deprecated or removed.
metadata --json output format has changed, adding a inner json object
named "fields" which contains only the fields and their values.
This should be easier to parse than the old format, which mixed up
metadata fields with other keys in the json object.
Any consumers of the old format will need to be updated.
This adds a dependency on unordered-containers for parsing MetaData
from JSON, but it's a free dependency; aeson pulls in that library.
This actually runs faster than building the man pages from the makefile
did. But the main purpose is to let Setup.hs import Build.Mans and so not
need the makefile.
The tarball on hackage will include only the files needed for cabal install;
it is NOT the full git-annex source tree. While it's totally obnoxious that
cabal files need every file listed out when basic wildcard support could
avoid hundreds of lines, and have to be maintained when files are added,
this does get the tarball size back down to 1 mb.
This also stops stack from complaining that it found modules not listed in
the cabal file.
debian/changelog, debian/NEWS, debian/copyright: Converted to symlinks
to CHANGELOG, NEWS, and COPYRIGHT, which used to symlink to these instead.
This avoids needing to include debian/ in the hackage tarball.
Setup.hs: Build man pages at install time using make and mdwn2man.
If it fails, which it probably will on windows, just skip installing
them.
According to https://github.com/redneb/disk-free-space/issues/3 ,
disk-free-space should be at least as portable as my homegrown code was.
One change I noticed is, getDiskSize was not implemented for windows
in the old code, and should work now.
* Removed the webapp-secure build flag, rolling it into the webapp build
flag.
* Removed the quvi and tahoe build flags, which only adds aeson to
the core dependencies.
* Removed the feed build flag, which only adds feed to the core
dependencies.
Build flags have cost in both code complexity and also make Setup configure
have to work harder to find a usable set of build flags when some
dependencies are missing.
The benchmark shows that the database access is quite fast indeed!
And, it scales linearly to the number of keys, with one exception,
getAssociatedKey.
Based on this benchmark, I don't think I need worry about optimising
for cases where all files are locked and the database is mostly empty.
In those cases, database access will be misses, and according to this
benchmark, should add only 50 milliseconds to runtime.
(NB: There may be some overhead to getting the database opened and locking
the handle that this benchmark doesn't see.)
joey@darkstar:~/src/git-annex>./git-annex benchmark
setting up database with 1000
setting up database with 10000
benchmarking keys database/getAssociatedFiles from 1000 (hit)
time 62.77 μs (62.70 μs .. 62.85 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 62.81 μs (62.76 μs .. 62.88 μs)
std dev 201.6 ns (157.5 ns .. 259.5 ns)
benchmarking keys database/getAssociatedFiles from 1000 (miss)
time 50.02 μs (49.97 μs .. 50.07 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 50.09 μs (50.04 μs .. 50.17 μs)
std dev 206.7 ns (133.8 ns .. 295.3 ns)
benchmarking keys database/getAssociatedKey from 1000 (hit)
time 211.2 μs (210.5 μs .. 212.3 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 211.0 μs (210.7 μs .. 212.0 μs)
std dev 1.685 μs (334.4 ns .. 3.517 μs)
benchmarking keys database/getAssociatedKey from 1000 (miss)
time 173.5 μs (172.7 μs .. 174.2 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 173.7 μs (173.0 μs .. 175.5 μs)
std dev 3.833 μs (1.858 μs .. 6.617 μs)
variance introduced by outliers: 16% (moderately inflated)
benchmarking keys database/getAssociatedFiles from 10000 (hit)
time 64.01 μs (63.84 μs .. 64.18 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 64.85 μs (64.34 μs .. 66.02 μs)
std dev 2.433 μs (547.6 ns .. 4.652 μs)
variance introduced by outliers: 40% (moderately inflated)
benchmarking keys database/getAssociatedFiles from 10000 (miss)
time 50.33 μs (50.28 μs .. 50.39 μs)
1.000 R² (1.000 R² .. 1.000 R²)
mean 50.32 μs (50.26 μs .. 50.38 μs)
std dev 202.7 ns (167.6 ns .. 252.0 ns)
benchmarking keys database/getAssociatedKey from 10000 (hit)
time 1.142 ms (1.139 ms .. 1.146 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.142 ms (1.140 ms .. 1.144 ms)
std dev 7.142 μs (4.994 μs .. 10.98 μs)
benchmarking keys database/getAssociatedKey from 10000 (miss)
time 1.094 ms (1.092 ms .. 1.096 ms)
1.000 R² (1.000 R² .. 1.000 R²)
mean 1.095 ms (1.095 ms .. 1.097 ms)
std dev 4.277 μs (2.591 μs .. 7.228 μs)
Make these features solely dependent on the OS being built on.
This lets stack build on windows w/o XMPP, on OSX w/o DBUS,
and on Linux with everything.
It was failing at link time, some problem with terminatePID.
Re-implemented that to not use a C wrapper function, which cleared up the
problem. Removed old EvilLinker hack with must have been related to the
same problem.
Note that I have not tested this with older ghc's. In
f11f7520b5 I mention having tried this
approach before, and getting segfaults.. So, who knows. It seems to work
fine with ghc 7.10 at least.
While cryptohash has SHA3 support, it has not been updated for the final
version of the spec. Note that cryptonite has not been ported to all arches
that cryptohash builds on yet.
This caused problems building with stackage and ghc 7.6, since cabal
assumes the flag means it wants a newer time. Since time is bundled with
ghc, stackage doesn't pin it.
Since this flag was only there to avoid a dep on old-locale, which is
currently bundled with ghc, let's remove the flag.
This is a work in progress. It compiles and is able to do basic command
dispatch, including git autocorrection, while using optparse-applicative
for the core commandline parsing.
* Many commands are temporarily disabled before conversion.
* Options are not wired in yet.
* cmdnorepo actions don't work yet.
Also, removed the [Command] list, which was only used in one place.
This reverts commit cf650eaa99.
It's too early to do this; the linux and android autobuilder will need to
be updated to use the new version of shakespeare, and that will require a
complete refresh of them. In the meantime, this has knocked the webapp out
of the autobuilders.
This is a nearly free feature; it piggybacks on the location log lookups
done for the numcopies stats. So, the only extra overhead is updating
the map of repository sizes.
However, I had to switch to Data.Map.Strict, which needs containers 0.5.
If backporting to wheezy, will probably need to revert this commit.
Did not keep backwards compat for sticky bit records. An incremental fsck
that is already in progress will start over on upgrade to this version.
This is not yet ready for merging. The autobuilders need to have sqlite
installed.
Also, interrupting a fsck --incremental does not commit the database.
So, resuming with fsck --more restarts from beginning.
Memory: Constant during a fsck of tens of thousands of files.
(But, it does seem to buffer whole transation in memory, so
may really scale with number of files.)
CPU: ?
With the following flags:
$ cabal configure --ghc --prefix=/usr --with-compiler=/usr/bin/ghc --with-hc-pkg=/usr/bin/ghc-pkg --prefix=/usr --libdir=/usr/lib64 --libsubdir=git-annex-5.20141203/ghc-7.8.3.20141119 --datadir=/usr/share/ --datasubdir=git-annex-5.20141203/ghc-7.8.3.20141119 --ghc-option=-O2 --ghc-option=+RTS --ghc-option=-H64M --ghc-option=-M4G --ghc-option=-RTS --ghc-option=-O0 --ghc-option=-j4 --ghc-option=-optl-Wl,-O1 --ghc-option=-optl-Wl,--as-needed --ghc-option=-optl-Wl,--hash-style=gnu --disable-executable-stripping --docdir=/usr/share/doc/git-annex-5.20141203 --verbose --sysconfdir=/etc --disable-library-stripping --flags=-android --flags=-androidsplice --flags=-assistant --flags=cryptohash --flags=dbus --flags=-desktop-notify --flags=dns --flags=-ekg --flags=-feed --flags=-inotify --flags=pairing --flags=production --flags=-quvi --flags=s3 --flags=tahoe --flags=tdfa --flags=-testsuite --flags=-webapp --flags=-webapp-secure --flags=-webdav --flags=-xmpp
ghc detects missing module (used directly by Remote.S3):
Remote/Helper/Http.hs:16:8:
Could not find module ‘Network.HTTP.Client’
It is a member of the hidden package ‘http-client-0.3.8.2’.
Perhaps you need to add ‘http-client’ to the build-depends in your .cabal file.
Use -v to see a list of the files searched for.
Signed-off-by: Sergei Trofimovich <siarheit@google.com>
Now that deps are sorted out in hackage, cabal is unlikely to try to
install a too old AWS, so I don't think this flag is worth the bother of
being completely correct with the dependency versioning.
This avoids me needing to enable to flag on the autobuilders..
I'm a little stuck on getting the list of etags of the parts.
This seems to require taking the md5 of each part locally,
which doesn't get along well with lazily streaming in the part from the
file. It would need to read the file twice, or lose laziness and buffer a
whole part -- but parts might be quite large.
This seems to be a problem with the API provided; S3 is supposed to return
an etag, but that is not exposed. I have filed a bug:
https://github.com/aristidb/aws/issues/141
Didn't know that this library existed!
This includes making git-annex not re-exec itself on start on windows, and
making the test suite on Windows run tests without forking.
This needs optparse-applicative 0.10. Dropped support for 0.9 and older,
but kept 0.9.1 working since autobuilders and debian testing still use it.
(The display is not perfect with 0.9.1.)
This is needed only because of the new MonadMask needed for bracket
in the new version. Ifdefing it everywhere is not practical, since the
Setup.hs uses it.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
This opens up the possibility of using http-conduit for large file
downloads, but for now I've left it using wget/curl.
This commit was sponsored by Paul Tötterman.
The hoary old HTTP library was only used when checking if an url exists,
when curl was not available. It had many problems, including not supporting
https at all.
Now, this is done using http-conduit for all urls that it supports. Falls
back to curl for any url that http-conduit doesn't like (probably ftp etc,
but could also be an url that its parser chokes on for whatever reason).
This adds a new dependency on http-conduit, but webdav support already
indirectly depended on that, and the s3-aws branch also uses it.
Currently, initremote works, but not the other operations. They should be
fairly easy to add from this base.
Also, https://github.com/aristidb/aws/issues/119 blocks internet archive
support.
Note that since http-conduit is used, this also adds https support to S3.
Although git-annex encrypts everything anyway, so that may not be extremely
useful. It is not enabled by default, because existing S3 special remotes
have port=80 in their config. Setting port=443 will enable it.
This commit was sponsored by Daniel Brockman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
This speeds up the webdav special remote somewhat, since it often now
groups actions together in a single http connection when eg, storing a
file.
Legacy chunks are still supported, but have not been sped up.
This depends on a as-yet unreleased version of DAV.
This commit was sponsored by Thomas Hochstein.
This only performs some basic tests so far; no testing of chunking or
resuming. Also, the existing encryption type of the remote is used; it
would be good later to derive an encrypted and a non-encrypted version of
the remote and test them both.
This commit was sponsored by Joseph Liu.
This version of wai changed the type of Middleware, so I cannot seem
to liftIO inside it. So, got rid of a lot of not really needed
complexity to use System.Log.Logger's logging stuff, and just use
the standard wai stdout logger when debug logging is enabled.
Format may change some, and it logs http to stdout instead of stderr
now. Doesn't matter for the webapp since both go to the same log anyway.
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.
This commit was sponsored by THM Schoemaker.
Debian stable's warp-tls is too old to support the new https feature well,
so only use http with that old version.
Note that the webapp still depends on warp-tls, because the TLSSettings
type is used.
The ctrl-c hack used before didn't actually seem to work.
No haskell libraries expose TerminateProcess. I tried just calling it via
FFI, but got segfaults, probably to do with the wacky process handle not
being managed correctly. Moving it all into one C function worked.
This was hell. The EvilLinker hack was just final icing on the cake.
We all know what the cake was made of.
Known problems:
1. Tries to tahoe start when daemon is already running.
2. If multiple tahoe remotes are set up on the same computer,
they will have the same node.url configured by default,
and this confuses tahoe commands.
This commit was sponsored by LeastAuthority.com
Fixed up a number of things that had worked around there not being a way to
get that.
Most notably, transfer info files on windows now include the process id,
since no locking is currently done. This means the file format varies
between windows and unix.
The gcc response file should make it build with webdav (fingers crossed).
webapp is waiting on a haskell platform upgrade on the autobuilder.
Current one has a too old version of network for hxt to install.
This used to work, but now hsc2hs is failing with a usage message.
Since I have not changed my windows build environment at all, it must be
some change due to a change in the cabal file. Perhaps too make flags are
causing it to hit a windows command line length limit?
Anyway, these hsc files did nothing on Windows, so can be omitted and not
built to work around yet another epic windows weirdness.
Thought was that this would be faster than a map, since a vector can be
updated more efficiently. It turns out to not seem to matter; runtime and
memory usage are basically identical.
This is a massive win on OSX, which doesn't have a sha256sum normally.
Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.
SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.
Used the following benchmark to arrive at the 1 mb number.
1 mb file:
benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
4 (4.0%) high mild
1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
2 (2.0%) high mild
1 (1.0%) high severe
2 mb file:
benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common
testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk
main = defaultMain
[ bgroup "sha256"
[ bench "internal" $ whnfIO internal
, bench "external" $ whnfIO external
]
]
sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy
internal :: IO String
internal = show . sha256 <$> L.readFile testfile
external :: IO String
external = do
s <- readProcess "sha256sum" [testfile]
return $ fst $ separate (== ' ') s