This is a massive win on OSX, which doesn't have a sha256sum normally.
Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.
SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.
Used the following benchmark to arrive at the 1 mb number.
1 mb file:
benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
4 (4.0%) high mild
1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
2 (2.0%) high mild
1 (1.0%) high severe
2 mb file:
benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common
testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk
main = defaultMain
[ bgroup "sha256"
[ bench "internal" $ whnfIO internal
, bench "external" $ whnfIO external
]
]
sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy
internal :: IO String
internal = show . sha256 <$> L.readFile testfile
external :: IO String
external = do
s <- readProcess "sha256sum" [testfile]
return $ fst $ separate (== ' ') s
Now the webapp can generate a gpg key that is dedicated for use by
git-annex. Since the key is single use, much of the complexity of
generating gpg keys is avoided.
Note that the key has no password, because gpg-agent is not available
everywhere the assistant is installed. This is not a big security problem
because the key is going to live on the same disk as the git annex
repository, so an attacker with access to it can look directly in the
repository to see the same files that get stored in the encrypted
repository on the removable drive.
There is no provision yet for backing up keys.
This commit sponsored by Robert Beaty.
Note that Utility.Format.prop_idempotent_deencode does not hold
now that hex escaped characters are supported. quickcheck fails to notice
this, so I have left it as-is for now.
Cipher is now a datatype
data Cipher = Cipher String | MacOnlyCipher String
which makes more precise its interpretation MAC-only vs. MAC + used to
derive a key for symmetric crypto.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".
/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\
Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
Having one module that knows about all the filenames used on the branch
allows working back from an arbitrary filename to enough information about
it to implement dropping dead remotes and doing other log file compacting
as part of a forget transition.
/!\ It is to be noted that revoking a key does NOT necessarily prevent
the owner of its private part from accessing data on the remote /!\
The only sound use of `keyid-=` is probably to replace a (sub-)key by
another, where the private part of both is owned by the same
person/entity:
git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C
Reference: http://git-annex.branchable.com/bugs/Using_a_revoked_GPG_key/
* Other change introduced by this patch:
New keys now need to be added with option `keyid+=`, and the scheme
specified (upon initremote only) with `encryption=`. The motivation for
this change is to open for new schemes, e.g., strict asymmetric
encryption.
git annex initremote myremote encryption=hybrid keyid=2512E3C7
git annex enableremote myremote keyid+=788A3F4C
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.
Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.
Credits to joeyh for the algorithm. :-)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.
web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.
This commit was sponsored by Mark Hepburn. Thanks!
<RichiH> i richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % rm /home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % git annex get P8060008.JPG
<RichiH> get P8060008.JPG (from website...) --2013-08-21 21:42:45-- http://annex.debconf.org/debconf-share/.git//annex/objects/1a4/67d/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> Resolving annex.debconf.org (annex.debconf.org)... 5.153.231.227, 2001:41c8:1000:19::227:2
<RichiH> Connecting to annex.debconf.org (annex.debconf.org)|5.153.231.227|:80... connected.
<RichiH> HTTP request sent, awaiting response... 404 Not Found
<RichiH> 2013-08-21 21:42:45 ERROR 404: Not Found.
<RichiH> File `/home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG' already there; not retrieving.
<RichiH> Unable to access these remotes: website
<RichiH> Try making some of these repositories available:
<RichiH> 3e0356ac-0743-11e3-83a5-1be63124a102 -- website (annex.debconf.org)
<RichiH> a7495021-9f2d-474e-80c7-34d29d09fec6 -- chrysn@hephaistos:~/data/projects/debconf13/debconf-share
<RichiH> eb8990f7-84cd-4e6b-b486-a5e71efbd073 -- joeyh passport usb drive
<RichiH> f415f118-f428-4c68-be66-c91501da3a93 -- joeyh laptop
<RichiH> failed
<RichiH> git-annex: get: 1 failed
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn %
I was not able to reproduce the failure, but I did reproduce that
wget -O http://404/ results in an empty file being written.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!
The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.
This needs to be reverted as soon as a fix is available in git!
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"
This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.
Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)
The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.
This commit was sponsored by Otavio Salvador. Thanks!
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..
Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.
Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.
A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.
Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.
Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.
There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.
This commit was sponsored by Alexander Brem. Thanks!
The icon files will be installed when running make install or cabal
install. Did not try to run update-icon-caches, since I think it's debian
specific, and dh_icons will take care of that for the Debian package.
Using the favicon as a 16x16 icon. At 24x24 the svg displays pretty well,
although the dotted lines are rather faint. The svg is ok at all higher
resolutions.
The standalone linux build auto-installs the desktop and autostart files
when run. I have not made it auto-install the icon file too, because
a) that would take more work to include them in the tarball and find them
b) it would need to be an install to ~/.icons/, and I don't know if that
really works!
This is a compromise. I would like to nice every thread except for the
webapp thread, but it's not practical to do so. That would need every
thread to run as a bound thread, which could add significant overhead.
And any forkIO would escape the nice level.