Commit graph

534 commits

Author SHA1 Message Date
Joey Hess
7390f08ef9 Use cryptohash rather than SHA for hashing.
This is a massive win on OSX, which doesn't have a sha256sum normally.

Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.

SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.

Used the following benchmark to arrive at the 1 mb number.

1 mb file:

benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high mild
  1 (1.0%) high severe

2 mb file:

benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)

import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common

testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk

main = defaultMain
        [ bgroup "sha256"
                [ bench "internal" $ whnfIO internal
                , bench "external" $ whnfIO external
                ]
        ]

sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy

internal :: IO String
internal = show . sha256 <$> L.readFile testfile

external :: IO String
external = do
	s <- readProcess "sha256sum" [testfile]
        return $ fst $ separate (== ' ') s
2013-09-22 20:06:02 -04:00
Joey Hess
9de189e788 webapp gpg key generation
Now the webapp can generate a gpg key that is dedicated for use by
git-annex. Since the key is single use, much of the complexity of
generating gpg keys is avoided.

Note that the key has no password, because gpg-agent is not available
everywhere the assistant is installed. This is not a big security problem
because the key is going to live on the same disk as the git annex
repository, so an attacker with access to it can look directly in the
repository to see the same files that get stored in the encrypted
repository on the removable drive.

There is no provision yet for backing up keys.

This commit sponsored by Robert Beaty.
2013-09-17 15:36:15 -04:00
Joey Hess
26baae8967 fix build with haskell DNS 1.0.0 2013-09-17 11:54:09 -04:00
Joey Hess
7936cc646d gpg secret key generation 2013-09-16 13:22:43 -04:00
Joey Hess
e4290c61d7 gpg secret keys list parsing
Note that Utility.Format.prop_idempotent_deencode does not hold
now that hex escaped characters are supported. quickcheck fails to notice
this, so I have left it as-is for now.
2013-09-16 12:57:39 -04:00
Joey Hess
b33bddd753 fix comment 2013-09-07 19:08:28 -04:00
Joey Hess
0a2f5f3993 gpg: Force --no-textmode in case the user has it turned on in config. 2013-09-07 13:06:36 -04:00
Joey Hess
cbc5aa623d fix windows build 2013-09-06 17:05:41 -04:00
guilhem
ac9807c887 Leverage an ambiguities between Ciphers
Cipher is now a datatype

    data Cipher = Cipher String | MacOnlyCipher String

which makes more precise its interpretation MAC-only vs. MAC + used to
derive a key for symmetric crypto.
2013-09-05 11:09:08 -04:00
Joey Hess
08f026e886 keep Utility.Gpg free of dependencies on git-annex 2013-09-04 23:16:33 -04:00
Joey Hess
2fcae0348f Merge branch 'master' into encryption 2013-09-04 18:08:47 -04:00
guilhem
8293ed619f Allow public-key encryption of file content.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".

/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\

Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
2013-09-03 14:34:16 -04:00
Joey Hess
62beaa1a86 refactor git-annex branch log filename code into central location
Having one module that knows about all the filenames used on the branch
allows working back from an arbitrary filename to enough information about
it to implement dropping dead remotes and doing other log file compacting
as part of a forget transition.
2013-08-29 19:13:00 -04:00
guilhem
53ce59021a Allow revocation of OpenPGP keys.
/!\ It is to be noted that revoking a key does NOT necessarily prevent
the owner of its private part from accessing data on the remote /!\

The only sound use of `keyid-=` is probably to replace a (sub-)key by
another, where the private part of both is owned by the same
person/entity:

    git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C

Reference: http://git-annex.branchable.com/bugs/Using_a_revoked_GPG_key/

* Other change introduced by this patch:

New keys now need to be added with option `keyid+=`, and the scheme
specified (upon initremote only) with `encryption=`. The motivation for
this change is to open for new schemes, e.g., strict asymmetric
encryption.

    git annex initremote myremote encryption=hybrid keyid=2512E3C7
    git annex enableremote myremote keyid+=788A3F4C
2013-08-29 14:31:33 -04:00
guilhem
f15fda60ed Speed up the 'unused' command.
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.

Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.

Credits to joeyh for the algorithm. :-)
2013-08-25 21:02:13 -04:00
Joey Hess
de58067785 better error message 2013-08-22 21:12:41 -04:00
Joey Hess
07d172e01d cleanup 2013-08-22 18:56:08 -04:00
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
d603f536bd Set --clobber when running wget to ensure resuming works properly. 2013-08-21 18:19:01 -04:00
Joey Hess
0912e752b5 Revert "Delete empty downloaded file when wget fails, to work around reported resume failure."
This reverts commit 98886e3fbf.

Better fix forthcoming
2013-08-21 18:17:48 -04:00
Joey Hess
98886e3fbf Delete empty downloaded file when wget fails, to work around reported resume failure.
<RichiH> i richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % rm /home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % git annex get P8060008.JPG
<RichiH> get P8060008.JPG (from website...) --2013-08-21 21:42:45--  http://annex.debconf.org/debconf-share/.git//annex/objects/1a4/67d/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> Resolving annex.debconf.org (annex.debconf.org)... 5.153.231.227, 2001:41c8:1000:19::227:2
<RichiH> Connecting to annex.debconf.org (annex.debconf.org)|5.153.231.227|:80... connected.
<RichiH> HTTP request sent, awaiting response... 404 Not Found
<RichiH> 2013-08-21 21:42:45 ERROR 404: Not Found.
<RichiH> File `/home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG' already there; not retrieving.
<RichiH>   Unable to access these remotes: website
<RichiH>   Try making some of these repositories available:
<RichiH>    3e0356ac-0743-11e3-83a5-1be63124a102 -- website (annex.debconf.org)
<RichiH>     a7495021-9f2d-474e-80c7-34d29d09fec6 -- chrysn@hephaistos:~/data/projects/debconf13/debconf-share
<RichiH>     eb8990f7-84cd-4e6b-b486-a5e71efbd073 -- joeyh passport usb drive
<RichiH>     f415f118-f428-4c68-be66-c91501da3a93 -- joeyh laptop
<RichiH> failed
<RichiH> git-annex: get: 1 failed
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn %

I was not able to reproduce the failure, but I did reproduce that
wget -O http://404/ results in an empty file being written.
2013-08-21 16:05:51 -04:00
Joey Hess
a3224ce35b avoid more build warnings on Windows 2013-08-04 14:05:36 -04:00
Joey Hess
a837ed47f7 Windows: Added support for encrypted special remotes. 2013-08-04 13:03:34 -04:00
Joey Hess
0a52f02f8e fix syntax 2013-08-02 12:42:14 -04:00
Joey Hess
da012e1eeb fix Windows breakage 2013-08-02 12:37:45 -04:00
Joey Hess
93f2371e09 get rid of __WINDOWS__, use mingw32_HOST_OS
The latter is harder for me to remember, but avoids build failures in code
used by the configure program.
2013-08-02 12:27:32 -04:00
Joey Hess
d16114d024 Slow and ugly work around for bug #718517 in git, which broke git-cat-file --batch for filenames containing spaces.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!

The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.

This needs to be reverted as soon as a fix is available in git!
2013-08-01 17:30:47 -04:00
Joey Hess
ebd778c519 Escape ':' in file/directory names to avoid it being treated as a pathspec by some git commands
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"

This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.

Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)

The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.

This commit was sponsored by Otavio Salvador. Thanks!
2013-08-01 15:15:49 -04:00
Joey Hess
ddd46db09a Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..

Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.

Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.

A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.

Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.

Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.

There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.

This commit was sponsored by Alexander Brem. Thanks!
2013-07-30 19:18:29 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
4c164c8c70 cleanup 2013-07-20 20:56:04 -04:00
Joey Hess
9c68e06276 refactor and unify code
This fixes several bugs in both modules.
2013-07-19 19:39:14 -04:00
Joey Hess
b8da297c77 pluralize 1.1 kilobytes etc 2013-07-19 14:05:27 -04:00
Joey Hess
647a938f6a Display byte sizes with more precision. 2013-07-19 12:21:44 -04:00
Joey Hess
8b644e74fa catch does not exist error when adding a watch
This could be thrown due to eg, the directory being moved or deleted, so
the error should not be propigated.
2013-07-17 15:32:24 -04:00
Joey Hess
00e6663128 linux standalone auto-install icons 2013-07-09 20:50:41 -04:00
Joey Hess
80b390560e install to ~/.local/icons, not ~/icons
Apparently the Icon Theme Specification no longer matches reality,
as implemented by XFCE and xdg-icon-resource.
2013-07-09 20:16:07 -04:00
Joey Hess
19b8bcbe30 Install XDG desktop icon files.
The icon files will be installed when running make install or cabal
install. Did not try to run update-icon-caches, since I think it's debian
specific, and dh_icons will take care of that for the Debian package.

Using the favicon as a 16x16 icon. At 24x24 the svg displays pretty well,
although the dotted lines are rather faint. The svg is ok at all higher
resolutions.

The standalone linux build auto-installs the desktop and autostart files
when run. I have not made it auto-install the icon file too, because
a) that would take more work to include them in the tarball and find them
b) it would need to be an install to ~/.icons/, and I don't know if that
   really works!
2013-07-09 19:56:30 -04:00
Joey Hess
c8936038dc reorg 2013-07-08 14:51:43 -04:00
Joey Hess
79e1a0c571 Pass -f to curl when downloading a file with it, so it propigates failure. 2013-07-06 00:55:00 -04:00
Joey Hess
86b845ff8d Windows: Look for .exe extension when searching for a command in path. 2013-07-06 00:48:47 -04:00
Joey Hess
3a843b165d fix a build failure on android 2013-06-27 15:25:28 -04:00
Joey Hess
ff4f008591 clean up build warnings with yesod 1.2, while still building with 1.1 2013-06-27 01:15:28 -04:00
Joey Hess
b44c978e2c webapp: Fix bug that caused the webapp to hang when built with yesod 1.2. 2013-06-27 00:01:31 -04:00
Joey Hess
579446aed4 assistant: Fix bug that prevented adding files written by gnucash, and more generally support adding hard links to files. However, other operations on hard links are still unsupported. 2013-06-26 12:30:37 -04:00
Joey Hess
2f9b0ba351 simpler ifdef for linux 2013-06-21 13:09:09 -04:00
Joey Hess
6e309b63f8 assistant: On Linux, the expensive transfer scan is run niced.
This is a compromise. I would like to nice every thread except for the
webapp thread, but it's not practical to do so. That would need every
thread to run as a bound thread, which could add significant overhead.
And any forkIO would escape the nice level.
2013-06-20 22:25:41 -04:00
Joey Hess
8830360829 fix regression test on windows 2013-06-18 13:08:28 -04:00
Joey Hess
287fb00163 make withQuietOutput work on Windows 2013-06-17 21:26:06 -04:00
Joey Hess
0527c74c0f assistant: In direct mode, objects are now only dropped when all associated files are unwanted. This avoids a repreated drop/get loop of a file that has a copy in an archive directory, and a copy not in an archive directory. (Indirect mode still has some buggy behavior in this area, since it does not keep track of associated files.) Closes: #712060 2013-06-15 14:44:43 -04:00