Commit graph

543 commits

Author SHA1 Message Date
Joey Hess
93dbb7842e watcher: Detect at startup time when there is a stale .git/lock, and remove it so it does not interfere with the automatic commits of changed files. 2013-10-03 16:57:21 -04:00
Joey Hess
20fb905bb6 allow building w/o cryptohash
Mostly for the debian stable autobuilds, which have a too old version
to use the Crypto.Hash module.
2013-10-03 12:33:38 -04:00
Joey Hess
547a18019f ensure that hash representations don't change in future 2013-10-01 21:11:47 -04:00
Joey Hess
a05b763b01 Added SKEIN256 and SKEIN512 backends
SHA3 is still waiting for final standardization.
Although this is looking less likely given
https://www.cdt.org/blogs/joseph-lorenzo-hall/2409-nist-sha-3

In the meantime, cryptohash implements skein, and it's used by some of the
haskell ecosystem (for yesod sessions, IIRC), so this implementation is
likely to continue working. Also, I've talked with the cryprohash author
and he's a reasonable guy.

It makes sense to have an alternate high security hash, in case some
horrible attack is found against SHA2 tomorrow, or in case SHA3 comes out
and worst fears are realized.

I'd also like to support using skein for HMAC. But no hurry there and
a new version of cryptohash has much nicer HMAC code, so I will probably
wait until I can use that version.
2013-10-01 20:34:36 -04:00
Joey Hess
6b37fcffd8 assistant: More robust inotify handling; avoid crashing if a directory cannot be read. 2013-09-30 13:11:26 -04:00
Joey Hess
12f6b9693a Send a git-annex user-agent when downloading urls.
Overridable with --user-agent option.

Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.

This commit sponsored by Michael Zehrer.
2013-09-28 14:35:21 -04:00
Joey Hess
57d49a6d04 remove *>=> and >=*> ; use <$$> instead
I forgot I had <$$> hidden away in Utility.Applicative.
It allows doing the same kind of currying as does >=*>
and I found using it made the code more readable for me.

(*>=> was not used)
2013-09-27 19:58:48 -04:00
Joey Hess
c6032b0dab clean up some ugly code 2013-09-27 19:52:36 -04:00
Joey Hess
98fc7e8a19 add, import, assistant: Better preserve the mtime of symlinks, when when adding content that gets deduplicated.
Note that this turned out to remove a syscall, not add any expense.
Otherwise, I would not have done it.
2013-09-25 16:07:11 -04:00
Joey Hess
7390f08ef9 Use cryptohash rather than SHA for hashing.
This is a massive win on OSX, which doesn't have a sha256sum normally.

Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.

SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.

Used the following benchmark to arrive at the 1 mb number.

1 mb file:

benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high mild
  1 (1.0%) high severe

2 mb file:

benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)

import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common

testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk

main = defaultMain
        [ bgroup "sha256"
                [ bench "internal" $ whnfIO internal
                , bench "external" $ whnfIO external
                ]
        ]

sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy

internal :: IO String
internal = show . sha256 <$> L.readFile testfile

external :: IO String
external = do
	s <- readProcess "sha256sum" [testfile]
        return $ fst $ separate (== ' ') s
2013-09-22 20:06:02 -04:00
Joey Hess
9de189e788 webapp gpg key generation
Now the webapp can generate a gpg key that is dedicated for use by
git-annex. Since the key is single use, much of the complexity of
generating gpg keys is avoided.

Note that the key has no password, because gpg-agent is not available
everywhere the assistant is installed. This is not a big security problem
because the key is going to live on the same disk as the git annex
repository, so an attacker with access to it can look directly in the
repository to see the same files that get stored in the encrypted
repository on the removable drive.

There is no provision yet for backing up keys.

This commit sponsored by Robert Beaty.
2013-09-17 15:36:15 -04:00
Joey Hess
26baae8967 fix build with haskell DNS 1.0.0 2013-09-17 11:54:09 -04:00
Joey Hess
7936cc646d gpg secret key generation 2013-09-16 13:22:43 -04:00
Joey Hess
e4290c61d7 gpg secret keys list parsing
Note that Utility.Format.prop_idempotent_deencode does not hold
now that hex escaped characters are supported. quickcheck fails to notice
this, so I have left it as-is for now.
2013-09-16 12:57:39 -04:00
Joey Hess
b33bddd753 fix comment 2013-09-07 19:08:28 -04:00
Joey Hess
0a2f5f3993 gpg: Force --no-textmode in case the user has it turned on in config. 2013-09-07 13:06:36 -04:00
Joey Hess
cbc5aa623d fix windows build 2013-09-06 17:05:41 -04:00
guilhem
ac9807c887 Leverage an ambiguities between Ciphers
Cipher is now a datatype

    data Cipher = Cipher String | MacOnlyCipher String

which makes more precise its interpretation MAC-only vs. MAC + used to
derive a key for symmetric crypto.
2013-09-05 11:09:08 -04:00
Joey Hess
08f026e886 keep Utility.Gpg free of dependencies on git-annex 2013-09-04 23:16:33 -04:00
Joey Hess
2fcae0348f Merge branch 'master' into encryption 2013-09-04 18:08:47 -04:00
guilhem
8293ed619f Allow public-key encryption of file content.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".

/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\

Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
2013-09-03 14:34:16 -04:00
Joey Hess
62beaa1a86 refactor git-annex branch log filename code into central location
Having one module that knows about all the filenames used on the branch
allows working back from an arbitrary filename to enough information about
it to implement dropping dead remotes and doing other log file compacting
as part of a forget transition.
2013-08-29 19:13:00 -04:00
guilhem
53ce59021a Allow revocation of OpenPGP keys.
/!\ It is to be noted that revoking a key does NOT necessarily prevent
the owner of its private part from accessing data on the remote /!\

The only sound use of `keyid-=` is probably to replace a (sub-)key by
another, where the private part of both is owned by the same
person/entity:

    git annex enableremote myremote keyid-=2512E3C7 keyid+=788A3F4C

Reference: http://git-annex.branchable.com/bugs/Using_a_revoked_GPG_key/

* Other change introduced by this patch:

New keys now need to be added with option `keyid+=`, and the scheme
specified (upon initremote only) with `encryption=`. The motivation for
this change is to open for new schemes, e.g., strict asymmetric
encryption.

    git annex initremote myremote encryption=hybrid keyid=2512E3C7
    git annex enableremote myremote keyid+=788A3F4C
2013-08-29 14:31:33 -04:00
guilhem
f15fda60ed Speed up the 'unused' command.
Instead of populating the second-level Bloom filter with every key
referenced in every Git reference, consider only those which differ
from what's referenced in the index.

Incidentaly, unlike with its old behavior, staged
modifications/deletion/... will now be detected by 'unused'.

Credits to joeyh for the algorithm. :-)
2013-08-25 21:02:13 -04:00
Joey Hess
de58067785 better error message 2013-08-22 21:12:41 -04:00
Joey Hess
07d172e01d cleanup 2013-08-22 18:56:08 -04:00
Joey Hess
46b6d75274 Youtube support! (And 53 other video hosts)
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.

web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.

This commit was sponsored by Mark Hepburn. Thanks!
2013-08-22 18:50:43 -04:00
Joey Hess
d603f536bd Set --clobber when running wget to ensure resuming works properly. 2013-08-21 18:19:01 -04:00
Joey Hess
0912e752b5 Revert "Delete empty downloaded file when wget fails, to work around reported resume failure."
This reverts commit 98886e3fbf.

Better fix forthcoming
2013-08-21 18:17:48 -04:00
Joey Hess
98886e3fbf Delete empty downloaded file when wget fails, to work around reported resume failure.
<RichiH> i richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % rm /home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn % git annex get P8060008.JPG
<RichiH> get P8060008.JPG (from website...) --2013-08-21 21:42:45--  http://annex.debconf.org/debconf-share/.git//annex/objects/1a4/67d/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG
<RichiH> Resolving annex.debconf.org (annex.debconf.org)... 5.153.231.227, 2001:41c8:1000:19::227:2
<RichiH> Connecting to annex.debconf.org (annex.debconf.org)|5.153.231.227|:80... connected.
<RichiH> HTTP request sent, awaiting response... 404 Not Found
<RichiH> 2013-08-21 21:42:45 ERROR 404: Not Found.
<RichiH> File `/home/richih/work/git/debconf-share/.git/annex/tmp/SHA256E-s3044235--693b74fcb12db06b5e79a8b99d03e2418923866506ee62d24a4e9ae8c5236758.JPG' already there; not retrieving.
<RichiH>   Unable to access these remotes: website
<RichiH>   Try making some of these repositories available:
<RichiH>    3e0356ac-0743-11e3-83a5-1be63124a102 -- website (annex.debconf.org)
<RichiH>     a7495021-9f2d-474e-80c7-34d29d09fec6 -- chrysn@hephaistos:~/data/projects/debconf13/debconf-share
<RichiH>     eb8990f7-84cd-4e6b-b486-a5e71efbd073 -- joeyh passport usb drive
<RichiH>     f415f118-f428-4c68-be66-c91501da3a93 -- joeyh laptop
<RichiH> failed
<RichiH> git-annex: get: 1 failed
<RichiH> richih@eudyptes (git)-[master] ~git/debconf-share/debconf13/photos/chrysn %

I was not able to reproduce the failure, but I did reproduce that
wget -O http://404/ results in an empty file being written.
2013-08-21 16:05:51 -04:00
Joey Hess
a3224ce35b avoid more build warnings on Windows 2013-08-04 14:05:36 -04:00
Joey Hess
a837ed47f7 Windows: Added support for encrypted special remotes. 2013-08-04 13:03:34 -04:00
Joey Hess
0a52f02f8e fix syntax 2013-08-02 12:42:14 -04:00
Joey Hess
da012e1eeb fix Windows breakage 2013-08-02 12:37:45 -04:00
Joey Hess
93f2371e09 get rid of __WINDOWS__, use mingw32_HOST_OS
The latter is harder for me to remember, but avoids build failures in code
used by the configure program.
2013-08-02 12:27:32 -04:00
Joey Hess
d16114d024 Slow and ugly work around for bug #718517 in git, which broke git-cat-file --batch for filenames containing spaces.
This runs git-cat-file in non-batch mode for all files with spaces.
If a directory tree has a lot of them, and is in direct mode, even "git
annex add" when there are few new files will need a *lot* of forks!

The only reason buffering the whole file content to get the sha is not a
memory leak is that git-annex only ever uses this on symlinks.

This needs to be reverted as soon as a fix is available in git!
2013-08-01 17:30:47 -04:00
Joey Hess
ebd778c519 Escape ':' in file/directory names to avoid it being treated as a pathspec by some git commands
A git pathspec is a filename, except when it starts with ':', it's taken
to refer to a branch, etc. Rather than special case ':', any filename
starting with anything unusual is prefixed with "./"

This could have been a real mess to deal with, but luckily SafeCommand
is already extensively used and so we know at the type level the difference
between parameters that are files, and parameters that are command options.

Testing did show that Git.Queue was not using SafeCommand on
filenames fed to xargs. (Filenames starting with '-' worked before only
because -- was used to separate filenames from options when calling eg git
add.)

The test suite now passes with filenames starting with ':'. However, I did
not keep that change to it, because such filenames are probably not legal
on windows, and I have enough ugly windows ifdefs in there as it is.

This commit was sponsored by Otavio Salvador. Thanks!
2013-08-01 15:15:49 -04:00
Joey Hess
ddd46db09a Fix a few bugs involving filenames that are at or near the filesystem's maximum filename length limit.
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..

Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.

Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.

A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.

Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.

Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.

There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.

This commit was sponsored by Alexander Brem. Thanks!
2013-07-30 19:18:29 -04:00
Joey Hess
7e66d260ea importfeed: git-annex becomes a podcatcher in 150 LOC 2013-07-28 16:55:42 -04:00
Joey Hess
4c164c8c70 cleanup 2013-07-20 20:56:04 -04:00
Joey Hess
9c68e06276 refactor and unify code
This fixes several bugs in both modules.
2013-07-19 19:39:14 -04:00
Joey Hess
b8da297c77 pluralize 1.1 kilobytes etc 2013-07-19 14:05:27 -04:00
Joey Hess
647a938f6a Display byte sizes with more precision. 2013-07-19 12:21:44 -04:00
Joey Hess
8b644e74fa catch does not exist error when adding a watch
This could be thrown due to eg, the directory being moved or deleted, so
the error should not be propigated.
2013-07-17 15:32:24 -04:00
Joey Hess
00e6663128 linux standalone auto-install icons 2013-07-09 20:50:41 -04:00
Joey Hess
80b390560e install to ~/.local/icons, not ~/icons
Apparently the Icon Theme Specification no longer matches reality,
as implemented by XFCE and xdg-icon-resource.
2013-07-09 20:16:07 -04:00
Joey Hess
19b8bcbe30 Install XDG desktop icon files.
The icon files will be installed when running make install or cabal
install. Did not try to run update-icon-caches, since I think it's debian
specific, and dh_icons will take care of that for the Debian package.

Using the favicon as a 16x16 icon. At 24x24 the svg displays pretty well,
although the dotted lines are rather faint. The svg is ok at all higher
resolutions.

The standalone linux build auto-installs the desktop and autostart files
when run. I have not made it auto-install the icon file too, because
a) that would take more work to include them in the tarball and find them
b) it would need to be an install to ~/.icons/, and I don't know if that
   really works!
2013-07-09 19:56:30 -04:00
Joey Hess
c8936038dc reorg 2013-07-08 14:51:43 -04:00
Joey Hess
79e1a0c571 Pass -f to curl when downloading a file with it, so it propigates failure. 2013-07-06 00:55:00 -04:00
Joey Hess
86b845ff8d Windows: Look for .exe extension when searching for a command in path. 2013-07-06 00:48:47 -04:00