Commit graph

447 commits

Author SHA1 Message Date
Joey Hess
3e68c1c2fd add remote state logs
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.

GETSTATE and SETSTATE are added to the external special remote protocol.

Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.

The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.

This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.

This commit was sponsored by Daniel Hofer.
2014-01-03 16:35:57 -04:00
Joey Hess
f7727d2df1 Remotes can now be made read-only, by setting remote.<name>.annex-readonly 2014-01-02 13:12:32 -04:00
Joey Hess
8e3032df2d added GETWANTED, SETWANTED for Tobias's flickr remote
This was unexpectedly difficult because of a depdenency cycle. To parse a
preferred content expression involves several things that need to operate
on the list of remotes. Which needs Remote.External. The only way to avoid
this cycle (I tried breaking it at several points) was to skip parsing the
expression in SETWANTED.

That's sorta ok, because git-annex already has to deal with unparsable
preferred content expressions being stored, in order to handle eg,
upgrades. But I'm still not very happy that I cannot check it.

I feel this is a strong indication that I need to beware of further
bloating the special remote protocol interface.
2014-01-01 20:12:20 -04:00
Joey Hess
ed1fcab6d7 external special remote protocol: Added GETUUID. 2013-12-31 13:50:18 -04:00
Joey Hess
054e4f17e2 implement PREPARE-FAILURE for Tobias 2013-12-29 13:39:25 -04:00
Joey Hess
aa97a33dde better error messages when external special remote exits unexpectedly or is not in PATH 2013-12-27 17:14:44 -04:00
Joey Hess
445b7b41b9 add credential storage support for external special remotes & update example 2013-12-27 16:01:43 -04:00
Joey Hess
551573570f better protocol error message, indicate if the command was able to be parsed or was misplaced 2013-12-27 14:03:35 -04:00
Joey Hess
21342cae63 flush handle after writing message 2013-12-27 13:22:06 -04:00
Joey Hess
fa6f404a5f fix deadlock when state TMVar is empty 2013-12-27 13:17:22 -04:00
Joey Hess
9125a25738 defer SETSTATE and GETSTATE for now
TAHOE-LAFS may use these eventually, but that's TBD and none of git-annex's
own special remotes need that, except for the web special remote's urls.
2013-12-27 13:07:56 -04:00
Joey Hess
a7f3724e21 implement GETCONFIG and SETCONFIG
Changed protocol spec to make SETCONFIG only store it persistently when run
during INITREMOTE. I see no reason to support storing it persistently at
other times, and doing so would unnecessarily complicate the code.

Also, letting that be done would probably result in use for storing data that
doesn't really belong there, and special remote authors who don't
understand how the union merging works would probably be surprised the
results.
2013-12-27 12:37:23 -04:00
Joey Hess
91c9e98168 support encryption 2013-12-27 12:21:55 -04:00
Joey Hess
5d8ff64dc1 make --debug show transcript of special remote protocol messages 2013-12-27 03:10:00 -04:00
Joey Hess
3289155e28 don't send PREPARE before INITREMOTE
That complicated special remote programs, because they had to avoid making
PREPARE fail if some configuration is missing, because the remote might not
be initialized yet. Instead, complicate git-annex slightly by only sending
PREPARE immediately before some other request other than INITREMOTE (or
PREPARE of course).
2013-12-27 02:49:10 -04:00
Joey Hess
6d504b57e7 make some requests optional, simplify and future-proof protocol more 2013-12-27 02:11:06 -04:00
Joey Hess
6c565ec905 external special remotes mostly implemented (untested)
This has not been tested at all. It compiles!

The only known missing things are support for encryption, and for get/set
of special remote configuration, and of key state. (The latter needs
separate work to add a new per-key log file to store that state.)

Only thing I don't much like is that initremote needs to be passed both
type=external and externaltype=foo. It would be better to have just
type=foo

Most of this is quite straightforward code, that largely wrote itself given
the types. The only tricky parts were:

* Need to lock the remote when using it to eg make a request, because
  in theory git-annex could have multiple threads that each try to use
  a remote at the same time. I don't think that git-annex ever does
  that currently, but better safe than sorry.

* Rather than starting up every external special remote program when
  git-annex starts, they are started only on demand, when first used.
  This will avoid slowdown, especially when running fast git-annex query
  commands. Once started, they keep running until git-annex stops, currently,
  which may not be ideal, but it's hard to know a better time to stop them.

* Bit of a chicken and egg problem with caching the cost of the remote,
  because setting annex-cost in the git config needs the remote to already
  be set up. Managed to finesse that.

This commit was sponsored by Lukas Anzinger.
2013-12-26 18:23:13 -04:00
Joey Hess
8803e36814 future-proofing 2013-12-25 20:04:31 -04:00
Joey Hess
1dc930063a basic data types and serialization for external special remote protocol
This is mostly straightforward, but did turn out quite nicely stronly
typed, and with a quite nice automatic tokenization and parsing of received
messages.

Made a few minor changes to the protocol to clear up ambiguities and make
it easier to parse. Note particularly that setting remote configuration
is moved to a separate command, which allows a remote to set arbitrary data.
2013-12-25 17:54:57 -04:00
Joey Hess
011b8bc7ec pull in Win32-extras, to be able to get current process id in Windows
Fixed up a number of things that had worked around there not being a way to
get that.

Most notably, transfer info files on windows now include the process id,
since no locking is currently done. This means the file format varies
between windows and unix.
2013-12-11 00:15:10 -04:00
Joey Hess
e425a966ed Deal with box.com changing the url of their webdav endpoint.
Use new url when making new remotes.

Transparently rewrite old url to new for existing remotes.
2013-12-02 16:01:20 -04:00
Joey Hess
0a63ed563f rsync special remote: Fix fallback mode for rsync remotes that use hashDirMixed. Closes: #731142 2013-12-02 12:53:39 -04:00
Joey Hess
58db042033 map: Work when there are gcrypt remotes. 2013-11-04 14:14:44 -04:00
Joey Hess
2203690822 really fix gcrypt for 7be69a2491
Fixed all the other ones, but forgot to fix gcrypt!
2013-11-02 20:10:54 -04:00
Joey Hess
b2cca95d1c clean import list 2013-11-02 19:55:18 -04:00
Joey Hess
a04fe350b8 fix build 2013-11-02 19:54:59 -04:00
Joey Hess
7be69a2491 gcrypt, bup: Fix bug that prevented using these special remotes with encryption=pubkey.
I think both of these are all that's affected, but I went ahead and fixed
all the remotes that set their config to M.empty to instead store the
actual config. Who knows what will expect it to be actually present in
future, the Remote instance of getGpgEncParams came to..
2013-11-02 16:37:28 -04:00
Joey Hess
7ed8e87a34 assistant: Support repairing git remotes that are locally accessible
(eg, on removable drives)

gcrypt remotes are not yet handled.

This commit was sponsored by Sören Brunk.
2013-10-27 15:38:59 -04:00
Joey Hess
5756636486 directory, webdav: Fix bug introduced in version 4.20131002 that caused the chunkcount file to not be written. Work around repositories without such a file, so files can still be retreived from them. 2013-10-26 15:03:12 -04:00
Joey Hess
06ea92282f fix inverted logic when determining whether to write a chunkcount file
late-night hlint bit me on this one..
Reviewed c1990702e9 and
the rest of it seems ok
2013-10-26 14:08:29 -04:00
Joey Hess
c76c94a0da S3: Try to ensure bucket name is valid for archive.org. 2013-10-16 16:35:47 -04:00
Joey Hess
a6e9386d39 fix remote fsck to run in remote 2013-10-14 15:05:29 -04:00
Joey Hess
c78aaed317 ye olde inverted logic 2013-10-14 12:26:46 -04:00
Joey Hess
1ffb3bb0ba add remote fsck interface
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.

Cronner user this when available, and when not falls back to
fsck --fast --from remote

git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.

This commit was sponsored by Damien Diederen.
2013-10-11 16:03:18 -04:00
Joey Hess
747f5b123c url size fixes
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.

Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
2013-10-11 13:05:00 -04:00
Joey Hess
571fe4999b remove __WINDOWS__ ifdef 2013-10-06 17:23:30 -04:00
Joey Hess
0ede6b7def typoe and debug info 2013-10-01 19:10:45 -04:00
Joey Hess
bddfbef8be git-annex-shell gcryptsetup command
This was the least-bad alternative to get dedicated key gcrypt repos
working in the assistant.
2013-10-01 17:20:51 -04:00
Joey Hess
1536ebfe47 Disable receive.denyNonFastForwards when setting up a gcrypt special remote
gcrypt needs to be able to fast-forward the master branch. If a git
repository is set up with git init --shared --bare, it gets that set, and
pushing to it will then fail, even when it's up-to-date.
2013-10-01 15:23:48 -04:00
Joey Hess
101099f7b5 fix probing for local gcrypt repos 2013-10-01 14:38:20 -04:00
Joey Hess
995e1e3c5d fix transferring to gcrypt repo from direct mode repo
recvkey was told it was receiving a HMAC key from a direct mode repo,
and that confused it into rejecting the transfer, since it has no way to
verify a key using that backend, since there is no HMAC backend.

I considered making recvkey skip verification in the case of an unknown
backend. However, that could lead to bad results; a key can legitimately be
in the annex with a backend that the remote git-annex-shell doesn't know
about. Better to keep it rejecting if it cannot verify.

Instead, made the gcrypt special remote not set the direct mode flag when
sending (and receiving) files.

Also, added some recvkey messages when its checks fail, since otherwise
all that is shown is a confusing error message from rsync when the remote
git-annex-shell exits nonzero.
2013-10-01 14:19:24 -04:00
Joey Hess
12f6b9693a Send a git-annex user-agent when downloading urls.
Overridable with --user-agent option.

Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.

This commit sponsored by Michael Zehrer.
2013-09-28 14:35:21 -04:00
Joey Hess
c6032b0dab clean up some ugly code 2013-09-27 19:52:36 -04:00
Joey Hess
e864c8d033 blind enabling gcrypt repos on rsync.net
This pulls off quite a nice trick: When given a path on rsync.net, it
determines if it is an encrypted git repository that the user has
the key to decrypt, and merges with it. This is works even when
the local repository had no idea that the gcrypt remote exists!

(As previously done with local drives.)

This commit sponsored by Pedro Côrte-Real
2013-09-27 16:21:56 -04:00
Joey Hess
e0b99f3960 support ssh://host/~/dir
When generating the path for rsync, /~/ is not valid, so change to
just host:dir

Note that git remotes specified in host:dir form are internally converted
to the ssh:// url form, so this was especially needed..
2013-09-26 15:02:27 -04:00
Joey Hess
c1990702e9 hlint 2013-09-25 23:19:01 -04:00
Joey Hess
3192b059b5 add back lost check that git-annex-shell supports gcrypt 2013-09-24 17:51:12 -04:00
Joey Hess
4c954661a1 git-annex-shell: Added support for operating inside gcrypt repositories.
* Note that the layout of gcrypt repositories has changed, and
  if you created one you must manually upgrade it.
  See http://git-annex.branchable.com/upgrades/gcrypt/
2013-09-24 17:25:47 -04:00
Joey Hess
f9e438c1bc factor out more ssh stuff from git remote
This has the dual benefits of making Remote.Git shorter, and letting
Remote.GCrypt use these utilities.
2013-09-24 13:37:41 -04:00
Joey Hess
7390f08ef9 Use cryptohash rather than SHA for hashing.
This is a massive win on OSX, which doesn't have a sha256sum normally.

Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.

SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.

Used the following benchmark to arrive at the 1 mb number.

1 mb file:

benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
  4 (4.0%) high mild
  1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
  2 (2.0%) high mild
  1 (1.0%) high severe

2 mb file:

benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers

benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)

import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common

testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk

main = defaultMain
        [ bgroup "sha256"
                [ bench "internal" $ whnfIO internal
                , bench "external" $ whnfIO external
                ]
        ]

sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy

internal :: IO String
internal = show . sha256 <$> L.readFile testfile

external :: IO String
external = do
	s <- readProcess "sha256sum" [testfile]
        return $ fst $ separate (== ' ') s
2013-09-22 20:06:02 -04:00