This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.
GETSTATE and SETSTATE are added to the external special remote protocol.
Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.
The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.
This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.
This commit was sponsored by Daniel Hofer.
This was unexpectedly difficult because of a depdenency cycle. To parse a
preferred content expression involves several things that need to operate
on the list of remotes. Which needs Remote.External. The only way to avoid
this cycle (I tried breaking it at several points) was to skip parsing the
expression in SETWANTED.
That's sorta ok, because git-annex already has to deal with unparsable
preferred content expressions being stored, in order to handle eg,
upgrades. But I'm still not very happy that I cannot check it.
I feel this is a strong indication that I need to beware of further
bloating the special remote protocol interface.
Changed protocol spec to make SETCONFIG only store it persistently when run
during INITREMOTE. I see no reason to support storing it persistently at
other times, and doing so would unnecessarily complicate the code.
Also, letting that be done would probably result in use for storing data that
doesn't really belong there, and special remote authors who don't
understand how the union merging works would probably be surprised the
results.
That complicated special remote programs, because they had to avoid making
PREPARE fail if some configuration is missing, because the remote might not
be initialized yet. Instead, complicate git-annex slightly by only sending
PREPARE immediately before some other request other than INITREMOTE (or
PREPARE of course).
This has not been tested at all. It compiles!
The only known missing things are support for encryption, and for get/set
of special remote configuration, and of key state. (The latter needs
separate work to add a new per-key log file to store that state.)
Only thing I don't much like is that initremote needs to be passed both
type=external and externaltype=foo. It would be better to have just
type=foo
Most of this is quite straightforward code, that largely wrote itself given
the types. The only tricky parts were:
* Need to lock the remote when using it to eg make a request, because
in theory git-annex could have multiple threads that each try to use
a remote at the same time. I don't think that git-annex ever does
that currently, but better safe than sorry.
* Rather than starting up every external special remote program when
git-annex starts, they are started only on demand, when first used.
This will avoid slowdown, especially when running fast git-annex query
commands. Once started, they keep running until git-annex stops, currently,
which may not be ideal, but it's hard to know a better time to stop them.
* Bit of a chicken and egg problem with caching the cost of the remote,
because setting annex-cost in the git config needs the remote to already
be set up. Managed to finesse that.
This commit was sponsored by Lukas Anzinger.
This is mostly straightforward, but did turn out quite nicely stronly
typed, and with a quite nice automatic tokenization and parsing of received
messages.
Made a few minor changes to the protocol to clear up ambiguities and make
it easier to parse. Note particularly that setting remote configuration
is moved to a separate command, which allows a remote to set arbitrary data.
Fixed up a number of things that had worked around there not being a way to
get that.
Most notably, transfer info files on windows now include the process id,
since no locking is currently done. This means the file format varies
between windows and unix.
I think both of these are all that's affected, but I went ahead and fixed
all the remotes that set their config to M.empty to instead store the
actual config. Who knows what will expect it to be actually present in
future, the Remote instance of getGpgEncParams came to..
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.
Cronner user this when available, and when not falls back to
fsck --fast --from remote
git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.
This commit was sponsored by Damien Diederen.
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.
Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
gcrypt needs to be able to fast-forward the master branch. If a git
repository is set up with git init --shared --bare, it gets that set, and
pushing to it will then fail, even when it's up-to-date.
recvkey was told it was receiving a HMAC key from a direct mode repo,
and that confused it into rejecting the transfer, since it has no way to
verify a key using that backend, since there is no HMAC backend.
I considered making recvkey skip verification in the case of an unknown
backend. However, that could lead to bad results; a key can legitimately be
in the annex with a backend that the remote git-annex-shell doesn't know
about. Better to keep it rejecting if it cannot verify.
Instead, made the gcrypt special remote not set the direct mode flag when
sending (and receiving) files.
Also, added some recvkey messages when its checks fail, since otherwise
all that is shown is a confusing error message from rsync when the remote
git-annex-shell exits nonzero.
Overridable with --user-agent option.
Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.
This commit sponsored by Michael Zehrer.
This pulls off quite a nice trick: When given a path on rsync.net, it
determines if it is an encrypted git repository that the user has
the key to decrypt, and merges with it. This is works even when
the local repository had no idea that the gcrypt remote exists!
(As previously done with local drives.)
This commit sponsored by Pedro Côrte-Real
When generating the path for rsync, /~/ is not valid, so change to
just host:dir
Note that git remotes specified in host:dir form are internally converted
to the ssh:// url form, so this was especially needed..
This is a massive win on OSX, which doesn't have a sha256sum normally.
Only use external hash commands when the file is > 1 mb,
since cryptohash is quite close to them in speed.
SHA is still used to calculate HMACs. I don't quite understand
cryptohash's API for those.
Used the following benchmark to arrive at the 1 mb number.
1 mb file:
benchmarking sha256/internal
mean: 13.86696 ms, lb 13.83010 ms, ub 13.93453 ms, ci 0.950
std dev: 249.3235 us, lb 162.0448 us, ub 458.1744 us, ci 0.950
found 5 outliers among 100 samples (5.0%)
4 (4.0%) high mild
1 (1.0%) high severe
variance introduced by outliers: 10.415%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 14.20670 ms, lb 14.17237 ms, ub 14.27004 ms, ci 0.950
std dev: 230.5448 us, lb 150.7310 us, ub 427.6068 us, ci 0.950
found 3 outliers among 100 samples (3.0%)
2 (2.0%) high mild
1 (1.0%) high severe
2 mb file:
benchmarking sha256/internal
mean: 26.44270 ms, lb 26.23701 ms, ub 26.63414 ms, ci 0.950
std dev: 1.012303 ms, lb 925.8921 us, ub 1.122267 ms, ci 0.950
variance introduced by outliers: 35.540%
variance is moderately inflated by outliers
benchmarking sha256/external
mean: 26.84521 ms, lb 26.77644 ms, ub 26.91433 ms, ci 0.950
std dev: 347.7867 us, lb 210.6283 us, ub 571.3351 us, ci 0.950
found 6 outliers among 100 samples (6.0%)
import Crypto.Hash
import Data.ByteString.Lazy as L
import Criterion.Main
import Common
testfile :: FilePath
testfile = "/run/shm/data" -- on ram disk
main = defaultMain
[ bgroup "sha256"
[ bench "internal" $ whnfIO internal
, bench "external" $ whnfIO external
]
]
sha256 :: L.ByteString -> Digest SHA256
sha256 = hashlazy
internal :: IO String
internal = show . sha256 <$> L.readFile testfile
external :: IO String
external = do
s <- readProcess "sha256sum" [testfile]
return $ fst $ separate (== ' ') s