Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.
This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.
This immediately found several bugs, which this commit also fixes!
* git remotes using ssh accidentially had checkPresent return
an exception, rather than throwing it
* The chunking code accidentially returned False rather than
propigating an exception when there were no chunks and
checkPresent threw an exception for the non-chunked key.
This commit was sponsored by Carlo Matteo Capocasa.
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:
1. They can all be caught in one place (Remote.hasKey), and we know
every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
Remotes to be able to be implemented without needing to worry about
catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
Bool, which all the other relevant remote methods already did.
I do not see a good way to generalize that type; my previous attempts
failed miserably.
Make the byteRetriever be passed the callback that consumes the bytestring.
This way, there's no worries about the lazy bytestring not all being read
when the resource that's creating it is closed.
Which in turn lets bup, ddar, and S3 each switch from using an unncessary
fileRetriver to a byteRetriever. So, more efficient on chunks and encrypted
files.
The only remaining fileRetrievers are hook and external, which really do
retrieve to files.
The forall a. in Preparer made resourcePrepare not seem to be usable, so
I specialized a to Bool. Which works for both Preparer Storer and
Preparer Retriever, but wouldn't let the Preparer be used for hasKey
as it currently stands.
Needed for eg, Remote.External.
Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.
There should be no behavior changes in this commit.
This commit was sponsored by Michael Barabanov.
This will allow things like WebDAV to opean a single persistent connection
and reuse it for all the chunked data.
The crazy types allow for some nice code reuse.
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.
Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
Moved old legacy chunking code, and cleaned up the directory and webdav
remotes use of it, so when no chunking is configured, that code is not
used.
The config for new style chunking will be chunk=1M instead of chunksize=1M.
There should be no behavior changes from this commit.
This commit was sponsored by Andreas Laas.
I think both of these are all that's affected, but I went ahead and fixed
all the remotes that set their config to M.empty to instead store the
actual config. Who knows what will expect it to be actually present in
future, the Remote instance of getGpgEncParams came to..
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.
Cronner user this when available, and when not falls back to
fsck --fast --from remote
git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.
This commit was sponsored by Damien Diederen.
To support this, a core.gcrypt-id is stored by git-annex inside the git
config of a local gcrypt repository, when setting it up.
That is compared with the remote's cached gcrypt-id. When different, a
drive has been changed. git-annex then looks up the remote config for
the uuid mapped from the core.gcrypt-id, and tweaks the configuration
appropriately. When there is no known config for the uuid, it will refuse to
use the remote.
This is a git-remote-gcrypt encrypted special remote. Only sending files
in to the remote works, and only for local repositories.
Most of the work so far has involved making initremote work. A particular
problem is that remote setup in this case needs to generate its own uuid,
derivied from the gcrypt-id. That required some larger changes in the code
to support.
For ssh remotes, this will probably just reuse Remote.Rsync's code, so
should be easy enough. And for downloading from a web remote, I will need
to factor out the part of Remote.Git that does that.
One particular thing that will need work is supporting hot-swapping a local
gcrypt remote. I think it needs to store the gcrypt-id in the git config of the
local remote, so that it can check it every time, and compare with the
cached annex-uuid for the remote. If there is a mismatch, it can change
both the cached annex-uuid and the gcrypt-id. That should work, and I laid
some groundwork for it by already reading the remote's config when it's
local. (Also needed for other reasons.)
This commit was sponsored by Daniel Callahan.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".
/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\
Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
Most remotes have meters in their implementations of retrieveKeyFile
already. Simply hooking these up to the transfer log makes that information
available. Easy peasy.
This is particularly valuable information for encrypted remotes, which
otherwise bypass the assistant's polling of temp files, and so don't have
good progress bars yet.
Still some work to do here (see progressbars.mdwn changes), but this
is entirely an improvement from the lack of progress bars for encrypted
downloads.
There was confusion in different parts of the progress bar code about
whether an update contained the total number of bytes transferred, or the
number of bytes transferred since the last update. One way this bug
showed up was progress bars that seemed to stick at zero for a long time.
In order to fix it comprehensively, I add a new BytesProcessed data type,
that is explicitly a total quantity of bytes, not a delta.
Note that this doesn't necessarily fix every problem with progress bars.
Particularly, buffering can now cause progress bars to seem to run ahead
of transfers, reaching 100% when data is still being uploaded.
git annex init probes for crippled filesystems, and sets direct mode, as
well as `annex.crippledfilesystem`.
Avoid manipulating permissions of files on crippled filesystems.
That would likely cause an exception to be thrown.
Very basic support in Command.Add for cripped filesystems; avoids the lock
down entirely since doing it needs both permissions and hard links.
Will make this better soon.
Ensure that each file has something written to it, even if the bytestring
chunk size is greater than the configured chunksize.
This means we may write a bit larger than the configured value, but only
when the configured value is very small; ie, < 8 kb.
Files are now written to a tmp directory in the remote, and once all
chunks are written, etc, it's moved into the final place atomically.
For now, checkpresent still checks every single chunk of a file, because
the old method could leave partially transferred files with some chunks
present and others not.