Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.
Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.
There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.
This commit was sponsored by Andreas Olsson.
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:
1. They can all be caught in one place (Remote.hasKey), and we know
every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
Remotes to be able to be implemented without needing to worry about
catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
Bool, which all the other relevant remote methods already did.
I do not see a good way to generalize that type; my previous attempts
failed miserably.
Moved old legacy chunking code, and cleaned up the directory and webdav
remotes use of it, so when no chunking is configured, that code is not
used.
The config for new style chunking will be chunk=1M instead of chunksize=1M.
There should be no behavior changes from this commit.
This commit was sponsored by Andreas Laas.
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.
Cronner user this when available, and when not falls back to
fsck --fast --from remote
git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.
This commit was sponsored by Damien Diederen.
To support this, a core.gcrypt-id is stored by git-annex inside the git
config of a local gcrypt repository, when setting it up.
That is compared with the remote's cached gcrypt-id. When different, a
drive has been changed. git-annex then looks up the remote config for
the uuid mapped from the core.gcrypt-id, and tweaks the configuration
appropriately. When there is no known config for the uuid, it will refuse to
use the remote.
This is a git-remote-gcrypt encrypted special remote. Only sending files
in to the remote works, and only for local repositories.
Most of the work so far has involved making initremote work. A particular
problem is that remote setup in this case needs to generate its own uuid,
derivied from the gcrypt-id. That required some larger changes in the code
to support.
For ssh remotes, this will probably just reuse Remote.Rsync's code, so
should be easy enough. And for downloading from a web remote, I will need
to factor out the part of Remote.Git that does that.
One particular thing that will need work is supporting hot-swapping a local
gcrypt remote. I think it needs to store the gcrypt-id in the git config of the
local remote, so that it can check it every time, and compare with the
cached annex-uuid for the remote. If there is a mismatch, it can change
both the cached annex-uuid and the gcrypt-id. That should work, and I laid
some groundwork for it by already reading the remote's config when it's
local. (Also needed for other reasons.)
This commit was sponsored by Daniel Callahan.
With the initremote parameters "encryption=pubkey keyid=788A3F4C".
/!\ Adding or removing a key has NO effect on files that have already
been copied to the remote. Hence using keyid+= and keyid-= with such
remotes should be used with care, and make little sense unless the point
is to replace a (sub-)key by another. /!\
Also, a test case has been added to ensure that the cipher and file
contents are encrypted as specified by the chosen encryption scheme.
Most remotes have meters in their implementations of retrieveKeyFile
already. Simply hooking these up to the transfer log makes that information
available. Easy peasy.
This is particularly valuable information for encrypted remotes, which
otherwise bypass the assistant's polling of temp files, and so don't have
good progress bars yet.
Still some work to do here (see progressbars.mdwn changes), but this
is entirely an improvement from the lack of progress bars for encrypted
downloads.
There was confusion in different parts of the progress bar code about
whether an update contained the total number of bytes transferred, or the
number of bytes transferred since the last update. One way this bug
showed up was progress bars that seemed to stick at zero for a long time.
In order to fix it comprehensively, I add a new BytesProcessed data type,
that is explicitly a total quantity of bytes, not a delta.
Note that this doesn't necessarily fix every problem with progress bars.
Particularly, buffering can now cause progress bars to seem to run ahead
of transfers, reaching 100% when data is still being uploaded.
Pity that the library does not provide a function to extract the status
code from the StatusCodeException, so when they had to add a new field, it
breaks every single place that does it.
Files are now written to a tmp directory in the remote, and once all
chunks are written, etc, it's moved into the final place atomically.
For now, checkpresent still checks every single chunk of a file, because
the old method could leave partially transferred files with some chunks
present and others not.
Both the directory and webdav special remotes used to have to buffer
the whole file contents before it could be decrypted, as they read
from chunks. Now the chunks are streamed through gpg with no buffering.
This allows deleting all chunks for a file with a single http command,
so it's a win after all.
However, does not look in the mixed case hash directories, which were
in the past used by the directory, etc remotes.
The benefit of using a compatable directory structure does not outweigh the
cost in complexity of handling the multiple locations content can be stored
in directory special remotes. And this also allows doing away with the parent
directories, which can't be made unwritable in DAV, so have no benefit
there. This will save 2 http calls per file store.
But, kept the directory hashing, just in case.