This threw an unusual exception w/o an error message when probing to see if
the bucket exists yet. So rather than relying on tryS3, catch all
exceptions.
This does mean that it might get an exception for some transient network
error, think this means the bucket DNE yet, and try to create it, and then
fail when it already exists.
When uploading the last part of a file, which was 640229 bytes, S3 rejected
that part: "Your proposed upload is smaller than the minimum allowed size"
I don't know what the minimum is, but the fix is just to include the last
part into the previous part. Since this can result in a part that's
double-sized, use half-sized parts normally.
Unfortunately, I don't fully understand why it was leaking using the old
method of a lazy bytestring. I just know that it was leaking, despite
neither hGetUntilMetered nor byteStringPopper seeming to leak by
themselves.
The new method avoids the lazy bytestring, and simply reads chunks from the
handle and streams them out to the http socket.
Untested and not even compiled yet.
Testing should include checks that file content streams through without
buffering in memory.
Note that CL.consume causes all the etags to be buffered in memory.
This is probably nearly unavoidable, since a request has to be constructed
that contains the list of etags in its body. (While it might be possible to
stream generation of the body, that would entail making a http request that
dribbles out parts of the body as the multipart uploads complete, which is
not likely to work well..
To limit this being a problem, it's best for partsize to be set to some
suitably large value, like 1gb. Then a full terabyte file will need only
1024 etags to be stored, which will probably use around 1 mb of memory.
I'm a little stuck on getting the list of etags of the parts.
This seems to require taking the md5 of each part locally,
which doesn't get along well with lazily streaming in the part from the
file. It would need to read the file twice, or lose laziness and buffer a
whole part -- but parts might be quite large.
This seems to be a problem with the API provided; S3 is supposed to return
an etag, but that is not exposed. I have filed a bug:
https://github.com/aristidb/aws/issues/141
That and S3 are all that uses creds currently, except that external
remotes can use creds. I have not handled showing info about external
remote creds because they can have 0, 1, or more separate cred pairs, and
there's no way for info to enumerate them or know how they're used.
So it seems ok to leave out creds info for external remotes.
This is intended to let the user easily tell if a remote's creds are
coming from info embedded in the repository, or instead from the
environment, or perhaps are locally stored in a creds file.
This commit was sponsored by Frédéric Schütz.
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.
Remote types that support encryption or chunking also include that in their
info.
This commit was sponsored by Ævar Arnfjörð Bjarmason.
Found these with:
git grep "^ " $(find -type f -name \*.hs) |grep -v ': where'
Unfortunately there is some inline hamlet that cannot use tabs for
indentation.
Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
See 2f3c3aa01f for backstory about how a repo
could be in this state.
When decryption fails, the repo must be using non-encrypted creds. Note
that creds are encrypted/decrypted using the encryption cipher which is
stored in the repo, so the decryption cannot fail due to missing gpg keys
etc. (For !shared encryptiom, the cipher is iteself encrypted using some
gpg key(s), and the decryption of the cipher happens earlier, so not
affected by this change.
Print a warning message for !shared repos, and continue on using the
cipher. Wrote a page explaining what users hit by this bug should do.
This commit was sponsored by Samuel Tardieu.
encryptionSetup must be called before setRemoteCredPair. Otherwise,
the RemoteConfig doesn't have the cipher in it, and so no cipher is used to
encrypt the embedded creds.
This is a security fix for non-shared encryption methods!
For encryption=shared, there's no security problem, just an
inconsistentency in whether the embedded creds are encrypted.
This is very important to get right, so used some types to help ensure that
setRemoteCredPair is only run after encryptionSetup. Note that the external
special remote bypasses the type safety, since creds can be set after the
initial remote config, if the external special remote program requests it.
Also note that IA remotes never use encryption, so encryptionSetup is not
run for them at all, and again the type safety is bypassed.
This leaves two open questions:
1. What to do about S3 and glacier remotes that were set up
using encryption=pubkey/hybrid with embedcreds?
Such a git repo has a security hole embedded in it, and this needs to be
communicated to the user. Is the changelog enough?
2. enableremote won't work in such a repo, because git-annex will
try to decrypt the embedded creds, which are not encrypted, so fails.
This needs to be dealt with, especially for ecryption=shared repos,
which are not really broken, just inconsistently configured.
Noticing that problem for encryption=shared is what led to commit
fbdeeeed5f, which tried to
fix the problem by not decrypting the embedded creds.
This commit was sponsored by Josh Taylor.
This reverts commit fbdeeeed5f.
I can find no basis for that commit and think that I made it in error.
setRemoteCredPair always encrypts using the cipher from remoteCipher,
even when the cipher is shared.