For use with tor hidden services, and perhaps other transports later.
Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.
Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.
This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.
Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.
This commit was sponsored by Ethan Aubin.
* S3: Support the special case endpoint needed for the cn-north-1 region.
* Webapp: Don't list the Frankfurt region, as this (and some other new
regions) need V4 authorization which the aws library does not yet use.
This commit was sponsored by Nick Daly on Patreon.
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.
RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
This is useful for makking a special remote that anyone with a clone of the
repo and your public keys can upload files to, but only you can decrypt the
files stored in it.
The naming is unofrtunately not consistent, but the gnupg-options
were only used for encrypting, and it's too late to change that.
It would be nice to have a third setting that is always passed to gnupg,
but ~/.gnupg/options can be used to specify such global options when really
needed.
The direct flag is also set when sending unlocked content, to support old
versions of git-annex-shell. At some point, the direct flag will be
removed, and only the unlocked flag will be used.
/dev/null stderr; ssh is still able to display a password prompt
despite this
Show some messages so the user knows it's locking a remote, and
knows if that locking failed.
In c6632ee5c8, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.
On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.
As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.
It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
When gpg.program is configured, it's used to get the command to run for
gpg. Useful on systems that have only a gpg2 command or want to use it
instead of the gpg command.
This only makes sense for public repos, that are not chunked, so
that there's a 1:1 from Key in the git-annex repo to file on the remote.
Rather than making every remote implementation deal with that, just disable
whereisKey when it doesn't make sense.
"checkPresent baser" was wrong; the baser has a dummy checkPresent action
not the real one. So, to fix this, we need to call preparecheckpresent to
get a checkpresent action that can be used to check if chunks are present.
Note that, for remotes like S3, this means that the preparer is run,
which opens a S3 handle, that will be used for each checkpresent of a
chunk. That's a good thing; if we're resuming an upload that's already many
chunks in, it'll reuse that same http connection for each chunk it checks.
Still, it's not a perfectly ideal thing, since this is a different http
connection that the one that will be used to upload chunks. It would be
nice to improve the API so that both use the same http connection.
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.
In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.
Problems noticed while doing this conversion:
* Some uses of Params "oneword" which was entirely unnecessary
overhead.
* A few places that built up a list of parameters with ++
and then used Params to split it!
Test suite passes.
The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.
Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.
This is not quite done; lockContent stil needs to be converted.
I've tested all the dataenc to sandi conversions except Assistant.XMPP,
and all have unchanged behavior, including behavior on large unicode code
points.
Came up with a generic way to filter out progress messages while keeping
errors, for commands that use stderr for both.
--json mode will disable command outputs too.
The fix is to stop using w82s, which does not properly reconstitute unicode
strings. Instrad, use utf8 bytestring to get the [Word8] to base64. This
passes unicode through perfectly, including any invalid filesystem encoded
characters.
Note that toB64 / fromB64 are also used for creds and cipher
embedding. It would be unfortunate if this change broke those uses.
For cipher embedding, note that ciphers can contain arbitrary bytes (should
really be using ByteString.Char8 there). Testing indicated it's not safe to
use the new fromB64 there; I think that characters were incorrectly
combined.
For credpair embedding, the username or password could contain unicode.
Before, that unicode would fail to round-trip through the b64.
So, I guess this is not going to break any embedded creds that worked
before.
This bug may have affected some creds before, and if so,
this change will not fix old ones, but should fix new ones at least.
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.
Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.
This commit was sponsored by Christian Dietrich.
* info: Can now display info about a given uuid.
* Added to remote/uuid info: Count of the number of keys present
on the remote, and their size. This is rather expensive to calculate,
so comes last and --fast will disable it.
* Git remote info now includes the date of the last sync with the remote.
Unfortunately, I don't fully understand why it was leaking using the old
method of a lazy bytestring. I just know that it was leaking, despite
neither hGetUntilMetered nor byteStringPopper seeming to leak by
themselves.
The new method avoids the lazy bytestring, and simply reads chunks from the
handle and streams them out to the http socket.
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.
Remote types that support encryption or chunking also include that in their
info.
This commit was sponsored by Ævar Arnfjörð Bjarmason.
Found these with:
git grep "^ " $(find -type f -name \*.hs) |grep -v ': where'
Unfortunately there is some inline hamlet that cannot use tabs for
indentation.
Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
See 2f3c3aa01f for backstory about how a repo
could be in this state.
When decryption fails, the repo must be using non-encrypted creds. Note
that creds are encrypted/decrypted using the encryption cipher which is
stored in the repo, so the decryption cannot fail due to missing gpg keys
etc. (For !shared encryptiom, the cipher is iteself encrypted using some
gpg key(s), and the decryption of the cipher happens earlier, so not
affected by this change.
Print a warning message for !shared repos, and continue on using the
cipher. Wrote a page explaining what users hit by this bug should do.
This commit was sponsored by Samuel Tardieu.
encryptionSetup must be called before setRemoteCredPair. Otherwise,
the RemoteConfig doesn't have the cipher in it, and so no cipher is used to
encrypt the embedded creds.
This is a security fix for non-shared encryption methods!
For encryption=shared, there's no security problem, just an
inconsistentency in whether the embedded creds are encrypted.
This is very important to get right, so used some types to help ensure that
setRemoteCredPair is only run after encryptionSetup. Note that the external
special remote bypasses the type safety, since creds can be set after the
initial remote config, if the external special remote program requests it.
Also note that IA remotes never use encryption, so encryptionSetup is not
run for them at all, and again the type safety is bypassed.
This leaves two open questions:
1. What to do about S3 and glacier remotes that were set up
using encryption=pubkey/hybrid with embedcreds?
Such a git repo has a security hole embedded in it, and this needs to be
communicated to the user. Is the changelog enough?
2. enableremote won't work in such a repo, because git-annex will
try to decrypt the embedded creds, which are not encrypted, so fails.
This needs to be dealt with, especially for ecryption=shared repos,
which are not really broken, just inconsistently configured.
Noticing that problem for encryption=shared is what led to commit
fbdeeeed5f, which tried to
fix the problem by not decrypting the embedded creds.
This commit was sponsored by Josh Taylor.
This reverts commit fbdeeeed5f.
I can find no basis for that commit and think that I made it in error.
setRemoteCredPair always encrypts using the cipher from remoteCipher,
even when the cipher is shared.
Also fixes a test suite failures introduced in recent commits, where
inAnnexSafe failed in indirect mode, since it tried to open the lock file
ReadWrite. This is why the new checkLocked opens it ReadOnly.
This commit was sponsored by Chad Horohoe.
Added a convenience Utility.LockFile that is not a windows/posix
portability shim, but still manages to cut down on the boilerplate around
locking.
This commit was sponsored by Johan Herland.
(With the exception of daemon pid locking.)
This fixes at part of #758630. I reproduced the assistant locking eg, a
removable drive's annex journal lock file and forking a long-running
git-cat-file process that inherited that lock.
This did not affect Windows.
Considered doing a portable Utility.LockFile layer, but git-annex uses
posix locks in several special ways that have no direct Windows equivilant,
and it seems like it would mostly be a complication.
This commit was sponsored by Protonet.
Since encryption=shared, the encryption key is stored in the git repo, so
there is no point at all in encrypting the creds, also stored in the git
repo with that key. So `initremote` doesn't. The creds are simply stored
base-64 encoded.
However, it then tried to always decrypt creds when encryption was used..
Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.
This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.
This immediately found several bugs, which this commit also fixes!
* git remotes using ssh accidentially had checkPresent return
an exception, rather than throwing it
* The chunking code accidentially returned False rather than
propigating an exception when there were no chunks and
checkPresent threw an exception for the non-chunked key.
This commit was sponsored by Carlo Matteo Capocasa.
Fixes the memory leak on store.. the second oldest open git-annex bug!
Only retrieve remains to be converted.
This commit was sponsored by Scott Robinson.
Currently, initremote works, but not the other operations. They should be
fairly easy to add from this base.
Also, https://github.com/aristidb/aws/issues/119 blocks internet archive
support.
Note that since http-conduit is used, this also adds https support to S3.
Although git-annex encrypts everything anyway, so that may not be extremely
useful. It is not enabled by default, because existing S3 special remotes
have port=80 in their config. Setting port=443 will enable it.
This commit was sponsored by Daniel Brockman.
Removed old extensible-exceptions, only needed for very old ghc.
Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.
Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.
However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.
Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.
Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.
There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.
This commit was sponsored by Andreas Olsson.
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.
S3 and WebDAV both need this to support chunks with reasonable speed.
Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.
However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.
It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:
1. They can all be caught in one place (Remote.hasKey), and we know
every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
Remotes to be able to be implemented without needing to worry about
catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
Bool, which all the other relevant remote methods already did.
I do not see a good way to generalize that type; my previous attempts
failed miserably.
This reaping of any processes came to cause me problems when redoing the
rsync special remote -- a gpg process that was running gets waited on and
the place that then checks its return code fails.
I cannot reproduce any zombies when using the rsync special remote.
But I still can when using a normal git remote, accessed over ssh.
There is 1 zombie per file downloaded without this horrible hack enabled.
So, move the hack to only be used in that case.
Make the byteRetriever be passed the callback that consumes the bytestring.
This way, there's no worries about the lazy bytestring not all being read
when the resource that's creating it is closed.
Which in turn lets bup, ddar, and S3 each switch from using an unncessary
fileRetriver to a byteRetriever. So, more efficient on chunks and encrypted
files.
The only remaining fileRetrievers are hook and external, which really do
retrieve to files.
Chunking would complicate the assistant's code that checks when a pending
retrieval of a key from glacier is done. It would perhaps be nice to
support it to allow resuming, but not right now.
Converting to the new API still simplifies the code.
The forall a. in Preparer made resourcePrepare not seem to be usable, so
I specialized a to Bool. Which works for both Preparer Storer and
Preparer Retriever, but wouldn't let the Preparer be used for hasKey
as it currently stands.
And fixed a bug found by these tests; retrieveKeyFile would fail
when the dest file was already complete.
This commit was sponsored by Bradley Unterrheiner.
The content of unstable keys can potentially be different in different
repos, so eg, resuming a chunked upload started by another repo would
corrupt data.
This way, when the remote implementation neglects to update progress,
there will still be a somewhat useful progress display, as long as chunks
are used.
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.
Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.
I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
Needed for eg, Remote.External.
Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.
There should be no behavior changes in this commit.
This commit was sponsored by Michael Barabanov.