The fix is to stop using w82s, which does not properly reconstitute unicode
strings. Instrad, use utf8 bytestring to get the [Word8] to base64. This
passes unicode through perfectly, including any invalid filesystem encoded
characters.
Note that toB64 / fromB64 are also used for creds and cipher
embedding. It would be unfortunate if this change broke those uses.
For cipher embedding, note that ciphers can contain arbitrary bytes (should
really be using ByteString.Char8 there). Testing indicated it's not safe to
use the new fromB64 there; I think that characters were incorrectly
combined.
For credpair embedding, the username or password could contain unicode.
Before, that unicode would fail to round-trip through the b64.
So, I guess this is not going to break any embedded creds that worked
before.
This bug may have affected some creds before, and if so,
this change will not fix old ones, but should fix new ones at least.
Eliminated complexity and future proofed. The most important change is that
all functions over Difference are now total; any Difference that can be
expressed should be handled. Avoids needs for sanity checking of inputs,
and version skew with the future.
Also, the difference.log now serializes a [Difference], not a Differences.
This saves space and keeps it simpler.
Note that [Difference] might contain conflicting differences (eg,
[Version5, Version6]. In this case, one of them needs to consistently win
over the others, probably based on Ord.
* init: Repository tuning parameters can now be passed when initializing a
repository for the first time. For details, see
http://git-annex.branchable.com/tuning/
* merge: Refuse to merge changes from a git-annex branch of a repo
that has been tuned in incompatable ways.
addurl behavior change: When downloading an url ending in .torrent,
it will download files from bittorrent, instead of the old behavior
of adding the torrent file to the repository.
Added Recommends on aria2 and bittornado | bittorrent.
This commit was sponsored by Asbjørn Sloth Tønnesen.
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.
Remote types that support encryption or chunking also include that in their
info.
This commit was sponsored by Ævar Arnfjörð Bjarmason.
Added a Default instance for TrustLevel, and was able to use that to clear
up several other parts of the code too.
This commit was sponsored by Stephan Schulz
Found these with:
git grep "^ " $(find -type f -name \*.hs) |grep -v ': where'
Unfortunately there is some inline hamlet that cannot use tabs for
indentation.
Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
* New annex.hardlink setting. Closes: #758593
* init: Automatically detect when a repository was cloned with --shared,
and set annex.hardlink=true, as well as marking the repository as
untrusted.
Had to reorganize Logs.Trust a bit to avoid a cycle between it and
Annex.Init.
Added a convenience Utility.LockFile that is not a windows/posix
portability shim, but still manages to cut down on the boilerplate around
locking.
This commit was sponsored by Johan Herland.
Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.
This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.
This immediately found several bugs, which this commit also fixes!
* git remotes using ssh accidentially had checkPresent return
an exception, rather than throwing it
* The chunking code accidentially returned False rather than
propigating an exception when there were no chunks and
checkPresent threw an exception for the non-chunked key.
This commit was sponsored by Carlo Matteo Capocasa.
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:
1. They can all be caught in one place (Remote.hasKey), and we know
every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
Remotes to be able to be implemented without needing to worry about
catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
Bool, which all the other relevant remote methods already did.
I do not see a good way to generalize that type; my previous attempts
failed miserably.
Make the byteRetriever be passed the callback that consumes the bytestring.
This way, there's no worries about the lazy bytestring not all being read
when the resource that's creating it is closed.
Which in turn lets bup, ddar, and S3 each switch from using an unncessary
fileRetriver to a byteRetriever. So, more efficient on chunks and encrypted
files.
The only remaining fileRetrievers are hook and external, which really do
retrieve to files.
The forall a. in Preparer made resourcePrepare not seem to be usable, so
I specialized a to Bool. Which works for both Preparer Storer and
Preparer Retriever, but wouldn't let the Preparer be used for hasKey
as it currently stands.
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.
Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.
I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
Needed for eg, Remote.External.
Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.
There should be no behavior changes in this commit.
This commit was sponsored by Michael Barabanov.
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.
But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!
This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.
Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.
Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.
In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.
However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.
This commit was sponsored by Georg Bauer.
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.
But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!
This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.
(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)
This commit was sponsored by Thomas Djärv.
Needed for resuming from chunks.
Url keys are considered not stable. I considered treating url keys with a
known size as stable, but just don't feel that is enough information.
This avoids a proliferation of hash directories when using new-style
chunking, and should improve performance since chunks are accessed
in sequence and so should have a common locality.
Of course, when a chunked key is encrypted, its hash directories have no
relation to the parent key.
This commit was sponsored by Christian Kellermann.
Added new fields for chunk number, and chunk size. These will not appear
in normal keys ever, but will be used for chunked data stored on special
remotes.
This commit was sponsored by Jouni K Seppanen.