Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.
Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)
To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.
And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.
(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
using git credential to get the password
One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.
The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.
configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.
Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.
Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.
initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
Using field functions consistently avoids possibility of typos and also
helps ensure that all fields are added to RemoteConfigParsers (as long
as I have remembered to add them when writing the functions).
Avoids parse error when the fields are added to RemoteConfig at setup
time and it then gets parsed, also at setup time. After setup time, such
internally added fields are not a problem, because they're Accepted. So
it may not be necessary in all cases to list such internally added
fields, but I think it's a good idea to always do so.
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log
This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.
This also lays groundwork for rejecting unknown/typoed config keys.
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.
Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:
Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.
The Eq instance is fixed.
Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.
Used git-annex benchmark:
find is 7% faster
whereis is 3% faster
get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.
(When curl is used, its errors are still not caught.)
This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.
The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
Prompted by the test suite on windows failing to with "export foo failed"
and no information about what went wrong.
Note that only storeExportWithContentIdentifier has been converted.
storeExport still returns a Bool and so exceptions may be hidden.
However, storeExportWithContentIdentifier has many more failure modes,
since it needs to avoid overwriting modified files. So it's more
important it have better error display.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
* Added mimeencoding= term to annex.largefiles expressions.
This is probably mostly useful to match non-text files with eg
"mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
Unfortunately, "port" has to be set by default, or the old git-annex
will crash when trying to enable the S3 remote.
So, when protocol=https is specified, it needs to override port=80,
since it may be a default setting.
protocol=https implies port=443 and
port=443 implies protocol=https
-- this was necessary because the existing configs set port=443, but
with a protocol setting, users will naturally want to use it, and then
there's no need for them to supply the default https port. So we keep
back-compat, add a nicer way to enable https, and also add support for
non-standard https ports.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Avoid a warning message when renameExport is not supported, and just
fallback to deleting with a subsequent re-upload. Especially needed for
importtree remotes, where renameExport needs to be disabled.
This changes the external special remote protocol, but in a
backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older
version of git-annex will cause it to make renameExport return False.
This gets back any speed lost in commit
9cebfd7002, and speeds up all uses of S3
remotes that operate on them more than once.
This commit was sponsored by Brett Eisenberg on Patreon.
Pushed the ResourceT out into larger code blocks, and made sure that
the the http result from a sendS3Handle is processed inside the same
ResourceT block.
I don't think this fixes any bugs, but it allows getting rid of a scary
comment.
This commit was sponsored by Eric Drechsel on Patreon.
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.
Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.
As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.
This commit was sponsored by Nick Piper on Patreon.
I seem to have thought that a Preparer was only run once when a remote
is accessed multiple times, but that is not in fact the case. prepareS3Handle
is run once per access. So, there is no point to it.
That there is some duplicate work done on each access is now apparent.
Luckily, the http manager is reused, so only one http connection is
made. But the S3 creds are loaded repeatedly. Room for improvement here.
This commit was sponsored by Jack Hill on Patreon.
Because when git-annex lacks S3 version IDs for files stored in the bucket,
deleting them would cause data loss.
Also because git-annex is not able to download unversioned objects from a bucket
when versioning=yes.
This also prevents setting versioning=no. While that would perhaps be
possible to do safely, it would add complexity, and would mean that if
the user accidentially did enableremote versioning=no, they would not be
able to undo it.
This commit was sponsored by Trenton Cronholm on Patreon.
Needs not yet released version 0.22 of aws library; with older versions
asks the user to configure the bucket versioning themselves.
Note that S3 endpoints that don't support versioning will cause putBucketVersioning
to throw an exception, so initremote will fail.
This commit was sponsored by Jake Vosloo on Patreon.
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.
Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.