This solves the problem of sameas remotes trampling over per-remote
state. Used for:
* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
could in theory generate the same content identifier for two different
peices of content
While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.
External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.
Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.
The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
Prompted by the test suite on windows failing to with "export foo failed"
and no information about what went wrong.
Note that only storeExportWithContentIdentifier has been converted.
storeExport still returns a Bool and so exceptions may be hidden.
However, storeExportWithContentIdentifier has many more failure modes,
since it needs to avoid overwriting modified files. So it's more
important it have better error display.
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.
The only remaining version ifdefs in the entire code base are now a couple
for aws!
This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.
This commit was sponsored by Ilya Shlyakhter.
* Added mimeencoding= term to annex.largefiles expressions.
This is probably mostly useful to match non-text files with eg
"mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
Unfortunately, "port" has to be set by default, or the old git-annex
will crash when trying to enable the S3 remote.
So, when protocol=https is specified, it needs to override port=80,
since it may be a default setting.
protocol=https implies port=443 and
port=443 implies protocol=https
-- this was necessary because the existing configs set port=443, but
with a protocol setting, users will naturally want to use it, and then
there's no need for them to supply the default https port. So we keep
back-compat, add a nicer way to enable https, and also add support for
non-standard https ports.
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.
Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.
(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
Avoid a warning message when renameExport is not supported, and just
fallback to deleting with a subsequent re-upload. Especially needed for
importtree remotes, where renameExport needs to be disabled.
This changes the external special remote protocol, but in a
backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older
version of git-annex will cause it to make renameExport return False.
This gets back any speed lost in commit
9cebfd7002, and speeds up all uses of S3
remotes that operate on them more than once.
This commit was sponsored by Brett Eisenberg on Patreon.
Pushed the ResourceT out into larger code blocks, and made sure that
the the http result from a sendS3Handle is processed inside the same
ResourceT block.
I don't think this fixes any bugs, but it allows getting rid of a scary
comment.
This commit was sponsored by Eric Drechsel on Patreon.
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.
Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.
As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.
This commit was sponsored by Nick Piper on Patreon.
I seem to have thought that a Preparer was only run once when a remote
is accessed multiple times, but that is not in fact the case. prepareS3Handle
is run once per access. So, there is no point to it.
That there is some duplicate work done on each access is now apparent.
Luckily, the http manager is reused, so only one http connection is
made. But the S3 creds are loaded repeatedly. Room for improvement here.
This commit was sponsored by Jack Hill on Patreon.
Because when git-annex lacks S3 version IDs for files stored in the bucket,
deleting them would cause data loss.
Also because git-annex is not able to download unversioned objects from a bucket
when versioning=yes.
This also prevents setting versioning=no. While that would perhaps be
possible to do safely, it would add complexity, and would mean that if
the user accidentially did enableremote versioning=no, they would not be
able to undo it.
This commit was sponsored by Trenton Cronholm on Patreon.
Needs not yet released version 0.22 of aws library; with older versions
asks the user to configure the bucket versioning themselves.
Note that S3 endpoints that don't support versioning will cause putBucketVersioning
to throw an exception, so initremote will fail.
This commit was sponsored by Jake Vosloo on Patreon.
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.
Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.
When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.
In both cases, display a hopefully useful warning.
This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.
Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* rmurl: Fix a case where removing the last url left git-annex thinking
content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
used to update the presence information of the external special remote
that called them; this was not documented behavior and is no longer done.
Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.
In AddUrl, had to make it update location tracking, to handle the
non-web-url case.
This commit was sponsored by Ewen McNeill on Patreon.
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.
As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.
This commit was sponsored by Eric Drechsel on Patreon.
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.
This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.
This commit was sponsored by Nick Piper on Patreon.
Makes git annex whereis display the versionId urls.
And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.
This commit was sponsored by Fernando Jimenez on Patreon.
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.
This commit was supported by the NSF-funded DataLad project.
Have to store the S3 object along with the version ID, so retrieval can
use the same object.
This commit was supported by the NSF-funded DataLad project.