When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.
When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.
In both cases, display a hopefully useful warning.
This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.
Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.
This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
* rmurl: Fix a case where removing the last url left git-annex thinking
content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
used to update the presence information of the external special remote
that called them; this was not documented behavior and is no longer done.
Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.
In AddUrl, had to make it update location tracking, to handle the
non-web-url case.
This commit was sponsored by Ewen McNeill on Patreon.
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.
As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.
This commit was sponsored by Eric Drechsel on Patreon.
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.
This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.
This commit was sponsored by Nick Piper on Patreon.
Makes git annex whereis display the versionId urls.
And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.
This commit was sponsored by Fernando Jimenez on Patreon.
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.
This commit was supported by the NSF-funded DataLad project.
Have to store the S3 object along with the version ID, so retrieval can
use the same object.
This commit was supported by the NSF-funded DataLad project.
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.
S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.
Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.
This commit was supported by the NSF-funded DataLad project.
Does nothing yet.
Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.
This commit was supported by the NSF-funded DataLad project.
This will be used to protect against CVE-2018-10859, where an encrypted
special remote is fed the wrong encrypted data, and so tricked into
decrypting something that the user encrypted with their gpg key and did
not store in git-annex.
It also protects against CVE-2018-10857, where a remote follows a http
redirect to a file:// url or to a local private web server. While that's
already been prevented in git-annex's own use of http, external special
remotes, hooks, etc use other http implementations and could still be
vulnerable.
The policy is not yet enforced, this commit only adds the appropriate
metadata to remotes.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.
The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.
The additional IO action should have no performance impact as long as
it's simply return.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.
downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.
git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.
This commit was supported by the NSF-funded DataLad project.
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.
Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.
Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.
This commit was supported by the NSF-funded DataLad project.
git annex testremote passes.
exportree not implemented yet, although the documentation talks about it,
since it will be the main way this remote will be used.
The adb push/pull progress is displayed for now; it would be better
to consume it and use it to update the git-annex progress bar.
This commit was sponsored by andrea rota.
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.
Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.
And some code refactoring.
This commit was sponsored by Francois Marier on Patreon.
Not yet called by Command.Export.
WebDAV needs this to clean up empty collections. Also, example.sh turned
out to not be cleaning up directories when removing content
from them, so it made sense for it to use this.
Remote.Directory did not need it, and since its cleanup method for empty
directories is more efficient than what Command.Export will need to do
to find empty directories, it uses Nothing so that extra work can be
avoided.
This commit was sponsored by Thom May on Patreon.
In a test, I uploaded a pdf, and several files were derived from it.
After removing the pdf, the derived files went away after approximatly
half an hour. This window does not seem worth warning about every time.
Documented it in the tip.
Since renameExport is allowed to fail for any reason, and its failure is
always recovered from by doing a new upload and deleting the old
content, this avoids unnecessary noise.
Copying a file on the IA failed, apparently something wrong with their
emulation of S3:
S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidArgument", s3ErrorMessage = "Invalid Argument", s3ErrorResource = Just "x-(amz|archive)-copy-source header is bad: 'joeyh-public-test2/foo'", s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}
This commit was sponsored by Jake Vosloo on Patreon.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.
Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.
This commit was sponsored by Trenton Cronholm on Patreon.
It opens a http connection per file exported, but then so does git
annex copy --to s3.
Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.
This commit was supported by the NSF-funded DataLad project.
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.
This commit was supported by the NSF-funded DataLad project.
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.
SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.
Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.
This commit was supported by the NSF-funded DataLad project.
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.
This commit was supported by the NSF-funded DataLad project
Implemented so far for the directory special remote.
Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.
Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.
Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.
Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.
This commit was sponsored by Bruno BEAUFILS on Patreon.
Removed dependency on MissingH, instead depending on the split
library.
After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.
Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.
This commit was sponsored by Shane-o on Patreon.
The check was broken in two ways.. First, nowhere did it error out when
checkUUIDFile found a different UUID already in the file. Instead,
it overwrote the uuid file.
And, checkUUIDFile's implementation was for some reason always failing with
a ConnectionClosed exception. Apparently something to do with using two
different runResourceT's and a response getting GCed inbetween. I'm pretty
sure that used to work, but changed to a more obviously correct
implementation.
This commit was sponsored by Peter Hogg on Patreon.
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.
This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.
Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.
This commit was sponsored by Thom May on Patreon.