Commit graph

215 commits

Author SHA1 Message Date
Joey Hess
669b305de2
S3: Send a Content-Type header when storing objects in S3
So exports to public buckets can be linked to from web pages.

(When git-annex is built with MagicMime support.)

Thanks to Jared Cosulich for the idea.
2019-01-23 13:08:47 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
2e069eb9f6
use putBucket to future-proof
New fields can be added to PutBucket in the future.
2018-12-31 13:09:20 -04:00
Joey Hess
4579dd6201
S3: Improve diagnostics when a remote is configured with exporttree and versioning, but no S3 version id has been recorded for a key.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.

When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.

In both cases, display a hopefully useful warning.

This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.

Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-12-06 13:44:37 -04:00
Joey Hess
a9dd087074
centralized "yes"/"no" parsing
This commit was sponsored by Jack Hill on Patreon.
2018-10-10 11:14:27 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
773084c49b
S3: Fix url construction bug
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.

As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.

This commit was sponsored by Eric Drechsel on Patreon.
2018-09-14 12:25:23 -04:00
Joey Hess
677038199c
fix build with older aws
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.

This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.

This commit was sponsored by Nick Piper on Patreon.
2018-09-13 09:58:39 -04:00
Joey Hess
445ea66732
simplify 2018-09-06 16:07:16 -04:00
Joey Hess
b7daf2685f
support public versioned S3 access
Makes git annex whereis display the versionId urls.

And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-09-06 14:31:41 -04:00
Joey Hess
7407a80c27
S3: Support AWS_SESSION_TOKEN
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-09-05 15:53:57 -04:00
Joey Hess
53d839d543
more efficient encoding 2018-08-31 13:49:08 -04:00
Joey Hess
b3d42283ad
use per-remote metadata storage for S3 version ID
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 13:27:29 -04:00
Joey Hess
6b75b9c448
turn on appendonly when versioning is enabled 2018-08-31 10:53:07 -04:00
Joey Hess
19dcff2b71
use S3 version ID for retrieval
Have to store the S3 object along with the version ID, so retrieval can
use the same object.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 15:37:08 -04:00
Joey Hess
794e9a7a44
store S3 version IDs
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.

S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.

Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 14:30:56 -04:00
Joey Hess
0ff5a41311
S3 versioning=yes config
Not yet used.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 13:45:28 -04:00
Joey Hess
02630b39ee
add Remote.readonly
Does nothing yet.

Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:12:18 -04:00
Joey Hess
2884637cab
S3: Support credential-less download from remotes configured with public=yes exporttree=yes.
This commit was supported by the NSF-funded DataLad project.
2018-07-31 16:32:43 -04:00
Joey Hess
4315bb9e42
add retrievalSecurityPolicy
This will be used to protect against CVE-2018-10859, where an encrypted
special remote is fed the wrong encrypted data, and so tricked into
decrypting something that the user encrypted with their gpg key and did
not store in git-annex.

It also protects against CVE-2018-10857, where a remote follows a http
redirect to a file:// url or to a local private web server. While that's
already been prevented in git-annex's own use of http, external special
remotes, hooks, etc use other http implementations and could still be
vulnerable.

The policy is not yet enforced, this commit only adds the appropriate
metadata to remotes.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-06-21 11:36:36 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
197b1510fa
remove unused import 2018-04-09 13:09:40 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2927618d35
Added adb special remote which allows exporting files to Android devices.
git annex testremote passes.

exportree not implemented yet, although the documentation talks about it,
since it will be the main way this remote will be used.

The adb push/pull progress is displayed for now; it would be better
to consume it and use it to update the git-annex progress bar.

This commit was sponsored by andrea rota.
2018-03-27 14:54:41 -04:00
Joey Hess
a01b0680e3
fix version number 2017-10-11 11:43:03 -04:00
Joey Hess
6679705116
typo 2017-10-11 11:24:51 -04:00
Joey Hess
61dccecad7
Fix build with aws-0.17.
This commit was sponsored by Denis Dzyubenko on Patreon.
2017-10-11 10:57:20 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
e1f5c90c92
split out Types.Export 2017-09-15 16:46:03 -04:00
Joey Hess
9f4ffe65e9
implement removeExportDirectory
Not yet called by Command.Export.

WebDAV needs this to clean up empty collections. Also, example.sh turned
out to not be cleaning up directories when removing content
from them, so it made sense for it to use this.

Remote.Directory did not need it, and since its cleanup method for empty
directories is more efficient than what Command.Export will need to do
to find empty directories, it uses Nothing so that extra work can be
avoided.

This commit was sponsored by Thom May on Patreon.
2017-09-15 13:18:21 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
7ef9b7ef46
update copyright year 2017-09-12 13:53:03 -04:00
Joey Hess
088d819cd8
propigate exception in checkPresentExportS3
checkPresentExport is supposed to throw exceptions
2017-09-12 13:46:33 -04:00
Joey Hess
1332e6cec0
stop warning about removals from IA
In a test, I uploaded a pdf, and several files were derived from it.
After removing the pdf, the derived files went away after approximatly
half an hour. This window does not seem worth warning about every time.
Documented it in the tip.
2017-09-12 12:47:43 -04:00
Joey Hess
da23dec7d3
avoid showing error when copy fails
Since renameExport is allowed to fail for any reason, and its failure is
always recovered from by doing a new upload and deleting the old
content, this avoids unnecessary noise.

Copying a file on the IA failed, apparently something wrong with their
emulation of S3:

  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidArgument", s3ErrorMessage = "Invalid Argument", s3ErrorResource = Just "x-(amz|archive)-copy-source header is bad: 'joeyh-public-test2/foo'", s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

This commit was sponsored by Jake Vosloo on Patreon.
2017-09-12 12:42:44 -04:00
Joey Hess
267f47c473
S3: Allow removing files from IA, but warn about derived versions potentially still existing there.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.

Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-09-12 12:35:58 -04:00
Joey Hess
afdff226fb
don't show key urls in whereis for S3 with public=yes and exporttree=yes 2017-09-08 16:44:00 -04:00
Joey Hess
650d0955a0
S3 export finalization
Fixed ACL issue, and updated some documentation.
2017-09-08 16:28:28 -04:00
Joey Hess
44cd5ae313
S3 export (untested)
It opens a http connection per file exported, but then so does git
annex copy --to s3.

Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 15:46:24 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
e55e445a36
add API for exporting
Implemented so far for the directory special remote.

Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.

Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.

Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.

Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-08-29 13:00:41 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00