Commit graph

286 commits

Author SHA1 Message Date
Joey Hess
36133f27c0
move untrust forcing from Logs.Trust into Remote
No behavior changes here, but this is groundwork for letting remotes
such as borg vary untrust forcing depending on configuration.
2020-12-28 15:22:10 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
Joey Hess
4f9969d0a1
optimisation for borg
Skip needing to list importable contents when unchanged since last time.
2020-12-22 15:00:05 -04:00
Joey Hess
e1ac42be77
convert listImportableContents to throwing exceptions 2020-12-22 14:24:29 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
e9db382308
avoid redundant set of a S3 verison ID that is already recorded
I think this could cause unnecessary changes to the git-annex branch,
and retrieveExportWithContentIdentifier is now also used for getting
content from importtree=yes remotes, so it would happen more frequently
so let's avoid.
2020-12-17 16:49:17 -04:00
Joey Hess
9b0dde834e
convert getFileSize to RawFilePath
Lots of nice wins from this in avoiding unncessary work, and I think
nothing got slower.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2020-11-05 11:32:57 -04:00
Joey Hess
5844a54869
aws-0.22 improved its support for setting etags, which improves support for versioned S3 buckets.
Remove placeholder version number I used when implementing the feature in
aws.

This commit was sponsored by Ethan Aubin.
2020-09-14 18:37:49 -04:00
Joey Hess
ddf963d019
deepseq all things returned from ResourceT http
Potentially fixes https://git-annex.branchable.com/bugs/concurrent_git-annex-copy_to_s3_special_remote_fails/
although I don't know if it does.

My thinking is, ResourceT may allocate a resource and then free it,
and a unforced thunk to that resource could result in reading memory
that has since been overwritten by something else, or in a SEGV,
depending. While that seems kind of like a bug in ResourceT to me, if it
is what's happening, this will avoid it. If it's not, this doesn't
really hurt much since the values are all smallish.

This commit was sponsored by Graham Spencer on Patreon.
2020-09-14 18:30:06 -04:00
Joey Hess
ddcab38e4a
no importKey for S3 for now
The Etag is sometimes a md5, but not if eg, there was a multipart
upload.

May revisit later if there's demand.
2020-07-03 13:53:14 -04:00
Joey Hess
3175015d1b
lockContent for S3 (with versioning=yes) and git-lfs
Made several special remotes support locking content on them while
dropping, which allows dropping from another special remote when the
content will only remain on a special remote of these types.

In both cases, verify the content is present actively, because it's
certianly possible for things other than git-annex to have removed it.

Worth thinking about what to do if at some later point, git-lfs gains
support for dropping content, and a content locking operation.
That would probably need a transition; first would need to make lockContent
use the locking operation. Then, once enough time had passed that we can
assume any git-annex operating on the git-lfs remote had that change,
git-annex could finally allow dropping from git-lfs.

Or, it could be that git-lfs gains support for dropping content, but not
locking it. In that case, it seems this commit would need to be reverted,
and then wait long enough for that git-annex to be everywhere, and only
then can git-annex safely support dropping from git-lfs.

So, the assumption made in this commit could lead to bother later.. But I
think it's actually highly unlikely git-lfs does ever support dropping;
it's outside their centralized model. Probably. :) Worth keeping in mind as
the same assumption is made about other special remotes though.

This commit was sponsored by Ethan Aubin.
2020-06-26 13:46:42 -04:00
Joey Hess
ad81feb053
fix implicit embedcreds regression
Fix bug that made creds not be stored in git when a special remote was
initialized with gpg encryption, but without an explicit embedcreds=yes.

(Yet nother regression introduced in version 7.20200202.7. 5th so far.)
2020-06-16 18:00:19 -04:00
Joey Hess
41952204ce
S3: The REDUCED_REDUNDANCY storage class is no longer cheaper
So stop documenting it, and stop offering it as a choice in the assistant.

Removed the code that parses it into S3.ReducedRedundancy, because
S3.OtherStorageClass with the value will work just the same and avoids a
special case for a deprecated this.
2020-06-16 12:04:29 -04:00
Joey Hess
e63dcbf36c
fix embedcreds=yes reversion
Fix bug that made enableremote of S3 and webdav remotes, that have
embedcreds=yes, fail to set up the embedded creds, so accessing the remotes
failed.

(Regression introduced in version 7.20200202.7 in when reworking all the
remote configs to be parsed.)

Root problem is that parseEncryptionConfig excludes all other config keys
except encryption ones, so it is then unable to find the
credPairRemoteField. And since that field is not required to be
present, it proceeds as if it's not, rather than failing in any visible
way.

This causes it to not find any creds, and so it does not cache
them. When when the S3 remote tries to make a S3 connection, it finds no
creds, so assumes it's being used in no-creds mode, and tries to find a
public url. With no public url available, it fails, but the failure doesn't
say a lack of creds is the problem.

Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full
set of configs of the remote can be parsed. A bit annoying to need to
parse the remote config before the full config (as returned by
setRemoteCredPair) is available, but this avoids the problem.

I assume webdav also had the problem by inspection, but didn't try to
reproduce it with it.

Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted
String, but that does not seem right. Now that it runs that code, it
crashed saying it had just a String.

Remotes that have already been enableremoted, and so lack the cached creds
file will work after this fix, because getRemoteCredPair will extract
the creds from the remote config, writing the missing file.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-21 14:35:30 -04:00
Joey Hess
6361074174
convert renameExport to throw exception
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.

A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?

However, it did let me clean up several places in the code.

This commit was sponsored by Ethan Aubin.
2020-05-15 15:08:09 -04:00
Joey Hess
cdbfaae706
change removeExport to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Graham Spencer on Patreon.
2020-05-15 14:15:14 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
4814b444dd
make storeExport throw exceptions 2020-05-15 12:20:02 -04:00
Joey Hess
4be94c67c7
make removeKey throw exceptions 2020-05-14 14:11:05 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
b50ee9cd0c
remove Preparer abstraction
That had almost no benefit at all, and complicated things quite a lot.

What I proably wanted this to be was something like ResourceT, but it
was not. The few remotes that actually need some preparation done only
once and reused used a MVar and not Preparer.
2020-05-13 11:56:21 -04:00
Joey Hess
1532d67c3e
S3: Support signature=v4
To use S3 Signature Version 4. Some S3 services seem to require v4, while
others may only support v2, which remains the default.

I'm also not sure if v4 works correctly in all cases, there is this
upstream bug report: https://github.com/aristidb/aws/issues/262
I've only tested it against the default S3 endpoint.
2020-05-07 13:18:11 -04:00
Joey Hess
81e3faf810
Merge branch 'v7' 2020-02-26 18:15:18 -04:00
Joey Hess
8af6d2c3c5
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.

Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)

To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.

And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.

(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 18:05:36 -04:00
Joey Hess
67476fbc54
minor code simplification 2020-02-25 13:06:09 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
2be4122bfc
include passthrough params in --describe-other-params 2020-01-20 16:53:27 -04:00
Joey Hess
7038acf96c
add descriptions for all remote config fields
not yet used
2020-01-20 15:20:04 -04:00
Joey Hess
99cb3e75f1
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.

The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.

configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.

Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.

Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.

initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 16:07:17 -04:00
Joey Hess
907ca937ab
use more field functions
Using field functions consistently avoids possibility of typos and also
helps ensure that all fields are added to RemoteConfigParsers (as long
as I have remembered to add them when writing the functions).
2020-01-15 11:15:07 -04:00
Joey Hess
7f2bfd41d7
include credPairRemoteFields in RemoteConfigParsers
Avoids parse error when the fields are added to RemoteConfig at setup
time and it then gets parsed, also at setup time. After setup time, such
internally added fields are not a problem, because they're Accepted. So
it may not be necessary in all cases to list such internally added
fields, but I think it's a good idea to always do so.
2020-01-15 10:57:45 -04:00
Joey Hess
0706d9d093
finish porting S3 2020-01-15 10:52:28 -04:00
Joey Hess
c4ea3ca40a
ported almost all remotes, until my brain melted
external is not started yet, and S3 is part way through and not
compiling yet
2020-01-14 15:41:34 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
650a631ef8
include all remotes back in 2019-12-02 12:26:33 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
c3975ff3b4
sameas RemoteConfig inheritance
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.

The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
2019-10-10 15:58:22 -04:00
Joey Hess
d1130ea04a
get rid of hardcoded "name" lookups
Support "sameas-name" being set instead.

In RenameRemote, rename which ever of the two is set.
2019-10-10 13:25:10 -04:00
Joey Hess
708fc6567f
S3: Fix encoding when generating public urls of S3 objects.
This code feels worryingly stringily typed, but using URI does not help
because the uriPath still has to be constructed with the right
uri-encoding.
2019-08-15 12:56:46 -04:00
Joey Hess
5004381dd9
improve error display when storing to an export/import remote fails
Prompted by the test suite on windows failing to with "export foo failed"
and no information about what went wrong.

Note that only storeExportWithContentIdentifier has been converted.
storeExport still returns a Bool and so exceptions may be hidden.

However, storeExportWithContentIdentifier has many more failure modes,
since it needs to avoid overwriting modified files. So it's more
important it have better error display.
2019-08-13 12:05:00 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
700a3f2787
Merge branch 'master' into import-from-s3 2019-05-01 14:30:52 -04:00
Joey Hess
9dd764e6f7
Added mimeencoding= term to annex.largefiles expressions.
* Added mimeencoding= term to annex.largefiles expressions.
  This is probably mostly useful to match non-text files with eg
  "mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
2019-04-30 12:17:22 -04:00
Joey Hess
f08cd6a4ac
set S3 version id in retrieveExportWithContentIdentifierS3
This is necessary because of checks for a S3 version id being set
done when deleting the export or overwriting or renaming it.
2019-04-24 15:13:07 -04:00
Joey Hess
a42e7a012a
refuse unsafe store to unversioned exporttree with old aws version
I've developed a patch to aws, once it gets merged, the real version
number of aws can be filled in.
2019-04-23 14:39:30 -04:00
Joey Hess
a7db925f59
typo 2019-04-23 13:19:48 -04:00
Joey Hess
710c2cdbdc
implement rest of missing methods for import from S3 2019-04-23 13:09:27 -04:00