Commit graph

169 commits

Author SHA1 Message Date
Joey Hess
cd544e548b
filter out control characters in error messages
giveup changed to filter out control characters. (It is too low level to
make it use StringContainingQuotedPath.)

error still does not, but it should only be used for internal errors,
where the message is not attacker-controlled.

Changed a lot of existing error to giveup when it is not strictly an
internal error.

Of course, other exceptions can still be thrown, either by code in
git-annex, or a library, that include some attacker-controlled value.
This does not guard against those.

Sponsored-by: Noam Kremen on Patreon
2023-04-10 13:50:51 -04:00
Yaroslav Halchenko
84b0a3707a
Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Joey Hess
cfaae7e931
added an optional cost= configuration to all special remotes
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.

In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.

Sponsored-by: Dartmouth College's DANDI project
2023-01-12 13:42:28 -04:00
Joey Hess
2f2701137d
incremental verification for retrieval from all export remotes
Only for export remotes so far, not export/import.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 13:49:33 -04:00
Joey Hess
90950a37e5
support incremental verification when retrieving from export/import remotes
None of the special remotes do it yet, but this lays the groundwork.

Added MustFinishIncompleteVerify so that, when an incremental verify is
started but not complete, it can be forced to finish it. Otherwise, it
would have skipped doing it when verification is disabled, but
verification must always be done when retrievin from export remotes
since files can be modified during retrieval.

Note that retrieveExportWithContentIdentifier doesn't support incremental
verification yet. And I'm not sure if it can -- it doesn't know the Key
before it downloads the content. It seems a new API call would need to
be split out of that, which is provided with the key.

Sponsored-by: Dartmouth College's Datalad project
2022-05-09 12:25:04 -04:00
Joey Hess
8034f2e9bb
factor out IncrementalHasher from IncrementalVerifier 2021-11-09 12:33:22 -04:00
Joey Hess
88b63a43fa
distinguish between incremental verification failing and not being done
Sponsored-by: Dartmouth College's DANDI project
2021-08-18 14:38:02 -04:00
Joey Hess
449851225a
refactor
IncrementalVerifier moved to Utility.Hash, which will let Utility.Url
use it later.

It's perhaps not really specific to hashing, but making a separate
module just for the data type seemed unncessary.

Sponsored-by: Dartmouth College's DANDI project
2021-08-18 13:19:02 -04:00
Joey Hess
8613770b06
incremental verify for webdav special remote
Sponsored-by: Dartmouth College's DANDI project
2021-08-16 17:29:32 -04:00
Joey Hess
b1622eb932
incremental verify for directory special remote
Added fileRetriever', which will let the remaining special remotes
eventually also support incremental verify.

Sponsored-by: Dartmouth College's DANDI project
2021-08-16 16:51:33 -04:00
Joey Hess
c20358b671
incremental verify for byteRetriever special remotes
Several special remotes verify content while it is being retrieved,
avoiding a separate checksum pass. They are: S3, bup, ddar, and
gcrypt (with a local repository).

Not done when using chunking, yet.

Complicated by Retriever needing to change to be polymorphic. Which in turn
meant RankNTypes is needed, and also needed some code changes. The
change in Remote.External does not change behavior at all but avoids
the type checking failing because of a "rigid, skolem type" which
"would escape its scope". So I refactored slightly to make the type
checker's job easier there.

Unfortunately, directory uses fileRetriever (except when chunked),
so it is not amoung the improved ones. Fixing that would need a way for
FileRetriever to return a Verification. But, since the file retrieved
may be encrypted or chunked, it would be extra work to always
incrementally checksum the file while retrieving it. Hm.

Some other special remotes use fileRetriever, and so don't get incremental
verification, but could be converted to byteRetriever later. One is
GitLFS, which uses downloadConduit, which writes to the file, so could
verify as it goes. Other special remotes like web could too, but don't
use Remote.Helper.Special and so will need to be addressed separately.

Sponsored-by: Dartmouth College's DANDI project
2021-08-11 14:20:38 -04:00
Joey Hess
f8836306fa
remove "checking remotename" message
This fixes fsck of a remote that uses chunking displaying
(checking remotename) (checking remotename)" for every chunk.

Also, some remotes displayed the message, and others did not, with no
consistency. It was originally displayed only when accessing remotes
that were expensive or might involve a password prompt, I think, but
nothing in the API said when to do it so it became an inconsistent mess.

Originally I thought fsck should always display it. But it only displays
in fsck --from remote, so the user knows the remote is being accessed,
so there is no reason to tell them it's accessing it over and over.

It was also possible for git-annex move to sometimes display it twice,
due to checking if content is present twice. But, the user of move
specifies --from/--to, so it does not need to display when it's
accessing the remote, as the user expects it to access the remote.

git-annex get might display it, but only if the remote also supports
hasKeyCheap, which is really only local git remotes, which didn't
display it always; and in any case nothing displayed it before hasKeyCheap,
which is checked first, so I don't think this needs to display it ever.

mirror is like move. And that's all the main places it would have been
displayed.

This commit was sponsored by Jochen Bartl on Patreon.
2021-04-27 13:05:27 -04:00
Joey Hess
13c090b37a
use fastDebug everywhere it can be used
None of these are likely to yeild a noticable speedup though.
2021-04-06 15:41:24 -04:00
Joey Hess
aaba83795b
switch from hslogger to purpose-built Utility.Debug
This uses a DebugSelector, rather than debug levels, which will allow
for a later option like --debug-from=Process to only
see debuging about running processes.

The module name that contains the thing being debugged is used as the
DebugSelector (in most cases; does not need to be a hard and fast rule).
Debug calls were changed to add that. hslogger did not display
that first parameter to debugM, but the DebugSelector does get
displayed.

Also fastDebug will allow doing debugging in places that are used in
tight loops, with the DebugSelector coming from the Annex Reader
essentially for free. Not done yet.
2021-04-05 13:40:31 -04:00
Joey Hess
5d75cbcdcf
webdav: deal with buggy webdav servers in renameExport
box.com already had a special case, since its renaming was known buggy.
In its case, renaming to the temp file succeeds, but then renaming the temp
file to final destination fails.

Then this 4shared server has buggy handling of renames across directories.
While already worked around with for the temp files when storing exports
now being in the same directory as the final filename, that also affected
renameExport when the file moves between directories.

I'm not entirely clear what happens on the 4shared server when it fails
this way. It kind of looks like it may rename the file to destination and
then still fail.

To handle both, when rename fails, delete both the source and the
destination, and fall back to uploading the content again. In the box.com
case, the temp file is the source, and deleting it makes sure the temp file
gets cleaned up. In the 4shared case, the file may have been renamed to the
destination and so cleaning that up avoids any interference with the
re-upload to the destination.
2021-03-22 13:08:18 -04:00
Joey Hess
0e44c252c8
avoid getting creds from environment during autoenable
When autoenabling special remotes of type S3, weddav, or glacier, do not
take login credentials from environment variables, as the user may not be
expecting the autoenable to happen, and may have those set for other
purposes.
2021-03-17 09:41:12 -04:00
Joey Hess
3e41a8f032
move
move use of </> into DavLocation so it always uses unix filepaths due to
imports
2021-03-12 15:19:11 -04:00
Joey Hess
4f49c29d20
webdav: store temp file in same collection as the final export location
This may work better in some webdav server that gets confused at
cross-collection renamed. I don't know, let's find out.

The only real downside of doing this is that the temp files are not all
in the top-level collection, in case an interrupted run leaves one
behind. But that does not seem especially significant.
2021-03-12 14:52:24 -04:00
Joey Hess
36133f27c0
move untrust forcing from Logs.Trust into Remote
No behavior changes here, but this is groundwork for letting remotes
such as borg vary untrust forcing depending on configuration.
2020-12-28 15:22:10 -04:00
Joey Hess
46059ab0e5
split off versionedExport from appendonly
S3 uses versionedExport, while GitLFS uses appendonly.

This is groundwork for later changes.
2020-12-28 14:37:15 -04:00
Joey Hess
9a2c8757f3
add thirdPartyPopulated interface
This is to support, eg a borg repo as a special remote, which is
populated not by running git-annex commands, but by using borg. Then
git-annex sync lists the content of the remote, learns which files are
annex objects, and treats those as present in the remote.

So, most of the import machinery is reused, to a new purpose. While
normally importtree maintains a remote tracking branch, this does not,
because the files stored in the remote are annex object files, not
user-visible filenames. But, internally, a git tree is still generated,
of the files on the remote that are annex objects. This tree is used
by retrieveExportWithContentIdentifier, etc. As with other import/export
remotes, that  the tree is recorded in the export log, and gets grafted
into the git-annex branch.

importKey changed to be able to return Nothing, to indicate when an
ImportLocation is not an annex object and so should be skipped from
being included in the tree.

It did not seem to make sense to have git-annex import do this, since
from the user's perspective, it's not like other imports. So only
git-annex sync does it.

Note that, git-annex sync does not yet download objects from such
remotes that are preferred content. importKeys is run with
content downloading disabled, to avoid getting the content of all
objects. Perhaps what's needed is for seekSyncContent to be run with these
remotes, but I don't know if it will just work (in particular, it needs
to avoid trying to transfer objects to them), so I skipped that for now.

(Untested and unused as of yet.)

This commit was sponsored by Jochen Bartl on Patreon.
2020-12-18 15:23:58 -04:00
Joey Hess
63839532c9
remove uses of warningIO
It's not concurrent-output safe, and doesn't support
--json-error-messages.

Using Annex.makeRunner is a bit scary, because what if it's run in a
different thread from an active annex action? Normally the same Annex
state is not used concurrently in several threads, and it's not designed
to be fully concurrency safe. (Annex.Concurrent exists to deal with
that.) I think it will be ok in these simple cases though. Eg,
when buffering a warning message to json, Annex.changeState is used,
and it modifies the MVar in a concurrency safe way.

The only warningIO remaining is not a problem.
2020-12-02 14:57:43 -04:00
Joey Hess
e63dcbf36c
fix embedcreds=yes reversion
Fix bug that made enableremote of S3 and webdav remotes, that have
embedcreds=yes, fail to set up the embedded creds, so accessing the remotes
failed.

(Regression introduced in version 7.20200202.7 in when reworking all the
remote configs to be parsed.)

Root problem is that parseEncryptionConfig excludes all other config keys
except encryption ones, so it is then unable to find the
credPairRemoteField. And since that field is not required to be
present, it proceeds as if it's not, rather than failing in any visible
way.

This causes it to not find any creds, and so it does not cache
them. When when the S3 remote tries to make a S3 connection, it finds no
creds, so assumes it's being used in no-creds mode, and tries to find a
public url. With no public url available, it fails, but the failure doesn't
say a lack of creds is the problem.

Fix is to provide setRemoteCredPair with a ParsedRemoteConfig, so the full
set of configs of the remote can be parsed. A bit annoying to need to
parse the remote config before the full config (as returned by
setRemoteCredPair) is available, but this avoids the problem.

I assume webdav also had the problem by inspection, but didn't try to
reproduce it with it.

Also, getRemoteCredPair used getRemoteConfigValue to get a ProposedAccepted
String, but that does not seem right. Now that it runs that code, it
crashed saying it had just a String.

Remotes that have already been enableremoted, and so lack the cached creds
file will work after this fix, because getRemoteCredPair will extract
the creds from the remote config, writing the missing file.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-21 14:35:30 -04:00
Joey Hess
6361074174
convert renameExport to throw exception
Finishes the transition to make remote methods throw exceptions, rather
than silently hide them.

A bit on the fence about this one, because when renameExport fails,
it falls back to deleting instead, and so does the user care why it failed?

However, it did let me clean up several places in the code.

This commit was sponsored by Ethan Aubin.
2020-05-15 15:08:09 -04:00
Joey Hess
037440ef36
convert removeExportDirectory to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 14:43:18 -04:00
Joey Hess
cdbfaae706
change removeExport to throw exception
Part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

This commit was sponsored by Graham Spencer on Patreon.
2020-05-15 14:15:14 -04:00
Joey Hess
3334d3831b
change retrieveExport and getKey to throw exception
retrieveExport is part of ongoing transition to make remote methods
throw exceptions, rather than silently hide them.

getKey very rarely fails, and when it does it's always for the same reason
(user configured annex.backend to url for some reason). So, this will
avoid dealing with Nothing everywhere it's used.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-05-15 13:45:53 -04:00
Joey Hess
4814b444dd
make storeExport throw exceptions 2020-05-15 12:20:02 -04:00
Joey Hess
4be94c67c7
make removeKey throw exceptions 2020-05-14 14:11:05 -04:00
Joey Hess
d9c7f81ba4
make retrieveKeyFile and retrieveKeyFileCheap throw exceptions
Converted retrieveKeyFileCheap to a Maybe, to avoid needing to throw a
exception when a remote doesn't support it.
2020-05-13 17:07:07 -04:00
Joey Hess
c1cd402081
make storeKey throw exceptions
When storing content on remote fails, always display a reason why.

Since the Storer used by special remotes already did, this mostly affects
git remotes, but not entirely. For example, if git-lfs failed to connect to
the endpoint, it used to silently return False.
2020-05-13 14:03:00 -04:00
Joey Hess
b50ee9cd0c
remove Preparer abstraction
That had almost no benefit at all, and complicated things quite a lot.

What I proably wanted this to be was something like ResourceT, but it
was not. The few remotes that actually need some preparation done only
once and reused used a MVar and not Preparer.
2020-05-13 11:56:21 -04:00
Joey Hess
4b92bbe8d7
webdav: Made exporttree remotes faster by caching connection to the server
Followed example of Remote.S3.
2020-03-20 12:48:43 -04:00
Joey Hess
8af6d2c3c5
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.

Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)

To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.

And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.

(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 18:05:36 -04:00
Joey Hess
7038acf96c
add descriptions for all remote config fields
not yet used
2020-01-20 15:20:04 -04:00
Joey Hess
99cb3e75f1
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.

The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.

configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.

Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.

Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.

initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 16:07:17 -04:00
Joey Hess
7f2bfd41d7
include credPairRemoteFields in RemoteConfigParsers
Avoids parse error when the fields are added to RemoteConfig at setup
time and it then gets parsed, also at setup time. After setup time, such
internally added fields are not a problem, because they're Accepted. So
it may not be necessary in all cases to list such internally added
fields, but I think it's a good idea to always do so.
2020-01-15 10:57:45 -04:00
Joey Hess
c4ea3ca40a
ported almost all remotes, until my brain melted
external is not started yet, and S3 is part way through and not
compiling yet
2020-01-14 15:41:34 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
650a631ef8
include all remotes back in 2019-12-02 12:26:33 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
59908586f4
rename RemoteConfigKey to RemoteConfigField
And some associated renames.
I was going to have some values named fooKeyKey otherwise..
2019-10-10 15:44:05 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
2912429640
better indicate when special remotes do not support renameExport
Avoid a warning message when renameExport is not supported, and just
fallback to deleting with a subsequent re-upload. Especially needed for
importtree remotes, where renameExport needs to be disabled.

This changes the external special remote protocol, but in a
backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older
version of git-annex will cause it to make renameExport return False.
2019-03-11 12:53:24 -04:00
Joey Hess
ccc0684d21
no remotes support import yet 2019-02-20 16:59:04 -04:00
Joey Hess
60c1b5c994
deal with attempt to export filename with # or ? to webdav
xporting files with '#' or '?' in their name won't work because urls get
truncated on those. Fail in a better way in this case, and avoid failing
when removing such files from the export, so after the user has renamed the
problem files the export will succeed.
2019-02-07 13:47:57 -04:00
Joey Hess
9cebfd7002
purify exportActions
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.

Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.

As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.

This commit was sponsored by Nick Piper on Patreon.
2019-01-30 15:11:40 -04:00
Joey Hess
5d394023eb
remove incorrect comment
resourcePrepare does not cause the resource to only be prepared once.

The http manager should be reused, which does avoid http connection
overhead, but not because of the use of resourcePrepare.
2019-01-30 14:38:35 -04:00
Joey Hess
3f587d447a
fix webdav reversion
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)

This commit was sponsored by mo on patreon.
2018-12-10 12:49:51 -04:00