Commit graph

1371 commits

Author SHA1 Message Date
Joey Hess
5f5170b22b
remove SafeFilePath
Move sanitizeFilePath call to where fromSafeFilePath had been.
2020-05-11 14:04:56 -04:00
Joey Hess
69e2e4763e
only check --force at init time, not enable time
git-lfs repos that encrypt the annexed content but not the git repo only
need --force passed to initremote, allow enableremote and autoenable of
such remotes without forcing again.

Needing --force again particularly made autoenable of such a repo not work.
And once such a repo has been set up, it seems a second --force when
enabling it elsewhere has little added value. It does tell the user about
the possibly insecure configuration, but if the git repo has already been
pushed to that remote in the clear, data has already been exposed. The goal
of that --force was not to prevent every situation where such an exposure
can happen -- anyone who sets up a public git repo and pushes to it will
expose things similarly and git-annex is not involved. Instead, the purpose
of the --force is to point out to the user that they're asking for a
configuration where encryption is inconsistently applied.
2020-05-07 15:59:29 -04:00
Joey Hess
1532d67c3e
S3: Support signature=v4
To use S3 Signature Version 4. Some S3 services seem to require v4, while
others may only support v2, which remains the default.

I'm also not sure if v4 works correctly in all cases, there is this
upstream bug report: https://github.com/aristidb/aws/issues/262
I've only tested it against the default S3 endpoint.
2020-05-07 13:18:11 -04:00
Joey Hess
f9ed30de3b
avoid beware of the leopard situation
* Display a warning message when a remote uses a protocol, such as
  git://, that git-annex does not support. Silently skipping such a
  remote was confusing behavior.

  It sets annex-ignore, so the warning is only displayed once.

* Also display a warning message when a remote, without a known uuid,
  is located in a directory that does not currently exist, to avoid
  silently skipping such a remote.

  This is a bit more debatable, since git-annex get will say,
  try making repository available. And since it does not set annex-ignore,
  the warning will be displayed repeatedly. It's also an extreme edge case,
  I don't think I've ever seen it happen in real life.
2020-05-04 13:01:11 -04:00
Joey Hess
2aeb79249b
external: stop storing readonly=true in remote.log
readonly=true is used to make an external special remote that does not
need the external program to be installed. It was stored in the
remote.log by default, and so every time it was specified in an
enableremote or initremote, whatever value was used became the new
default for subsequent enableremotes of that remote.

That was surprising, and I consider it to be a bug.

It does not make much sense to pass it to initremote because then how
would you populate that remote with anything? You would have to
enableremote elsewhere, and store content there. I'm assuming nobody
used it that way.

Someone might rely on passing it to enableremote once, and then that
being inherited in other clones. But that is not how it's documented to
be used. It is barely documented in git-annex at all, only in the
external special remote protocol, and the documentation there says to
"Document that this external special remote can be used in readonly
mode." (by the user of it passing readonly=true to enableremote). The
one external special remote that I know of that does document that is
<https://github.com/bgilbert/gcsannex> (the one that motivated adding
it). That one's docs do say to pass it to enableremote.

So, it seemed safe to make this behavior change. If someone was in fact
relying on one of those behaviors, all their current repos will still
work as they configured them (although they will need to deal
with the related change in 9f3c2dfeda).
In new clones, they will find enableremote fails, complaining the
external program is not in path. An easy enough problem to recover from.
2020-04-23 15:21:26 -04:00
Joey Hess
9f3c2dfeda
stop using remote.name.annex-readonly for two distinct things 2020-04-23 14:56:03 -04:00
Joey Hess
cd1676d604
fix bug involving local git remote and out of date location log
get --from, move --from: When used with a local git remote, these used to
silently skip files that the location log thought were present on the
remote, when the remote actually no longer contained them. Since that
behavior could be surprising, now instead display a warning.

I got very confused when I encountered this behavior, since it was silently
skipping a file I needed that whereis said was on the remote.

get without --from already displayed a "unable to access these remotes"
message, which while a bit misleading in that the remote is likely
accessible, but just doesn't contain the file, at least indicated something
went wrong.

Having get --from display a warning makes it in line with get
w/o --from, so seems certianly ok. It might be there are situations where
move --from is used, on eg a whole directory, and the user only wants to
move whatever is present in the remote, and is perfectly ok with files
that are not present being skipped. So I'm less sure about the new warning
being ok there. OTOH, only local git remotes avoiding displaying a warning
in that case too, so this just brings them into line with other remotes.

(Also note that this makes it a little bit faster when dealing with a lot of
files, since it avoids a redundant stat of the file.)
2020-04-21 12:36:58 -04:00
Joey Hess
529f488ec4
fix a thundering herd problem
Avoid repeatedly opening keys db when accessing a local git remote and -J
is used.

What was happening was that Remote.Git.onLocal created a new annex state
as each thread started up. The way the MVar was used did not prevent that.
And that, in turn, led to repeated opening of the keys db, as well as
probably other extra work or resource use.

Also managed to get rid of Annex.remoteannexstate, and it turned out there
was an unncessary Maybe in the keysdbhandle, since the handle starts out
closed.
2020-04-17 17:09:29 -04:00
Joey Hess
f85ca7dc80
fix all remaining -Wincomplete-uni-patterns warnings
A couple of these were probably actual bugs in edge cases. Most of the
changes I'm fine with. The fact that aeson's object returns sometihng
that we know will be an Object, but the type checker does not know is
kind of annoying.
2020-04-15 13:55:08 -04:00
Joey Hess
ca9c6c5f60
Fix a potential failure to parse git config
Git has an obnoxious special case in git config, a line "foo" is the same
as "foo = true". That means there is no way to examine the output of
git config and tell if it was run with --null or not, since a "foo"
in the first line could be such a boolean, or could be followed by its
value on the next line if --null were used.

So, rather than trying to do such a detection, track the style of config
at all the points where it's generated.
2020-04-13 13:05:41 -04:00
Joey Hess
7ebc118776
adb: Better messages when the adb command is not installed
After a user completely ignored the display of the exception probably
because it didn't make sense..

This does make it a little bit slower since it checks adb is in path each
time before running it. Also, it might display a lot of warnings about it
not being installed.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2020-04-02 10:48:28 -04:00
Joey Hess
4b92bbe8d7
webdav: Made exporttree remotes faster by caching connection to the server
Followed example of Remote.S3.
2020-03-20 12:48:43 -04:00
Joey Hess
a9d56a1abd
fix builds build 2020-03-10 13:50:46 -04:00
Joey Hess
6a91471923
GETCONFIG name fix
Fix regression that prevented external special remotes from using GETCONFIG
to query values like "name". (Introduced in version 7.20200202.7.)
2020-03-09 12:38:04 -04:00
Joey Hess
7f992ef59c
mostly finished with createDirectoryUnder conversion
Remaining things needing converted are in the assistant, and Annex.Ssh.

Every other remaining call to createDirectoryIfMissing True has been
audited and is not relevant. The ones in Build/ of course don't get
included in the program. Others included eg, Remote.Tahoe and
Config.Files which both write to dotfiles under the home directory.
2020-03-06 11:57:15 -04:00
Joey Hess
6d58ca94d6
some easy createDirectoryUnder conversions 2020-03-05 15:20:10 -04:00
Joey Hess
ccd8c43dc8
git-annex config: guard against non-repo-global configs
git-annex config: Only allow configs be set that are ones git-annex
actually supports reading from repo-global config, to avoid confused users
trying to set other configs with this.
2020-03-02 15:54:18 -04:00
Joey Hess
2366e7fb84
catch whereisKey exception and provide error messages when external programs neglect to
* whereis: If a remote fails to report on urls where a key
  is located, display a warning, rather than giving up and not displaying
  any information.
* When external special remotes fail but neglect to provide an error
  message, say what request failed, which is better than displaying an
  empty error message to the user.
2020-02-27 14:09:18 -04:00
Joey Hess
81e3faf810
Merge branch 'v7' 2020-02-26 18:15:18 -04:00
Joey Hess
e535da621c
Bugfix to getting content from an export remote with -J, when the export database was not yet populated.
(cherry picked from commit e520341500)
2020-02-26 18:07:20 -04:00
Joey Hess
8af6d2c3c5
fix encryption of content to gcrypt and git-lfs
Fix serious regression in gcrypt and encrypted git-lfs remotes.
Since version 7.20200202.7, git-annex incorrectly stored content
on those remotes without encrypting it.

Problem was, Remote.Git enumerates all git remotes, including git-lfs
and gcrypt. It then dispatches to those. So, Remote.List used the
RemoteConfigParser from Remote.Git, instead of from git-lfs or gcrypt,
and that parser does not know about encryption fields, so did not
include them in the ParsedRemoteConfig. (Also didn't include other
fields specific to those remotes, perhaps chunking etc also didn't
get through.)

To fix, had to move RemoteConfig parsing down into the generate methods
of each remote, rather than doing it in Remote.List.

And a consequence of that was that ParsedRemoteConfig had to change to
include the RemoteConfig that got parsed, so that testremote can
generate a new remote based on an existing remote.

(I would have rather fixed this just inside Remote.Git, but that was not
practical, at least not w/o re-doing work that Remote.List already did.
Big ugly mostly mechanical patch seemed preferable to making git-annex
slower.)
2020-02-26 18:05:36 -04:00
Joey Hess
9050788b66
info: Fix display of the encryption value. (Some debugging junk had crept in.) 2020-02-26 15:02:23 -04:00
Joey Hess
e520341500
Bugfix to getting content from an export remote with -J, when the export database was not yet populated. 2020-02-26 14:57:29 -04:00
Joey Hess
67476fbc54
minor code simplification 2020-02-25 13:06:09 -04:00
Joey Hess
79a0435b77
automate remote.name.skipFetchAll
initremote, enableremote: Set remote.name.skipFetchAll when the remote
cannot be fetched from by git, so git fetch --all will not try to use it.
2020-02-19 13:58:26 -04:00
Joey Hess
69f2d1dd43
remoteConfig rework
remoteAnnexConfig will avoid bugs like
a3a674d15b

Use now more generic remoteConfig in a couple places that built
non-annex config settings manually before.
2020-02-19 13:45:11 -04:00
Joey Hess
399319ccbc
Avoid throwing fatal errors when asked to write to a readonly git remote on http
Test suite found one of them, looking for giveup turned up several more.
2020-02-14 14:38:13 -04:00
Joey Hess
1883f7ef8f
support git remotes that need http basic auth
using git credential to get the password

One thing this doesn't do is wrap the password prompting inside the prompt
action. So with -J, the output can be a bit garbled.
2020-01-22 16:16:19 -04:00
Joey Hess
75059c9f3b
better error message when git config fails to parse remote config
Rather than leaking the name of the temp file, just say the config parse
failed, and where the config was downloaded from.

Not closing the bug report because two issues were reported in the same
bug report, because the universe wants me to continually re-read old
unclosed bug reports to waste my time determining what still needs to be
done.
2020-01-22 13:35:54 -04:00
Joey Hess
d227093002
avoid ugly error message
Http remotes that do expose a git config file, but are not initialized
resulted in an ugly and unncessary error message, now sqelched.

When git-annex-shell configlist is run w/o the autoinit field, it may
not generate a uuid for the repository. So in that case, it's not
unexpected for the config it does list to not include a UUID, and
dumping out the config in a warning message is not needed.

If configlist is asked to autoinit and we don't get back a config with a
UUID in it, that suggests some problem, and what we got back may not be
a config at all but some diagnostic message, so it does make sense to
output it then.
2020-01-22 11:57:20 -04:00
Joey Hess
830c30001b
fix --describe-other-params of external when encryption is not specified
Encryption not being specified makes lenientRemoteConfigParser fail
to parse, and so it was not able to start the external up to get
LISTCONFIGS.
2020-01-20 16:56:34 -04:00
Joey Hess
2be4122bfc
include passthrough params in --describe-other-params 2020-01-20 16:53:27 -04:00
Joey Hess
7038acf96c
add descriptions for all remote config fields
not yet used
2020-01-20 15:20:04 -04:00
Joey Hess
201049cf93
gcrypt inherits shellescape setting from rsync, allow it 2020-01-20 15:13:49 -04:00
Joey Hess
923230ea30
convert RemoteConfigFieldParser to data type 2020-01-20 13:49:30 -04:00
Joey Hess
8406ff8861
speed hack
Avoids the external program being started just to use LISTCONFIGS on an
already accepted config.

So initremote/enableremote will still run the external program an extra
time to use LISTCONFIGS, but everything that uses the special remote after
it's initialized will not any longer.
2020-01-17 17:26:36 -04:00
Joey Hess
5c58f86790
always add the specialRemoteConfigParsers
Was not being added in some places, resulting in error messages about
encryption not being a valid field.
2020-01-17 17:13:44 -04:00
Joey Hess
99cb3e75f1
add LISTCONFIGS to external special remote protocol
Special remote programs that use GETCONFIG/SETCONFIG are recommended
to implement it.

The description is not yet used, but will be useful later when adding a way
to make initremote list all accepted configs.

configParser now takes a RemoteConfig parameter. Normally, that's not
needed, because configParser returns a parter, it does not parse it
itself. But, it's needed to look at externaltype and work out what
external remote program to run for LISTCONFIGS.

Note that, while externalUUID is changed to a Maybe UUID, checkExportSupported
used to use NoUUID. The code that now checks for Nothing used to behave
in some undefined way if the external program made requests that
triggered it.

Also, note that in externalSetup, once it generates external,
it parses the RemoteConfig strictly. That generates a
ParsedRemoteConfig, which is thrown away. The reason it's ok to throw
that away, is that, if the strict parse succeeded, the result must be
the same as the earlier, lenient parse.

initremote of an external special remote now runs the program three
times. First for LISTCONFIGS, then EXPORTSUPPORTED, and again
LISTCONFIGS+INITREMOTE. It would not be hard to eliminate at least
one of those, and it should be possible to only run the program once.
2020-01-17 16:07:17 -04:00
Joey Hess
1ce722d86f
avoid relying on crazy monoid instance
This code worked as intended, but only by accident, because of this
instance:

instance Monoid b => Monoid (x -> b) where mempty = const (mempty :: b)

Let's be explicit that we throw away the error message.
2020-01-17 13:49:12 -04:00
Joey Hess
e78bf29725
avoid getting config parser when there is no config to parse
The benefit here is that external special remotes will need a
LISTCONFIGS request and response to generate their config parser,
and this avoids it being done for all the ones that don't have any
configs.

Note that, a config parser could in theory fail to parse if there are no
configs (none currently do), but a parse failure is already thrown away
when generating the remote list because it's too late. Such problems
have to be caught at initremote/enableremote time, not here.
2020-01-17 13:32:48 -04:00
Joey Hess
465ec9dcd7
ported Remote.External
Not yet added anything to the protocol to get a list of remote config
fields; any fields will be accepted and are available for the external
remote to use as before.

There is one minor behavior change.. Before, GETCONFIG could be passed a
field such as type, externaltype, encryption, etc, and would get the
value of that. Now, GETCONFIG only works on fields that don't have a
defined meaning to git-annex, so are passed through to the external
remote. This seems unlikely to affect any external special remotes in
practice.
2020-01-15 13:01:22 -04:00
Joey Hess
6a982e38eb
a few more field functions 2020-01-15 12:57:56 -04:00
Joey Hess
907ca937ab
use more field functions
Using field functions consistently avoids possibility of typos and also
helps ensure that all fields are added to RemoteConfigParsers (as long
as I have remembered to add them when writing the functions).
2020-01-15 11:15:07 -04:00
Joey Hess
7f2bfd41d7
include credPairRemoteFields in RemoteConfigParsers
Avoids parse error when the fields are added to RemoteConfig at setup
time and it then gets parsed, also at setup time. After setup time, such
internally added fields are not a problem, because they're Accepted. So
it may not be necessary in all cases to list such internally added
fields, but I think it's a good idea to always do so.
2020-01-15 10:57:45 -04:00
Joey Hess
0706d9d093
finish porting S3 2020-01-15 10:52:28 -04:00
Joey Hess
c4ea3ca40a
ported almost all remotes, until my brain melted
external is not started yet, and S3 is part way through and not
compiling yet
2020-01-14 15:41:34 -04:00
Joey Hess
c498269a88
convert configParser to Annex action and add passthrough option
Needed so Remote.External can query the external program for its
configs. When the external program does not support the query,
the passthrough option will make all input fields be available.
2020-01-14 13:52:03 -04:00
Joey Hess
8f142a9279
fix wrong type
Use of Typeable means the type checker can't catch this kind of mistake,
the error is deferred to runtime.

testremote now passes on a directory special remote
2020-01-14 13:05:38 -04:00
Joey Hess
963239da5c
separate RemoteConfig parsing basically working
Many special remotes are not updated yet and are commented out.
2020-01-14 12:35:08 -04:00
Joey Hess
71f78fe45d
wip separate RemoteConfig parsing
Remote now contains a ParsedRemoteConfig. The parsing happens when the
Remote is constructed, rather than when individual configs are used.

This is more efficient, and it lets initremote/enableremote
reject configs that have unknown fields or unparsable values.

It also allows for improved type safety, as shown in
Remote.Helper.Encryptable where things that used to match on string
configs now match on data types.

This is a work in progress, it does not build yet.

The main risk in this conversion is forgetting to add a field to
RemoteConfigParser. That will prevent using that field with
initremote/enableremote, and will prevent remotes that already are set
up from seeing that configuration. So will need to check carefully that
every field that getRemoteConfigValue is called on has been added to
RemoteConfigParser.

(One such case I need to remember is that credPairRemoteField needs to be
included in the RemoteConfigParser.)
2020-01-13 12:39:21 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
2000e9a4b8
avoid build warning on windows 2020-01-01 14:40:35 -04:00
Joey Hess
fb04cfd0e6
fix windows build 2020-01-01 14:27:03 -04:00
Joey Hess
37467a008f
annex.addunlocked expressions
* annex.addunlocked can be set to an expression with the same format used by
  annex.largefiles, in case you want to default to unlocking some files but
  not others.
* annex.addunlocked can be configured by git-annex config.

Added a git-annex-matching-expression man page, broken out from
tips/largefiles.

A tricky consequence of this is that git-annex add --relaxed
honors annex.addunlocked, but an expression might want to know the size
or content of an url, which it's not going to download. I decided it was
better not to fail, and just dummy up some plausible data in that case.

Performance impact should be negligible. The global config is already
loaded for annex.largefiles. The expression only has to be parsed once,
and in the simple true/false case, it should not do any additional work
matching it.
2019-12-20 15:56:25 -04:00
Joey Hess
4acbb40112
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.

Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.

Performance notes:

git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.

git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 13:01:41 -04:00
Joey Hess
686791c4ed
more RawFilePath
Remove dup definitions and just use the RawFilePath one. </> etc are
enough faster that it's probably faster than building a String directly,
although I have not benchmarked.
2019-12-18 17:10:28 -04:00
Joey Hess
c19211774f
use filepath-bytestring for annex object manipulations
git-annex find is now RawFilePath end to end, no string conversions.
So is git-annex get when it does not need to get anything.
So this is a major milestone on optimisation.

Benchmarks indicate around 30% speedup in both commands.

Probably many other performance improvements. All or nearly all places
where a file is statted use RawFilePath now.
2019-12-11 15:25:07 -04:00
Joey Hess
bdec7fed9c
convert TopFilePath to use RawFilePath
Adds a dependency on filepath-bytestring, an as yet unreleased fork of
filepath that operates on RawFilePath.

Git.Repo also changed to use RawFilePath for the path to the repo.

This does eliminate some RawFilePath -> FilePath -> RawFilePath
conversions. And filepath-bytestring's </> is probably faster.
But I don't expect a major performance improvement from this.
This is mostly groundwork for making Annex.Location use RawFilePath,
which will allow for a conversion-free pipleline.
2019-12-09 15:07:21 -04:00
Joey Hess
c20f4704a7
all commands building except for assistant
also, changed ConfigValue to a newtype, and moved it into Git.Config.
2019-12-05 14:41:18 -04:00
Joey Hess
650a631ef8
include all remotes back in 2019-12-02 12:26:33 -04:00
Joey Hess
f3047d7186
include git-annex-shell back in
Also pushed ConfigKey down into the Git modules, which is the bulk of
the changes.
2019-12-02 11:51:52 -04:00
Joey Hess
d7833def66
use ByteString for git config
The parser and looking up config keys in the map should both be faster
due to using ByteString.

I had hoped this would speed up startup time, but any improvement to
that was too small to measure. Seems worth keeping though.

Note that the parser breaks up the ByteString, but a config map ends up
pointing to the config as read, which is retained in memory until every
value from it is no longer used. This can change memory usage
patterns marginally, but won't affect git-annex.
2019-11-27 17:40:09 -04:00
Joey Hess
067aabdd48
wip RawFilePath 2x git-annex find speedup
Finally builds (oh the agoncy of making it build), but still very
unmergable, only Command.Find is included and lots of stuff is badly
hacked to make it compile.

Benchmarking vs master, this git-annex find is significantly faster!
Specifically:

	num files	old	new	speedup
	48500		4.77	3.73	28%
	12500		1.36	1.02	66%
	20		0.075	0.074	0% (so startup time is unchanged)

That's without really finishing the optimization. Things still to do:

* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
  decodeBS conversions.
* Use versions of IO actions like getFileStatus that take a RawFilePath.
* Eliminate some Data.ByteString.Lazy.toStrict, which is a slow copy.
* Use ByteString for parsing git config to speed up startup.

It's likely several of those will speed up git-annex find further.
And other commands will certianly benefit even more.
2019-11-26 16:01:58 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
b207d944f3
sync, assistant: Pull and push from git-lfs remotes.
Oversight, forgot to add it to gitSyncableRemote
2019-11-18 16:13:21 -04:00
Joey Hess
5877de5e80
git-lfs: remember urls, and autoenable remotes using known urls
* git-lfs: The url provided to initremote/enableremote will now be
  stored in the git-annex branch, allowing enableremote to be used without
  an url. initremote --sameas can be used to add additional urls.
* git-lfs: When there's a git remote with an url that's known to be
  used for git-lfs, automatically enable the special remote.
2019-11-18 16:09:09 -04:00
Joey Hess
cee14f147a
stop displaying rsync progress, and use git-annex's own progress display for local-to-local repo transfers
Reasons to do this include:

1. I've gotten pretty used to git-annex's own progress display, which is
   used for all transfers over ssh (except to old git-annex-shell),
   and for most special remote transfers. It's getting to seem weird to see
   the rsync progress display instead.
2. When -J was used, the rsync output could not be shown, and so there was
   no progress display. Now there will be.

Progress will also be displayed now when cp CoW is used. But I'd expect a CoW
copy to typically run so fast that the progress display will barely be
noticable.

This commit was sponsored by Peter on Patreon.
2019-11-15 13:21:06 -04:00
Joey Hess
890330f0fe
make --json-error-messages capture url download errors
Convert Utility.Url to return Either String so the error message can be
displated in the annex monad and so captured.

(When curl is used, its errors are still not caught.)
2019-11-12 13:52:38 -04:00
Joey Hess
9e8d40181f
remove some unncessary uses of warningIO
warningIO is not concurrent output safe, and it doesn't go to
--json-error-messages

There are a few more that would be too hard to remove, and there are also
several dozen direct prints to stderr still.
2019-11-12 10:07:27 -04:00
Joey Hess
9828f45d85
add RemoteStateHandle
This solves the problem of sameas remotes trampling over per-remote
state. Used for:

* per-remote state, of course
* per-remote metadata, also of course
* per-remote content identifiers, because two remote implementations
  could in theory generate the same content identifier for two different
  peices of content

While chunk logs are per-remote data, they don't use this, because the
number and size of chunks stored is a common property across sameas
remotes.

External special remote had a complication, where it was theoretically
possible for a remote to send SETSTATE or GETSTATE during INITREMOTE or
EXPORTSUPPORTED. Since the uuid of the remote is typically generate in
Remote.setup, it would only be possible to pass a Maybe
RemoteStateHandle into it, and it would otherwise have to construct its
own. Rather than go that route, I decided to send an ERROR in this case.
It seems unlikely that any existing external special remote will be
affected. They would have to make up a git-annex key, and set state for
some reason during INITREMOTE. I can imagine such a hack, but it doesn't
seem worth complicating the code in such an ugly way to support it.

Unfortunately, both TestRemote and Annex.Import needed the Remote
to have a new field added that holds its RemoteStateHandle.
2019-10-14 13:51:42 -04:00
Joey Hess
35d7ffe128
initremote --sameas fully working
And using sameas remotes is working.

Moved annex-config-uuid setting out of Remote.Helper.Special.
EnableRemote will also have to set it.
2019-10-11 14:19:10 -04:00
Joey Hess
2bd6e81bb0
support annex-config-uuid when generating remote
This is used by a special remote with sameas-uuid=
The remote's uuid is the sameas-uuid, but it needs to get
its RemoteConfig from the annex-config-uuid.
2019-10-11 12:34:11 -04:00
Joey Hess
df5b0ffab3
inherit other fields
I think this is all that need to be inherited.
2019-10-10 16:11:21 -04:00
Joey Hess
c3975ff3b4
sameas RemoteConfig inheritance
I found a way to avoid inheritance complicating anything outside of
Logs.Remote. It seems fine to require all inherited values to be
inherited and not set in the sameas remote's config. Since inherited
values will be used for stuff like encryption and perhaps chunking, which
control the actual content stored on the remote, it seems likely that
there will not be any reason to need them to vary between two remotes
that access the same underlying data store.

The newer version of containers is free; the minimum ghc version is
bundled with a newer version than that.
2019-10-10 15:58:22 -04:00
Joey Hess
59908586f4
rename RemoteConfigKey to RemoteConfigField
And some associated renames.
I was going to have some values named fooKeyKey otherwise..
2019-10-10 15:44:05 -04:00
Joey Hess
d1130ea04a
get rid of hardcoded "name" lookups
Support "sameas-name" being set instead.

In RenameRemote, rename which ever of the two is set.
2019-10-10 13:25:10 -04:00
Joey Hess
92ff30df70
set annex-config-uuid when RemoteConfig contains a sameas-uuid
Initremote sets that, so after both initremote and enableremote,
the git config will be set.

Any remote that does not use Annex.SpecialRemote won't set
annex-config-uuid. But that's only Remote.Git, which doesn't use
RemoteConfig anyway.
2019-10-10 12:58:59 -04:00
Joey Hess
46071a2435
use storeUUIDIn 2019-10-10 12:38:17 -04:00
Joey Hess
06c04ffe29
use storeUUIDIn 2019-10-10 12:12:09 -04:00
Joey Hess
a6c3d1cb6d
avoid unneccesary extra blank line before git-credentials prompt 2019-09-24 18:06:10 -04:00
Joey Hess
bc1b9a2c0a
improved GitLFS api 2019-09-24 18:05:11 -04:00
Joey Hess
6ae0a44c64
git-lfs: Added support for http basic auth 2019-09-24 14:46:20 -04:00
Joey Hess
de564df8b3
git-lfs: Only do endpoint discovery once when concurrency is enabled
This avoids some extra work, but I don't think it was possible for two ssh
endpoint discoveries run concurrently to both prompt for the ssh password;
Annex.Ssh itself deals with concurrency.

This is mostly groundwork for http password prompting.
2019-09-24 13:01:51 -04:00
Joey Hess
53fd746705
avoid some build warnings on windows 2019-09-12 14:11:19 -04:00
Joey Hess
3f0eef4baa
v7 for all repositories
* Default to v7 for new repositories.
* Automatically upgrade v5 repositories to v7.
2019-08-30 14:09:14 -04:00
Joey Hess
e804f48f82
remove a few more isDirect tests 2019-08-28 11:53:10 -04:00
Joey Hess
16f646c9a6
don't hide message when ensureInitialized fails 2019-08-27 12:38:47 -04:00
Joey Hess
bb18bbd426
consolidate calls to ensureInitialized
tryGitConfigRead may run ensureInitialized first, but when checkuuid = false,
that is skipped. So, make sure it's run before all onLocal actions.

ensureInitialized is inexpensive, so the extra call by tryGitConfigRead
is not a big deal. But since it was easy to do, I made it only be run
once by all calls to onLocal.

A few calls to onLocal didn't call ensureInitialized before. Notably,
the checkPresent action didn't, and does now.

That means that there's a guarantee that any necessary repo upgrades
will be run before the checkPresent action runs in the repo. Which is
important especially for the direct mode conversion, because without
that upgrade, the checkPresent action would need to support direct mode
still. Now I can remove the last bits of direct mode support in
Annex.Content without worrying that it will break accessing remotes
that have not been upgraded.

This does necessarily mean that checkPresent needs to write to the disk
when performing such a repo upgrade. The other remote actions already
did, so retrieval from a readonly remote that needed to be upgraded would
fail. Having checkPresent also fail doesn't seem like a large reversion,
especially since it already failed in the default case when checkuuid = true.
2019-08-27 12:18:01 -04:00
Joey Hess
708fc6567f
S3: Fix encoding when generating public urls of S3 objects.
This code feels worryingly stringily typed, but using URI does not help
because the uriPath still has to be constructed with the right
uri-encoding.
2019-08-15 12:56:46 -04:00
Joey Hess
386c0ce90a
close handle so windows can stat the file
windows cannot stat a file that another process has open, which caused
this to crash with an exception
2019-08-13 13:26:25 -04:00
Joey Hess
cfd0b4108e
avoid windows build warning 2019-08-13 13:10:33 -04:00
Joey Hess
5004381dd9
improve error display when storing to an export/import remote fails
Prompted by the test suite on windows failing to with "export foo failed"
and no information about what went wrong.

Note that only storeExportWithContentIdentifier has been converted.
storeExport still returns a Bool and so exceptions may be hidden.

However, storeExportWithContentIdentifier has many more failure modes,
since it needs to avoid overwriting modified files. So it's more
important it have better error display.
2019-08-13 12:05:00 -04:00
Joey Hess
f27c5db5c5
avoid rsync failing with a permissions error
The test suite was intermittently failing with rsync complaining it
could not write to dest.

get foo (from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
             20 100%    0.00kB/s    0:00:00  ^M             20 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)
(from origin...)
SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77
             20 100%    0.00kB/s    0:00:00  ^M             20 100%    0.00kB/s    0:00:00 (xfr#1, to-chk=0/1)
rsync: open "/home/joey/src/git-annex/.t/tmprepo1103/.git/annex/tmp/SHA256E-s20--e394a389d787383843decc5d3d99b6d184ffa5fddeec23b911f9ee7fc8b9ea77" failed: Permission denied (13)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1207) [sender=3.1.3]

It seems that the first rsync actually transferred the file, but then for some
reason git-annex thinks it failed, so it retries. The second rsync then fails
because the first rsync copied the file mode over and so the file is not
writable now.

So, this fixes that problem, but leaves open the question of why git-annex
would think rsync failed when it wrote the file and didn't output any
error message. Possibly a bug in rsyncProgress that either hides an
error message, or somehow makes rsync unhappy?
2019-08-09 15:26:58 -04:00
Joey Hess
fb7d92457f
support using gcrypt with git-lfs special remote 2019-08-05 13:43:45 -04:00
Joey Hess
8401b09e32
Allow setting up a gcrypt special remote with encryption=shared
It was documented to work, but seems it has been broken for a
while/forever.
2019-08-05 12:41:05 -04:00
Joey Hess
3f450f0f4a
add encryption warning 2019-08-05 11:35:26 -04:00
Joey Hess
ecf7f34c23
remember sha256 and size when necessary
Using Logs.RemoteState for this means that if the same key gets uploaded
twice to a git-lfs remote, but somehow has different content the two
times (eg it's an URL key with non-stable content), the sha256/size of
the newer content uploaded will overwrite what was remembered before. That
seems ok; it just means that git-annex will request the newer version of
the content when downloading from git-lfs.

It will remember the sha256 and size if both are not known, or if only
the sha256 is not known but the size is known, it only remembers the
sha256, to avoid wasting space on the size. I did not add special case
for when the sha256 is known and the size is not, because it's been a
long time since git-annex created SHA256 keys without a size.
(See doc/upgrades/SHA_size.mdwn)
2019-08-05 11:05:59 -04:00
Joey Hess
f5eb28682a
expand 2019-08-04 13:59:24 -04:00
Joey Hess
408cb0af39
remove unused imports 2019-08-04 12:43:53 -04:00
Joey Hess
9aab851a55
fix reversion
lost check of resp_actions in b82ecf7076
2019-08-04 12:43:16 -04:00
Joey Hess
7269851550
download from LFS working
including resuming
2019-08-04 12:32:36 -04:00
Joey Hess
b82ecf7076
verify that LFS server responds with requested object
The protocol design allows the server to respond with some other object;
if a server for some reason a server did that, it would not be right for
git-annex to download its content. I don't think it would be a security
hole, since git-annex is downloading a specific key and will verify the
key's content. Seems like a good idea to belt-and-suspenders test for
such a misuse of the protocol.
2019-08-03 16:23:47 -04:00
Joey Hess
28c0395d61
start at retieval from LFS
Doesn't yet download the content, which will need to support resuming.
2019-08-03 12:51:16 -04:00
Joey Hess
5be0a35dae
implemented checkPresent for git-lfs 2019-08-03 12:21:28 -04:00
Joey Hess
a16e83eec8
also debug http response status code 2019-08-03 11:30:06 -04:00
Joey Hess
fc09a41ed1
storing objects in git-lfs is working
Still need to record the sha256 and size when they cannot be determined
by inspecting the key.
2019-08-02 13:56:55 -04:00
Joey Hess
6c1130a3bb
lfs endpoint discovery and caching in git-lfs special remote 2019-08-02 12:38:14 -04:00
Joey Hess
1cef791cf3
skeleton git-lfs special remote
This is a special remote and a git remote at the same time; git can pull
and push to it and git-annex can use it as a special remote.

Remote.Git has to check if it's configured as a git-lfs special remote
and sets it up as one if so.

Object methods not implemented yet.
2019-08-01 15:30:12 -04:00
Joey Hess
426053cb6c
Corrected some license statements
In 40ecf58d4b I changed the license of code I
wrote from GPL to AGPL. But, two files containing code I wrote combined
with code by others were updated to say their license is AGPL, while in
fact part of it was (the code I wrote) but part remained under the original
license (the code written by others).

Remote/Ddar.hs is now changed entirely back to GPL 3.

Annex/DirHashes.hs stays AGPL, but I broke out Utility/MD5.hs with the code
not written by me, and corrected its license statement to GPL-2, which
is the actual version of the GPL included with the code in its original
distribution at http://www.cs.ox.ac.uk/people/ian.lynagh/md5/
2019-07-28 14:27:33 -04:00
Joey Hess
21ff5e1e5a
CoW probing
Improved probing when CoW copies can be made between files on the same
drive. Now supports CoW between BTRFS subvolumes. And, falls back to rsync
instead of using cp when CoW won't work, eg copies between repos on the
same EXT4 filesystem.

Rather than trying cp --reflink=always for each file copied to a remote,
it's tried once and if it fails it falls back to using rsync thereafter
for the lifetime of the Remote object. That avoids overhead of calling cp
which while small, will add up over a large number of files.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2019-07-17 14:19:08 -04:00
Joey Hess
9a5ddda511
remove many old version ifdefs
Drop support for building with ghc older than 8.4.4, and with older
versions of serveral haskell libraries than will be included in Debian 10.

The only remaining version ifdefs in the entire code base are now a couple
for aws!

This commit should only be merged after the Debian 10 release.
And perhaps it will need to wait longer than that; it would make
backporting new versions of  git-annex to Debian 9 (stretch) which
has been actively happening as recently as this year.

This commit was sponsored by Ilya Shlyakhter.
2019-07-05 15:09:37 -04:00
Joey Hess
26c54d6ea3
make metered more generic
Allow it to be used when the Key is not known.
2019-06-25 12:33:36 -04:00
Joey Hess
44de3fff0b
avoid rsync/gcrypt ssh startup delay with -J
Avoid a delay at startup when concurrency is enabled and there are
rsync or gcrypt special remotes, which was caused by git-annex
opening a ssh connection to the remote too early.

sshOptions makes a connection to the ssh server if one is not already open,
when concurrency is enabled. Avoid doing that at startup, when the remote
list is being built, but the remote may not be used at all.

Instead, rsync/gcrypt now runs sshOptions once per ssh connection to the
server. This should not be significant overhead since Remote.Git already
has the same overhead (as do Bup and Ddar).
2019-06-13 11:16:38 -04:00
Joey Hess
94cba37f68
fix build 2019-05-28 11:18:05 -04:00
Joey Hess
c1ed0293b0
improve docs about removeExportDirectory 2019-05-28 11:16:01 -04:00
Joey Hess
8960f259b8
make readonly export remotes really be readonly
When a remote is configured to be readonly, don't allow changing what's
exported to it.

This was missed in the original export remote implementation, but it makes
sense for a readonly export remote to not be allowed to change.
2019-05-28 11:04:28 -04:00
Joey Hess
700a3f2787
Merge branch 'master' into import-from-s3 2019-05-01 14:30:52 -04:00
Joey Hess
9dd764e6f7
Added mimeencoding= term to annex.largefiles expressions.
* Added mimeencoding= term to annex.largefiles expressions.
  This is probably mostly useful to match non-text files with eg
  "mimeencoding=binary"
* git-annex matchexpression: Added --mimeencoding option.
2019-04-30 12:17:22 -04:00
Joey Hess
f08cd6a4ac
set S3 version id in retrieveExportWithContentIdentifierS3
This is necessary because of checks for a S3 version id being set
done when deleting the export or overwriting or renaming it.
2019-04-24 15:13:07 -04:00
Joey Hess
a42e7a012a
refuse unsafe store to unversioned exporttree with old aws version
I've developed a patch to aws, once it gets merged, the real version
number of aws can be filled in.
2019-04-23 14:39:30 -04:00
Joey Hess
15bd7d57ca
info: Show when a remote is configured with importtree 2019-04-23 14:27:43 -04:00
Joey Hess
a7db925f59
typo 2019-04-23 13:19:48 -04:00
Joey Hess
710c2cdbdc
implement rest of missing methods for import from S3 2019-04-23 13:09:27 -04:00
Joey Hess
2f79cb4b45
versioned import from S3 is working
Still some bugs and two stubbed methods to implement though.
2019-04-19 15:13:49 -04:00
Joey Hess
9dc7a10448
Drop support for building with aws older than 0.14.
debian stable has 0.14 so lose the complexity for old versions
2019-04-19 14:27:59 -04:00
Joey Hess
55a5d9679a
implemented mkImportableContentsVersioned 2019-04-19 13:39:33 -04:00
Joey Hess
bf6c7ea6b6
starting work on import from S3
Not in a usuable state yet.
2019-04-18 15:20:09 -04:00
Joey Hess
cd86692c95
fix storeExportWithContentIdentifier 2019-04-09 19:15:20 -04:00
Joey Hess
7b6d0da9b8
adb import
As well as adding the necessary methods, a few other changes to the adb
remote:

* Use ".annextmp" extension for temp files, to avoid conflict with other
  temp files.
* Stop using "echo $?" to get exit status of command inside adb.
  There were two problems; first the "echo" just before it meant it was
  always 0! And secondly, it seems kind of random on my phone whether it's
  1 or 0, not dependant on whether the command seems to have succeeded.
2019-04-09 17:52:41 -04:00
Joey Hess
2dc20e3fa4
update design doc with final design choices 2019-04-09 13:05:22 -04:00
Joey Hess
06cbaa4233
fix back-compat with old git-annex
Unfortunately, "port" has to be set by default, or the old git-annex
will crash when trying to enable the S3 remote.

So, when protocol=https is specified, it needs to override port=80,
since it may be a default setting.
2019-03-22 12:27:41 -04:00
Joey Hess
2a99d7ffc0
improve error message 2019-03-22 12:23:59 -04:00
Joey Hess
7d37011a11
S3: Added protocol= initremote setting, to allow https to be used on a non-standard port
protocol=https implies port=443 and
port=443 implies protocol=https
-- this was necessary because the existing configs set port=443, but
with a protocol setting, users will naturally want to use it, and then
there's no need for them to supply the default https port. So we keep
back-compat, add a nicer way to enable https, and also add support for
non-standard https ports.
2019-03-22 12:17:05 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
2912429640
better indicate when special remotes do not support renameExport
Avoid a warning message when renameExport is not supported, and just
fallback to deleting with a subsequent re-upload. Especially needed for
importtree remotes, where renameExport needs to be disabled.

This changes the external special remote protocol, but in a
backwards-compatible way. A reply of UNSUPPORTED-REQUEST to an older
version of git-annex will cause it to make renameExport return False.
2019-03-11 12:53:24 -04:00
Joey Hess
e412129523
concurrency and status messages when downloading from import 2019-03-08 12:33:44 -04:00
Joey Hess
ee5f1422df
remove debug print 2019-03-07 16:08:58 -04:00
Joey Hess
9a72785307
fixes to export db lookup when accessing importtree=yes
Now in a fresh clone with a importtree=yes remote enabled,
git annex fsck --from the remote works.
2019-03-07 14:10:56 -04:00
Joey Hess
93025dd59f
add missing locking of ContentIdentifier database when writing
This is not super efficient; it would be better to lock the database
once and build up a queue of changes and flush once.

But, storeExportWithContentIdentifier is likely going to be the really
expensive part, so let's do the simple thing and only optimise later if
needed.
2019-03-07 13:32:33 -04:00
Joey Hess
3f449f845e
update 2019-03-07 13:28:18 -04:00
Joey Hess
b3d30e7d70
remove unncessary locking of ContentIdentifier db
Remote.Helper.ExportImport only reads from it, and locking is only
needed when writing.
2019-03-06 14:36:57 -04:00
Joey Hess
b23c301820
fix false positive from checkPresentExportWithContentIdentifierM when file does not exist 2019-03-05 17:04:00 -04:00
Joey Hess
dc278c059c
fix STM crash
git-annex: thread blocked indefinitely in an STM transaction
failed

git-annex: sqlite query crashed
CallStack (from HasCallStack):
  error, called at ./Database/Handle.hs:98:42 in main:Database.Handle
failed

This needs further investigation.
2019-03-05 16:37:40 -04:00
Joey Hess
46d33e804a
added checkPresentExportWithContentIdentifier
Ugh, don't like needing to add this, but I can't see a way around it.
2019-03-05 16:03:03 -04:00
Joey Hess
354aafce1a
refactor database handle code
Use same, simpler method to make only one thread open the export db as
is used for the ContentIdentifier db.

And, always update the export db once before using.
2019-03-05 15:42:39 -04:00
Joey Hess
fd2a1aaa17
avoid using renameExport on import remotes 2019-03-05 14:57:48 -04:00
Joey Hess
bec66258a8
minor 2019-03-05 14:50:39 -04:00
Joey Hess
8c54604e67
import+export from directory special remote fully working
Had to add two more API calls to override export APIs that are not safe
for use in combination with import.

It's unfortunate that removeExportDirectory is documented to be allowed
to remove non-empty directories. I'm not entirely sure why it's that
way, my best guess is it was intended to make it easy to implement with
just rm -rf.
2019-03-05 14:20:14 -04:00
Joey Hess
554b7b7f3e
fix todo 2019-03-04 18:20:12 -04:00
Joey Hess
bc509143e5
avoid opening export db until needed
Before, it was opened when constructing the export Remote, even if it
never got used.
2019-03-04 18:11:32 -04:00
Joey Hess
cd3a2b023a
initial try at using storeExportWithContentIdentifier
Untested, and I'm not sure about the locking of the ContentIdentifier db.
2019-03-04 17:50:41 -04:00
Joey Hess
aaacf431d8
handle importtree=yes config
For now, it's only allowed when exporttree=yes is also set.
That simplified the implementation, but could later be changed if
there's a remote that makes sense to be an import but not an export.
However, it may work just as well to make a remote be readonly to
prevent export to it while still allowing import.
2019-03-04 16:07:35 -04:00
Joey Hess
88ccfaa78c
storeExportWithContentIdentifierM for directory special remote
Not sure if my reasoning about the races really holds.

It would certianly be possible to better guard against races by using
Linux-specific renameat2 with RENAME_EXCHANGE or RENAME_NOREPLACE.

Or by using link and relying on it not overwriting existing files -- but
that would need a filesystem that supports hard links and directory can
be used in filesystems that don't.
2019-03-04 14:46:25 -04:00
Joey Hess
3cd19fb4d0
use InodeCache to avoid races in import from directory special remote
This does not avoid all possible races, but it does avoid all likely
ones, and is demonstratably better than git's own handling of races
where files get modified at the same time as it's updating the working
tree.

The main thing this won't detect are not unlikely races where part
of a file gets changed while it's being copied and then the file is
restored to its original condition before the modification check.
No, it's more likely that the limitations of checking inode, size,
and mtime won't detect certian modifications, involving eg mmapped
files.
2019-03-04 13:57:23 -04:00
Joey Hess
e2e57f8556
initial export support for directory special remote
This does not guard against race condition yet, it's only for testing
purposes.
2019-02-27 13:42:34 -04:00
Joey Hess
45aacd888b
import downloader complete (untested)
Made some api changes.

listImportableContents needs to provide the size
of the data, so the downloader can check disk free space.

retrieveExportWithContentIdentifier is passed the filepath to write to

Use temporary "CID" key during download of a ContentIdentifier from a
remote, so withTmp can be used and then move the content to the real key
once it's known.
2019-02-27 13:15:02 -04:00
Joey Hess
760f26ebc6
Merge branch 'master' into importtree 2019-02-26 11:36:36 -04:00
Joey Hess
19f833b0b1
aws-0.21.1
* S3: Support enabling bucket versioning when built with aws-0.21.1.
* stack.yaml: Build with aws-0.21.1
2019-02-24 12:45:09 -04:00
Joey Hess
fd304dce60
split out Types.Import and some changes to the types in it 2019-02-21 13:39:09 -04:00
Joey Hess
ccc0684d21
no remotes support import yet 2019-02-20 16:59:04 -04:00
Joey Hess
e49f3139b5
fix windows build some more 2019-02-18 17:46:21 -04:00
Joey Hess
4c3178aadf
fix windows build 2019-02-18 17:38:21 -04:00
Joey Hess
9f6b7d6258
On Windows, avoid using rsync for file-to-file copies, since rsync is not always available there.
Installing git-annex with stack rsync won't be available.
Also, using the git-annex installer with 64 bit git installs a non-working
rsync binary because it's linked with libraries provided by 32 bit git.
2019-02-18 17:27:34 -04:00
Joey Hess
60c1b5c994
deal with attempt to export filename with # or ? to webdav
xporting files with '#' or '?' in their name won't work because urls get
truncated on those. Fail in a better way in this case, and avoid failing
when removing such files from the export, so after the user has renamed the
problem files the export will succeed.
2019-02-07 13:47:57 -04:00
Joey Hess
7b9701675e
Display progress bar when getting files from export remotes
And moved the progress bar display into storeExport as well.

This commit was sponsored by John Pellman on Patreon.
2019-01-31 13:34:12 -04:00
Joey Hess
ab689cf0cd
Improved speed of S3 remote by only loading S3 creds once
This gets back any speed lost in commit
9cebfd7002, and speeds up all uses of S3
remotes that operate on them more than once.

This commit was sponsored by Brett Eisenberg on Patreon.
2019-01-30 16:20:14 -04:00
Joey Hess
8eb66a5c40
avoid potentually unsafe use of runResourceT
Pushed the ResourceT out into larger code blocks, and made sure that
the the http result from a sendS3Handle is processed inside the same
ResourceT block.

I don't think this fixes any bugs, but it allows getting rid of a scary
comment.

This commit was sponsored by Eric Drechsel on Patreon.
2019-01-30 15:40:13 -04:00
Joey Hess
9cebfd7002
purify exportActions
Purifying exportActions will allow introspecting and modifying it,
which is needed to add progress bar display to it.

Only S3 and WebDAV ran an Annex action while constructing ExportActions.
There was a small performance gain from them doing that, since a
resource was able to be prepared and reused for multiple actions by
Command.Export.

As seen in commit 809cfbbd8a and
5d394023eb S3 and WebDAV actually create a
new handle for each access in normal, non-export use. It doesn't seem
worth making export use of them marginally more efficient than normal
use. It would be better to do that work upfront when constructing the
remote. Or perhaps use a MVar to cache a handle.

This commit was sponsored by Nick Piper on Patreon.
2019-01-30 15:11:40 -04:00
Joey Hess
5d394023eb
remove incorrect comment
resourcePrepare does not cause the resource to only be prepared once.

The http manager should be reused, which does avoid http connection
overhead, but not because of the use of resourcePrepare.
2019-01-30 14:38:35 -04:00
Joey Hess
809cfbbd8a
prepareS3Handle didn't give any benefits, so remove
I seem to have thought that a Preparer was only run once when a remote
is accessed multiple times, but that is not in fact the case. prepareS3Handle
is run once per access. So, there is no point to it.

That there is some duplicate work done on each access is now apparent.
Luckily, the http manager is reused, so only one http connection is
made. But the S3 creds are loaded repeatedly. Room for improvement here.

This commit was sponsored by Jack Hill on Patreon.
2019-01-30 14:23:39 -04:00
Joey Hess
720e5fda5c
export retrieval fallback to handle S3 remote with partially missing version IDs
When key-based retrieval from a S3 remote with exporttree=yes
appendonly=yes fails, fall back to trying to retrieve from the exported
tree. This allows downloads of files that were exported to such a remote
before versioning was enabled on it.

This is useful at least for a transition for users who got into that
situation, so they can download content from their S3 remote. May want to
remove this in the future though, since normally trying to download the
second time is only extra work.

This commit was sponsored by Brock Spratlen on Patreon.
2019-01-30 13:23:03 -04:00
Joey Hess
ad1d422dd7
fix false positive in export conflict detection
Like the earlier fixed one in Command.Export, it occurred when the same
tree was exported by multiple clones. Previous fix was incomplete since
several other places looked at the list of exported trees to detect when
there was an export conflict. Added a single unified function to avoid
missing any places it needed to be fixed.

This commit was sponsored by mo on Patreon.
2019-01-30 12:36:30 -04:00
Joey Hess
8fc6c11cf1
couple fixes 2019-01-29 15:20:22 -04:00
Joey Hess
a8f1add4d1
S3: Detect when version=yes but an exported file lacks versioning, and refuse to delete it, to avoid data loss.
This commit was sponsored by Denis Dzyubenko on Patreon.
2019-01-29 15:07:27 -04:00
Joey Hess
bb9817ceae
enableremote S3: Do not let versioning=yes be set on existing remote
Because when git-annex lacks S3 version IDs for files stored in the bucket,
deleting them would cause data loss.

Also because git-annex is not able to download unversioned objects from a bucket
when versioning=yes.

This also prevents setting versioning=no. While that would perhaps be
possible to do safely, it would add complexity, and would mean that if
the user accidentially did enableremote versioning=no, they would not be
able to undo it.

This commit was sponsored by Trenton Cronholm on Patreon.
2019-01-29 14:09:50 -04:00
Joey Hess
ee011b3cbb
initremote S3: Automatically enable versioning in S3 buckets when configured with versioning=yes.
Needs not yet released version 0.22 of aws library; with older versions
asks the user to configure the bucket versioning themselves.

Note that S3 endpoints that don't support versioning will cause putBucketVersioning
to throw an exception, so initremote will fail.

This commit was sponsored by Jake Vosloo on Patreon.
2019-01-29 13:46:04 -04:00
Joey Hess
c4977ec1ff
refactoring 2019-01-29 13:42:32 -04:00
Joey Hess
669b305de2
S3: Send a Content-Type header when storing objects in S3
So exports to public buckets can be linked to from web pages.

(When git-annex is built with MagicMime support.)

Thanks to Jared Cosulich for the idea.
2019-01-23 13:08:47 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
2e069eb9f6
use putBucket to future-proof
New fields can be added to PutBucket in the future.
2018-12-31 13:09:20 -04:00
Joey Hess
3fdc6fdfa9
remove unused import 2018-12-30 15:18:49 -04:00
Joey Hess
a26514d67e
Fix doubled progress display when downloading an url when -J is used.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.

Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.

When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.

This commit was sponsored by John Pellman on Patreon.
2018-12-30 12:29:49 -04:00
Joey Hess
3f587d447a
fix webdav reversion
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)

This commit was sponsored by mo on patreon.
2018-12-10 12:49:51 -04:00
Joey Hess
4579dd6201
S3: Improve diagnostics when a remote is configured with exporttree and versioning, but no S3 version id has been recorded for a key.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.

When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.

In both cases, display a hopefully useful warning.

This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.

Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-12-06 13:44:37 -04:00
Joey Hess
1308a76bf1
deMaybe credPairRemoteKey
It's always Just
2018-12-04 13:37:43 -04:00
Joey Hess
a25fef36ad
fix json for exportedtrees in conflict
Repeating the same json field with multiple values tends to not be
supported well by json parsers, so list the trees separated by spaces.
2018-12-03 14:43:59 -04:00
Joey Hess
b8f9dea27d
add exportedtree to info
info: When used with an exporttree remote, includes an "exportedtree" info,
which is the tree last exported to the remote. During an export conflict,
multiple values will be listed.

This commit was sponsored by John Pellman on Patreon.
2018-12-03 14:36:00 -04:00
Robert Schütz
32017f8082
replace TORRENT by WITH_TORRENTPARSER 2018-11-27 12:29:25 -04:00
Joey Hess
370757087d
catch lockContentForRemoval exception
removeKey should not throw exceptions, so catch exception there

In Assistant.Unused, keep trying to drop other keys if one drop fails
2018-11-15 15:39:57 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
c472c268c4
webapp: Fixed a crash when adding a git remote.
Reversion introduced in 2b66492d6e which added a new cache that needs to be
cleared.
2018-10-29 16:01:08 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00