Commit graph

66 commits

Author SHA1 Message Date
Joey Hess
963239da5c
separate RemoteConfig parsing basically working
Many special remotes are not updated yet and are commented out.
2020-01-14 12:35:08 -04:00
Joey Hess
71f78fe45d
wip separate RemoteConfig parsing
Remote now contains a ParsedRemoteConfig. The parsing happens when the
Remote is constructed, rather than when individual configs are used.

This is more efficient, and it lets initremote/enableremote
reject configs that have unknown fields or unparsable values.

It also allows for improved type safety, as shown in
Remote.Helper.Encryptable where things that used to match on string
configs now match on data types.

This is a work in progress, it does not build yet.

The main risk in this conversion is forgetting to add a field to
RemoteConfigParser. That will prevent using that field with
initremote/enableremote, and will prevent remotes that already are set
up from seeing that configuration. So will need to check carefully that
every field that getRemoteConfigValue is called on has been added to
RemoteConfigParser.

(One such case I need to remember is that credPairRemoteField needs to be
included in the RemoteConfigParser.)
2020-01-13 12:39:21 -04:00
Joey Hess
71ecfbfccf
be stricter about rejecting invalid configurations for remotes
This is a first step toward that goal, using the ProposedAccepted type
in RemoteConfig lets initremote/enableremote reject bad parameters that
were passed in a remote's configuration, while avoiding enableremote
rejecting bad parameters that have already been stored in remote.log

This does not eliminate every place where a remote config is parsed and a
default value is used if the parse false. But, I did fix several
things that expected foo=yes/no and so confusingly accepted foo=true but
treated it like foo=no. There are still some fields that are parsed with
yesNo but not not checked when initializing a remote, and there are other
fields that are parsed in other ways and not checked when initializing a
remote.

This also lays groundwork for rejecting unknown/typoed config keys.
2020-01-10 14:52:48 -04:00
Joey Hess
81d402216d cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

Previously attempted in 4536c93bb2
and reverted in 96aba8eff7.
The problems mentioned in the latter commit are addressed now:

Read/Show of KeyData is backwards-compatible with Read/Show of Key from before
this change, so Types.Distribution will keep working.

The Eq instance is fixed.

Also, Key has smart constructors, avoiding needing to remember to update
the cached serialization.

Used git-annex benchmark:
  find is 7% faster
  whereis is 3% faster
  get when all files are already present is 5% faster
Generally, the benchmarks are running 0.1 seconds faster per 2000 files,
on a ram disk in my laptop.
2019-11-22 17:49:16 -04:00
Joey Hess
df5b0ffab3
inherit other fields
I think this is all that need to be inherited.
2019-10-10 16:11:21 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
2542fb58ed
fix giveup shadowing 2016-11-16 00:28:10 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
b890f3a53d
Fix bug that prevented resuming of uploads to encrypted special remotes that used chunking. This bug could also expose the names of keys to such remotes.
This is a low-severity security hole.
2016-04-27 12:54:43 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
6dad09a823 disable whereisKey for encrypted or chunked remotes
This only makes sense for public repos, that are not chunked, so
that there's a 1:1 from Key in the git-annex repo to file on the remote.
Rather than making every remote implementation deal with that, just disable
whereisKey when it doesn't make sense.
2015-08-19 14:16:01 -04:00
Joey Hess
addc82dab7 removed all uses of undefined from code base
It's a code smell, can lead to hard to diagnose error messages.
2015-04-19 00:38:29 -04:00
Joey Hess
afc5153157 update my email address and homepage url 2015-01-21 12:50:09 -04:00
Joey Hess
4f657aa14e add getFileSize, which can get the real size of a large file on Windows
Avoid using fileSize which maxes out at just 2 gb on Windows.
Instead, use hFileSize, which doesn't have a bounded size.
Fixes support for files > 2 gb on Windows.

Note that the InodeCache code only needs to compare a file size,
so it doesn't matter it the file size wraps. So it has been
left as-is. This was necessary both to avoid invalidating existing inode
caches, and because the code passed FileStatus around and would have become
more expensive if it called getFileSize.

This commit was sponsored by Christian Dietrich.
2015-01-20 17:09:24 -04:00
Joey Hess
a0297915c1 add per-remote-type info
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.

Remote types that support encryption or chunking also include that in their
info.

This commit was sponsored by Ævar Arnfjörð Bjarmason.
2014-10-21 14:36:09 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
6adbd50cd9 testremote: Add testing of behavior when remote is not available
Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.

This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.

This immediately found several bugs, which this commit also fixes!

* git remotes using ssh accidentially had checkPresent return
  an exception, rather than throwing it
* The chunking code accidentially returned False rather than
  propigating an exception when there were no chunks and
  checkPresent threw an exception for the non-chunked key.

This commit was sponsored by Carlo Matteo Capocasa.
2014-08-10 15:02:59 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
b4cf22a388 pushed checkPresent exception handling out of Remote implementations
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:

1. They can all be caught in one place (Remote.hasKey), and we know
   every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
   Remotes to be able to be implemented without needing to worry about
   catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
   Bool, which all the other relevant remote methods already did.
   I do not see a good way to generalize that type; my previous attempts
   failed miserably.
2014-08-06 13:45:19 -04:00
Joey Hess
b3fe23b552 remove redundant progress meter display code
specialRemote handles all meter display, so this is redundant.
2014-08-03 16:18:40 -04:00
Joey Hess
4b16989e98 roll ChunkedEncryptable into Special and improve interface
Allow disabling progress displays, for eg, rsync.
2014-08-03 15:40:01 -04:00
Joey Hess
00f92a7e59 whitespace 2014-08-03 01:21:38 -04:00
Joey Hess
de0da0aece minor optimisation 2014-08-01 17:18:39 -04:00
Joey Hess
3991327d09 testremote: Test retrieveKeyFile resume
And fixed a bug found by these tests; retrieveKeyFile would fail
when the dest file was already complete.

This commit was sponsored by Bradley Unterrheiner.
2014-08-01 17:16:20 -04:00
Joey Hess
9636cfd9e1 fix a fenchpost bug when resuming chunked store at end
Discovered thanks to testremote command!
2014-08-01 16:29:39 -04:00
Joey Hess
8fce4e4bd7 fix chunk=0
Found by testremote
2014-08-01 15:36:11 -04:00
Joey Hess
89416ba2d9 only chunk stable keys
The content of unstable keys can potentially be different in different
repos, so eg, resuming a chunked upload started by another repo would
corrupt data.
2014-07-30 10:34:39 -04:00
Joey Hess
a963d790d3 update progress after each chunk, at least
This way, when the remote implementation neglects to update progress,
there will still be a somewhat useful progress display, as long as chunks
are used.
2014-07-29 20:31:16 -04:00
Joey Hess
53b87a859e optimise case of remote that retrieves FileContent, when chunks and encryption are not being used
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.

Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
2014-07-29 20:10:14 -04:00
Joey Hess
c0dc134cde support chunking for all external special remotes!
Removing code and at the same time adding great features, including
upload/download resuming.

This commit was sponsored by Romain Lenglet.
2014-07-29 18:50:20 -04:00
Joey Hess
bc9e4697b9 better type for Retriever
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.

I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
2014-07-29 18:41:41 -04:00
Joey Hess
47e522979c allow Retriever action to update the progress meter
Needed for eg, Remote.External.

Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
2014-07-29 17:18:49 -04:00
Joey Hess
1d263e1e7e lift types from IO to Annex
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.

There should be no behavior changes in this commit.

This commit was sponsored by Michael Barabanov.
2014-07-29 16:28:44 -04:00
Joey Hess
f5af470875 add ContentSource type, for remotes that act on files rather than ByteStrings
Note that currently nothing cleans up a ContentSource's file, when eg,
retrieving chunks.
2014-07-29 15:16:12 -04:00
Joey Hess
216fdbd6bd fix non-checked hasKeyChunks 2014-07-29 15:07:32 -04:00
Joey Hess
58f727afdd resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.

Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.

Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.

In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.

However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.

This commit was sponsored by Georg Bauer.
2014-07-28 14:35:52 -04:00
Joey Hess
80cc554c82 add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later)
If a newer git-annex starts logging something else in the chunk log, it
won't be used by this version, but it will be preserved when updating the
log.
2014-07-28 13:19:08 -04:00
Joey Hess
9d4a766cd7 resume interrupted chunked downloads
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.

(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)

This commit was sponsored by Thomas Djärv.
2014-07-27 18:56:32 -04:00
Joey Hess
2996f0eb05 use existing chunks even when chunk=0
When chunk=0, always try the unchunked key first. This avoids the overhead
of needing to read the git-annex branch to find the chunkcount.

However, if the unchunked key is not present, go on and try the chunks.

Also, when removing a chunked key, update the chunkcounts even when
chunk=0.
2014-07-27 02:13:51 -04:00
Joey Hess
7afb057d60 reorg 2014-07-27 01:24:34 -04:00
Joey Hess
c3af4897c0 faster storeChunks
No need to process each L.ByteString chunk, instead ask it to split.

Doesn't seem to have really sped things up much, but it also made the code
simpler.

Note that this does (and already did) buffer in memory. It seems that only
the directory special remote could take advantage of streaming chunks to
files w/o buffering, so probably won't add an interface to allow for that.
2014-07-27 01:18:38 -04:00
Joey Hess
9a8c4bb21f improve exception handling
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.

Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
2014-07-26 23:26:10 -04:00
Joey Hess
867fd116a7 better exception display 2014-07-26 23:01:44 -04:00
Joey Hess
93be3296fc fix another fallback bug 2014-07-26 22:47:52 -04:00
Joey Hess
86e8532c0a allM has slightly better memory use 2014-07-26 22:34:40 -04:00
Joey Hess
67975bf50d fix fallback to other chunk size when first does not have it 2014-07-26 22:25:50 -04:00
Joey Hess
d4d68f57e5 finish up basic chunked remote groundwork
Chunk retrieval and reassembly, removal, and checking if all necessary
chunks are present.

This commit was sponsored by Damien Raude-Morvan.
2014-07-26 20:11:41 -04:00
Joey Hess
cf83697c33 reorg 2014-07-26 12:04:35 -04:00