Commit graph

852 commits

Author SHA1 Message Date
Joey Hess
0f78f197eb casts; now fully working.. but still leaking
Still seems to buffer the whole partsize in memory, but I'm pretty sure my
code is not what's doing it. See https://github.com/aristidb/aws/issues/142
2014-11-03 21:12:15 -04:00
Joey Hess
f0551578d6 this should avoid leaking memory 2014-11-03 20:49:30 -04:00
Joey Hess
4230b56b79 logic error 2014-11-03 20:15:33 -04:00
Joey Hess
62de9a39bf WIP 3 2014-11-03 20:04:42 -04:00
Joey Hess
d16382e99f WIP 2 2014-11-03 19:50:33 -04:00
Joey Hess
5360417436 WIP try sending using RequestBodyStreamChunked
May not work; if it does this is gonna be the simplest way to get good
memory size and progress reporting.
2014-11-03 19:18:46 -04:00
Joey Hess
8f61bfad51 link to memory leak bug 2014-11-03 17:55:05 -04:00
Joey Hess
711b18a6eb improve info display for multipart 2014-11-03 17:24:53 -04:00
Joey Hess
2c53f331bd fix build 2014-11-03 17:23:46 -04:00
Joey Hess
6a965cf8d7 adjust version check
I assume 0.10.6 will have the fix for the bug I reported, which got fixed
in master already..
2014-11-03 16:23:00 -04:00
Joey Hess
5c3d9d6caa show multipart configuration in git annex info s3remote 2014-11-03 16:07:41 -04:00
Joey Hess
a3ec6ed73b Merge branch 'master' into s3-aws-multipart 2014-11-03 16:05:03 -04:00
Joey Hess
8faeb25076 finish multipart support using unreleased update to aws lib to yield etags
Untested and not even compiled yet.

Testing should include checks that file content streams through without
buffering in memory.

Note that CL.consume causes all the etags to be buffered in memory.
This is probably nearly unavoidable, since a request has to be constructed
that contains the list of etags in its body. (While it might be possible to
stream generation of the body, that would entail making a http request that
dribbles out parts of the body as the multipart uploads complete, which is
not likely to work well..

To limit this being a problem, it's best for partsize to be set to some
suitably large value, like 1gb. Then a full terabyte file will need only
1024 etags to be stored, which will probably use around 1 mb of memory.
2014-11-03 16:04:55 -04:00
Joey Hess
39dd5a2ac3 improve uuid mismatch message 2014-10-28 15:54:44 -04:00
Joey Hess
6e89d070bc WIP multipart S3 upload
I'm a little stuck on getting the list of etags of the parts.
This seems to require taking the md5 of each part locally,
which doesn't get along well with lazily streaming in the part from the
file. It would need to read the file twice, or lose laziness and buffer a
whole part -- but parts might be quite large.

This seems to be a problem with the API provided; S3 is supposed to return
an etag, but that is not exposed. I have filed a bug:
https://github.com/aristidb/aws/issues/141
2014-10-28 14:17:30 -04:00
Joey Hess
8ed1a0afee fix build 2014-10-23 16:52:05 -04:00
Joey Hess
8edf7a0fc3 fix build 2014-10-23 16:51:10 -04:00
Joey Hess
171e677a3c update for aws 0.10's better handling of DNE for HEAD
Kept support for older aws, since Debian has 0.9.2 still.
2014-10-23 16:32:18 -04:00
Joey Hess
fa1318479e rename isIA to configIA
Already done on s3-aws branch, so reduce divergence.
2014-10-23 15:56:35 -04:00
Joey Hess
6acc6863c5 fix build 2014-10-23 15:54:00 -04:00
Joey Hess
7489f516bc one last build fix, yes it builds now 2014-10-23 15:50:41 -04:00
Joey Hess
76ee815e89 needs type families 2014-10-23 15:48:37 -04:00
Joey Hess
f0989cf0bd fix build 2014-10-23 15:41:57 -04:00
Joey Hess
8b48bdfdc8 enable frankfurt
The aws library supports the AWS4-HMAC-SHA256 that it requires.
2014-10-23 11:02:24 -04:00
Joey Hess
4eefc12295 Merge branch 'master' into s3-aws 2014-10-23 11:02:14 -04:00
Joey Hess
e687c61d04 add new frankfurt region to list in webapp
But commented out for now, because:

The authorization mechanism you have provided is not supported. Please use
AWS4-HMAC-SHA256
2014-10-23 11:02:02 -04:00
Joey Hess
35551d0ed0 Merge branch 'master' into s3-aws
Conflicts:
	Remote/S3.hs
2014-10-22 17:14:38 -04:00
Joey Hess
5c15d6d3cc show in info whether a remote uses hybrid encryption or not 2014-10-22 14:39:59 -04:00
Joey Hess
3006b79c86 include creds info for glacier and webdav
That and S3 are all that uses creds currently, except that external
remotes can use creds. I have not handled showing info about external
remote creds because they can have 0, 1, or more separate cred pairs, and
there's no way for info to enumerate them or know how they're used.
So it seems ok to leave out creds info for external remotes.
2014-10-22 13:56:14 -04:00
Joey Hess
1b90838bbd add internet archive item url to info 2014-10-21 15:34:32 -04:00
Joey Hess
9280fe4cbe include creds location in info
This is intended to let the user easily tell if a remote's creds are
coming from info embedded in the repository, or instead from the
environment, or perhaps are locally stored in a creds file.

This commit was sponsored by Frédéric Schütz.
2014-10-21 15:09:40 -04:00
Joey Hess
a0297915c1 add per-remote-type info
Now `git annex info $remote` shows info specific to the type of the remote,
for example, it shows the rsync url.

Remote types that support encryption or chunking also include that in their
info.

This commit was sponsored by Ævar Arnfjörð Bjarmason.
2014-10-21 14:36:09 -04:00
Joey Hess
fced322834 glacier: Fix pipe setup when calling glacier-cli to retrieve an object. 2014-10-20 15:11:01 -04:00
Joey Hess
ef3804bdb3 S3: Fix embedcreds=yes handling for the Internet Archive.
Before, embedcreds=yes did not cause the creds to be stored in remote.log,
but also prevented them being locally cached.
2014-10-12 13:15:52 -04:00
Joey Hess
9fd95d9025 indent with tabs not spaces
Found these with:
git grep "^  " $(find -type  f -name \*.hs) |grep -v ':  where'

Unfortunately there is some inline hamlet that cannot use tabs for
indentation.

Also, Assistant/WebApp/Bootstrap3.hs is a copy of a module and so I'm
leaving it as-is.
2014-10-09 15:09:26 -04:00
Joey Hess
7b50b3c057 fix some mixed space+tab indentation
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.

Done as a background task while listening to episode 2 of the Type Theory
podcast.
2014-10-09 15:09:11 -04:00
Joey Hess
0ed33c8b74 deal with old repositories with non-encrypted creds
See 2f3c3aa01f for backstory about how a repo
could be in this state.

When decryption fails, the repo must be using non-encrypted creds. Note
that creds are encrypted/decrypted using the encryption cipher which is
stored in the repo, so the decryption cannot fail due to missing gpg keys
etc. (For !shared encryptiom, the cipher is iteself encrypted using some
gpg key(s), and the decryption of the cipher happens earlier, so not
affected by this change.

Print a warning message for !shared repos, and continue on using the
cipher. Wrote a page explaining what users hit by this bug should do.

This commit was sponsored by Samuel Tardieu.
2014-09-18 17:58:03 -04:00
Joey Hess
2f3c3aa01f glacier, S3: Fix bug that caused embedded creds to not be encypted using the remote's key.
encryptionSetup must be called before setRemoteCredPair. Otherwise,
the RemoteConfig doesn't have the cipher in it, and so no cipher is used to
encrypt the embedded creds.

This is a security fix for non-shared encryption methods!

For encryption=shared, there's no security problem, just an
inconsistentency in whether the embedded creds are encrypted.

This is very important to get right, so used some types to help ensure that
setRemoteCredPair is only run after encryptionSetup. Note that the external
special remote bypasses the type safety, since creds can be set after the
initial remote config, if the external special remote program requests it.
Also note that IA remotes never use encryption, so encryptionSetup is not
run for them at all, and again the type safety is bypassed.

This leaves two open questions:

1. What to do about S3 and glacier remotes that were set up
   using encryption=pubkey/hybrid with embedcreds?
   Such a git repo has a security hole embedded in it, and this needs to be
   communicated to the user. Is the changelog enough?

2. enableremote won't work in such a repo, because git-annex will
   try to decrypt the embedded creds, which are not encrypted, so fails.
   This needs to be dealt with, especially for ecryption=shared repos,
   which are not really broken, just inconsistently configured.

   Noticing that problem for encryption=shared is what led to commit
   fbdeeeed5f, which tried to
   fix the problem by not decrypting the embedded creds.

This commit was sponsored by Josh Taylor.
2014-09-18 17:26:12 -04:00
Joey Hess
d84eab8a8a Revert "S3, Glacier, WebDAV: Fix bug that prevented accessing the creds when the repository was configured with encryption=shared embedcreds=yes."
This reverts commit fbdeeeed5f.

I can find no basis for that commit and think that I made it in error.
setRemoteCredPair always encrypts using the cipher from remoteCipher,
even when the cipher is shared.
2014-09-18 15:21:47 -04:00
Joey Hess
f7847ae98d Merge branch 'master' into s3-aws
Conflicts:
	Utility/Url.hs
	debian/changelog
	git-annex.cabal
2014-09-18 14:36:20 -04:00
Joey Hess
9964584c34 WebDav: Fix enableremote crash when the remote already exists. (Bug introduced in version 5.20140817.) 2014-09-17 13:04:55 -04:00
Joey Hess
a97c9e43b7 The annex-rsync-transport configuration is now also used when checking if a key is present on a rsync remote, and when dropping a key from the remote. 2014-09-11 13:21:35 -04:00
Joey Hess
b874f84086 New annex.hardlink setting. Closes: #758593
* New annex.hardlink setting. Closes: #758593
* init: Automatically detect when a repository was cloned with --shared,
  and set annex.hardlink=true, as well as marking the repository as
  untrusted.

Had to reorganize Logs.Trust a bit to avoid a cycle between it and
Annex.Init.
2014-09-05 13:44:09 -04:00
Joey Hess
6eb5c3f479 Do not preserve permissions and acls when copying files from one local git repository to another. Timestamps are still preserved as long as cp --preserve=timestamps is supported.
This avoids cp -a overriding the default mode acls that the user might have
set in a git repository.

With GNU cp, this behavior change should not be a breaking change, because
git-anex also uses rsync sometimes in the same situation, and has only ever
preserved timestamps when using rsync.

Systems without GNU cp will no longer use cp -a, but instead just cp.
So, timestamps will no longer be preserved. Preserving timestamps when
copying between repos is not guaranteed anyway.

Closes: #729757
2014-08-26 17:10:25 -07:00
Joey Hess
aebcc395ff use types to enforce that removeAnnex can only be called inside lockContent
This fixed one bug where it needed to be and wasn't (in Assistant.Unused).
And also found one place where lockContent was used unnecessarily (by
drop --from remote).

A few other places like uninit probably don't really need to lockContent,
but it doesn't hurt to do call it anyway.

This commit was sponsored by David Wagner.
2014-08-20 20:13:47 -04:00
Joey Hess
1994771215 more lock file refactoring
Also fixes a test suite failures introduced in recent commits, where
inAnnexSafe failed in indirect mode, since it tried to open the lock file
ReadWrite. This is why the new checkLocked opens it ReadOnly.

This commit was sponsored by Chad Horohoe.
2014-08-20 18:58:14 -04:00
Joey Hess
d279180266 reorganize and refactor lock code
Added a convenience Utility.LockFile that is not a windows/posix
portability shim, but still manages to cut down on the boilerplate around
locking.

This commit was sponsored by Johan Herland.
2014-08-20 16:45:58 -04:00
Joey Hess
96dc423e39 When accessing a local remote, shut down git-cat-file processes afterwards, to ensure that remotes on removable media can be unmounted. Closes: #758630
This does mean that eg, copying multiple files to a local remote will
become slightly slower, since it now restarts git-cat-file after each copy.
Should not be significant slowdown.

The reason git-cat-file is run on the remote at all is to update its
location log. In order to add an item to it, it needs to get the current
content of the log. Finding a way to avoid needing to do that would be a
good path to avoiding this slowdown if it does become a problem somehow.

This commit was sponsored by Evan Deaubl.
2014-08-20 12:07:57 -04:00
Joey Hess
83dc82c232 forgot some lifts 2014-08-20 11:51:47 -04:00
Joey Hess
092041fab0 Ensure that all lock fds are close-on-exec, fixing various problems with them being inherited by child processes such as git commands.
(With the exception of daemon pid locking.)

This fixes at part of #758630. I reproduced the assistant locking eg, a
removable drive's annex journal lock file and forking a long-running
git-cat-file process that inherited that lock.

This did not affect Windows.

Considered doing a portable Utility.LockFile layer, but git-annex uses
posix locks in several special ways that have no direct Windows equivilant,
and it seems like it would mostly be a complication.

This commit was sponsored by Protonet.
2014-08-20 11:37:02 -04:00
Joey Hess
ef01ff1e77 Merge branch 'master' into s3-aws
Conflicts:
	git-annex.cabal
2014-08-15 17:30:40 -04:00
Joey Hess
fbdeeeed5f S3, Glacier, WebDAV: Fix bug that prevented accessing the creds when the repository was configured with encryption=shared embedcreds=yes.
Since encryption=shared, the encryption key is stored in the git repo, so
there is no point at all in encrypting the creds, also stored in the git
repo with that key. So `initremote` doesn't. The creds are simply stored
base-64 encoded.

However, it then tried to always decrypt creds when encryption was used..
2014-08-12 15:35:29 -04:00
Joey Hess
6adbd50cd9 testremote: Add testing of behavior when remote is not available
Added a mkUnavailable method, which a Remote can use to generate a version
of itself that is not available. Implemented for several, but not yet all
remotes.

This allows testing that checkPresent properly throws an exceptions when
it cannot check if a key is present or not. It also allows testing that the
other methods don't throw exceptions in these circumstances.

This immediately found several bugs, which this commit also fixes!

* git remotes using ssh accidentially had checkPresent return
  an exception, rather than throwing it
* The chunking code accidentially returned False rather than
  propigating an exception when there were no chunks and
  checkPresent threw an exception for the non-chunked key.

This commit was sponsored by Carlo Matteo Capocasa.
2014-08-10 15:02:59 -04:00
Joey Hess
5fc54cb182 auto-create IA buckets
Needs my patch to aws which will hopefully be accepted soon.
2014-08-09 22:17:40 -04:00
Joey Hess
445f04472c better memoization 2014-08-09 22:13:03 -04:00
Joey Hess
5ee72b1bae fix meter update 2014-08-09 16:49:31 -04:00
Joey Hess
3659cb9efb S3: finish converting to aws library
Implemented the Retriever.

Unfortunately, it is a fileRetriever and not a byteRetriever.
It should be possible to convert this to a byteRetiever, but I got stuck:
The conduit sink needs to process individual chunks, but a byteRetriever
needs to pass a single L.ByteString to its callback for processing. I
looked into using unsafeInerlaveIO to build up the bytestring lazily,
but the sink is already operating under conduit's inversion of control,
and does not run directly in IO anyway.

On the plus side, no more memory leak..
2014-08-09 15:58:01 -04:00
Joey Hess
57872b457b pass metadata headers and storage class to S3 when putting objects 2014-08-09 14:44:53 -04:00
Joey Hess
1ba1e37be3 remove dead code 2014-08-09 14:30:28 -04:00
Joey Hess
4f007ace87 S3: convert to aws for store, remove, checkPresent
Fixes the memory leak on store.. the second oldest open git-annex bug!

Only retrieve remains to be converted.

This commit was sponsored by Scott Robinson.
2014-08-09 14:26:19 -04:00
Joey Hess
8eac9eab03 Merge branch 'master' into s3-aws 2014-08-09 13:40:21 -04:00
Joey Hess
f69a9274f9 avoid printing really ugly webdav exceptions
The responseheaders can sometimes include the entire input request,
which is several pages of garbage.
2014-08-09 01:38:13 -04:00
Joey Hess
809ee40d76 wording 2014-08-08 21:42:46 -04:00
Joey Hess
ccfb433ab3 cleanup 2014-08-08 20:51:22 -04:00
Joey Hess
cf82b0e1ec cleanup 2014-08-08 20:33:03 -04:00
Joey Hess
4f1ba9a23d fix checkPresent error handling for non-present local git repos
guardUsable r (error "foo") *returned* an error, rather than throwing it
2014-08-08 19:18:08 -04:00
Joey Hess
6fcca2f13e WIP converting S3 special remote from hS3 to aws library
Currently, initremote works, but not the other operations. They should be
fairly easy to add from this base.

Also, https://github.com/aristidb/aws/issues/119 blocks internet archive
support.

Note that since http-conduit is used, this also adds https support to S3.
Although git-annex encrypts everything anyway, so that may not be extremely
useful. It is not enabled by default, because existing S3 special remotes
have port=80 in their config. Setting port=443 will enable it.

This commit was sponsored by Daniel Brockman.
2014-08-08 19:00:53 -04:00
Joey Hess
1dd3232e8e check for 200 response 2014-08-08 17:17:36 -04:00
Joey Hess
0260ee43e6 fix removeKey when not present 2014-08-08 14:57:05 -04:00
Joey Hess
6cb9e5c32f show missing url= parameter error sooner 2014-08-08 14:19:08 -04:00
Joey Hess
c3f8512475 WebDAV: Avoid buffering whole file in memory when downloading.
httpBodyRetriever will later also be used by S3

This commit was sponsored by Ethan Aubin.
2014-08-08 13:40:55 -04:00
Joey Hess
fc17cf852e further break out legacy chunking code 2014-08-08 13:17:24 -04:00
Joey Hess
c784ef4586 unify exception handling into Utility.Exception
Removed old extensible-exceptions, only needed for very old ghc.

Made webdav use Utility.Exception, to work after some changes in DAV's
exception handling.

Removed Annex.Exception. Mostly this was trivial, but note that
tryAnnex is replaced with tryNonAsync and catchAnnex replaced with
catchNonAsync. In theory that could be a behavior change, since the former
caught all exceptions, and the latter don't catch async exceptions.

However, in practice, nothing in the Annex monad uses async exceptions.
Grepping for throwTo and killThread only find stuff in the assistant,
which does not seem related.

Command.Add.undo is changed to accept a SomeException, and things
that use it for rollback now catch non-async exceptions, rather than
only IOExceptions.
2014-08-07 22:03:29 -04:00
Joey Hess
2dd8dab314 WebDAV: Avoid buffering whole file in memory when uploading.
The httpStorer will later also be used by S3.

This commit was sponsored by Torbjørn Thorsen.
2014-08-07 19:32:23 -04:00
Joey Hess
fc4b3cdcce webdav: reuse http connection when operating on the chunks of a file
For both new and legacy chunks.

Massive speed up!

This commit was sponsored by Dominik Wagenknecht.
2014-08-07 18:33:14 -04:00
Joey Hess
0b1b85d9ea use DAV monad
This speeds up the webdav special remote somewhat, since it often now
groups actions together in a single http connection when eg, storing a
file.

Legacy chunks are still supported, but have not been sped up.

This depends on a as-yet unreleased version of DAV.

This commit was sponsored by Thomas Hochstein.
2014-08-07 17:32:57 -04:00
Joey Hess
aacb0b2823 convert WebDAV to new special remote interface, adding new-style chunking support
Reusing http connection when operating on chunks is not done yet,
I had to submit some patches to DAV to support that. However, this is no
slower than old-style chunking was.

Note that it's a fileRetriever and a fileStorer, despite DAV using
bytestrings that would allow streaming. As a result, upload/download of
encrypted files is made a bit more expensive, since it spools them to temp
files. This was needed to get the progress meters to work.

There are probably ways to avoid that.. But it turns out that the current
DAV interface buffers the whole file content in memory, and I have
sent in a patch to DAV to improve its interfaces. Using the new interfaces,
it's certainly going to need to be a fileStorer, in order to read the file
size from the file (getting the size of a bytestring would destroy
laziness). It should be possible to use the new interface to make it be a
byteRetriever, so I'll change that when I get to it.

This commit was sponsored by Andreas Olsson.
2014-08-06 16:57:06 -04:00
Joey Hess
8025decc7f run Preparer to get Remover and CheckPresent actions
This will allow special remotes to eg, open a http connection and reuse it,
while checking if chunks are present, or removing chunks.

S3 and WebDAV both need this to support chunks with reasonable speed.

Note that a special remote might want to cache a http connection across
multiple requests. A simple case of this is that CheckPresent is typically
called before Store or Remove. A remote using this interface can certianly
use a Preparer that eg, uses a MVar to cache a http connection.

However, it's up to the remote to then deal with things like stale or
stalled http connections when eg, doing a series of downloads from a remote
and other places. There could be long delays between calls to a remote,
which could lead to eg, http connection stalls; the machine might even
move to a new network, etc.

It might be nice to improve this interface later to allow
the simple case without needing to handle the full complex case.
One way to do it would be to have a `Transaction SpecialRemote cache`,
where SpecialRemote contains methods for Storer, Retriever, Remover, and
CheckPresent, that all expect to be passed a `cache`.
2014-08-06 14:28:36 -04:00
Joey Hess
b4cf22a388 pushed checkPresent exception handling out of Remote implementations
I tend to prefer moving toward explicit exception handling, not away from
it, but in this case, I think there are good reasons to let checkPresent
throw exceptions:

1. They can all be caught in one place (Remote.hasKey), and we know
   every possible exception is caught there now, which we didn't before.
2. It simplified the code of the Remotes. I think it makes sense for
   Remotes to be able to be implemented without needing to worry about
   catching exceptions inside them. (Mostly.)
3. Types.StoreRetrieve.Preparer can only work on things that return a
   Bool, which all the other relevant remote methods already did.
   I do not see a good way to generalize that type; my previous attempts
   failed miserably.
2014-08-06 13:45:19 -04:00
Joey Hess
22c7a7a41a make local gcrypt storeKey be atomic
Reuse Remote.Directory's code.
2014-08-04 09:35:57 -04:00
Joey Hess
00c1468160 gcrypt: fix removal of key that does not exist
Generalized code from Remote.Directory and reused it.

Test suite now passes for local gcrypt repos.
2014-08-04 09:01:40 -04:00
Joey Hess
6f4592966d make testremote work with gcrypt repos
This involved making Remote.Gcrypt.gen expect a Repo with a regular,
non-gcrypt path. Since tht is what's stored as the Remote's gitrepo,
testremote can then modify it and feed it back into gen.
2014-08-04 08:42:04 -04:00
Joey Hess
d3778e631b remove write bit when storing to local gcrypt repo
Same as is done by rsync, and for regular git repos.
2014-08-03 20:25:44 -04:00
Joey Hess
d12becfdde fix removal from local gcrypt repo that had files stored using rsync
When files are stored using rsync, they have their write bit removed;
so does the directory they're put in. The local repo code did not turn
these bits back on, so failed to remove.
2014-08-03 20:21:46 -04:00
Joey Hess
8601f8f571 when not using rsync (for local gcrypt repo), display own progress meter 2014-08-03 20:19:04 -04:00
Joey Hess
1cd2273035 finally properly fixed ssh zombie leak
The leak was caused by the thread that sshd'd to send transferinfo
not waiting on its ssh. Doh.
2014-08-03 20:14:20 -04:00
Joey Hess
b35f7983ff convert gcrypt to new regime, including chunking
Some reorg of Remote.Rsync code to export the things gcrypt needs.
2014-08-03 17:31:10 -04:00
Joey Hess
f5f961215b finish making rsync support chunking
This breaks gcrypt, which relies on some internals of the rsync remote.
To fix next..
2014-08-03 16:54:57 -04:00
Joey Hess
6c450aad1d move ugly rsync zombie workaround
This reaping of any processes came to cause me problems when redoing the
rsync special remote -- a gpg process that was running gets waited on and
the place that then checks its return code fails.

I cannot reproduce any zombies when using the rsync special remote.
But I still can when using a normal git remote, accessed over ssh.
There is 1 zombie per file downloaded without this horrible hack enabled.

So, move the hack to only be used in that case.
2014-08-03 16:53:29 -04:00
Joey Hess
b3fe23b552 remove redundant progress meter display code
specialRemote handles all meter display, so this is redundant.
2014-08-03 16:18:40 -04:00
Joey Hess
4b16989e98 roll ChunkedEncryptable into Special and improve interface
Allow disabling progress displays, for eg, rsync.
2014-08-03 15:40:01 -04:00
Joey Hess
00f92a7e59 whitespace 2014-08-03 01:21:38 -04:00
Joey Hess
d05b7b9182 better byteRetriever
Make the byteRetriever be passed the callback that consumes the bytestring.

This way, there's no worries about the lazy bytestring not all being read
when the resource that's creating it is closed.

Which in turn lets bup, ddar, and S3 each switch from using an unncessary
fileRetriver to a byteRetriever. So, more efficient on chunks and encrypted
files.

The only remaining fileRetrievers are hook and external, which really do
retrieve to files.
2014-08-03 01:12:24 -04:00
Joey Hess
19b71cfb8f convert ddar to new ChunkedEncryptable API (but do not support chunking)
Since ddar de-deuplicates, I assume there is no benefit from chunking.

This has not been tested!
2014-08-02 18:58:48 -04:00
Joey Hess
b261df735d convert bup to new ChunkedEncryptable API (but do not support chunking)
bup already splits files and does rolling deltas, so there is no reason to
use chunking here.

The new API made it easier to add progress support for storeKey, so that's
done. Unfortunately, bup-split still outputs its own progress with -q,
so a little ugly, but not too bad.

Made dropping remove the branch for an object, for two reasons:

1. The new API calls removeKey to roll back a storeKey when the content
   changed unexpectedly.
2. So that testremote will be happy.

Also, fixed a bug that caused a crash when removing the branch for an
object in rollback.
2014-08-02 18:48:49 -04:00
Joey Hess
7f5cd868d7 hook: use ChunkedEncryptable 2014-08-02 17:25:16 -04:00
Joey Hess
0eb1f057c4 convert glacier to new ChunkedEncryptable API (but do not support chunking)
Chunking would complicate the assistant's code that checks when a pending
retrieval of a key from glacier is done. It would perhaps be nice to
support it to allow resuming, but not right now.

Converting to the new API still simplifies the code.
2014-08-02 16:59:07 -04:00
Joey Hess
32e4368377 S3: support chunking
The assistant defaults to 1MiB chunk size for new S3 special remotes.
Which will work around a couple of bugs:
  http://git-annex.branchable.com/bugs/S3_memory_leaks/
  http://git-annex.branchable.com/bugs/S3_upload_not_using_multipart/
2014-08-02 15:51:58 -04:00
Joey Hess
c3750901d8 specialize Preparer a bit, so resourcePrepare can be added
The forall a. in Preparer made resourcePrepare not seem to be usable, so
I specialized a to Bool. Which works for both Preparer Storer and
Preparer Retriever, but wouldn't let the Preparer be used for hasKey
as it currently stands.
2014-08-02 15:34:09 -04:00
Joey Hess
de0da0aece minor optimisation 2014-08-01 17:18:39 -04:00
Joey Hess
3991327d09 testremote: Test retrieveKeyFile resume
And fixed a bug found by these tests; retrieveKeyFile would fail
when the dest file was already complete.

This commit was sponsored by Bradley Unterrheiner.
2014-08-01 17:16:20 -04:00
Joey Hess
9636cfd9e1 fix a fenchpost bug when resuming chunked store at end
Discovered thanks to testremote command!
2014-08-01 16:29:39 -04:00
Joey Hess
8fce4e4bd7 fix chunk=0
Found by testremote
2014-08-01 15:36:11 -04:00
Joey Hess
b5ac627fee WebDAV: Dropped support for DAV before 0.6.1.
0.6.1 is in testing, and stable does not have DAV at all, so I can dispense
with this compatability code
2014-07-30 11:20:35 -04:00
Joey Hess
89416ba2d9 only chunk stable keys
The content of unstable keys can potentially be different in different
repos, so eg, resuming a chunked upload started by another repo would
corrupt data.
2014-07-30 10:34:39 -04:00
Joey Hess
a963d790d3 update progress after each chunk, at least
This way, when the remote implementation neglects to update progress,
there will still be a somewhat useful progress display, as long as chunks
are used.
2014-07-29 20:31:16 -04:00
Joey Hess
444944c7a9 fix cleanup of FileContents once done when them when retrieving 2014-07-29 20:27:13 -04:00
Joey Hess
53b87a859e optimise case of remote that retrieves FileContent, when chunks and encryption are not being used
No need to read whole FileContent only to write it back out to a file in
this case. Can just rename! Yay.

Also indidentially, fixed an attempt to open a file for write that was
already opened for write, which caused a crash and deadlock.
2014-07-29 20:10:14 -04:00
Joey Hess
c0dc134cde support chunking for all external special remotes!
Removing code and at the same time adding great features, including
upload/download resuming.

This commit was sponsored by Romain Lenglet.
2014-07-29 18:50:20 -04:00
Joey Hess
bc9e4697b9 better type for Retriever
Putting a callback in the Retriever type allows for the callback to
remove the retrieved file when it's done with it.

I did not really want to make Retriever be fixed to Annex Bool,
but when I tried to use Annex a, I got into some type of type mess.
2014-07-29 18:41:41 -04:00
Joey Hess
47e522979c allow Retriever action to update the progress meter
Needed for eg, Remote.External.

Generally, any Retriever that stores content in a file is responsible for
updating the meter, while ones that procude a lazy bytestring cannot update
the meter, so are not asked to.
2014-07-29 17:18:49 -04:00
Joey Hess
1d263e1e7e lift types from IO to Annex
Some remotes like External need to run store and retrieve actions in Annex,
not IO. In order to do that lift, I had to dive pretty deep into the
utilities, making Utility.Gpg and Utility.Tmp be partly converted to using
MonadIO, and Control.Monad.Catch for exception handling.

There should be no behavior changes in this commit.

This commit was sponsored by Michael Barabanov.
2014-07-29 16:28:44 -04:00
Joey Hess
f5af470875 add ContentSource type, for remotes that act on files rather than ByteStrings
Note that currently nothing cleans up a ContentSource's file, when eg,
retrieving chunks.
2014-07-29 15:16:12 -04:00
Joey Hess
216fdbd6bd fix non-checked hasKeyChunks 2014-07-29 15:07:32 -04:00
Joey Hess
58f727afdd resume interrupted chunked uploads
Leverage the new chunked remotes to automatically resume uploads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also allow starting an upload from one repository,
interrupting it, and then resuming the upload to the same remote from
an entirely different repository.

Note that I added a comment that storeKey should atomically move the content
into place once it's all received. This was already an undocumented
requirement -- it's necessary for hasKey to work reliably. This resume code
just uses hasKey to find the first chunk that's missing.

Note that if there are two uploads of the same key to the same chunked remote,
one might resume at the point the other had gotten to, but both will then
redundantly upload. As before.

In the non-resume case, this adds one hasKey call per storeKey, and only
if the remote is configured to use chunks. Future work: Try to eliminate that
hasKey. Notice that eg, `git annex copy --to` checks if the key is present
before sending it, so is already running hasKey.. which could perhaps
be cached and reused.

However, this additional overhead is not very large compared with
transferring an entire large file, and the ability to resume
is certianly worth it. There is an optimisation in place for small files,
that avoids trying to resume if the whole file fits within one chunk.

This commit was sponsored by Georg Bauer.
2014-07-28 14:35:52 -04:00
Joey Hess
153ace4524 fix handling of removal of keys that are not present 2014-07-28 14:14:01 -04:00
Joey Hess
80cc554c82 add ChunkMethod type and make Logs.Chunk use it, rather than assuming fixed size chunks (so eg, rolling hash chunks can be supported later)
If a newer git-annex starts logging something else in the chunk log, it
won't be used by this version, but it will be preserved when updating the
log.
2014-07-28 13:19:08 -04:00
Joey Hess
9d4a766cd7 resume interrupted chunked downloads
Leverage the new chunked remotes to automatically resume downloads.
Sort of like rsync, although of course not as efficient since this
needs to start at a chunk boundry.

But, unlike rsync, this method will work for S3, WebDAV, external
special remotes, etc, etc. Only directory special remotes so far,
but many more soon!

This implementation will also properly handle starting a download
from one remote, interrupting, and resuming from another one, and so on.

(Resuming interrupted chunked uploads is similarly doable, although
slightly more expensive.)

This commit was sponsored by Thomas Djärv.
2014-07-27 18:56:32 -04:00
Joey Hess
2996f0eb05 use existing chunks even when chunk=0
When chunk=0, always try the unchunked key first. This avoids the overhead
of needing to read the git-annex branch to find the chunkcount.

However, if the unchunked key is not present, go on and try the chunks.

Also, when removing a chunked key, update the chunkcounts even when
chunk=0.
2014-07-27 02:13:51 -04:00
Joey Hess
7afb057d60 reorg 2014-07-27 01:24:34 -04:00
Joey Hess
bffd0e34b3 comment typo 2014-07-27 01:22:51 -04:00
Joey Hess
c3af4897c0 faster storeChunks
No need to process each L.ByteString chunk, instead ask it to split.

Doesn't seem to have really sped things up much, but it also made the code
simpler.

Note that this does (and already did) buffer in memory. It seems that only
the directory special remote could take advantage of streaming chunks to
files w/o buffering, so probably won't add an interface to allow for that.
2014-07-27 01:18:38 -04:00
Joey Hess
f3e47b16a5 better Preparer interface
This will allow things like WebDAV to opean a single persistent connection
and reuse it for all the chunked data.

The crazy types allow for some nice code reuse.
2014-07-27 00:30:04 -04:00
Joey Hess
9a8c4bb21f improve exception handling
Push it down from needing to be done in every Storer,
to being checked once inside ChunkedEncryptable.

Also, catch exceptions from PrepareStorer and PrepareRetriever,
just in case..
2014-07-26 23:26:10 -04:00
Joey Hess
867fd116a7 better exception display 2014-07-26 23:01:44 -04:00
Joey Hess
0d89b65bfc fix key checking when a directory special remote's directory is missing
The best thing to do in this case is return Left, so that anything that
tries to access it will fail.
2014-07-26 22:52:47 -04:00
Joey Hess
93be3296fc fix another fallback bug 2014-07-26 22:47:52 -04:00
Joey Hess
86e8532c0a allM has slightly better memory use 2014-07-26 22:34:40 -04:00
Joey Hess
67975bf50d fix fallback to other chunk size when first does not have it 2014-07-26 22:25:50 -04:00
Joey Hess
adb6ca62ca fix build 2014-07-26 20:21:36 -04:00
Joey Hess
34c6fdf5e3 fix build 2014-07-26 20:21:10 -04:00
Joey Hess
b2922c1d6d convert directory special remote to using ChunkedEncryptable
And clean up legacy chunking code, which is in its own module now.

So much cleaner!

This commit was sponsored by Henrik Ahlgren
2014-07-26 20:19:24 -04:00
Joey Hess
1400cbb032 Support for remotes that are chunkable and encryptable.
I'd have liked to keep these two concepts entirely separate,
but that are entagled: Storing a key in an encrypted and chunked remote
need to generate chunk keys, encrypt the keys, chunk the data, encrypt the
chunks, and send them to the remote. Similar for retrieval, etc.

So, here's an implemnetation of all of that.

The total win here is that every remote was implementing encrypted storage
and retrival, and now it can move into this single place. I expect this
to result in several hundred lines of code being removed from git-annex
eventually!

This commit was sponsored by Henrik Ahlgren.
2014-07-26 20:14:31 -04:00
Joey Hess
d4d68f57e5 finish up basic chunked remote groundwork
Chunk retrieval and reassembly, removal, and checking if all necessary
chunks are present.

This commit was sponsored by Damien Raude-Morvan.
2014-07-26 20:11:41 -04:00
Joey Hess
cf83697c33 reorg 2014-07-26 12:04:35 -04:00
Joey Hess
e4cb50db33 Merge branch 'master' into newchunks 2014-07-26 12:02:48 -04:00
Joey Hess
005aded3e0 Fix cost calculation for non-encrypted remotes.
Encyptable types of remotes that were not actually encrypted still had
the encryptedRemoteCostAdj applied to their configured cost, which was a
bug.
2014-07-25 17:29:59 -04:00
Joey Hess
9e8a4a0950 support new style chunking in directory special remote
Only when storing non-encrypted so far, not retrieving or checking if a key
is present or removing.

This commit was sponsored by Renaud Casenave-Péré.
2014-07-25 16:21:01 -04:00
Joey Hess
ab4cce4114 core implementation of new style chunking
Not yet used by any special remotes, but should not be too hard to add it
to most of them.

storeChunks is the hairy bit! It's loosely based on
Remote.Directory.storeLegacyChunked. The object is read in using a lazy
bytestring, which is streamed though, creating chunks as needed, without
ever buffering more than 1 chunk in memory.

Getting the progress meter update to work right was also fun, since
progress meter values are absolute. Finessed by constructing an offset
meter.

This commit was sponsored by Richard Collins.
2014-07-25 16:20:32 -04:00
Joey Hess
ceea04e77f move meteredWriteFileChunks out of legacy 2014-07-24 16:42:35 -04:00
Joey Hess
e2c44bf656 implement chunk logs
Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.

This commit was sponsored by Frank Thomas.
2014-07-24 16:23:36 -04:00
Joey Hess
bbdb2c04d5 improve chunk data types 2014-07-24 15:08:07 -04:00
Joey Hess
9e2d49d441 prepare for new style chunking
Moved old legacy chunking code, and cleaned up the directory and webdav
remotes use of it, so when no chunking is configured, that code is not
used.

The config for new style chunking will be chunk=1M instead of chunksize=1M.

There should be no behavior changes from this commit.

This commit was sponsored by Andreas Laas.
2014-07-24 14:49:22 -04:00
Joey Hess
ec5ed2af9d Set gcrypt-publish-participants when setting up a gcrypt repository, to avoid unncessary passphrase prompts.
This is a security/usability tradeoff. To avoid exposing the gpg key ids
who can decrypt the repository, users can unset
gcrypt-publish-participants.

The gcrypt-publish-participants option is available in my fork of
git-remote-gcrypt.

This commit was sponsored by Christopher Kernahan.
2014-07-15 17:33:14 -04:00
Joey Hess
cdf61071bc optimise handling of unavailable repos
The exception handling resulted in git config --list being run twice
for unavailable repos. This dials it back down to running it only once.
2014-07-15 14:45:27 -04:00
Joey Hess
bd514eb65a catch exception when repo is really not available 2014-07-15 14:39:31 -04:00
Joey Hess
522a0922b8 sync: Fix git sync with local git remotes even when they don't have an annex.uuid set.
Catch an exception when ensureInitialized is run in a non-initted
repository. In this case, just read the git config, so that the Git.Repo
object is not LocalUnknown, which is what is used to represent remotes
on eg, drives that are not connected.

The assistant already got this right, and like with the assistant, this
causes an implicit git-annex init of the local remote on the second sync,
once the git-annex branch has been pushed to it.
See this comment for more analysis:
http://git-annex.branchable.com/todo/Recovering_from_a_bad_sync/#comment-64e469a2c1969829ee149cbb41b1c138

This commit was sponsored by jscit.
2014-07-15 14:27:43 -04:00
Joey Hess
604740b720 S3: Deal with AWS ACL configurations that do not allow creating or checking the location of a bucket, but only reading and writing content to it. 2014-07-11 15:21:43 -04:00
Joey Hess
26ee27915a refactor locking 2014-07-10 00:32:23 -04:00
Joey Hess
a44fd2c019 export CreateProcess fields from Utility.Process
update code to avoid cwd and env redefinition warnings
2014-06-10 19:20:14 -04:00
Joey Hess
2f84659d51 fix build with old versions of bytestring 2014-06-06 14:04:35 -04:00
Joey Hess
0c2a14e4aa fix dodgy use of Char8
I don't know if this was a bug, but I don't know if it was not a bug
either.

See also,
http://git-annex.branchable.com/bugs/Truncated_file_transferred_via_S3/
where the file is not truncated, but mangled..
2014-05-27 20:31:25 -04:00
Joey Hess
c07343e4f7 initremote/enableremote: Basic support for using with regular git remotes
initremote stores the location of an already existing git remote, and
enableremote setups up a remote using its stored location.
2014-05-22 13:42:17 -04:00
Joey Hess
c34b5e09f8 factor out getRemoteGitConfig 2014-05-16 16:08:20 -04:00
Fraser Tweedale
4eb72392b4 execute remote.<name>.annex-shell on remote, if set
It is useful to be able to specify an alternative git-annex-shell
program to execute on the remote, e.g., to run a version not on the
PATH.  Use remote.<name>.annex-shell if specified, instead of the
default "git-annex-shell" i.e., first so-named executable on the
PATH.
2014-05-16 15:46:43 -04:00
Joey Hess
0b899fa2f1 show a much longer message when annex-ignore is automatically set, to help the user fix their problem 2014-05-16 12:58:50 -04:00
Joey Hess
b1cddea7e4 remove odd character that snuck in somehow and broke build 2014-05-15 16:36:19 -04:00
Robie Basak
4184566627 ddar special remote 2014-05-15 16:32:44 -04:00
Joey Hess
f00cb21037 Bring back rsync -p, but only when git-annex is running on a non-crippled file system. This is a better approach to fix #700282 while not unncessarily losing file permissions on non-crippled systems. 2014-04-17 14:31:42 -04:00
Joey Hess
5af30678c7 factored out Utility.SimpleProtocol from the external special remote implementation 2014-04-05 13:29:28 -04:00
Joey Hess
3b8d5f03bb Fix glacier repo creation bug
Version 5.20140227 broke creation of glacier repositories, not including
the datacenter and vault in their configuration. This bug is fixed, but
glacier repositories set up with the broken version of git-annex need to
have the datacenter and vault set in order to be usable. This can be done
using git annex enableremote to add the missing settings. For details, see
http://git-annex.branchable.com/bugs/problems_with_glacier/
2014-03-27 14:30:36 -04:00
Alberto Berti
0f7c2dd39b Fix thaoe remote to work with latest tahoe (v. 1.10.0) 2014-03-26 00:31:02 +01:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
40b599eff2 rsync special remote: Fix slashes when used on Windows. 2014-03-18 13:02:10 -04:00
Joey Hess
b63276309e clean up cleanup action enumeration 2014-03-13 19:06:26 -04:00
Joey Hess
4d06037fdd Fix zombie leak and general inneficiency when copying files to a local git repo.
Benchmarking this with 1000 small files being copied, the time reduced from
15.98s to 14.64s -- an 8% improvement in the non-data-transfer overhead of
git-annex copy.
2014-03-06 17:13:27 -04:00
Joey Hess
aa377ed567 webdav: When built with a new enough haskell DAV (0.6), disable the http response timeout, which was only 5 seconds. 2014-03-05 13:51:54 -04:00
Joey Hess
1f98d6fb00 glacier: Pass --region to glacier checkpresent.
I suppose this is not necessary when it has a local cache, so I didn't
notice it was missing.
2014-03-04 23:22:24 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
2aeb0750f9 more DAV url fixes for windows 2014-02-25 16:16:14 -04:00
Joey Hess
b1931d1cc1 add protocol-level debugging for dav 2014-02-25 15:58:44 -04:00
Joey Hess
2b66aaa763 Windows webdav: Fix DOS path separator bug.
Use posix </> etc for urls.
2014-02-25 15:26:33 -04:00
Joey Hess
360ecb9f35 fix bare repo optimisation on Windows 2014-02-25 13:47:09 -04:00
Joey Hess
06142f4943
fix #740010 properly 2014-02-25 01:55:01 -04:00
Joey Hess
003fc2b7e1
add UrlOptions sum type 2014-02-24 22:00:25 -04:00
Joey Hess
c69d6eb035 Make annex.web-options be used in several places that call curl. 2014-02-24 21:29:37 -04:00
Joey Hess
d5a2b498f6 webdav: When built with DAV 0.6.0, use the new DAV monad to avoid locking files, which is not needed by git-annex's use of webdav, and does not work on Box.com. 2014-02-24 18:21:51 -04:00
Joey Hess
45e7040142 webapp: Fix creation of box.com, S3, and Glacier repositories, broken in 5.20140221. 2014-02-24 15:29:17 -04:00
Joey Hess
ded4ab5704 Fix handling of rsync remote urls containing a username, including rsync.net.
This breakage seems to have been caused way back in a1eded86,
but I am pretty sure rsync.net support has not been entirely
broken since last April. AFAICS, the generated .ssh/config
has not changed since then -- it has never included a Username setting
line. So, I am puzzled at when this reversion was introduced.

Note that the breakage only affected checkpresent and remove. Upload and
download use the ssh connection caching, which includes a -l username.
2014-02-21 13:20:57 -04:00
Joey Hess
7d288d83c9 glacier: Do not try to run glacier value create when an existing glacier remote is enabled. 2014-02-20 15:56:26 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
7b19c7d25b cleanup thanks to Utility.PID 2014-02-11 15:39:51 -04:00
Joey Hess
fa24ba2520 plumb creds from webapp to initremote
Avoids abusing setting environment variables, which was always a hack
and won't work on windows.
2014-02-11 14:07:56 -04:00
Joey Hess
e885080d06 Add progress display for transfers to/from external special remotes. 2014-02-10 21:33:22 -04:00
Joey Hess
08afe3a1f6 fix failing test case on Windows
ensure file being modified is all read before it's opened for write
2014-02-03 10:20:18 -04:00
Joey Hess
1572c460e8 avoid using openFile when withFile can be used
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
2014-02-03 10:19:06 -04:00
Joey Hess
089c0109a2 Added ways to configure rsync options to be used only when uploading or downloading from a remote. Useful to eg limit upload bandwidth. 2014-02-02 16:06:34 -04:00
Joey Hess
070ed4a766 change a few renameFile's to rename
AFAIK, none of these ever operate on directories, but nor do I want to
explicitly check if they're files and fail if not.
2014-01-29 15:21:02 -04:00
Joey Hess
891c85cd88 use locking on Windows
This is all the easy cases, where there was already a separate lock file.
2014-01-28 14:42:03 -04:00
Joey Hess
74b101d1dd reorg 2014-01-26 16:36:31 -04:00
Joey Hess
1ca111620d reorg 2014-01-26 16:32:55 -04:00
Joey Hess
5fc2d760ea Optimise non-bare http remotes; no longer does a 404 to the wrong url every time before trying the right url. Needs annex-bare to be set to false, which is done when initially probing the uuid of a http remote. 2014-01-26 13:03:25 -04:00
Joey Hess
b40df4f0d0 reorganize numcopies code (no behavior changes)
Move stuff into Logs.NumCopies. Add a NumCopies newtype.

Better names for various serialization classes that are specific to one
thing or another.
2014-01-21 16:08:59 -04:00
Joey Hess
b6ba0bd556 sync --content: New option that makes the content of annexed files be transferred.
Similar to the assistant, this honors any configured preferred content
expressions.

I am not entirely happpy with the implementation. It would be nicer if
the seek function returned a list of actions which included the individual
file gets and copies and drops, rather than the current list of calls to
syncContent. This would allow getting rid of the somewhat reundant display
of "sync file [ok|failed]" after the get/put display.

But, do that, withFilesInGit would need to somehow be able to construct
such a mixed action list. And it would be less efficient than the current
implementation, which is able to reuse several values between eg get and
drop.

Note that currently this does not try to satisfy numcopies when
getting/putting files (numcopies are of course checked when dropping
files!) This makes it like the assistant, and unlike get --auto
and copy --auto, which do duplicate files when numcopies is not yet
satisfied. I don't know if this is the right decision; it only seemed to
make sense to have this parallel the assistant as far as possible to start
with, since I know the assistant works.

This commit was sponsored by Øyvind Andersen Holm.
2014-01-19 17:49:54 -04:00
Joey Hess
0d544649d0 catch exception checking if url exists when network is disconnected
Leads to better failure message (or possibly fallback to another remote).
2014-01-16 21:24:17 -04:00
Joey Hess
207ac67aaa avoid needing a build-dep on hxt for Data.AssocList 2014-01-14 16:42:10 -04:00
Joey Hess
d07f2d7865 Fix a long-standing bug that could cause the wrong index file to be used when committing to the git-annex branch, if GIT_INDEX_FILE is set in the environment. This typically resulted in git-annex branch log files being committed to the master branch and later showing up in the work tree. (These log files can be safely removed.) 2014-01-14 15:36:33 -04:00
Joey Hess
c20f31a1ad add GETAVAILABILITY to external special remote protocol
And some reworking of types, and added an annex-availability git config
setting.
2014-01-13 14:41:10 -04:00
Joey Hess
57edce8ad9 external special remote protocol: Added GETGITDIR. 2014-01-13 14:00:09 -04:00
Joey Hess
d8fb366cf7 forgot to delay for 1 second in busy wait loop 2014-01-08 19:58:47 -04:00
Joey Hess
215ea66471 only run tahoe start once 2014-01-08 19:17:18 -04:00
Joey Hess
93161d0dea copyright year 2014-01-08 16:29:15 -04:00
Joey Hess
85272d8a98 Added tahoe special remote.
Known problems:

1. Tries to tahoe start when daemon is already running.

2. If multiple tahoe remotes are set up on the same computer,
   they will have the same node.url configured by default,
   and this confuses tahoe commands.

This commit was sponsored by LeastAuthority.com
2014-01-08 16:14:41 -04:00
Joey Hess
5e23dfabd6 add DEBUG 2014-01-07 13:23:58 -04:00
Joey Hess
614986b19a show PATH on failure 2014-01-07 12:59:26 -04:00
Joey Hess
3e68c1c2fd add remote state logs
This allows a remote to store a piece of arbitrary state associated with a
key. This is needed to support Tahoe, where the file-cap is calculated from
the data stored in it, and used to retrieve a key later. Glacier also would
be much improved by using this.

GETSTATE and SETSTATE are added to the external special remote protocol.

Note that the state is left as-is even when a key is removed from a remote.
It's up to the remote to decide when it wants to clear the state.

The remote state log, $KEY.log.rmt, is a UUID-based log. However,
rather than using the old UUID-based log format, I created a new variant
of that format. The new varient is more space efficient (since it lacks the
"timestamp=" hack, and easier to parse (and the parser doesn't mess with
whitespace in the value), and avoids compatability cruft in the old one.

This seemed worth cleaning up for these new files, since there could be a
lot of them, while before UUID-based logs were only used for a few log
files at the top of the git-annex branch. The transition code has also
been updated to handle these new UUID-based logs.

This commit was sponsored by Daniel Hofer.
2014-01-03 16:35:57 -04:00
Joey Hess
f7727d2df1 Remotes can now be made read-only, by setting remote.<name>.annex-readonly 2014-01-02 13:12:32 -04:00
Joey Hess
8e3032df2d added GETWANTED, SETWANTED for Tobias's flickr remote
This was unexpectedly difficult because of a depdenency cycle. To parse a
preferred content expression involves several things that need to operate
on the list of remotes. Which needs Remote.External. The only way to avoid
this cycle (I tried breaking it at several points) was to skip parsing the
expression in SETWANTED.

That's sorta ok, because git-annex already has to deal with unparsable
preferred content expressions being stored, in order to handle eg,
upgrades. But I'm still not very happy that I cannot check it.

I feel this is a strong indication that I need to beware of further
bloating the special remote protocol interface.
2014-01-01 20:12:20 -04:00
Joey Hess
ed1fcab6d7 external special remote protocol: Added GETUUID. 2013-12-31 13:50:18 -04:00
Joey Hess
054e4f17e2 implement PREPARE-FAILURE for Tobias 2013-12-29 13:39:25 -04:00
Joey Hess
aa97a33dde better error messages when external special remote exits unexpectedly or is not in PATH 2013-12-27 17:14:44 -04:00
Joey Hess
445b7b41b9 add credential storage support for external special remotes & update example 2013-12-27 16:01:43 -04:00
Joey Hess
551573570f better protocol error message, indicate if the command was able to be parsed or was misplaced 2013-12-27 14:03:35 -04:00
Joey Hess
21342cae63 flush handle after writing message 2013-12-27 13:22:06 -04:00
Joey Hess
fa6f404a5f fix deadlock when state TMVar is empty 2013-12-27 13:17:22 -04:00
Joey Hess
9125a25738 defer SETSTATE and GETSTATE for now
TAHOE-LAFS may use these eventually, but that's TBD and none of git-annex's
own special remotes need that, except for the web special remote's urls.
2013-12-27 13:07:56 -04:00
Joey Hess
a7f3724e21 implement GETCONFIG and SETCONFIG
Changed protocol spec to make SETCONFIG only store it persistently when run
during INITREMOTE. I see no reason to support storing it persistently at
other times, and doing so would unnecessarily complicate the code.

Also, letting that be done would probably result in use for storing data that
doesn't really belong there, and special remote authors who don't
understand how the union merging works would probably be surprised the
results.
2013-12-27 12:37:23 -04:00
Joey Hess
91c9e98168 support encryption 2013-12-27 12:21:55 -04:00
Joey Hess
5d8ff64dc1 make --debug show transcript of special remote protocol messages 2013-12-27 03:10:00 -04:00
Joey Hess
3289155e28 don't send PREPARE before INITREMOTE
That complicated special remote programs, because they had to avoid making
PREPARE fail if some configuration is missing, because the remote might not
be initialized yet. Instead, complicate git-annex slightly by only sending
PREPARE immediately before some other request other than INITREMOTE (or
PREPARE of course).
2013-12-27 02:49:10 -04:00
Joey Hess
6d504b57e7 make some requests optional, simplify and future-proof protocol more 2013-12-27 02:11:06 -04:00
Joey Hess
6c565ec905 external special remotes mostly implemented (untested)
This has not been tested at all. It compiles!

The only known missing things are support for encryption, and for get/set
of special remote configuration, and of key state. (The latter needs
separate work to add a new per-key log file to store that state.)

Only thing I don't much like is that initremote needs to be passed both
type=external and externaltype=foo. It would be better to have just
type=foo

Most of this is quite straightforward code, that largely wrote itself given
the types. The only tricky parts were:

* Need to lock the remote when using it to eg make a request, because
  in theory git-annex could have multiple threads that each try to use
  a remote at the same time. I don't think that git-annex ever does
  that currently, but better safe than sorry.

* Rather than starting up every external special remote program when
  git-annex starts, they are started only on demand, when first used.
  This will avoid slowdown, especially when running fast git-annex query
  commands. Once started, they keep running until git-annex stops, currently,
  which may not be ideal, but it's hard to know a better time to stop them.

* Bit of a chicken and egg problem with caching the cost of the remote,
  because setting annex-cost in the git config needs the remote to already
  be set up. Managed to finesse that.

This commit was sponsored by Lukas Anzinger.
2013-12-26 18:23:13 -04:00
Joey Hess
8803e36814 future-proofing 2013-12-25 20:04:31 -04:00
Joey Hess
1dc930063a basic data types and serialization for external special remote protocol
This is mostly straightforward, but did turn out quite nicely stronly
typed, and with a quite nice automatic tokenization and parsing of received
messages.

Made a few minor changes to the protocol to clear up ambiguities and make
it easier to parse. Note particularly that setting remote configuration
is moved to a separate command, which allows a remote to set arbitrary data.
2013-12-25 17:54:57 -04:00
Joey Hess
011b8bc7ec pull in Win32-extras, to be able to get current process id in Windows
Fixed up a number of things that had worked around there not being a way to
get that.

Most notably, transfer info files on windows now include the process id,
since no locking is currently done. This means the file format varies
between windows and unix.
2013-12-11 00:15:10 -04:00
Joey Hess
e425a966ed Deal with box.com changing the url of their webdav endpoint.
Use new url when making new remotes.

Transparently rewrite old url to new for existing remotes.
2013-12-02 16:01:20 -04:00
Joey Hess
0a63ed563f rsync special remote: Fix fallback mode for rsync remotes that use hashDirMixed. Closes: #731142 2013-12-02 12:53:39 -04:00
Joey Hess
58db042033 map: Work when there are gcrypt remotes. 2013-11-04 14:14:44 -04:00
Joey Hess
2203690822 really fix gcrypt for 7be69a2491
Fixed all the other ones, but forgot to fix gcrypt!
2013-11-02 20:10:54 -04:00
Joey Hess
b2cca95d1c clean import list 2013-11-02 19:55:18 -04:00
Joey Hess
a04fe350b8 fix build 2013-11-02 19:54:59 -04:00
Joey Hess
7be69a2491 gcrypt, bup: Fix bug that prevented using these special remotes with encryption=pubkey.
I think both of these are all that's affected, but I went ahead and fixed
all the remotes that set their config to M.empty to instead store the
actual config. Who knows what will expect it to be actually present in
future, the Remote instance of getGpgEncParams came to..
2013-11-02 16:37:28 -04:00
Joey Hess
7ed8e87a34 assistant: Support repairing git remotes that are locally accessible
(eg, on removable drives)

gcrypt remotes are not yet handled.

This commit was sponsored by Sören Brunk.
2013-10-27 15:38:59 -04:00
Joey Hess
5756636486 directory, webdav: Fix bug introduced in version 4.20131002 that caused the chunkcount file to not be written. Work around repositories without such a file, so files can still be retreived from them. 2013-10-26 15:03:12 -04:00
Joey Hess
06ea92282f fix inverted logic when determining whether to write a chunkcount file
late-night hlint bit me on this one..
Reviewed c1990702e9 and
the rest of it seems ok
2013-10-26 14:08:29 -04:00
Joey Hess
c76c94a0da S3: Try to ensure bucket name is valid for archive.org. 2013-10-16 16:35:47 -04:00
Joey Hess
a6e9386d39 fix remote fsck to run in remote 2013-10-14 15:05:29 -04:00
Joey Hess
c78aaed317 ye olde inverted logic 2013-10-14 12:26:46 -04:00
Joey Hess
1ffb3bb0ba add remote fsck interface
Currently only implemented for local git remotes. May try to add support
to git-annex-shell for ssh remotes later. Could concevably also be
supported by some special remote, although that seems unlikely.

Cronner user this when available, and when not falls back to
fsck --fast --from remote

git annex fsck --from does not itself use this interface.
To do so, I would need to pass --fast and all other options that influence
fsck on to the git annex fsck that it runs inside the remote. And that
seems like a lot of work for a result that would be no better than
cd remote; git annex fsck
This may need to be revisited if git-annex-shell gets support, since it
may be the case that the user cannot ssh to the server to run git-annex
fsck there, but can run git-annex-shell there.

This commit was sponsored by Damien Diederen.
2013-10-11 16:03:18 -04:00
Joey Hess
747f5b123c url size fixes
addurl: Improve message when adding url with wrong size to existing file.
Before the message suggested the url didn't exist.

Fixed handling of URL keys that have no recorded size. Before, if the key
has no size, the url also had to not declare any size, which was unlikely
and wrong, or it was taken to not exist. This probably would mostly affect
keys that were added to the annex with addurl --relaxed.
2013-10-11 13:05:00 -04:00
Joey Hess
571fe4999b remove __WINDOWS__ ifdef 2013-10-06 17:23:30 -04:00
Joey Hess
0ede6b7def typoe and debug info 2013-10-01 19:10:45 -04:00
Joey Hess
bddfbef8be git-annex-shell gcryptsetup command
This was the least-bad alternative to get dedicated key gcrypt repos
working in the assistant.
2013-10-01 17:20:51 -04:00
Joey Hess
1536ebfe47 Disable receive.denyNonFastForwards when setting up a gcrypt special remote
gcrypt needs to be able to fast-forward the master branch. If a git
repository is set up with git init --shared --bare, it gets that set, and
pushing to it will then fail, even when it's up-to-date.
2013-10-01 15:23:48 -04:00
Joey Hess
101099f7b5 fix probing for local gcrypt repos 2013-10-01 14:38:20 -04:00
Joey Hess
995e1e3c5d fix transferring to gcrypt repo from direct mode repo
recvkey was told it was receiving a HMAC key from a direct mode repo,
and that confused it into rejecting the transfer, since it has no way to
verify a key using that backend, since there is no HMAC backend.

I considered making recvkey skip verification in the case of an unknown
backend. However, that could lead to bad results; a key can legitimately be
in the annex with a backend that the remote git-annex-shell doesn't know
about. Better to keep it rejecting if it cannot verify.

Instead, made the gcrypt special remote not set the direct mode flag when
sending (and receiving) files.

Also, added some recvkey messages when its checks fail, since otherwise
all that is shown is a confusing error message from rsync when the remote
git-annex-shell exits nonzero.
2013-10-01 14:19:24 -04:00
Joey Hess
12f6b9693a Send a git-annex user-agent when downloading urls.
Overridable with --user-agent option.

Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.

This commit sponsored by Michael Zehrer.
2013-09-28 14:35:21 -04:00
Joey Hess
c6032b0dab clean up some ugly code 2013-09-27 19:52:36 -04:00
Joey Hess
e864c8d033 blind enabling gcrypt repos on rsync.net
This pulls off quite a nice trick: When given a path on rsync.net, it
determines if it is an encrypted git repository that the user has
the key to decrypt, and merges with it. This is works even when
the local repository had no idea that the gcrypt remote exists!

(As previously done with local drives.)

This commit sponsored by Pedro Côrte-Real
2013-09-27 16:21:56 -04:00
Joey Hess
e0b99f3960 support ssh://host/~/dir
When generating the path for rsync, /~/ is not valid, so change to
just host:dir

Note that git remotes specified in host:dir form are internally converted
to the ssh:// url form, so this was especially needed..
2013-09-26 15:02:27 -04:00