Commit graph

764 commits

Author SHA1 Message Date
Joey Hess
2def1d0a23 other 80% of avoding verification when hard linking to objects in shared repo
In c6632ee5c8, it actually only handled
uploading objects to a shared repository. To avoid verification when
downloading objects from a shared repository, was a lot harder.

On the plus side, if the process of downloading a file from a remote
is able to verify its content on the side, the remote can indicate this
now, and avoid the extra post-download verification.

As of yet, I don't have any remotes (except Git) using this ability.
Some more work would be needed to support it in special remotes.

It would make sense for tahoe to implicitly verify things downloaded from it;
as long as you trust your tahoe server (which typically runs locally),
there's cryptographic integrity. OTOH, despite bup being based on shas,
a bup repo under an attacker's control could have the git ref used for an
object changed, and so a bup repo shouldn't implicitly verify. Indeed,
tahoe seems unique in being trustworthy enough to implicitly verify.
2015-10-02 14:35:12 -04:00
Joey Hess
c6632ee5c8 avoid verification when hard linking to objects in shared repository
Such a repository is implicitly trusted, so there's no point.
2015-10-02 12:36:03 -04:00
Joey Hess
2fb3722ce9 Do verification of checksums of annex objects downloaded from remotes.
* When annex objects are received into git repositories, their checksums are
  verified then too.
* To get the old, faster, behavior of not verifying checksums, set
  annex.verify=false, or remote.<name>.annex-verify=false.
* setkey, rekey: These commands also now verify that the provided file
  matches the key, unless annex.verify=false.
* reinject: Already verified content; this can now be disabled by
  setting annex.verify=false.

recvkey and reinject already did verification, so removed now duplicate
code from them. fsck still does its own verification, which is ok since it
does not use getViaTmp, so verification doesn't happen twice when using fsck
--from.
2015-10-01 15:56:39 -04:00
Joey Hess
807ba6a903 refactor 2015-10-01 14:07:06 -04:00
Joey Hess
9e3ac97608 avoid deprecation warnings when built with http-client >= 0.4.18
Since I want git-annex to keep building on debian stable, I need to still
support the old http-client, which required explicit calls to
closeManager, or use of withManager to get Managers to close at appropriate
times. This is not needed in the new version, and so they added a
deprecation warning. IMHO much too early, because look at the mess I had to
go through to avoid that deprecation warning while supporting both
versions..
2015-10-01 13:48:56 -04:00
Joey Hess
20205b6073 avoid hard dependency on new version of aws 2015-09-22 11:04:26 -04:00
Joey Hess
26d6566307 S3 storage classes expansion
Added support for storageclass=STANDARD_IA to use Amazon's
new Infrequently Accessed storage.

Also allows using storageclass=NEARLINE to use Google's NearLine storage.

The necessary changes to aws to support this are in
https://github.com/aristidb/aws/pull/176
2015-09-17 17:20:01 -04:00
Joey Hess
ffa8221517 annex.hardlink extended to also try to use hard links when copying from the repository to a remote.
Also, it used to only check that one of the repos was not in direct mode;
now when either repo is direct mode, annex.hardlink won't have an effect.
2015-09-14 12:13:38 -04:00
Joey Hess
0390efae8c support gpg.program
When gpg.program is configured, it's used to get the command to run for
gpg. Useful on systems that have only a gpg2 command or want to use it
instead of the gpg command.
2015-09-09 18:06:49 -04:00
Joey Hess
6dad09a823 disable whereisKey for encrypted or chunked remotes
This only makes sense for public repos, that are not chunked, so
that there's a 1:1 from Key in the git-annex repo to file on the remote.
Rather than making every remote implementation deal with that, just disable
whereisKey when it doesn't make sense.
2015-08-19 14:16:01 -04:00
Joey Hess
858104078a make whereis show urls when web remote does not have content
This is needed when external special remotes register an url for a key.
2015-08-17 11:35:34 -04:00
Joey Hess
3a5b7dbaf0 External special remotes can now be built that can be used in readonly mode, where git-annex downloads content from the remote using regular http.
Note that, if an url is added to the web log for such a remote, it's not
distinguishable from another url that might be added for the web remote.
(Because the web log doesn't distinguish which remote owns a plain url.
Urls with a downloader set are distinguishable, but we're not using them
here.)

This seems ok-ish.. In such a case, both remotes will try to use both
urls, and both remotes should be able to.

The only issue I see is that dropping a file from the web remote will
remove both urls in this case. This is not often done, and could even
be considered a feature, I suppose.
2015-08-17 11:22:22 -04:00
Joey Hess
99b9a3f277 export some always failing methods for readonly remotes 2015-08-17 11:21:38 -04:00
Joey Hess
fb9d851258 refactor 2015-08-17 11:21:13 -04:00
Joey Hess
1cd3b7ddf0 refactor 2015-08-17 10:42:14 -04:00
Joey Hess
6bc46e384e Added WHEREIS to external special remote protocol. 2015-08-13 17:27:50 -04:00
Joey Hess
127c3db162 add some debugs to get timings
Note that I had one in Annex.Action.startup too, but it resulted in a weird
message printed by ssh, "channel 2: bad ext data". I don't know why, but
it only happened when transferinfo was run, so I wonder
if 983a95f021 introduced a fragility somehow.
2015-08-13 16:13:16 -04:00
Joey Hess
43aa881b47 --debug is passed along to git-annex-shell when git-annex is in debug mode. 2015-08-13 15:05:39 -04:00
Joey Hess
983a95f021 Sped up downloads of files from ssh remotes, reducing the non-data-transfer overhead 6x. 2015-08-13 14:20:28 -04:00
Joey Hess
f99ae3d713 remove debug print 2015-08-13 13:18:47 -04:00
Joey Hess
0ec9bc2200 Added support for SHA3 hashed keys (in 8 varieties), when git-annex is built using the cryptonite library.
While cryptohash has SHA3 support, it has not been updated for the final
version of the spec. Note that cryptonite has not been ported to all arches
that cryptohash builds on yet.
2015-08-06 15:02:25 -04:00
Joey Hess
c5b8484c2e Simplify setup process for a ssh remote.
Now it suffices to run git remote add, followed by git-annex sync. Now the
remote is automatically initialized for use by git-annex, where before the
git-annex branch had to manually be pushed before using git-annex sync.
Note that this involved changes to git-annex-shell, so if the remote is
using an old version, the manual push is still needed.

Implementation required git-annex-shell be changed, so configlist can
autoinit a repository even when no git-annex branch has been pushed yet.
Unfortunate because we'll have to wait for it to get deployed to servers
before being able to rely on this change in the documentation.

Did consider making git-annex sync push the git-annex branch to repos that
didn't have a uuid, but this seemed difficult to do without complicating it
in messy ways.

It would be cleaner to split a command out from configlist to handle
the initialization. But this is difficult without sacrificing backwards
compatability, for users of old git-annex versions which would not use the
new command.
2015-08-05 13:49:58 -04:00
Joey Hess
c0b598b7f1 remove unused imports 2015-07-31 15:50:07 -04:00
Joey Hess
506452012c Fix rsync special remote to work when -Jn is used for concurrent uploads. 2015-07-30 13:29:45 -04:00
Joey Hess
afe6a53bca Fix bug that prevented uploads to remotes using new-style chunking from resuming after the last successfully uploaded chunk.
"checkPresent baser" was wrong; the baser has a dummy checkPresent action
not the real one. So, to fix this, we need to call preparecheckpresent to
get a checkpresent action that can be used to check if chunks are present.

Note that, for remotes like S3, this means that the preparer is run,
which opens a S3 handle, that will be used for each checkpresent of a
chunk. That's a good thing; if we're resuming an upload that's already many
chunks in, it'll reuse that same http connection for each chunk it checks.
Still, it's not a perfectly ideal thing, since this is a different http
connection that the one that will be used to upload chunks. It would be
nice to improve the API so that both use the same http connection.
2015-07-16 15:01:27 -04:00
Joey Hess
1eb4b47c79 layout 2015-06-15 14:48:38 -04:00
Joey Hess
800e9f2658 tahoe: Use ~/.tahoe-git-annex/ rather than ~/.tahoe/git-annex/ to avoid old versions of tahoe create-client choking. 2015-06-09 15:29:16 -04:00
Joey Hess
f2486b21dd show S3 urls for public repos in whereis
Note that it's possible for a S3 bucket to be configured to allow public
access, but for git-annex to not know that it is. I chose to not show the
url unless public=yes.
2015-06-05 16:52:38 -04:00
Joey Hess
5f0f063a7a S3: Publically accessible buckets can be used without creds. 2015-06-05 16:23:35 -04:00
Joey Hess
4acd28bf21 public=yes config to send AclPublicRead
In my tests, this has to be set when uploading a file to the bucket
and then the file can be accessed using the bucketname.s3.amazonaws.com
url.

Setting it when creating the bucket didn't seem to make the whole bucket
public, or allow accessing files stored in it. But I have gone ahead and
also sent it when creating the bucket just in case that is needed in some
case.
2015-06-05 14:38:01 -04:00
Joey Hess
334fd6d598 groundwork for readonly access
Split S3Info out of S3Handle and added some stubs
2015-06-05 13:12:45 -04:00
Joey Hess
eb33569f9d remove Params constructor from Utility.SafeCommand
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.

In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.

Problems noticed while doing this conversion:

	* Some uses of Params "oneword" which was entirely unnecessary
	  overhead.
	* A few places that built up a list of parameters with ++
	  and then used Params to split it!

Test suite passes.
2015-06-01 13:52:23 -04:00
Joey Hess
77c43a388e fromkey, registerurl: Allow urls to be specified instead of keys, and generate URL keys.
This is especially useful because the caller doesn't need to generate valid
url keys, which involves some escaping of characters, and may involve
taking a md5sum of the url if it's too long.
2015-05-22 22:41:36 -04:00
Joey Hess
ecb0d5c087 use lock pools throughout git-annex
The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.

Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.

This is not quite done; lockContent stil needs to be converted.
2015-05-19 14:09:52 -04:00
Joey Hess
61ccf95004 Avoid accumulating transfer failure log files unless the assistant is being used.
Only the assistant uses these, and only the assistant cleans them up, so
make only git annex transferkeys write them,

There is one behavior change from this. If glacier is being used, and a
manual git annex get --from glacier fails because the file isn't available
yet, the assistant will no longer later see that failed transfer file and
retry the get. Hope no-one depended on that old behavior.
2015-05-12 15:53:38 -04:00
Joey Hess
a812d598ef Take space that will be used by running downloads into account when checking annex.diskreserve. 2015-05-12 15:20:22 -04:00
Joey Hess
e27b97d364 Merge branch 'master' into concurrentprogress
Conflicts:
	Command/Fsck.hs
	Messages.hs
	Remote/Directory.hs
	Remote/Git.hs
	Remote/Helper/Special.hs
	Types/Remote.hs
	debian/changelog
	git-annex.cabal
2015-05-12 13:23:22 -04:00
Joey Hess
9f14f51d63 generalied elem/notElem in ghc 7.10 require some additional type signatures when using OverloadedStrings 2015-05-10 15:41:41 -04:00
Joey Hess
4aba1c74bd remaining dataenc to sandi conversions
I've tested all the dataenc to sandi conversions except Assistant.XMPP,
and all have unchanged behavior, including behavior on large unicode code
points.
2015-05-07 18:07:13 -04:00
Joey Hess
ee78958798 S3: Fix incompatability with bucket names used by hS3; the aws library cannot handle upper-case bucket names. git-annex now converts them to lower case automatically.
For example, it failed to get files from a bucket named S3.

Also fixes `git annex initremote UPPERCASE type=S3`, which failed with the
new aws library, with a signing error message.
2015-04-27 18:00:58 -04:00
Joey Hess
cfbeb1e7b7 Fix bogus failure of fsck --fast. 2015-04-27 17:40:21 -04:00
Joey Hess
22a4e92df7 S3: git annex enableremote will not create a bucket name, which failed since the bucket already exists. 2015-04-23 14:16:53 -04:00
Joey Hess
b3eccec68c S3: git annex info will show additional information about a S3 remote (endpoint, port, storage class) 2015-04-23 14:12:25 -04:00
Joey Hess
ae9bbf25a0 convert all log prorities, not just debug
In particular, error should go to stderr
2015-04-21 15:59:30 -04:00
Joey Hess
3b3aaf0d56 S3: Enable debug logging when annex.debug or --debug is set.
To debug a bug report, but generally useful.
2015-04-21 15:55:42 -04:00
Joey Hess
addc82dab7 removed all uses of undefined from code base
It's a code smell, can lead to hard to diagnose error messages.
2015-04-19 00:38:29 -04:00
Joey Hess
0def1f0b53 Fix fsck --from a git remote in a local directory, and from a directory special remote. This was a reversion caused by the relative path changes in 5.20150113.
The directory special remote was not affected in its normal configuration,
since annex-directory is an absolute path normally. But it could fail
when a relative path was used.

The git remote was affected even when an absolute path to it was used in
.git/config, since git-annex now converts all such paths to relative.
2015-04-18 13:36:12 -04:00
Joey Hess
a2902cdaaf add filename to progress bar, and display ok/failed at end
This needed plumbing an AssociatedFile through retrieveKeyFileCheap.
2015-04-14 16:35:10 -04:00
Joey Hess
dc4de7faf7 add missing progress bar 2015-04-14 16:00:20 -04:00
Joey Hess
86a2f9dc4d Merge branch 'master' into concurrentprogress
Conflicts:
	debian/changelog
2015-04-14 15:35:15 -04:00