Commit graph

1094 commits

Author SHA1 Message Date
Joey Hess
669b305de2
S3: Send a Content-Type header when storing objects in S3
So exports to public buckets can be linked to from web pages.

(When git-annex is built with MagicMime support.)

Thanks to Jared Cosulich for the idea.
2019-01-23 13:08:47 -04:00
Joey Hess
d5f2463702
misctmp cleanup
* Switch to using .git/annex/othertmp for tmp files other than partial
  downloads, and make stale files left in that directory when git-annex
  is interrupted be cleaned up promptly by subsequent git-annex processes.
* The .git/annex/misctmp directory is no longer used and git-annex will
  delete anything lingering in there after it's 1 week old.

Also, in Annex.Ingest, made the filename it uses in the tmp dir be
prefixed with "ingest-" to avoid potentially using a filename used by
some other code.
2019-01-17 16:02:22 -04:00
Joey Hess
96aba8eff7
Revert "cache the serialization of a Key"
This reverts commit 4536c93bb2.

That broke Read/Show of a Key, and unfortunately Key is read in at least
one place; the GitAnnexDistribution data type.

It would be worth bringing this optimisation back, but it would need
either a custom Read/Show instance that preserves back-compat, or
wrapping Key in a data type that contains the serialization, or changing
how GitAnnexDistribution is serialized.

Also, the Eq instance would need to compare keys with and without a
cached seralization the same.
2019-01-16 16:21:59 -04:00
Joey Hess
4536c93bb2
cache the serialization of a Key
This will speed up the common case where a Key is deserialized from
disk, but is then serialized to build eg, the path to the annex object.

It means that every place a Key has any of its fields changed, the cache
has to be dropped. I've grepped and found them all. But, it would be
better to avoid that gotcha somehow..
2019-01-14 16:37:28 -04:00
Joey Hess
d3ab5e626b
rename key2file and file2key
What these generate is not really suitable to be used as a filename,
which is why keyFile and fileKey further escape it. These are just
serializing Keys.

Also removed a quickcheck test that was very unlikely to test anything
useful, since it relied on random chance creating something that looks
like a serialized key. The other test is sufficient for testing what
that was intended to test anyway.
2019-01-14 13:03:35 -04:00
Joey Hess
727767e1e2
make everything build again after ByteString Key changes 2019-01-11 16:39:46 -04:00
Joey Hess
cb375977a6
follow-on changes from MetaData type changes
Including writing and parsing the metadata log files with
bytestring-builder and attoparsec.
2019-01-07 15:51:05 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
9cc6d5549b
convert UUID from String to ByteString
This should make == comparison of UUIDs somewhat faster, and perhaps a
few other operations around maps of UUIDs etc.

FromUUID/ToUUID are used to convert String, which is still used for all
IO of UUIDs. Eventually the hope is those instances can be removed,
and all git-annex branch log files etc use ByteString throughout, for a
real speed improvement.

Note the use of fromRawFilePath / toRawFilePath -- while a UUID usually
contains only alphanumerics and so could be treated as ascii, it's
conceivable that some git-annex repository has been initialized using
a UUID that is not only not a canonical UUID, but contains high unicode
or invalid unicode. Using the filesystem encoding avoids any problems
with such a thing. However, a NUL in a UUID seems extremely unlikely,
so I didn't use encodeBS / decodeBS to avoid their extra overhead in
handling NULs.

The Read/Show instance for UUID luckily serializes the same way for
ByteString as it did for String.
2019-01-01 14:45:33 -04:00
Joey Hess
2e069eb9f6
use putBucket to future-proof
New fields can be added to PutBucket in the future.
2018-12-31 13:09:20 -04:00
Joey Hess
3fdc6fdfa9
remove unused import 2018-12-30 15:18:49 -04:00
Joey Hess
a26514d67e
Fix doubled progress display when downloading an url when -J is used.
downloadUrl uses meteredFile, which sets up one progress meter,
and Remote.Web also uses metered, so two progress meters are displayed for
the same download.

Reversion introduced with the http-conduit switch in
c34152777b -- I don't know why the extra
call to metered was added there.

When -J is not used, the extra progress meter didn't display,
but an extra blank line did get output, which is also fixed.

This commit was sponsored by John Pellman on Patreon.
2018-12-30 12:29:49 -04:00
Joey Hess
3f587d447a
fix webdav reversion
webdav: When initializing, avoid trying to make a directory at the top of
the webdav server, which could never accomplish anything and failed on
nextcloud servers. (Reversion introduced in version 6.20170925.)

This commit was sponsored by mo on patreon.
2018-12-10 12:49:51 -04:00
Joey Hess
4579dd6201
S3: Improve diagnostics when a remote is configured with exporttree and versioning, but no S3 version id has been recorded for a key.
When public access is used for the remote, it complained that the user
needed to set creds to use it, which was just wrong.

When creds were being used, it fell back from trying to use the version ID
to just accessing the key in the bucket, which was ok for non-export
remotes, but wrong for buckets.

In both cases, display a hopefully useful warning.

This should only come up when an existing S3 remote has been exported
to, and then later versioning was enabled.

Note that it would perhaps be possible to fall back from trying to use
retrieveKeyFile when it fails and instead use retrieveKeyFileFromExport,
which may work when S3 version ID is missing. But there are problems
with that approach; how to tell when retrieveKeyFile has failed due to this
rather than a network problem etc? Anyway, that approach would only work
until the file in the export got overwritten, and then it would no
longer be accessible. And with versioning enabled, the user wants old
versions of objects to remain accessible, so it seems better to warn
about the problem as soon as possible, so they can go back and add S3
version IDs.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-12-06 13:44:37 -04:00
Joey Hess
1308a76bf1
deMaybe credPairRemoteKey
It's always Just
2018-12-04 13:37:43 -04:00
Joey Hess
a25fef36ad
fix json for exportedtrees in conflict
Repeating the same json field with multiple values tends to not be
supported well by json parsers, so list the trees separated by spaces.
2018-12-03 14:43:59 -04:00
Joey Hess
b8f9dea27d
add exportedtree to info
info: When used with an exporttree remote, includes an "exportedtree" info,
which is the tree last exported to the remote. During an export conflict,
multiple values will be listed.

This commit was sponsored by John Pellman on Patreon.
2018-12-03 14:36:00 -04:00
Robert Schütz
32017f8082
replace TORRENT by WITH_TORRENTPARSER 2018-11-27 12:29:25 -04:00
Joey Hess
370757087d
catch lockContentForRemoval exception
removeKey should not throw exceptions, so catch exception there

In Assistant.Unused, keep trying to drop other keys if one drop fails
2018-11-15 15:39:57 -04:00
Joey Hess
d65df7ab21
improve messages around export conflicts
When an export conflict prevents accessing a special remote, be clearer
about what the problem is and how to resolve it.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-11-13 15:50:06 -04:00
Joey Hess
c472c268c4
webapp: Fixed a crash when adding a git remote.
Reversion introduced in 2b66492d6e which added a new cache that needs to be
cleared.
2018-10-29 16:01:08 -04:00
Joey Hess
a622488758
remove CHECKURL-MULTI single url response special case
Removed undocumented special case in handling of a CHECKURL-MULTI response
with only a single file listed. Rather than ignoring the url that was in
the response, use it. This allows external special remotes that want to
provide some better url to do so, although I don't entirely agree with
using CHECKURL-MULTI to accomplish that. I'm more of the feeling that an
undocumented special case that throws data away is just not a good idea.

This could in theory break some external special remote program that relied
on the current behavior, but its seems unlikely that it would because such
a program must already handle the multiple url case, unless it only ever
provides a single url response to CHECKURL-MULTI.

Make addurl --file work with a single item CHECKURL-MULTI response.
It already did for external special remotes due to the special case,
but now it also will for builtin ones like the BitTorrent special remote.

This commit was sponsored by Ilya Shlyakhter on Patron.
2018-10-29 14:52:12 -04:00
Joey Hess
fcca7adaff
instrument P2P --debug with connection and thread info
For debugging http://git-annex.branchable.com/bugs/annex_get_-J_16_via_ssh_stalls_/

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 15:52:11 -04:00
Joey Hess
c24e255de1
Fix concurrency bug that occurred on the first download from an exporttree remote
Block other threads while the export database is being constructed (or
updated) by the first thread to try to access it.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-10-22 12:59:10 -04:00
Joey Hess
a9dd087074
centralized "yes"/"no" parsing
This commit was sponsored by Jack Hill on Patreon.
2018-10-10 11:14:27 -04:00
Joey Hess
6f0d8870df
Fix crash when exporttree is set to a bad value.
Made it impossible to recover from setting a bad value since enableremote
to change it would crash.

This commit was sponsored by Henrik Riomar on Patreon.
2018-10-10 10:44:54 -04:00
Joey Hess
451171b7c1
clean up url removal presence update
* rmurl: Fix a case where removing the last url left git-annex thinking
  content was still present in the web special remote.
* SETURLPRESENT, SETURIPRESENT, SETURLMISSING, and SETURIMISSING
  used to update the presence information of the external special remote
  that called them; this was not documented behavior and is no longer done.

Done by making setUrlPresent and setUrlMissing only update presence info
for the web, and only when the url is a web url. See the comment for
reasoning about why that's the right thing to do.

In AddUrl, had to make it update location tracking, to handle the
non-web-url case.

This commit was sponsored by Ewen McNeill on Patreon.
2018-10-04 17:35:49 -04:00
Joey Hess
303d10cee6
Improve display when git config download from a http remote fails.
The error message displayed used to only come from curl/wget and perhaps
was clearer than the one displayed now that http-client is used. In any
case, it does make sense to hide it because git-annex prints its own
warning message.

This commit was sponsored by Jake Vosloo on Patreon.
2018-10-03 12:31:09 -04:00
Joey Hess
6134431254
clean P2P protocol shutdown on EOF try 2
Same goal as b18fb1e343 but without
breaking backwards compatability. Just return IO exceptions when running
the P2P protocol, so that git-annex-shell can detect eof and avoid the
ugly message.

This commit was sponsored by Ethan Aubin.
2018-09-25 16:49:59 -04:00
Joey Hess
bc31b93c77
remote.name.annex-security-allow-unverified-downloads
Added remote.name.annex-security-allow-unverified-downloads, a per-remote
setting for annex.security.allow-unverified-downloads.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-25 15:34:47 -04:00
Joey Hess
773084c49b
S3: Fix url construction bug
When the publicurl has been set to an url that does not end with a slash,
we need to add one in between it and the rest of the url.

As far as I can see, git-annex does not default to such publicurls; it's
careful to end them with slashes. But this was observed in the wild, and
there may be documentation that doesn't include the slash. And it's an easy
mistake to make in any case.

This commit was sponsored by Eric Drechsel on Patreon.
2018-09-14 12:25:23 -04:00
Joey Hess
677038199c
fix build with older aws
S3: Multipart uploads are now only supported when git-annex is built
with aws-0.16.0 or later, as earlier versions of the library don't
support versioning with multipart uploads.

This will affect the android build, and debian stable also has a too old
aws to support both features at the same time.

This commit was sponsored by Nick Piper on Patreon.
2018-09-13 09:58:39 -04:00
Joey Hess
445ea66732
simplify 2018-09-06 16:07:16 -04:00
Joey Hess
b7daf2685f
support public versioned S3 access
Makes git annex whereis display the versionId urls.

And, when a s3 remote is enabled without creds, git-annex will use the
versionId urls to access its contents.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-09-06 14:31:41 -04:00
Joey Hess
7407a80c27
S3: Support AWS_SESSION_TOKEN
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-09-05 15:53:57 -04:00
Joey Hess
53d839d543
more efficient encoding 2018-08-31 13:49:08 -04:00
Joey Hess
b3d42283ad
use per-remote metadata storage for S3 version ID
Since the same key can be stored in a versioned S3 bucket multiple times
with different version IDs, this allows tracking them all. Not currently
needed, but if we ever want to drop from a versioned S3 bucket, we'll
need to know them all.

This commit was supported by the NSF-funded DataLad project.
2018-08-31 13:27:29 -04:00
Joey Hess
6b75b9c448
turn on appendonly when versioning is enabled 2018-08-31 10:53:07 -04:00
Joey Hess
19dcff2b71
use S3 version ID for retrieval
Have to store the S3 object along with the version ID, so retrieval can
use the same object.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 15:37:08 -04:00
Joey Hess
794e9a7a44
store S3 version IDs
Only done when versioning=yes is configured. It could always do it when
S3 sends back a version id, but there may be buckets that have
versioning enabled by accident, so it seemed better to honor the
configuration.

S3's docs say version IDs are "randomly generated", so presumably
storing the same content twice gets two different ones not the same one.
So I considered storing a list of version IDs for a key. That would
allow removing the key completely. But.. The way Logs.RemoteState works,
when there are multiple writers, the last writer wins. So storing a list
would need a different log format that merges, which seemed overkill to support
removing a key from an append-only remote.

Note that Logs.RemoteState for S3 is now dedicated to version IDs.
If something else needs to be stored, a new log will be needed to do it.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 14:30:56 -04:00
Joey Hess
0ff5a41311
S3 versioning=yes config
Not yet used.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 13:45:28 -04:00
Joey Hess
358178fbfb
don't untrust appendonly exports
Make exporttree=yes remotes that are appendonly not be untrusted, and not force
verification of content, since the usual concerns about losing data when an
export is updated by someone else don't apply.

Note that all the remote operations on keys are left as usual for
appendonly export remotes, except for storing content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:48:04 -04:00
Joey Hess
8b39db20b5
export appendonly support
Make `git annex export` check appendonly when removing a file from an
export, and not update the location log, since the remote still contains
the content.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:18:20 -04:00
Joey Hess
02630b39ee
add Remote.readonly
Does nothing yet.

Considered making bup readonly, but while the content can't be removed,
it is able to delete a branch, so didn't.

This commit was supported by the NSF-funded DataLad project.
2018-08-30 11:12:18 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
2884637cab
S3: Support credential-less download from remotes configured with public=yes exporttree=yes.
This commit was supported by the NSF-funded DataLad project.
2018-07-31 16:32:43 -04:00
Joey Hess
9f3a346f25
fix nested exception bug
Fix reversion introduced in version 6.20180316 that caused git-annex to
stop processing files when unable to contact a ssh remote.

The bug was not in any of the changed lines, but this one in inAnnex:

P2PHelper.checkpresent (Ssh.runProto rmt connpool (cantCheck rmt) fallback) key

cantCheck throws an exception, but that parameter to runProto expects a
value, which it returns. So, inAnnex is returning a Bool containing an
exception. This defeats the usual checks for checkPresent throwing an
exception, crashing git-annex.

Fixed by making runProto take an `Annex a` instead of an `a`, so
passing cantCheck to it doesn't nest exceptions.

This commit was sponsored by andrea rota.
2018-07-03 13:10:43 -04:00
Joey Hess
dc6cb6aa5f
Merge branch 'later' 2018-06-25 21:59:20 -04:00
Joey Hess
a5228ac765
Support configuring remote.web.annex-cost and remote.bittorrent.annex-cost
Seems that has never worked before due to oversight.
2018-06-24 17:31:22 -04:00
Joey Hess
05ecee0db4
set ddar to RetrievalAllKeysSecure
Based on information from Robie Basak.
2018-06-21 16:38:47 -04:00
Joey Hess
f1b29dbeb4
don't assume boto will remain secure
On second thought, best to default to being secure even if boto changes
http libraries to one that happens to follow redirects.
2018-06-21 14:14:56 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
4315bb9e42
add retrievalSecurityPolicy
This will be used to protect against CVE-2018-10859, where an encrypted
special remote is fed the wrong encrypted data, and so tricked into
decrypting something that the user encrypted with their gpg key and did
not store in git-annex.

It also protects against CVE-2018-10857, where a remote follows a http
redirect to a file:// url or to a local private web server. While that's
already been prevented in git-annex's own use of http, external special
remotes, hooks, etc use other http implementations and could still be
vulnerable.

The policy is not yet enforced, this commit only adds the appropriate
metadata to remotes.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-06-21 11:36:36 -04:00
Joey Hess
cc4b3b9c06
remove unused import 2018-06-14 12:33:00 -04:00
Joey Hess
760f66829a
display p2pstdio stderr after auth
Display error messages that come from git-annex-shell when the p2p protocol
is used, so that diskreserve messages, IO errors, etc from the remote side
are visible again.

Felt like it should perhaps use outputError, so --json-error-messages would
include these, but as an async IO action, it can't, and this would need
MessageState to be converted to a tvar. Anyway, when not using p2pstdio,
that's not done; nor is it done for stderr from external special remotes
or other commands, so punted on the idea for now.

This commit was sponsored by mo on Patreon.
2018-06-12 14:59:05 -04:00
Joey Hess
90a3afb60f
adb: Android serial numbers are not all 16 characters long, so accept other lengths.
I can't find any documentation of how long it should be. Hard to imagine
it being shorter than 4 characters though, so put that in as a conservative
lower bound.

This commit was sponsored by Nick Piper on Patreon.
2018-06-12 13:56:01 -04:00
Joey Hess
c3c28f7617
add GETINFO to external protocol (for ronnypfa)
External special remotes can now add info to `git annex info $remote`, by
replying to the GETINFO message.

Had to generalize some helpers to allow consuming multiple messages from
the remote.

The code added to Remote/* here is AGPL licensed, thus changed the license
of the files.

This commit was sponsored by Jake Vosloo on Patreon.
2018-06-08 11:56:24 -04:00
Joey Hess
0f566ed242
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.

That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.

In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.

(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)

Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..

This commit was sponsored by Ethan Aubin.
2018-06-05 14:48:37 -04:00
Joey Hess
fc5888300f
fix annex-checkuuid
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.

This commit was sponsored by Øyvind Andersen Holm.
2018-06-04 16:52:22 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
197b1510fa
remove unused import 2018-04-09 13:09:40 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
0791c24221
fix bad refactoring
Reponse BodyReader is not a conduit thing, so can't use the refactored
function here after all. Oops. Put it back how it was.
2018-04-06 16:59:14 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2ec07bc29f
Avoid running annex.http-headers-command more than once. 2018-04-04 15:15:08 -04:00
Joey Hess
46d4316954
implement annex.retry et al
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.

This commit was supported by the NSF-funded DataLad project.
2018-03-29 13:04:07 -04:00
Joey Hess
ceee0ea5f1
store probed androidserial for later use by enableremote 2018-03-27 17:38:04 -04:00
Joey Hess
ae75eb06bc
exporttree support for adb special remote
This commit was sponsored by Michael Magin.
2018-03-27 16:28:41 -04:00
Joey Hess
2927618d35
Added adb special remote which allows exporting files to Android devices.
git annex testremote passes.

exportree not implemented yet, although the documentation talks about it,
since it will be the main way this remote will be used.

The adb push/pull progress is displayed for now; it would be better
to consume it and use it to update the git-annex progress bar.

This commit was sponsored by andrea rota.
2018-03-27 14:54:41 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
b96b845ffd
fix nested progress meters when using git-annex-shell fallback
Caused an ugly blank line when the first progress meter was not used,
but also it may have confused -J display.
2018-03-12 19:20:10 -04:00
Joey Hess
7bed3927ba
make way for rsync progress output when it's enabled 2018-03-12 19:10:22 -04:00
Joey Hess
1c2c8995ac
hide rsync progress output when metered but not in other uses of rsync 2018-03-12 18:36:07 -04:00
Joey Hess
cb05ef06bf
fix lost metering for fallback rsyncs
08814327ff accidentially got rid of it,
when it removed commandMetered.
2018-03-12 18:22:48 -04:00
Joey Hess
c3df5d1f10
avoid double-connect to unreachable ssh remote
When git-annex-shell p2pstdio fails with 255, it's because the ssh
server is not reachable. Avoid running the fallback action in this case,
since it would just try a second time to connect, and presumably fail.

Note that the closed P2PSshConnection will not be stored in the pool,
so the next request tries again to connect. This is just the right
behavior; when the remote becomes reachable again, the same git-annex
process will start using it.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-12 16:50:21 -04:00
Joey Hess
596af7cbc4
move protocol version stuff to the Net free monad
Needs to be in Net not Local, so that Net actions can take the protocol
version into account.

This commit was sponsored by an anonymous bitcoin donor.
2018-03-12 15:20:51 -04:00
Joey Hess
c81768d425
version the P2P protocol
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.

So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.

Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-12 14:36:35 -04:00
Joey Hess
d7f54671bf
refactoring 2018-03-09 13:48:10 -04:00
Joey Hess
936ab43932
use P2P for locking keys
The P2P protocol is now fully used for git-annex-shell.

This commit was sponsored by Ewen McNeill on Patreon.
2018-03-09 13:42:55 -04:00
Joey Hess
08814327ff
use P2P protocol for checkpresent, retrieve, and store
Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression. Added NEWS item about this.

Another significant side effect of this is that, even when rsync is run to
retrieve a file, its progress display will no longer be shown, and
instead the native git-annex progress display will appear. It would be
possible to use the rsync process display when rsync is used (old
git-annex-shell and also retrieval from a local repository), but it
would have complicated the code unncessarily, and been inconsistent
behavior.

(I'd been thinking for a while about eliminating the rsync progress
display, since it's got some annoying verbosities, including display of
the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat
inconsistent.)

retrieveKeyFileCheap still uses rsync, since that ensures that it gets
the actual file content from the remote. Using the P2P protocol would
use the local content, as long as the local and remote size are the
same.

This commit was sponsored by John Pellman on Patreon.
2018-03-09 13:25:16 -04:00
Joey Hess
5bc0ab3f31
going AGPL
Remote/Git.hs now contains AGPL licensed code, thus the license
of git-annex as a whole is AGPL. This was already the case when git-annex
was built with the webapp enabled.

The AGPL license will apply to all code added to Remote/Git.hs in the
future, which is going to include support for using
`git-annex-shell p2pstdio`.
2018-03-09 01:03:46 -04:00
Joey Hess
6a59bc4845
use P2P protocol for drop
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.

Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.

Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex	40.867 seconds.
With the P2P protocol	9.905 seconds!

This commit was sponsored by Jochen Bartl on Patreon.
2018-03-08 16:56:17 -04:00
Joey Hess
16af259209
refactor p2p remote action code
Make a Remote.Helper.P2P using code that was in Remote.P2P, converted to
use generic protocol runner actions.

This will allow it to be reused in Remote.Git.

This commit was sponsored by mo on Patreon.
2018-03-08 16:11:00 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
bed6773346
Support exporttree=yes for rsync special remotes.
Renaming is not supported; it might be possible to use --fuzzy to get rsync
to notice the file is being renamed, but that is a bit ..fuzzy.

On the other hand, interrupted transfers of an exported file are resumed,
since rsync is great at that. Had to adjust the exporttree docs, which
said interrupted transfers would restart.

Note that remove no longer makes the empty directory dummy, instead
sending the top-level empty directory. This works just as well and I
noticed the dummy was unncessary when refactoring it into removeGeneric.
Verified that behavior of remove is not changed, and git annex
testremote does pass.

This commit was sponsored by Brock Spratlen on Patreon.
2018-02-28 13:36:20 -04:00
Joey Hess
d884e5b6fe
Added EXTENSIONS to external special remote protocol.
Allows using new special remote messages when git-annex supports them,
and avoiding using them when git-annex is too old. The new INFO is one
such message.

There's also the possibility, currently unused, for the special remote's
reply to include some kind of extensions of its own.

Merging this is blocked by https://github.com/datalad/datalad/issues/2124
since it seems it will break datalad. I checked all the other special
remotes and they will be ok.

This commit was supported by the NSF-funded DataLad project.
2018-02-07 15:02:12 -04:00
Joey Hess
7d9f0e0fbe
Added INFO to external special remote protocol.
It's left up to the special remote to detect when git-annex is new enough
to support the message; an old git-annex will blow up.

This commit was supported by the NSF-funded DataLad project.
2018-02-06 13:03:55 -04:00
Joey Hess
a28c541e23
add remote.<name>.annex-checkuuid
Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.

Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-01-10 14:21:18 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
8a0038ec23
avoid warning when youtube-dl is not installed
If a user does not have it installed, don't warn on every imported item
about it.
2017-11-30 13:43:55 -04:00
Joey Hess
99bebdface
youtube-dl working
Including resuming and cleanup of incomplete downloads.

Still todo: --fast, --relaxed, importfeed, disk reserve checking,
quvi code cleanup.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-11-29 16:40:32 -04:00
Joey Hess
4e7e1fcff4
add gitAnnexTmpWorkDir and withTmpWorkDir
Needed to run youtube-dl in, but could also be useful for other stuff.

The tricky part of this was making the workdir be cleaned up whenever the
tmp object file is cleaned up.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-29 13:53:39 -04:00
Joey Hess
595dfb6fe2
avoid build warning with old version of http 2017-11-21 12:45:49 -04:00
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00