Commit graph

1142 commits

Author SHA1 Message Date
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
4315bb9e42
add retrievalSecurityPolicy
This will be used to protect against CVE-2018-10859, where an encrypted
special remote is fed the wrong encrypted data, and so tricked into
decrypting something that the user encrypted with their gpg key and did
not store in git-annex.

It also protects against CVE-2018-10857, where a remote follows a http
redirect to a file:// url or to a local private web server. While that's
already been prevented in git-annex's own use of http, external special
remotes, hooks, etc use other http implementations and could still be
vulnerable.

The policy is not yet enforced, this commit only adds the appropriate
metadata to remotes.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-06-21 11:36:36 -04:00
Joey Hess
cc4b3b9c06
remove unused import 2018-06-14 12:33:00 -04:00
Joey Hess
760f66829a
display p2pstdio stderr after auth
Display error messages that come from git-annex-shell when the p2p protocol
is used, so that diskreserve messages, IO errors, etc from the remote side
are visible again.

Felt like it should perhaps use outputError, so --json-error-messages would
include these, but as an async IO action, it can't, and this would need
MessageState to be converted to a tvar. Anyway, when not using p2pstdio,
that's not done; nor is it done for stderr from external special remotes
or other commands, so punted on the idea for now.

This commit was sponsored by mo on Patreon.
2018-06-12 14:59:05 -04:00
Joey Hess
90a3afb60f
adb: Android serial numbers are not all 16 characters long, so accept other lengths.
I can't find any documentation of how long it should be. Hard to imagine
it being shorter than 4 characters though, so put that in as a conservative
lower bound.

This commit was sponsored by Nick Piper on Patreon.
2018-06-12 13:56:01 -04:00
Joey Hess
c3c28f7617
add GETINFO to external protocol (for ronnypfa)
External special remotes can now add info to `git annex info $remote`, by
replying to the GETINFO message.

Had to generalize some helpers to allow consuming multiple messages from
the remote.

The code added to Remote/* here is AGPL licensed, thus changed the license
of the files.

This commit was sponsored by Jake Vosloo on Patreon.
2018-06-08 11:56:24 -04:00
Joey Hess
0f566ed242
removal of the rest of remoteGitConfig
In keyUrls, the GitConfig is used only by annexLocations
to support configured Differences. Since such configurations affect all
clones of a repository, the local repo's GitConfig must have the same
information as the remote's GitConfig would have. So, used getGitConfig
to get the local GitConfig, which is cached and so available cheaply.

That actually fixed a bug noone had ever noticed: keyUrls is
used for remotes accessed over http. The full git config of such a
remote is normally not available, so the remoteGitConfig that keyUrls
used would not have the necessary information in it.

In copyFromRemoteCheap', it uses gitAnnexLocation,
which does need the GitConfig of the remote repo itself in order to
check if it's crippled, supports symlinks, etc. So, made the
State include that GitConfig, cached. The use of gitAnnexLocation is
within a (not $ Git.repoIsUrl repo) guard, so it's local, and so
its git config will always be read and available.

(Note that gitAnnexLocation in turn calls annexLocations, so the
Differences config it uses in this case comes from the remote repo's
GitConfig and not from the local repo's GitConfig. As explained above
this is ok since they must have the same value.)

Not very happy with this mess of different GitConfigs not type-safe and
some read only sometimes etc. Very hairy. Think I got it this change
right. Test suite passes..

This commit was sponsored by Ethan Aubin.
2018-06-05 14:48:37 -04:00
Joey Hess
fc5888300f
fix annex-checkuuid
Fixed annex-checkuuid implementation, so that remotes configured that way
can be used. This was 100% broken from the first commit of it, oops.

This commit was sponsored by Øyvind Andersen Holm.
2018-06-04 16:52:22 -04:00
Joey Hess
67e46229a5
change Remote.repo to Remote.getRepo
This is groundwork for letting a repo be instantiated the first time
it's actually used, instead of at startup.

The only behavior change is that some old special cases for xmpp remotes
were removed. Where before git-annex silently did nothing with those
no-longer supported remotes, it may now fail in some way.

The additional IO action should have no performance impact as long as
it's simply return.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2018-06-04 15:30:26 -04:00
Joey Hess
89e1a05a8f
Fix mangling of --json output of utf-8 characters when not running in a utf-8 locale
As long as all code imports Utility.Aeson rather than Data.Aeson,
and no Strings that may contain utf-8 characters are used for eg, object
keys via T.pack, this is guaranteed to fix the problem everywhere that
git-annex generates json.

It's kind of annoying to need to wrap ToJSON with a ToJSON', especially
since every data type that has a ToJSON instance has to be ported over.
However, that only took 50 lines of code, which is worth it to ensure full
coverage. I initially tried an alternative approach of a newtype FileEncoded,
which had to be used everywhere a String was fed into aeson, and chasing
down all the sites would have been far too hard. Did consider creating an
intentionally overlapping instance ToJSON String, and letting ghc fail
to build anything that passed in a String, but am not sure that wouldn't
pollute some library that git-annex depends on that happens to use ToJSON
String internally.

This commit was supported by the NSF-funded DataLad project.
2018-04-16 16:21:21 -04:00
Joey Hess
197b1510fa
remove unused import 2018-04-09 13:09:40 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
0791c24221
fix bad refactoring
Reponse BodyReader is not a conduit thing, so can't use the refactored
function here after all. Oops. Put it back how it was.
2018-04-06 16:59:14 -04:00
Joey Hess
0f6775f1ff
refactor sinkResponseFile and add downloadC
Remote.S3 and Remote.Helper.Http both had similar code to sink a
http-conduit Response to a file; refactor out sinkResponseFile.

downloadC downloads an url to a file using http-conduit, and supports
resuming. Falls back to curl to handle urls that http-conduit does not
support. This is not used yet, but the goal is to replace download with
it.

git-annex.cabal: conduit-extra was not actually used for a long time,
remove the dep. conduit moves into the main dependency list, but since
http-conduit was already in there, and it depends on conduit, that's not
really adding a new build dep.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 16:07:08 -04:00
Joey Hess
9b98d3f630
better HTTP connection reuse
Enable HTTP connection reuse across multiple files, when git-annex
uses http-conduit. Before, a new Manager was created each time
Utility.Url used it. Now, a single Manager gets created the first time,
so connections are reused.

Doesn't help when external programs are used for url download,
but does speed up addurl --fast, fsck --from web, etc.

Testing fsck --fast --from web with 3 files, over high-latency
satellite internet, it sped up from 19.37s to 14.96s.

This commit was supported by the NSF-funded DataLad project.
2018-04-04 15:39:40 -04:00
Joey Hess
2ec07bc29f
Avoid running annex.http-headers-command more than once. 2018-04-04 15:15:08 -04:00
Joey Hess
46d4316954
implement annex.retry et al
Added annex.retry, annex.retry-delay, and per-remote versions to configure
transfer retries.

This commit was supported by the NSF-funded DataLad project.
2018-03-29 13:04:07 -04:00
Joey Hess
ceee0ea5f1
store probed androidserial for later use by enableremote 2018-03-27 17:38:04 -04:00
Joey Hess
ae75eb06bc
exporttree support for adb special remote
This commit was sponsored by Michael Magin.
2018-03-27 16:28:41 -04:00
Joey Hess
2927618d35
Added adb special remote which allows exporting files to Android devices.
git annex testremote passes.

exportree not implemented yet, although the documentation talks about it,
since it will be the main way this remote will be used.

The adb push/pull progress is displayed for now; it would be better
to consume it and use it to update the git-annex progress bar.

This commit was sponsored by andrea rota.
2018-03-27 14:54:41 -04:00
Joey Hess
31e1adc005
deal with unlocked files
P2P protocol version 1 adds VALID|INVALID after DATA; INVALID means the
file was detected to change content while it was being sent and so we
may not have received the valid content of the file.

Added new MustVerify constructor for Verification, which forces
verification even when annex.verify=false etc. This is used when INVALID
and in protocol version 0.

As well as changing git-annex-shell p2psdio, this makes git-annex tor
remotes always force verification, since they don't yet use protocol
version 1. Previously, annex.verify=false could skip verification when
using tor remotes, and let bad data into the repository.

This commit was sponsored by Jack Hill on Patreon.
2018-03-13 14:27:14 -04:00
Joey Hess
e16b069331
use total size from DATA
Noticed that getting a key whose size is not known resulted in a
progress display that didn't include the percent complete.

Fixed for P2P by making the size sent with DATA be used to update the
meter's total size.

In order for rateLimitMeterUpdate to also learn the total size,
had to make it be passed the Meter, and some other reorg in
Utility.Metered was also done so that --json-progress can construct a
Meter to pass to rateLimitMeterUpdate.

When the fallback rsync is done, the progress display still doesn't
include the percent complete. Only way to fix that seems to be to let rsync
display its output again, but that would conflict with git-annex's
own progress meter, which is also being displayed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-03-12 21:46:58 -04:00
Joey Hess
b96b845ffd
fix nested progress meters when using git-annex-shell fallback
Caused an ugly blank line when the first progress meter was not used,
but also it may have confused -J display.
2018-03-12 19:20:10 -04:00
Joey Hess
7bed3927ba
make way for rsync progress output when it's enabled 2018-03-12 19:10:22 -04:00
Joey Hess
1c2c8995ac
hide rsync progress output when metered but not in other uses of rsync 2018-03-12 18:36:07 -04:00
Joey Hess
cb05ef06bf
fix lost metering for fallback rsyncs
08814327ff accidentially got rid of it,
when it removed commandMetered.
2018-03-12 18:22:48 -04:00
Joey Hess
c3df5d1f10
avoid double-connect to unreachable ssh remote
When git-annex-shell p2pstdio fails with 255, it's because the ssh
server is not reachable. Avoid running the fallback action in this case,
since it would just try a second time to connect, and presumably fail.

Note that the closed P2PSshConnection will not be stored in the pool,
so the next request tries again to connect. This is just the right
behavior; when the remote becomes reachable again, the same git-annex
process will start using it.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-12 16:50:21 -04:00
Joey Hess
596af7cbc4
move protocol version stuff to the Net free monad
Needs to be in Net not Local, so that Net actions can take the protocol
version into account.

This commit was sponsored by an anonymous bitcoin donor.
2018-03-12 15:20:51 -04:00
Joey Hess
c81768d425
version the P2P protocol
Unfortunately ReceiveMessage didn't handle unknown messages the way it
was documented to; client sending VERSION would cause the server to
return an ERROR and hang up. Fixed that, but old releases of git-annex
use the P2P protocol for tor and will still have that behavior.

So, version is not negotiated for Remote.P2P connections, only for
Remote.Git connections, which will support VERSION from their first
release. There will need to be a later flag day to change Remote.P2P;
left a commented out line that is the only thing that will need to be
changed then.

Version 1 of the P2P protocol is not implemented yet, but updated
the docs for the DATA change that will be allowed by that version.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-12 14:36:35 -04:00
Joey Hess
d7f54671bf
refactoring 2018-03-09 13:48:10 -04:00
Joey Hess
936ab43932
use P2P for locking keys
The P2P protocol is now fully used for git-annex-shell.

This commit was sponsored by Ewen McNeill on Patreon.
2018-03-09 13:42:55 -04:00
Joey Hess
08814327ff
use P2P protocol for checkpresent, retrieve, and store
Note that, due to not using rsync to transfer files to ssh remotes
any longer, permissions and other file metadata of annexed files
will no longer be preserved when copying them to ssh remotes.
Other remotes never supported preserving that information, so
this is not considered a regression. Added NEWS item about this.

Another significant side effect of this is that, even when rsync is run to
retrieve a file, its progress display will no longer be shown, and
instead the native git-annex progress display will appear. It would be
possible to use the rsync process display when rsync is used (old
git-annex-shell and also retrieval from a local repository), but it
would have complicated the code unncessarily, and been inconsistent
behavior.

(I'd been thinking for a while about eliminating the rsync progress
display, since it's got some annoying verbosities, including display of
the key and the "(xfr#1, to-chk=0/1)" bit and was already somewhat
inconsistent.)

retrieveKeyFileCheap still uses rsync, since that ensures that it gets
the actual file content from the remote. Using the P2P protocol would
use the local content, as long as the local and remote size are the
same.

This commit was sponsored by John Pellman on Patreon.
2018-03-09 13:25:16 -04:00
Joey Hess
5bc0ab3f31
going AGPL
Remote/Git.hs now contains AGPL licensed code, thus the license
of git-annex as a whole is AGPL. This was already the case when git-annex
was built with the webapp enabled.

The AGPL license will apply to all code added to Remote/Git.hs in the
future, which is going to include support for using
`git-annex-shell p2pstdio`.
2018-03-09 01:03:46 -04:00
Joey Hess
6a59bc4845
use P2P protocol for drop
Not yet used for everything else, but this is enough to
verify that it works, and do some benchmarking.

Some bugfixes included, which got it working. Also fallback to old
actions has been verified to work correctly.

Benchmarked dropping one thousand files from a ssh remote on localhost.
Using the old git-annex	40.867 seconds.
With the P2P protocol	9.905 seconds!

This commit was sponsored by Jochen Bartl on Patreon.
2018-03-08 16:56:17 -04:00
Joey Hess
16af259209
refactor p2p remote action code
Make a Remote.Helper.P2P using code that was in Remote.P2P, converted to
use generic protocol runner actions.

This will allow it to be reused in Remote.Git.

This commit was sponsored by mo on Patreon.
2018-03-08 16:11:00 -04:00
Joey Hess
c036a380b2
p2p ssh connection pools
Much like Remote.P2P, there's a pool of connections to a peer, in order
to support concurrent operations.

Deals with old git-annex-ssh on the remote that does not support p2pstdio,
by only trying once to use it, and remembering if it's not supported.

Made p2pstdio send an AUTH_SUCCESS with its uuid, which serves the dual
purposes of something to detect to see that the connection is working,
and a way to verify that it's connected to the right uuid.
(There's a redundant uuid check since the uuid field is sent
by git_annex_shell, but I anticipate that being removed later when
the legacy git-annex-shell stuff gets removed.)

Not entirely happy with Remote.Git.runSsh's behavior
when the proto action fails. Running the fallback will work ok, but what
will we do when the fallbacks later get removed? It might be better to
try to reconnect, in case the connection got closed.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-03-08 15:11:31 -04:00
Joey Hess
f4103744c3
make sure that lockContentShared is always paired with an inAnnex check
lockContentShared had a screwy caveat that it didn't verify that the content
was present when locking it, but in the most common case, eg indirect mode,
it failed to lock when the content is not present.

That led to a few callers forgetting to check inAnnex when using it,
but the potential data loss was unlikely to be noticed because it only
affected direct mode I think.

Fix data loss bug when the local repository uses direct mode, and a
locally modified file is dropped from a remote repsitory. The bug
caused the modified file to be counted as a copy of the original file.
(This is not a severe bug because in such a situation, dropping
from the remote and then modifying the file is allowed and has the same
end result.)

And, in content locking over tor, when the remote repository is
in direct mode, it neglected to check that the content was actually
present when locking it. This could cause git annex drop to remove
the only copy of a file when it thought the tor remote had a copy.

So, make lockContentShared do its own inAnnex check. This could perhaps
be optimised for direct mode, to avoid the check then, since locking
the content necessarily verifies it exists there, but I have not bothered
with that.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-03-07 14:23:52 -04:00
Joey Hess
bed6773346
Support exporttree=yes for rsync special remotes.
Renaming is not supported; it might be possible to use --fuzzy to get rsync
to notice the file is being renamed, but that is a bit ..fuzzy.

On the other hand, interrupted transfers of an exported file are resumed,
since rsync is great at that. Had to adjust the exporttree docs, which
said interrupted transfers would restart.

Note that remove no longer makes the empty directory dummy, instead
sending the top-level empty directory. This works just as well and I
noticed the dummy was unncessary when refactoring it into removeGeneric.
Verified that behavior of remove is not changed, and git annex
testremote does pass.

This commit was sponsored by Brock Spratlen on Patreon.
2018-02-28 13:36:20 -04:00
Joey Hess
d884e5b6fe
Added EXTENSIONS to external special remote protocol.
Allows using new special remote messages when git-annex supports them,
and avoiding using them when git-annex is too old. The new INFO is one
such message.

There's also the possibility, currently unused, for the special remote's
reply to include some kind of extensions of its own.

Merging this is blocked by https://github.com/datalad/datalad/issues/2124
since it seems it will break datalad. I checked all the other special
remotes and they will be ok.

This commit was supported by the NSF-funded DataLad project.
2018-02-07 15:02:12 -04:00
Joey Hess
7d9f0e0fbe
Added INFO to external special remote protocol.
It's left up to the special remote to detect when git-annex is new enough
to support the message; an old git-annex will blow up.

This commit was supported by the NSF-funded DataLad project.
2018-02-06 13:03:55 -04:00
Joey Hess
a28c541e23
add remote.<name>.annex-checkuuid
Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.

Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-01-10 14:21:18 -04:00
Joey Hess
2b66492d6e
Improve startup time for commands that do not operate on remotes
And for tab completion, by not unnessessarily statting paths to remotes,
which used to cause eg, spin-up of removable drives.

Got rid of the remotes member of Git.Repo. This was a bit painful.

Remote.Git modifies the list of remotes as it reads their configs,
so still need a persistent list of remotes. So, put it in as
Annex.gitremotes. It's only populated by getGitRemotes, so commands
like examinekey that don't care about remotes won't do so.

This commit was sponsored by Jake Vosloo on Patreon.
2018-01-09 16:22:07 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
8a0038ec23
avoid warning when youtube-dl is not installed
If a user does not have it installed, don't warn on every imported item
about it.
2017-11-30 13:43:55 -04:00
Joey Hess
99bebdface
youtube-dl working
Including resuming and cleanup of incomplete downloads.

Still todo: --fast, --relaxed, importfeed, disk reserve checking,
quvi code cleanup.

This commit was sponsored by Anthony DeRobertis on Patreon.
2017-11-29 16:40:32 -04:00
Joey Hess
4e7e1fcff4
add gitAnnexTmpWorkDir and withTmpWorkDir
Needed to run youtube-dl in, but could also be useful for other stuff.

The tricky part of this was making the workdir be cleaned up whenever the
tmp object file is cleaned up.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-29 13:53:39 -04:00
Joey Hess
595dfb6fe2
avoid build warning with old version of http 2017-11-21 12:45:49 -04:00
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00
Joey Hess
0e4bdd21a8
Fix directory special remote's cleanup of empty export directories.
Was trying to rmdir the file, which had already been deleted, and when that
failed, it skipped trying to delete the parent directories.

Noticed the bug through testremote, but it can't itself detect such
problems as there is no enumeration in the API.

This commit was sponsored by Brock Spratlen on Patreon.
2017-11-08 14:38:24 -04:00
Joey Hess
9d129367db
Web.checkKey: Fix handling of multiple urls
When there are multiple urls for a file, still treat it as being present
in the web when some urls don't work, as long as at least one url does
work.

This is consistent with the other web methods handling of multiple urls.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-11-07 16:15:44 -04:00
Joey Hess
a01b0680e3
fix version number 2017-10-11 11:43:03 -04:00
Joey Hess
6679705116
typo 2017-10-11 11:24:51 -04:00
Joey Hess
9aaf7e2b52
webdav: Avoid unncessisarily creating the collection at the top of the repo
when storing files there, since that collection is created by initremote.
(This seems to work around some brokenness of the box.com webdav server
which was entering a redirect loop.)

Note that the fix makes locationParent return Nothing instead of "."
when there's no parent directory between the path and the top of the webdav
repo.

This commit was sponsored by André Pereira on Patreon.
2017-10-11 11:10:33 -04:00
Joey Hess
61dccecad7
Fix build with aws-0.17.
This commit was sponsored by Denis Dzyubenko on Patreon.
2017-10-11 10:57:20 -04:00
Joey Hess
34bb350724
webdav: Make --debug show all webdav operations. 2017-10-07 14:11:32 -04:00
Joey Hess
5c32196a37
fix process and FD leak
Fix process and file descriptor leak that was exposed when git-annex was
built with ghc 8.2.1. Apparently ghc has changed its behavior of GC
of open file handles that are pipes to running processes. That
broke git-annex test on OSX due to running out of FDs.

Audited for all uses of Annex.new and made stopCoProcesses be called
once it's done with the state. Fixed several places that might have
leaked in other situations than running the test suite.

This commit was sponsored by Ewen McNeill.
2017-09-29 22:36:08 -04:00
Joey Hess
e9e5613e94
external crash fixes
When the external special remote program crashed, a newline
could be output, which messed up the expected output for --batch mode.

Avoid checking EXPORTSUPPORTED for special remotes that are
not configured to use exports. The datalad special remote apparently is/was
buggy and crashed on EXPORTSUPPORTED. Anyway, there's no need to send
it when the configuration doesn't need it.

This commit was supported by the NSF-funded DataLad project.
2017-09-28 15:44:45 -04:00
Joey Hess
f4746da4ca
webdav: Improve error message for failed request to include the request method and path. 2017-09-28 12:01:58 -04:00
Joey Hess
129418615b
refactor 2017-09-20 16:22:32 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
f4be3c3f89
merge changes made on other repos into ExportTree
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.

There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-09-18 19:21:41 -04:00
Joey Hess
55809081d0
update for ExportTree
Use ExportTree rather than ExportedLocation for retrieveKeyFile and
checkPresent. When another remote exported the content, ExportTree will
be populated, but ExportedLocation will not be.

It would be possible to implement storeKey to exports as well, but it
risks performing a lot of unncessary work when another repository
already stored the key on the export and the local repository doesn't
know about it.

The only way to avoid that work would be for storeKey to use checkPresentExport
before uploading. But, the other repository could have changed the
exported tree as well, so that can't be trusted, and if it were used in
storeKey, could result in bad information getting into the location log.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-09-18 14:45:00 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
4a45f34fe1
don't support removing content from export with removeKey
There does not seem to be a use case for supporting that, and it would
need a lot of complication to support it in a way that allows eventual
consistency when two repositories are updating the same export.

This commit was sponsored by Henrik Riomar on Patreon.
2017-09-17 17:56:33 -04:00
Joey Hess
e1f5c90c92
split out Types.Export 2017-09-15 16:46:03 -04:00
Joey Hess
e54a05612e
avoid unncessary db queries when exported directory can't be empty
In rename foo/bar to foo/baz, foo can't be empty.

In delete zxyyz, there's no exported directory (top doesn't count).
2017-09-15 16:30:49 -04:00
Joey Hess
cf51f40f0e
webdav: Changed path used on webdav server for temporary files.
Done to avoid a "tmp" directory appearing in webdav exports.

Also affects non-export webdav remotes, so interrupted uploads using the
old path will not overwrite it. However, PUT is quite likely to be
implemented atomically on web servers anyway, so I doubt this will cause
problems.
2017-09-15 15:52:31 -04:00
Joey Hess
c633144d28
remove empty directories when removing from export
The subtle part of this is what happens when the remote fails to remove
an empty directory. The removal from the export needs to fail in that
case, so the removal will be tried again later. However, removeExportLocation
has already been run and changed the export db, so if the next run
checks getExportLocation, it might decide nothing remains to be done,
leaving the empty directory.

Dealt with that by making removeEmptyDirectories, handle a failure
by calling addExportLocation, reverting the database changes so the next
run will be guaranteed to try deleting the empty directory again.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-09-15 15:22:53 -04:00
Joey Hess
bdcf19b095
add missing case 2017-09-15 14:32:56 -04:00
Joey Hess
9f4ffe65e9
implement removeExportDirectory
Not yet called by Command.Export.

WebDAV needs this to clean up empty collections. Also, example.sh turned
out to not be cleaning up directories when removing content
from them, so it made sense for it to use this.

Remote.Directory did not need it, and since its cleanup method for empty
directories is more efficient than what Command.Export will need to do
to find empty directories, it uses Nothing so that extra work can be
avoided.

This commit was sponsored by Thom May on Patreon.
2017-09-15 13:18:21 -04:00
Joey Hess
bf48ba4ef7
work around box.com webdav rename bug
Apparently box.com renaming is just buggy. I tried a couple of fixes:

* In case the http Manager was opening multiple connections and reaching
  different backend servers, I tried limiting the number of connections
  to 1. Didn't help.
* To make sure it was not a http connection reuse problem, I tried
  rewriting how exportAction works, so that the same http connection
  is clearly open. Didn't help.

So, disable renaming of exports for box.com. It would be good to test it
with some other webdav server.

This commit was sponsored by John Peloquin on Patreon.
2017-09-13 15:26:56 -04:00
Joey Hess
955c616956
fix exporting files in subdirectories to webdav
Use tmp/key when exporting, so the whole export directory structure does
not have to be created under tmp/

This commit was sponsored by Denis Dzyubenko on Patreon.
2017-09-13 15:09:19 -04:00
Joey Hess
28ba158a24
clear exportSupported for non-export remotes
Non-export remotes were being treated as untrusted, so the test suite
failed, and probably other things broke.
2017-09-13 12:05:53 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
e177bb1e25
webdav: Fix lack of url-escaping of filenames.
inDAVLocation does not url-escape, and so exporting a filename with spaces
to box.com at least resulted in a error 400.

It might also have affected storing keys on a webdav remote, if the key
contained a space or other problem character. Pretty unlikely.

I emailed Clint about the inDAVLocation gotcha, but seems best to fix it
here.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:45:03 -04:00
Joey Hess
2ca1d3cc01
deal with box.com horrible infinite redirect behavior
webdav: Checking if a non-existent file is present on Box.com triggered a
bug in its webdav support that generates an infinite series of redirects.

It seems to redirect foo to foo/ to foo/index.php to
foo/index.php/index.php ... Why a webdav endpoint would behave this way
who knows.

Deal with such problems by assuming such behavior means the file is not
present.

Can't simply disable following redirects, because the webdav endpoint could
legitimately be redirected to a new endpoint. So, when this happens
10 redirects have to be followed, before it gives up and assumes this means
the file does not exist.

This commit was supported by the NSF-funded DataLad project.
2017-09-12 15:13:42 -04:00
Joey Hess
4d3a464e83
export to webdav
This basically works, but there's a bug when renaming a file that leaves
a .git-annex-temp-content-key file in the webdav store, that never gets
cleaned up.

Also, exporting files with spaces to box.com seems to fail; perhaps it
does not support it?

This commit was supported by the NSF-funded DataLad project.
2017-09-12 14:10:09 -04:00
Joey Hess
7ef9b7ef46
update copyright year 2017-09-12 13:53:03 -04:00
Joey Hess
088d819cd8
propigate exception in checkPresentExportS3
checkPresentExport is supposed to throw exceptions
2017-09-12 13:46:33 -04:00
Joey Hess
1332e6cec0
stop warning about removals from IA
In a test, I uploaded a pdf, and several files were derived from it.
After removing the pdf, the derived files went away after approximatly
half an hour. This window does not seem worth warning about every time.
Documented it in the tip.
2017-09-12 12:47:43 -04:00
Joey Hess
da23dec7d3
avoid showing error when copy fails
Since renameExport is allowed to fail for any reason, and its failure is
always recovered from by doing a new upload and deleting the old
content, this avoids unnecessary noise.

Copying a file on the IA failed, apparently something wrong with their
emulation of S3:

  S3Error {s3StatusCode = Status {statusCode = 400, statusMessage = "Bad Request"}, s3ErrorCode = "InvalidArgument", s3ErrorMessage = "Invalid Argument", s3ErrorResource = Just "x-(amz|archive)-copy-source header is bad: 'joeyh-public-test2/foo'", s3ErrorHostId = Nothing, s3ErrorAccessKeyId = Nothing, s3ErrorStringToSign = Nothing, s3ErrorBucket = Nothing, s3ErrorEndpointRaw = Nothing, s3ErrorEndpoint = Nothing}

This commit was sponsored by Jake Vosloo on Patreon.
2017-09-12 12:42:44 -04:00
Joey Hess
267f47c473
S3: Allow removing files from IA, but warn about derived versions potentially still existing there.
Removal works, only derives are a potential issue, so allow removing
with a warning. This way, unexporting a file works, and behavior is
consistent with IA remotes whether or not exporttree=yes.

Also tested exporting filenames containing unicode, spaces, underscores.
All worked, despite the IA's faq saying it doesn't.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-09-12 12:35:58 -04:00
Joey Hess
afdff226fb
don't show key urls in whereis for S3 with public=yes and exporttree=yes 2017-09-08 16:44:00 -04:00
Joey Hess
650d0955a0
S3 export finalization
Fixed ACL issue, and updated some documentation.
2017-09-08 16:28:28 -04:00
Joey Hess
44cd5ae313
S3 export (untested)
It opens a http connection per file exported, but then so does git
annex copy --to s3.

Decided not to munge exported filenames for IA. Too large a chance of
the munging having confusing results. Instead, export of files not
supported by IA, eg with spaces in their name, will fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 15:46:24 -04:00
Joey Hess
a1b195d84c
External special remote protocol extended to support export.
Also updated example.sh to support export.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 14:24:05 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
5cd340ce27
rename bug fix 2017-09-06 15:48:14 -04:00
Joey Hess
a1cc9ec0fd
add export infication to git-annex info 2017-09-04 17:01:38 -04:00
Joey Hess
662f2a5ee7
git annex get from exports
Straightforward enough, except for the needed belt-and-suspenders sanity
checks to avoid foot shooting due to exports not being key/value stores.

* Even when annex.verify=false, always verify from exports.
* Only get files from exports that use a backend that supports
  checksum verification.
* Never trust exports, even if the user says to, because then
  `git annex drop` would drop content if the export seemed to contain
  a copy.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 16:39:56 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
bb08b1abd2
make storeExport atomic
This avoids needing to deal with the complexity of partially transferred
files in the export. We'd not be able to resume uploading to such a file
anyway, so just avoid them.

The implementation in Remote.Directory is not completely ideal, because
it could leave the temp file hanging around in the export directory.
This only happens if it's killed with -9, or there's a power failure;
normally viaTmp cleans up after itself, even when interrupted. I could
not see a better way to do it though, since the export directory might
be the root of a filesystem.

Also some design thoughts on resuming, which depend on storeExport being
atomic.

This commit was sponsored by Fernando Jimenez on Partreon.
2017-08-31 14:24:32 -04:00
Joey Hess
efe3910c04
remove empty parent dirs when removing from export 2017-08-31 12:32:02 -04:00
Joey Hess
9f3630f4e0
initial export command
Very basic operation works, but of course this is only the beginning.

This commit was sponsored by Nick Daly on Patreon.
2017-08-29 15:10:01 -04:00
Joey Hess
cca2764f91
provide file with content to export
Rather than providing the key to export, provide the file.

When exporting a treeish that contains files that are not annexed,
this will let the content of those files also be exported.

There's still a Key in the interface; it will be used by the external
special remote protocol. A SHA1 key can be used when exporting
non-annexed files.

This commit was sponsored by Brock Spratlen on Patreon.
2017-08-29 13:57:42 -04:00
Joey Hess
e55e445a36
add API for exporting
Implemented so far for the directory special remote.

Several remotes don't make sense to export to. Regular Git remotes,
obviously, do not. Bup remotes almost certianly do not, since bup would
need to be used to extract the export; same store for Ddar. Web and
Bittorrent are download-only. GCrypt is always encrypted so exporting to
it would be pointless. There's probably no point complicating the Hook
remotes with exporting at this point. External, S3, Glacier, WebDAV,
Rsync, and possibly Tahoe should be modified to support export.

Thought about trying to reuse the storeKey/retrieveKeyFile/removeKey
interface, rather than adding a new interface. But, it seemed better to
keep it separate, to avoid a complicated interface that sometimes
encrypts/chunks key/value storage and sometimes users non-key/value
storage. Any common parts can be factored out.

Note that storeExport is not atomic.
doc/design/exporting_trees_to_special_remotes.mdwn has some things in
the "resuming exports" section that bear on this decision. Basically,
I don't think, at this time, that an atomic storeExport would help with
resuming, because exports are not key/value storage, and we can't be
sure that a partially uploaded file is the same content we're currently
trying to export.

Also, note that ExportLocation will always use unix path separators.
This is important, because users may export from a mix of windows and
unix, and it avoids complicating the API with path conversions,
and ensures that in such a mix, they always use the same locations for
exports.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-08-29 13:00:41 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
dafafad115
external: nice error message for keys with spaces in their name
External special remotes will refuse to operate on keys with spaces in
their names. That has never worked correctly due to the design of the
external special remote protocol. Display an error message suggesting
migration.

Not super happy with this, but it's a pragmatic solution. Better than
complicating the external special remote interface and all external special
remotes.

Note that I only made it use SafeKey in Request, not Response. git-annex
does not construct a Response, so that would not add any safety. And
presumably, if git-annex avoids feeding any such keys to an external
special remote, it will never have a reason to make a Response using such a
key. If it did, it would result in a protocol error anyway.

There's still a Serializeable instance for Key; it's used by P2P.Protocol.
There, the Key is always in the final position, so it's ok if it contains
spaces.

Note that the protocol documentation has been fixed to say that the File
may contain spaces. One way that can happen, even though the Key can't,
is when using direct mode, and the work tree filename contains spaces.
When sending such a file to the external special remote the worktree
filename is used.

This commit was sponsored by Thom May on Patreon.
2017-08-17 16:18:34 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
0a2f7c261f
fix build with old http-client versions 2017-08-17 11:00:48 -04:00
Joey Hess
69dcb08d7a
Disable http-client's default 30 second response timeout when HEADing an url to check if it exists. Some web servers take quite a long time to answer a HEAD request. 2017-08-15 13:56:12 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
db1600b2de
de-Maybe remoteGitConfig
It's always set, so does not need to be a Maybe.
2017-05-11 16:05:01 -04:00
Joey Hess
57e923b712
gcrypt: Support re-enabling to change eg, encryption parameters.
This was never supported before. And it doesn't re-encrypt the
gcrypt repo to the new gcrypt-participants, but it does at least now not
crash, and set gcrypt-participants.

This commit was sponsored by andrea rota.
2017-04-07 14:10:34 -04:00
Joey Hess
3c8eb59860
When a http remote does not expose an annex.uuid config, only warn about it once, not every time git-annex is run.
Same behavior as for a ssh remote.
2017-03-29 12:43:47 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
5358fb992a
Windows: Improve handling of shebang in external special remote program, searching for the program in the PATH.
findShellCommand needs a full path to a file in order to check it for a
shebang on Windows. It was being run with only the base name of the external
special remote program, which would only work when it was in the current
directory.

This is why users in
https://github.com/DanielDent/git-annex-remote-rclone/pull/10 and elsewhere
were complaining that the previous improvements to git-annex didn't make
git-remote-rclone work on Windows.

Also, reworked checkearlytermination, which while it worked, seemed
to rely on a race condition. And, improved its error messages.

This commit was sponsored by Shane-o on Patreon.
2017-03-08 15:59:00 -04:00
Joey Hess
e6857e75a6
sync hack to make updateInstead work on eg FAT
sync: When syncing with a local repository located on a crippled
filesystem, run the post-receive hook there, since it wouldn't get run
otherwise. This makes pushing to repos on FAT-formatted removable drives
update them when receive.denyCurrentBranch=updateInstead.

Made Remote.Git export onLocal, which was cleaned up to not have so many
caveats about its use.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-02-17 15:21:52 -04:00
Joey Hess
00464fbed7
have onLocal stop any coprocesses, not only cat-file
I have not seen any other coprocesses being started, but let's avoid
problems if any do for whatever reason.
2017-02-17 14:30:18 -04:00
Joey Hess
f07af03018
Run ssh with -n whenever input is not being piped into it
... to avoid it consuming stdin that it shouldn't.

This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.

Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.

Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
2017-02-15 15:08:46 -04:00
Joey Hess
976676a7b0
S3: Fix check of uuid file stored in bucket, which was not working.
The check was broken in two ways.. First, nowhere did it error out when
checkUUIDFile found a different UUID already in the file. Instead,
it overwrote the uuid file.

And, checkUUIDFile's implementation was for some reason always failing with
a ConnectionClosed exception. Apparently something to do with using two
different runResourceT's and a response getting GCed inbetween. I'm pretty
sure that used to work, but changed to a more obviously correct
implementation.

This commit was sponsored by Peter Hogg on Patreon.
2017-02-13 15:35:24 -04:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
5c804cf42e
add SetupStage parameter to RemoteType.setup
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.

This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.

Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.

This commit was sponsored by Thom May on Patreon.
2017-02-07 14:55:58 -04:00
Joey Hess
655f707990
Fix build with aws 0.16. Thanks, aristidb. 2017-02-07 13:01:57 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
f275caf732
Increase default cost for p2p remotes from 200 to 1000. This makes git-annex prefer transferring data from special remotes when possible. 2017-01-06 15:23:30 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
b72352e1b1
fix build warning 2016-12-10 11:41:38 -04:00
Alper Nebi Yasak
93a22a1c97
Remove http-conduit (<2.2.0) constraint
Since https://github.com/aristidb/aws/issues/206 is resolved, this
constraint is no longer necessary. However, http-conduit (>=2.2.0)
requires http-client (>=0.5.0) which introduces some breaking changes.
This commit also implements those changes depending on the version.
Fixes: https://git-annex.branchable.com/bugs/Build_with_aws_head_fails/

Signed-off-by: Alper Nebi Yasak <alpernebiyasak@gmail.com>
2016-12-10 10:45:52 -04:00
Joey Hess
15be5c04a6
git-annex-shell, remotedaemon, git remote: Fix some memory DOS attacks.
The attacker could just send a very lot of data, with no \n and it would
all be buffered in memory until the kernel killed git-annex or perhaps OOM
killed some other more valuable process.

This is a low impact security hole, only affecting communication between
local git-annex and git-annex-shell on the remote system. (With either
able to be the attacker). Only those with the right ssh key can do it. And,
there are probably lots of ways to construct git repositories that make git
use a lot of memory in various ways, which would have similar impact as
this attack.

The fix in P2P/IO.hs would have been higher impact, if it had made it to a
released version, since it would have allowed DOSing the tor hidden
service without needing to authenticate.

(The LockContent and NotifyChanges instances may not be really
exploitable; since the line is read and ignored, it probably gets read
lazily and does not end up staying buffered in memory.)
2016-12-09 13:34:32 -04:00
Joey Hess
58f5d41cac
fix 2016-12-09 12:56:38 -04:00
Joey Hess
0f3a3ff1e5
make clear that log is only updated after successful removal
This does not change behavior, because an exception is thrown on
unsuccessful removal. But is clearer.
2016-12-09 12:54:18 -04:00
Joey Hess
ca1bcdcd7c
improve warning on connection loss 2016-12-09 12:35:45 -04:00
Joey Hess
c6972cb914
better format error 2016-12-08 16:02:26 -04:00
Joey Hess
af41519126
convert P2P runners from Maybe to Either String
So we get some useful error messages when things fail.

This commit was sponsored by Peter Hogg on Patreon.
2016-12-08 15:47:49 -04:00
Joey Hess
ad5ef51040
more p2p progress meters
Display progress meter on send and receive from remote.

Added a new hGetMetered that can read an exact number of bytes (or
less), updating a meter as it goes.

This commit was sponsored by Andreas on Patreon.
2016-12-07 14:25:01 -04:00
Joey Hess
83ea1cec86
update progress meter when sending to p2p remote
This commit was sponsored by Thom May on Patreon.
2016-12-07 13:37:35 -04:00
Joey Hess
757d36f8ca
validate peer uuid each time we talk to it
In case the repo on the peer changes uuid (eg by a new repo being moved
into place).

Also, added some warning messages when unable to communicate with a
peer.

This commit was sponsored by Anthony DeRobertis on Patreon.
2016-12-07 12:39:28 -04:00
Joey Hess
bb5168e894
need to auth with the peer 2016-12-06 15:50:02 -04:00
Joey Hess
f744bd5391
refactor 2016-12-06 15:43:03 -04:00
Joey Hess
26a53fb4a5
finish implementation of Remote.P2P (untested)
Not tested at all, but it just might work.
Only known problem is that progress is not updated when storing to a P2P
remote.

This commit was sponsored by Nick Daly on Patreon.
2016-12-06 15:09:04 -04:00
Joey Hess
b29088b8dc
stub Remote.P2P
Similar to GCrypt remotes, P2P remotes have an url, so Remote.Git has to
separate them out and handle them, passing off to Remote.P2P.

This commit was sponsored by Ignacio on Patreon.
2016-12-06 12:27:58 -04:00
Joey Hess
b88e44ea9a
use P2P auth for git-remote-tor-annex
This changes the environment variable name to the more generic
GIT_ANNEX_P2P_AUTHTOKEN.

This commit was sponsored by andrea rota.
2016-11-30 15:26:55 -04:00
Joey Hess
b08799893f
reorg 2016-11-22 14:37:09 -04:00
Joey Hess
af4d919793
unified AuthToken type between webapp and tor 2016-11-22 14:18:34 -04:00
Joey Hess
57a9484fbc
remove debug 2016-11-21 22:11:53 -04:00
Joey Hess
2da338bb8d
detect EOF on socket and cleanly shutdown the service process 2016-11-21 21:45:56 -04:00
Joey Hess
483dbcdbef
stop cleanly when there's a IO error accessing the Handle
All other exceptions are let through, but IO errors accessing the handle
are to be expected, so quietly ignore.
2016-11-21 21:32:51 -04:00
Joey Hess
ae69ebfc7c
try to gather scattered writes
git upload-pack makes some uncessary writes in sequence, this tries to
gather them together to avoid needing to send multiple DATA packets when
just one will do.

In a small pull, this reduces the average number of DATA packets from
4.5 to 2.5.
2016-11-21 20:56:58 -04:00
Joey Hess
9c311fb564
fix parse of CONNECTDONE 2016-11-21 19:33:57 -04:00
Joey Hess
6b992f672c
pull/push over tor working now
Still a couple bugs:

* Closing the connection to the server leaves git upload-pack /
  receive-pack running, which could be used to DOS.

* Sometimes the data is transferred, but it fails at the end, sometimes
  with:

  git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe)

  Must be a race condition around shutdown.
2016-11-21 19:24:55 -04:00
Joey Hess
070fb9e624
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service.
Almost working, but there's a bug in the relaying.

Also, made tor hidden service setup pick a random port, to make it harder
to port scan.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2016-11-21 17:27:38 -04:00
Joey Hess
9cf9ee73f5
improve p2p protocol implementation
Tested it in ghci a little now.
2016-11-20 16:42:18 -04:00
Joey Hess
74691ddf0e
remotedaemon: serve tor hidden service 2016-11-20 15:48:12 -04:00
Joey Hess
d50b0f3bb3
implement p2p protocol for Handle
This is most of the way to having the p2p protocol working over tor
hidden services, at least enough to do git push/pull.

The free monad was split into two, one for network operations and the
other for local (Annex) operations. This will allow git-remote-tor-annex
to run only an IO action, not needing the Annex monad.

This commit was sponsored by Remy van Elst on Patreon.
2016-11-20 12:16:32 -04:00
Joey Hess
0eaad7ca3a
extend p2p protocol to support gitremote-helpers connect
A bit tricky since Proto doesn't support threads. Rather than adding
threading support to it, ended up using a callback that waits for both
data on a Handle, and incoming messages at the same time.

This commit was sponsored by Denis Dzyubenko on Patreon.
2016-11-19 22:39:36 -04:00
Joey Hess
73a6b9b514
Add content locking to P2P protocol
Is content locking needed in the P2P protocol? Based on re-reading
bugs/concurrent_drop--from_presence_checking_failures.mdwn,
I think so: Peers can form cycles, and multiple peers can all be trying
to drop the same content.

So, added content locking to the protocol, with some difficulty.

The implementation is fine as far as it goes, but note the warning
comment for lockContentWhile -- if the connection to the peer is dropped
unexpectedly, the peer will then unlock the content, and yet the local
side will still think it's locked.

To be honest I'm not sure if Remote.Git's lockKey for ssh remotes
doesn't have the same problem. It checks that the
"ssh remote git-annex-shell lockcontent"
process has not exited, but if the connection closes afer that check,
the lockcontent command will unlock it, and yet the local side will
still think it's locked.

Probably this needs to be fixed by eg, making lockcontent catch any
execptions due to the connection closing, and in that case, wait a
significantly long time before dropping the lock.

This commit was sponsored by Anthony DeRobertis on Patreon.
2016-11-18 01:32:24 -04:00
Joey Hess
236ff111a7
rename 2016-11-17 22:10:28 -04:00
Joey Hess
b121078b35
refactor 2016-11-17 22:09:07 -04:00
Joey Hess
27c8a4a229
add CHECKPRESENT
Using SUCCESS to mean the content is present and FAILURE to mean it's not.
2016-11-17 21:56:02 -04:00
Joey Hess
cbffb61083
added REMOVE to protocol 2016-11-17 21:48:59 -04:00
Joey Hess
2b33452bd8
add ALREADY-HAVE response to PUT 2016-11-17 21:37:49 -04:00
Joey Hess
47b7028d7c
pass Len to writeKeyFile so it can detect short reads 2016-11-17 21:32:09 -04:00
Joey Hess
505d1df8ab
refactor 2016-11-17 21:04:35 -04:00
Joey Hess
ae403be24b
avoid setPresent when sending to a peer
This mirrors how git-annex-shell works; recvKey updates location
tracking, but sendKey does not.
2016-11-17 20:54:14 -04:00
Joey Hess
65e903397c
implementation of peer-to-peer protocol
For use with tor hidden services, and perhaps other transports later.

Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.

Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.

This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2016-11-17 18:30:50 -04:00
Joey Hess
2542fb58ed
fix giveup shadowing 2016-11-16 00:28:10 -04:00
Joey Hess
0a4479b8ec
Avoid backtraces on expected failures when built with ghc 8; only use backtraces for unexpected errors.
ghc 8 added backtraces on uncaught errors. This is great, but git-annex was
using error in many places for a error message targeted at the user, in
some known problem case. A backtrace only confuses such a message, so omit it.

Notably, commands like git annex drop that failed due to eg, numcopies,
used to use error, so had a backtrace.

This commit was sponsored by Ethan Aubin.
2016-11-15 21:29:54 -04:00
Joey Hess
5343544822
S3: Support the special case endpoint needed for the cn-north-1 region.
* S3: Support the special case endpoint needed for the cn-north-1 region.
* Webapp: Don't list the Frankfurt region, as this (and some other new
  regions) need V4 authorization which the aws library does not yet use.

This commit was sponsored by Nick Daly on Patreon.
2016-11-07 11:49:34 -04:00
Joey Hess
8dcf79694d
enable forwardRetry for command-line transfers
If a transfer fails for some reason, but some data managed to be sent, the
transfer will be retried. (The assistant already did this.)

Possible impacts:

* More ssh prompts if ssh needs to prompt for a password to connect to a
  host, or is prompting about some other problem like a ssh key mismatch.

* More data transfer due to retrying, epecially when a remote does not
  support resuming a transfer.

  In the worst case, a lot of data will be transferred but it fails before
  the end, and then all that data gets transferred again plus one byte more;
  repeat until it manages to get the whole file.
2016-10-26 15:38:27 -04:00
Joey Hess
166d70db77
convert TMVars that are never left empty into TVars
This is probably more efficient, and it avoids mistakenly leaving them
empty.
2016-09-30 19:51:16 -04:00
Joey Hess
37c8c6df99
include external special remote process number in debug
Not actual pid, because System.Process does not expose that.
2016-09-30 14:47:36 -04:00
Joey Hess
5bf4623a1d
allow multiple concurrent external special remote processes
Multiple external special remote processes for the same remote will be
started as needed when using -J.

This should not beak any existing external special remotes, because running
multiple git-annex commands at the same time could already start multiple
processes for the same external special remotes.
2016-09-30 14:29:02 -04:00
Joey Hess
b69dea0ac3
move externalConfig into ExternalState
Groundwork to having multiple processes running at once for an external
special remote; each needs its own externalConfig.
2016-09-30 13:36:50 -04:00
Joey Hess
63e21a607f
remove unnecessary mvar 2016-09-30 13:17:49 -04:00
Joey Hess
312ef4dfae
make --json-progress update meter when getting from git remote with rsync 2016-09-09 16:05:45 -04:00
Joey Hess
f292f78366
Windows: Handle shebang in external special remote program. 2016-09-05 12:09:23 -04:00
Joey Hess
10ddf2c3bd
remove TransferObserver
unused after last commit
2016-08-03 13:46:20 -04:00
Joey Hess
1a0e2c9901
get, move, copy, mirror: Added --failed switch which retries failed copies/moves
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.

Noisy due to some refactoring into Types/
2016-08-03 12:37:12 -04:00
Joey Hess
79704528c0
Support checking presence of content at a http url that redirects to a ftp url. 2016-07-12 16:41:45 -04:00
Joey Hess
d6483deeb1
testremote: Fix crash when testing a freshly made external special remote.
Ignore exceptions when getting the cost and availability for the remote,
and return sane defaults. These defaults are not cached, so if a special
remote program has a transient problem, it will re-query it later.
2016-07-05 16:34:39 -04:00
Joey Hess
f4db181d9b
fix warning 2016-05-27 11:15:52 -04:00
Joey Hess
1b3bde0625
enableremote: Remove annex-ignore configuration from a remote. 2016-05-24 15:58:27 -04:00
Joey Hess
20bfbb28ac
improved refactoring
ghc 8.0.1 didn't like runner because it used Rank2Types or something.
Instead, factor out the feeder action.
2016-05-23 18:47:30 -04:00
Joey Hess
0d0a796d63
plumb RemoteGitConfig through to encryptCipher 2016-05-23 17:48:38 -04:00
Joey Hess
b9ce477fa2
plumb RemoteGitConfig through to decryptCipher 2016-05-23 17:33:32 -04:00
Joey Hess
22c174158c
plumb RemoteGitConfig through to setRemoteCredPair 2016-05-23 17:08:43 -04:00
Joey Hess
91df4c6b53
Pass the various gnupg-options configs to gpg in several cases where they were not before.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.

RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
2016-05-23 17:03:20 -04:00
ilovezfs
fe944a96d3
git-annex: GHC compatibility 2016-05-23 11:02:34 -04:00
Joey Hess
7cacd7888b
Change git annex info remote encryption description to use wording closer to what's used in initremote. 2016-05-11 16:09:39 -04:00
Joey Hess
e219289c83
Added new encryption=sharedpubkey mode for special remotes.
This is useful for makking a special remote that anyone with a clone of the
repo and your public keys can upload files to, but only you can decrypt the
files stored in it.
2016-05-10 16:50:31 -04:00
Joey Hess
3f1aaa84c5
Added annex.gnupg-decrypt-options and remote.<name>.annex-gnupg-decrypt-options, which are passed to gpg when it's decrypting data.
The naming is unofrtunately not consistent, but the gnupg-options
were only used for encrypting, and it's too late to change that.

It would be nice to have a third setting that is always passed to gnupg,
but ~/.gnupg/options can be used to specify such global options when really
needed.
2016-05-10 13:03:56 -04:00
Joey Hess
6659c7ec0e
Propigate GIT_DIR and GIT_WORK_TREE environment to external special remotes.
Since git-annex unsets these when started, they have to be explicitly
propigated. Also, this makes --git-dir and --work-tree settings be
reflected in the environment.

The need for this came up in
https://github.com/DanielDent/git-annex-remote-rclone/issues/3
2016-05-06 12:26:44 -04:00
Joey Hess
dce4b1a189
improve info display of OtherStorageClass 2016-05-05 11:54:59 -04:00
Joey Hess
3b7713b493
use DIRHASH-LOWER for consistency 2016-05-03 14:10:11 -04:00
Joey Hess
4b9ddb9429
Added DIRHASH_LOWER to external special remote protocol. 2016-05-03 13:36:59 -04:00
Joey Hess
bfb4095c13
Improve behavior when a just added http remote is not available during uuid probe. Do not mark it as annex-ignore, so it will be tried again later. 2016-05-03 12:53:42 -04:00
Joey Hess
b890f3a53d
Fix bug that prevented resuming of uploads to encrypted special remotes that used chunking. This bug could also expose the names of keys to such remotes.
This is a low-severity security hole.
2016-04-27 12:54:43 -04:00
Joey Hess
850d0da699
Fix duplicate progress meter display when downloading from a git remote over http with -J. 2016-04-19 13:10:56 -04:00
Joey Hess
2d7e46ea98
fix drop hang reported by musicmatze
Fix hang when dropping content needs to lock the content on a ssh remote,
which occurred when the remote has git-annex version 5.20151019 or newer.

Analysis: `race` runs 2 threads at once, and the hGetLine finishes first.
So, it tries to cancel the waitForProcess, but unfortunately that is making
a foreign call and so cannot be canceled. The remote git-annex-shell
is waiting for a line on stdin before it will exit. Deadlock.

This only occurred sometimes; I reproduced it going from darkstar to
elephant, but not from darkstar to darkstar. Not sure how that fits into
the above analysis -- perhaps a race condition is also involved?

Fixed by not using `race`; now the hGetLine will fail with an exception
if the remote git-annex-shell exits without any output.
2016-04-18 14:04:50 -04:00
Gabor Greif
7f6da40c78
simplify code to make it compilable with ghc v7.11.20150407 2016-04-12 15:26:40 -04:00
Joey Hess
cf06dac2b8
hard links on windows
* annex.thin and annex.hardlink are now supported on Windows.
* unannex --fast now makes hard links on Windows.
2016-04-08 15:25:32 -04:00
Robie Basak
7948110134
ddar remote: fix ssh calls
sshOptions is now designed for working out ssh options only, and may
insert the extra options it is given to the middle. So it is incorrect
to call it with the remote parameters at the end. Instead, append them
to its return value.

This half regressed in 5be7ba7, and presumably regressed fully when
sshOptions was changed some time later.
2016-03-23 11:42:26 -04:00
Joey Hess
5d05aad74c
S3: Allow configuring with requeststyle=path to use path-style bucket access instead of the default DNS-style access.
untested
2016-02-09 15:36:36 -04:00
Joey Hess
c40d14a37d
WebDAV: Remove a bogus trailing slash from the end of the url to the temporary store location for a key. Thanks, wzhd.
That trailing slash is needed for legacy chunked mode, because it puts the
chunks in a subdir under the key. But, outside legacy chunked mode, it's BS
and it's amazing it worked at all with some webdav servers.
2016-02-09 11:50:40 -04:00
Joey Hess
850a645233
WebDAV: Set depth 1 in PROPFIND request, for better compatability with some servers. Thanks, wzhd. 2016-02-09 11:47:35 -04:00
Joey Hess
f051b51645
remove 3 build flags
* Removed the webapp-secure build flag, rolling it into the webapp build
  flag.
* Removed the quvi and tahoe build flags, which only adds aeson to
  the core dependencies.
* Removed the feed build flag, which only adds feed to the core
  dependencies.

Build flags have cost in both code complexity and also make Setup configure
have to work harder to find a usable set of build flags when some
dependencies are missing.
2016-01-26 08:14:57 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00