Commit graph

270 commits

Author SHA1 Message Date
Joey Hess
f5edb16729
Display progress meter when uploading a key without size information
Getting the size by statting the content file.

This commit was supported by the NSF-funded DataLad project.
2017-11-14 16:40:49 -04:00
Joey Hess
129418615b
refactor 2017-09-20 16:22:32 -04:00
Joey Hess
2e69efea8d
git annex sync --content to exports
Assistant still todo.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon
2017-09-19 14:20:47 -04:00
Joey Hess
f4be3c3f89
merge changes made on other repos into ExportTree
Now when one repository has exported a tree, another repository can get
files from the export, after syncing.

There's a bug: While the database update works, somehow the database on
disk does not get updated, and so the database update is run the next
time, etc. Wasn't able to figure out why yet.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-09-18 19:21:41 -04:00
Joey Hess
55809081d0
update for ExportTree
Use ExportTree rather than ExportedLocation for retrieveKeyFile and
checkPresent. When another remote exported the content, ExportTree will
be populated, but ExportedLocation will not be.

It would be possible to implement storeKey to exports as well, but it
risks performing a lot of unncessary work when another repository
already stored the key on the export and the local repository doesn't
know about it.

The only way to avoid that work would be for storeKey to use checkPresentExport
before uploading. But, the other repository could have changed the
exported tree as well, so that can't be trusted, and if it were used in
storeKey, could result in bad information getting into the location log.

This commit was sponsored by Bruno BEAUFILS on Patreon.
2017-09-18 14:45:00 -04:00
Joey Hess
b03d77c211
add ExportTree table to export db
New table needed to look up what filenames are used in the currently
exported tree, for reasons explained in export.mdwn.

Also, added smart constructors for ExportLocation and ExportDirectory to
make sure they contain filepaths with the right direction slashes.

And some code refactoring.

This commit was sponsored by Francois Marier on Patreon.
2017-09-18 13:59:59 -04:00
Joey Hess
4a45f34fe1
don't support removing content from export with removeKey
There does not seem to be a use case for supporting that, and it would
need a lot of complication to support it in a way that allows eventual
consistency when two repositories are updating the same export.

This commit was sponsored by Henrik Riomar on Patreon.
2017-09-17 17:56:33 -04:00
Joey Hess
e1f5c90c92
split out Types.Export 2017-09-15 16:46:03 -04:00
Joey Hess
e54a05612e
avoid unncessary db queries when exported directory can't be empty
In rename foo/bar to foo/baz, foo can't be empty.

In delete zxyyz, there's no exported directory (top doesn't count).
2017-09-15 16:30:49 -04:00
Joey Hess
c633144d28
remove empty directories when removing from export
The subtle part of this is what happens when the remote fails to remove
an empty directory. The removal from the export needs to fail in that
case, so the removal will be tried again later. However, removeExportLocation
has already been run and changed the export db, so if the next run
checks getExportLocation, it might decide nothing remains to be done,
leaving the empty directory.

Dealt with that by making removeEmptyDirectories, handle a failure
by calling addExportLocation, reverting the database changes so the next
run will be guaranteed to try deleting the empty directory again.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-09-15 15:22:53 -04:00
Joey Hess
9f4ffe65e9
implement removeExportDirectory
Not yet called by Command.Export.

WebDAV needs this to clean up empty collections. Also, example.sh turned
out to not be cleaning up directories when removing content
from them, so it made sense for it to use this.

Remote.Directory did not need it, and since its cleanup method for empty
directories is more efficient than what Command.Export will need to do
to find empty directories, it uses Nothing so that extra work can be
avoided.

This commit was sponsored by Thom May on Patreon.
2017-09-15 13:18:21 -04:00
Joey Hess
28ba158a24
clear exportSupported for non-export remotes
Non-export remotes were being treated as untrusted, so the test suite
failed, and probably other things broke.
2017-09-13 12:05:53 -04:00
Joey Hess
9c3622882b
export: cache connections for S3 and webdav 2017-09-12 16:59:04 -04:00
Joey Hess
afdff226fb
don't show key urls in whereis for S3 with public=yes and exporttree=yes 2017-09-08 16:44:00 -04:00
Joey Hess
a1b195d84c
External special remote protocol extended to support export.
Also updated example.sh to support export.

This commit was supported by the NSF-funded DataLad project.
2017-09-08 14:24:05 -04:00
Joey Hess
16eb2f976c
prevent exporttree=yes on remotes that don't support exports
Don't allow "exporttree=yes" to be set when the special remote
does not support exports. That would be confusing since the user would
set up a special remote for exports, but `git annex export` to it would
later fail.

This commit was supported by the NSF-funded DataLad project.
2017-09-07 13:48:44 -04:00
Joey Hess
a1cc9ec0fd
add export infication to git-annex info 2017-09-04 17:01:38 -04:00
Joey Hess
662f2a5ee7
git annex get from exports
Straightforward enough, except for the needed belt-and-suspenders sanity
checks to avoid foot shooting due to exports not being key/value stores.

* Even when annex.verify=false, always verify from exports.
* Only get files from exports that use a backend that supports
  checksum verification.
* Never trust exports, even if the user says to, because then
  `git annex drop` would drop content if the export seemed to contain
  a copy.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 16:39:56 -04:00
Joey Hess
28e2cad849
implement exporttree=yes configuration
* Only export to remotes that were initialized to support it.
* Prevent storing key/value on export remotes.
* Prevent enabling exporttree=yes and encryption in the same remote.

SetupStage Enable was changed to take the old RemoteConfig.
This allowed only setting exporttree when initially setting up a
remote, and not configuring it later after stuff might already be stored
in the remote.

Went with =yes rather than =true for consistency with other parts of
git-annex. Changed docs accordingly.

This commit was supported by the NSF-funded DataLad project.
2017-09-04 13:09:38 -04:00
Joey Hess
a4328b49d2
refactor ExportActions
This will allow disabling exports for remotes that are not configured to
allow them. Also, exportSupported will be useful for the external
special remote to probe.

This commit was supported by the NSF-funded DataLad project
2017-09-01 13:05:09 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
a1730cd6af
adeiu, MissingH
Removed dependency on MissingH, instead depending on the split
library.

After laying groundwork for this since 2015, it
was mostly straightforward. Added Utility.Tuple and
Utility.Split. Eyeballed System.Path.WildMatch while implementing
the same thing.

Since MissingH's progress meter display was being used, I re-implemented
my own. Bonus: Now progress is displayed for transfers of files of
unknown size.

This commit was sponsored by Shane-o on Patreon.
2017-05-16 01:03:52 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
c8e1e3dada
AssociatedFile newtype
To prevent any further mistakes like 301aff34c4

This commit was sponsored by Francois Marier on Patreon.
2017-03-10 13:35:31 -04:00
Joey Hess
f07af03018
Run ssh with -n whenever input is not being piped into it
... to avoid it consuming stdin that it shouldn't.

This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.

Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.

Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
2017-02-15 15:08:46 -04:00
Edward Betts
0750913136
correct spelling mistakes 2017-02-12 17:30:23 -04:00
Joey Hess
9eb10caa27
Some optimisations to string splitting code.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.

As well as being an optimisation, this helps move toward eliminating use of
missingh.

(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)

I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2017-01-31 19:06:22 -04:00
Joey Hess
f744bd5391
refactor 2016-12-06 15:43:03 -04:00
Joey Hess
b88e44ea9a
use P2P auth for git-remote-tor-annex
This changes the environment variable name to the more generic
GIT_ANNEX_P2P_AUTHTOKEN.

This commit was sponsored by andrea rota.
2016-11-30 15:26:55 -04:00
Joey Hess
b08799893f
reorg 2016-11-22 14:37:09 -04:00
Joey Hess
af4d919793
unified AuthToken type between webapp and tor 2016-11-22 14:18:34 -04:00
Joey Hess
57a9484fbc
remove debug 2016-11-21 22:11:53 -04:00
Joey Hess
2da338bb8d
detect EOF on socket and cleanly shutdown the service process 2016-11-21 21:45:56 -04:00
Joey Hess
483dbcdbef
stop cleanly when there's a IO error accessing the Handle
All other exceptions are let through, but IO errors accessing the handle
are to be expected, so quietly ignore.
2016-11-21 21:32:51 -04:00
Joey Hess
ae69ebfc7c
try to gather scattered writes
git upload-pack makes some uncessary writes in sequence, this tries to
gather them together to avoid needing to send multiple DATA packets when
just one will do.

In a small pull, this reduces the average number of DATA packets from
4.5 to 2.5.
2016-11-21 20:56:58 -04:00
Joey Hess
9c311fb564
fix parse of CONNECTDONE 2016-11-21 19:33:57 -04:00
Joey Hess
6b992f672c
pull/push over tor working now
Still a couple bugs:

* Closing the connection to the server leaves git upload-pack /
  receive-pack running, which could be used to DOS.

* Sometimes the data is transferred, but it fails at the end, sometimes
  with:

  git-remote-tor-annex: <socket: 10>: commitBuffer: resource vanished (Broken pipe)

  Must be a race condition around shutdown.
2016-11-21 19:24:55 -04:00
Joey Hess
070fb9e624
Added git-remote-tor-annex, which allows git pull and push to the tor hidden service.
Almost working, but there's a bug in the relaying.

Also, made tor hidden service setup pick a random port, to make it harder
to port scan.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2016-11-21 17:27:38 -04:00
Joey Hess
9cf9ee73f5
improve p2p protocol implementation
Tested it in ghci a little now.
2016-11-20 16:42:18 -04:00
Joey Hess
74691ddf0e
remotedaemon: serve tor hidden service 2016-11-20 15:48:12 -04:00
Joey Hess
d50b0f3bb3
implement p2p protocol for Handle
This is most of the way to having the p2p protocol working over tor
hidden services, at least enough to do git push/pull.

The free monad was split into two, one for network operations and the
other for local (Annex) operations. This will allow git-remote-tor-annex
to run only an IO action, not needing the Annex monad.

This commit was sponsored by Remy van Elst on Patreon.
2016-11-20 12:16:32 -04:00
Joey Hess
0eaad7ca3a
extend p2p protocol to support gitremote-helpers connect
A bit tricky since Proto doesn't support threads. Rather than adding
threading support to it, ended up using a callback that waits for both
data on a Handle, and incoming messages at the same time.

This commit was sponsored by Denis Dzyubenko on Patreon.
2016-11-19 22:39:36 -04:00
Joey Hess
73a6b9b514
Add content locking to P2P protocol
Is content locking needed in the P2P protocol? Based on re-reading
bugs/concurrent_drop--from_presence_checking_failures.mdwn,
I think so: Peers can form cycles, and multiple peers can all be trying
to drop the same content.

So, added content locking to the protocol, with some difficulty.

The implementation is fine as far as it goes, but note the warning
comment for lockContentWhile -- if the connection to the peer is dropped
unexpectedly, the peer will then unlock the content, and yet the local
side will still think it's locked.

To be honest I'm not sure if Remote.Git's lockKey for ssh remotes
doesn't have the same problem. It checks that the
"ssh remote git-annex-shell lockcontent"
process has not exited, but if the connection closes afer that check,
the lockcontent command will unlock it, and yet the local side will
still think it's locked.

Probably this needs to be fixed by eg, making lockcontent catch any
execptions due to the connection closing, and in that case, wait a
significantly long time before dropping the lock.

This commit was sponsored by Anthony DeRobertis on Patreon.
2016-11-18 01:32:24 -04:00
Joey Hess
236ff111a7
rename 2016-11-17 22:10:28 -04:00
Joey Hess
b121078b35
refactor 2016-11-17 22:09:07 -04:00
Joey Hess
27c8a4a229
add CHECKPRESENT
Using SUCCESS to mean the content is present and FAILURE to mean it's not.
2016-11-17 21:56:02 -04:00
Joey Hess
cbffb61083
added REMOVE to protocol 2016-11-17 21:48:59 -04:00
Joey Hess
2b33452bd8
add ALREADY-HAVE response to PUT 2016-11-17 21:37:49 -04:00
Joey Hess
47b7028d7c
pass Len to writeKeyFile so it can detect short reads 2016-11-17 21:32:09 -04:00
Joey Hess
505d1df8ab
refactor 2016-11-17 21:04:35 -04:00