Commit graph

120 commits

Author SHA1 Message Date
Joey Hess
093fde5abd
completed the createDirectoryIfMissing conversion
Remaining calls in the assistant and Annex.Ssh have been audited and are ok.
2020-03-06 12:55:03 -04:00
Joey Hess
a490947068
annex.sshcaching warning improvement and allow overridding build time default
* When git-annex is built with a ssh that does not support ssh connection
  caching, default annex.sshcaching to false, but let the user override it.
* Improve warning messages further when ssh connection caching cannot
  be used, to clearly state why.
2020-02-14 14:21:03 -04:00
Joey Hess
5c3636037b
Display a warning when concurrency is enabled but ssh connection caching is not enabled or won't work due to a crippled filesystem
A warning message is unsatisfying. But erroring out is too hard a failure,
especially since it may well work fine if the user has enabled passwordless
ssh.

I did think about falling back to one ssh connection at a time in this
case, but it would have needed a rework of every ssh call, which
seems far overboard for such a niche problem. There's no single place where
git-annex runs ssh, so no one place that it could block a concurrent call
on a semaphore. And, even if it did fall back to one ssh connection at a
time, it seems to me that doing so without warning the user about the
problem just invites bug reports like "git-annex is ignoring my -J2 and
only doing one download at a time". So a warning is needed, and I suppose
is good enough.
2020-01-23 12:35:46 -04:00
Joey Hess
322c542b5c
fix ByteString conversion on windows
the encode' and decode' functions on Windows should not apply the
filesystem encoding, which does not work there. Instead, convert to and
from UTF-8.

Also, avoid exporting encodeW8 and decodeW8. Both use the filesystem
encoding, so won't work as expected on windows.
2019-12-18 13:32:56 -04:00
Joey Hess
82186ca58f
annex.jobs=cpus etc
Added the ability to run one job per CPU (core), by setting annex.jobs=cpus,
or using option --jobs=cpus or -Jcpus.

Built with future expansion in mind, including not defaulting matching on
Concurrency so more constructors can later be added, and using "cpu"
instead of "0".
2019-05-10 13:27:08 -04:00
Joey Hess
40ecf58d4b
update licenses from GPL to AGPL
This does not change the overall license of the git-annex program, which
was already AGPL due to a number of sources files being AGPL already.

Legally speaking, I'm adding a new license under which these files are
now available; I already released their current contents under the GPL
license. Now they're dual licensed GPL and AGPL. However, I intend
for all my future changes to these files to only be released under the
AGPL license, and I won't be tracking the dual licensing status, so I'm
simply changing the license statement to say it's AGPL.

(In some cases, others wrote parts of the code of a file and released it
under the GPL; but in all cases I have contributed a significant portion
of the code in each file and it's that code that is getting the AGPL
license; the GPL license of other contributors allows combining with
AGPL code.)
2019-03-13 15:48:14 -04:00
Joey Hess
7d51b0c109
import Utility.FileSystemEncoding in Common 2019-01-03 11:37:02 -04:00
Joey Hess
b3c69eaaf8
strict bytestring encoders and decoders
Only had lazy ones before.

Already sped up a few parts of the code.
2019-01-01 14:55:15 -04:00
Joey Hess
65bb30bcf5
fix accidental commit 2018-11-20 11:43:33 -04:00
Joey Hess
9c0cece35a
followup 2018-11-19 18:12:03 -04:00
Joey Hess
9127fe4821
add DebugLocks build flag
Using the method described in
https://www.fpcomplete.com/blog/2018/05/pinpointing-deadlocks-in-haskell
but my own code to implement it, and with callstacks added.

This work is supported by the NIH-funded NICEMAN (ReproNim TR&D3) project.
2018-11-19 15:02:43 -04:00
Joey Hess
b94294a43d
remove no longer needed uuid check in prepSocket
Since 3dd43df9c2, the socket warmup does
not run git-annex-shell on the remote host, and the point of this check
was to avoid error messages running git-annex-shell when it was not
installed. So the check is not needed any longer.

Also, this is one of only two uses of remoteGitConfig, which
I want to get rid of for reasons explained in
fc5888300f.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-05 12:51:17 -04:00
Joey Hess
ac6f58d642
fix ssh warmup hang
Fix race condition in ssh warmup that caused git-annex to get stuck and
never process some while when run with high levels of concurrency.

So far, I've isolated the problem to processTranscript, which hangs
reading output from ssh in this situation. I don't yet understand why
processTranscript behaves that way.

Since here we don't care about the ssh output, and only want to /dev/null
it, changed to not use processTranscript, avoiding its problem.

This commit was supported by the NSF-funded DataLad project.
2018-03-15 15:04:15 -04:00
Joey Hess
3dd43df9c2
Better ssh connection warmup when using -J for concurrency.
Avoids ugly messages when forced ssh command is not git-annex-shell.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-03-07 17:30:14 -04:00
Joey Hess
25703e1413
finally really add back custom-setup stanza
Fourth or fifth try at this and finally found a way to make it work.

Absurd amount of busy-work forced on me by change in cabal's behavior.
Split up Utility modules that need posix stuff out of ones used by
Setup. Various other hacks around inability for Setup to use anything
that ifdefs a use of unix.

Probably lost a full day of my life to this.
This is how build systems make their users hate them. Just saying.
2017-12-31 16:36:39 -04:00
Joey Hess
308cd1383c
fold Build/SysConfig.hs into BuildInfo via include
This avoids warnings from stack about the module not being listed in the
cabal file. So, the generated file is also renamed to Build/SysConfig.

Note that the setup program seems to be cached despite these changes; I
had to cabal clean to get cabal to update it so that Build/SysConfig was
written.

This commit was sponsored by Jochen Bartl on Patreon.
2017-12-14 12:46:57 -04:00
Joey Hess
fc845e6530
more lambda-case conversion 2017-12-05 15:00:50 -04:00
Joey Hess
df11e54788
avoid the dashed ssh hostname class of security holes
Security fix: Disallow hostname starting with a dash, which would get
passed to ssh and be treated an option. This could be used by an attacker
who provides a crafted ssh url (for eg a git remote) to execute arbitrary
code via ssh -oProxyCommand.

No CVE has yet been assigned for this hole.
The same class of security hole recently affected git itself,
CVE-2017-1000117.

Method: Identified all places where ssh is run, by git grep '"ssh"'
Converted them all to use a SshHost, if they did not already, for
specifying the hostname.

SshHost was made a data type with a smart constructor, which rejects
hostnames starting with '-'.

Note that git-annex already contains extensive use of Utility.SafeCommand,
which fixes a similar class of problem where a filename starting with a
dash gets passed to a program which treats it as an option.

This commit was sponsored by Jochen Bartl on Patreon.
2017-08-17 22:11:31 -04:00
Joey Hess
e23839acf3
Avoid error about git-annex-shell not being found when syncing with -J with a git remote where git-annex-shell is not installed.
This commit was sponsored by andrea rota.
2017-06-06 12:57:27 -04:00
Joey Hess
1d45e47e3f
clear regions before ssh prompt
When built with concurrent-output 1.9, ssh password prompts will no longer
interfere with the -J display.

To avoid flicker, only done when ssh actually does need to prompt;
ssh is first run in batch mode and if that succeeds the connection is up
and no need to clear regions.

This commit was supported by the NSF-funded DataLad project.
2017-05-16 15:50:11 -04:00
Joey Hess
6dd806f1ad
stop using MissingH for MD5
Cryptonite is faster and allocates less, and I want to get rid of
MissingH use.

Note that the new dependency on memory is free; it's a dependency of
cryptonite.

This commit was supported by the NSF-funded DataLad project.
2017-05-15 21:36:03 -04:00
Joey Hess
2c6cfbe503
also serialize ssh password prompting when json or quiet output is enable 2017-05-13 13:13:13 -04:00
Joey Hess
3f4b671486
fix sshCleanup race using STM 2017-05-11 18:29:51 -04:00
Joey Hess
6992fe133b
Ssh password prompting improved when using -J
When ssh connection caching is enabled (and when GIT_ANNEX_USE_GIT_SSH is
not set), only one ssh password prompt will be made per host, and only one
ssh password prompt will be made at a time.

This also fixes a race in prepSocket's stale ssh connection stopping
when run with -J. It was possible for one thread to start a cached ssh
connection, and another thread to immediately stop it, resulting in excess
connections being made.

This commit was supported by the NSF-funded DataLad project.
2017-05-11 17:36:03 -04:00
Joey Hess
a6416ba232
improve comment 2017-05-11 14:37:24 -04:00
Joey Hess
b6f26bac86
Disable git-annex's support for GIT_SSH and GIT_SSH_COMMAND, unless GIT_ANNEX_USE_GIT_SSH=1 is also set in the environment.
This is necessary because as feared, the extra -n parameter that git-annex
passes breaks uses of these environment variables that expect exactly the
parameters that git passes.

For example, see https://github.com/datalad/datalad/issues/1456

It would of course be possible to pre-close stdin before running ssh so not
needing the -n, and I think that would not even break ssh's password
caching. But it would probably involve a lot of work, possibly would need
to deal with some layering violations, and would be error-prone. The really
clean fix would be to make all the ssh stuff return a CreateProcess, which
could have the handle closed when appropriate, but that would be a large
reworing of the code base.

This commit was supported by the NSF-funded DataLad project.
2017-04-07 11:35:27 -04:00
Joey Hess
6af15d0ec9
rest of fix for GIT_SSH_COMMAND -n parameter
c8a6be7eef was incomplete
2017-03-20 23:35:29 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
35915a30d5
mention GIT_SSH_COMMAND 2017-02-20 12:58:08 -04:00
Joey Hess
f07af03018
Run ssh with -n whenever input is not being piped into it
... to avoid it consuming stdin that it shouldn't.

This fixes git-annex-checkpresentkey --batch remote, which didn't output
results for all keys passed into it.

Other git-annex commands that communicate with a remote over ssh may also
have been consuming stdin that they shouldn't have, which could have
impacted using them in eg, shell scripts. For example, a shell script
reading files from stdin and passing them to git annex drop would be
impacted by this bug, whenever git annex drop ran git-annex-shell
checkpresent, it would consume part/all of the stdin that the shell script
was supposed to consume.

Fixed by adding a ConsumeStdin parameter to Annex.Ssh.sshOptions, which
is used throughout git-annex to run ssh (in order for ssh connection
caching to work). Every call site was checked to see if it used
CreatePipe for stdin, and if not was marked NoConsumeStdin.
2017-02-15 15:08:46 -04:00
Joey Hess
8484c0c197
Always use filesystem encoding for all file and handle reads and writes.
This is a big scary change. I have convinced myself it should be safe. I
hope!
2016-12-24 14:46:31 -04:00
Joey Hess
48d9624a2d
Revert ServerAliveInterval
Revert ServerAliveInterval change in 6.20161111, which caused problems
with too many old versions of ssh and unusual ssh configurations.

It should have not been needed anyway since ssh is supposted to
have TCPKeepAlive enabled by default.
2016-12-13 12:12:38 -04:00
Joey Hess
7ed96a2405
Make .git/annex/ssh.config file work with versions of ssh older than 7.3, which don't support Include.
When used with an older version of ssh, any ServerAliveInterval in
~/.ssh/config will be overridden by .git/annex/ssh.config.

This commit was sponsored by Josh Taylor on Patreon.
2016-11-07 10:32:57 -04:00
Joey Hess
0ae08947ac
Run ssh with ServerAliveInterval 60
So that stalled transfers will be noticed within about 3 minutes,
even if TCPKeepAlive is disabled or doesn't work.

Rather than setting with -o, use -F with another config file,
so that any settings in ~/.ssh/config or /etc/ssh/ssh_config overrides this.
2016-10-26 16:41:34 -04:00
Joey Hess
1a8ba7eab4
Improve ssh socket cleanup code to skip over the cruft that NFS sometimes puts in a directory when a file is being deleted. 2016-10-26 13:16:41 -04:00
Joey Hess
b56218f0c2
Fix bug that prevented annex.sshcaching=false configuration from taking effect when on a crippled filesystem. Thanks, divergentdave. 2016-04-20 14:43:43 -04:00
Joey Hess
be2e9427ad
refactor 2016-02-25 13:46:31 -04:00
Joey Hess
737e45156e
remove 163 lines of code without changing anything except imports 2016-01-20 16:36:33 -04:00
Joey Hess
aaf1ef268d
convert from Utility.LockPool to Annex.LockPool everywhere 2015-11-12 18:13:37 -04:00
Joey Hess
7c741302cc
assistant: Pass ssh-options through 3 more git pull/push calls that were missed before.
It was used for regular pull, but not for regular push, tagged push, or the
fallback fetching.
2015-11-10 16:52:30 -04:00
Joey Hess
f7d7995172 clean 2015-08-04 17:07:45 -04:00
Joey Hess
3c971c414e sshopts is never going to be null; the concat of it may be 2015-08-04 16:53:38 -04:00
Joey Hess
a6374b7a3d typo 2015-08-04 15:44:46 -04:00
Joey Hess
f041a65c33 Windows: Fix bug that caused git-annex sync to fail due to missing environment variable.
I think that the problem was caused by windows not having a concept of an
env var that is set, but to the empty string. So, GIT_ANNEX_SSHOPTION
got set to "" and was not seen as set at all.

Easy fix, which also makes git-annex sync a little faster is to not set
GIT_SSH, when GIT_ANNEX_SSHOPTION has no options. Might as well let git use
ssh per usual in this case, no need to run git-annex as the proxy ssh
command..
2015-08-04 15:27:48 -04:00
Joey Hess
eb33569f9d remove Params constructor from Utility.SafeCommand
This removes a bit of complexity, and should make things faster
(avoids tokenizing Params string), and probably involve less garbage
collection.

In a few places, it was useful to use Params to avoid needing a list,
but that is easily avoided.

Problems noticed while doing this conversion:

	* Some uses of Params "oneword" which was entirely unnecessary
	  overhead.
	* A few places that built up a list of parameters with ++
	  and then used Params to split it!

Test suite passes.
2015-06-01 13:52:23 -04:00
Joey Hess
a6d54e49a0 sync, remotedaemon: Pass configured ssh-options even when annex.sshcaching is disabled. 2015-05-30 22:01:52 -04:00
Joey Hess
ecb0d5c087 use lock pools throughout git-annex
The one exception is in Utility.Daemon. As long as a process only
daemonizes once, which seems reasonable, and as long as it avoids calling
checkDaemon once it's already running as a daemon, the fcntl locking
gotchas won't be a problem there.

Annex.LockFile has it's own separate lock pool layer, which has been
renamed to LockCache. This is a persistent cache of locks that persist
until closed.

This is not quite done; lockContent stil needs to be converted.
2015-05-19 14:09:52 -04:00
Joey Hess
450ee53ab6 When re-execing git-annex, use current program location, rather than ~/.config/git-annex/program, when possible.
Most of the time, there will be no discreprancy between programPath and
readProgramFile.

But, the programFile might have been written by an old version of git-annex
that is still installed, while a newer one is currently running. In this
case, we want to run the same one that's currently running.

This is especially important for things like the GIT_SSH=git-annex used for
ssh connection caching.

The only code that still uses readProgramFile directly is the upgrade code,
which needs to know where the standalone git-annex was installed, in order to
upgrade it.
2015-02-28 17:23:13 -04:00
Joey Hess
15107d2c5a propigate ssh-options everywhere ssh caching is used
* sync: Use the ssh-options git config when doing git pull and push.
* remotedaemon: Use the ssh-options git config.

Note that the rename env var means that if a new git-annex calls an old one
for git-annex ssh, or a new calls an old, nothing much will go wrong;
just ssh caching won't happen.
2015-02-12 16:14:53 -04:00
Joey Hess
5be7ba7ee5 The ssh-options git config is now used by gcrypt, rsync, and ddar special remotes that use ssh as a transport. 2015-02-12 15:44:10 -04:00