parentDir is less safe than takeDirectory, especially when working
with relative FilePaths. It's really only useful in loops that
want to terminate at /
This commit was sponsored by Audric SCHILTKNECHT.
This fixes all instances of " \t" in the code base. Most common case
seems to be after a "where" line; probably vim copied the two space layout
of that line.
Done as a background task while listening to episode 2 of the Type Theory
podcast.
Also fixes a test suite failures introduced in recent commits, where
inAnnexSafe failed in indirect mode, since it tried to open the lock file
ReadWrite. This is why the new checkLocked opens it ReadOnly.
This commit was sponsored by Chad Horohoe.
Added a convenience Utility.LockFile that is not a windows/posix
portability shim, but still manages to cut down on the boilerplate around
locking.
This commit was sponsored by Johan Herland.
(With the exception of daemon pid locking.)
This fixes at part of #758630. I reproduced the assistant locking eg, a
removable drive's annex journal lock file and forking a long-running
git-cat-file process that inherited that lock.
This did not affect Windows.
Considered doing a portable Utility.LockFile layer, but git-annex uses
posix locks in several special ways that have no direct Windows equivilant,
and it seems like it would mostly be a complication.
This commit was sponsored by Protonet.
To use, set GIT_ANNEX_SSHASKPASS to point to a fifo or regular file
(FIFO is better, avoids touching disk or multiple readers) that contains
the password. Then set SSH_ASKPASS=git-annex, and when ssh runs it, it will
tell ssh the password.
This is not yet used..
For sync, saves 1 ssh connection per remote. For remotedaemon, the same
ssh connection that is already open to run git-annex-shell notifychanges
is reused to pull from the remote.
Only potential problem is that this also enables connection caching
when the assistant syncs with a ssh remote. Including the sync it does
when a network connection has just come up. In that case, cached ssh
connections are likely to be stale, and so using them would hang.
Until I'm sure such problems have been dealt with, this commit needs to
stay on the remotecontrol branch, and not be merged to master.
This commit was sponsored by Alexandre Dupas.
Old ssh did not check the hostname passed to -O stop, so I had used "any".
But now ssh does check it! I think this happened as part of the client-side
hostname canonicalization changes in 6.5p1, but have not verified that
introduced the problem.
The symptom was that it would try to dns lookup "any", which often caused a
bit of a delay at shutdown. And the old ssh connection kept running, so
it would do it over and over again.
Fixed by using localhost, which hopefully reliably resolves to some address
that ssh will accept.. Also nukeFile the socket after ssh has been asked to
shutdown, just in case.
This guarantees that stopping an existing socket never fails.
This might be the route out of the mess of needing to worry about socket
lengths in general. However, it would need quite a lot of refactoring
to make every place in git-annex that runs ssh run it with a cwd that was
determined by the location of its connection caching socket. If this
wasn't already such a mess, I'd consider even the thought of that API a bad
idea..
The control socket path passed to ssh needs to be 17 characters shorter
than the maximum unix domain socket length, because ssh appends stuff to it
to make a temporary filename. Closes: #725512
Also, take the shorter of the relative and the absolute paths to the
socket. Typically the relative path will be a lot shorter (unless
deep inside a subdirectory of the repository), and so using it will
avoid flirting with the maximum safe socket lenghts in more situations,
and so lead to less breakage if all my attempts at fixing this are
still buggy.
This is ok to do now that the socket filename never needs to be mapped back
to a hostname.
Short hostnames will still appear in the clear, which is less obfuscated.
So this cannot possibly make ssh connection caching fail for a hostname it
used to work for.
Turns out that with -O stop -S socketfile, ssh does not need the real
hostname, or port to be specificed. This is because it simply talks to the
ssh behind the socket and tells it to stop. So, can eliminate the
conversion back from a socketfile to host and port. Which will allow using
shorter filenames for sockets in the future.
Yeah, that didn't actually work. Got error messages like it couldn't read
from the control socket, so probably ssh doesn't really support that on
Windows, at least the cygwin ssh build I'm using.
That's needed in files used to build the configure program.
For the other files, I'm keeping my __WINDOWS__ define, as I find that much easier to type.
I may search and replace it to use the mingw32_HOST_OS thing later.
Introduced a new per-remote option 'annex-rsync-transport' to specify
the remote shell that it to be used with rsync. In case the value is
'ssh', connections are cached unless 'sshcaching' is unset.
Now there's a Config type, that's extracted from the git config at startup.
Note that laziness means that individual config values are only looked up
and parsed on demand, and so we get implicit memoization for all of them.
So this is not only prettier and more type safe, it optimises several
places that didn't have explicit memoization before. As well as getting rid
of the ugly explicit memoization code.
Not yet done for annex.<remote>.* configuration settings.
The standalone build does not bundle its own ssh, so should be built
to support as wide an array of ssh versions as possible, so turn off
connection caching.
Unfortunatly, as implemented this forces a full rebuild when building the
standalone binary, and of course it makes it somewhat slower.
This is not ideal, but neither is probing the ssh version every time it's
run (slow), or once when initializing a repo (fragile).
Baked into the code was an assumption that a repository's git directory
could be determined by adding ".git" to its work tree (or nothing for bare
repos). That fails when core.worktree, or GIT_DIR and GIT_WORK_TREE are
used to separate the two.
This was attacked at the type level, by storing the gitdir and worktree
separately, so Nothing for the worktree means a bare repo.
A complication arose because we don't learn where a repository is bare
until its configuration is read. So another Location type handles
repositories that have not had their config read yet. I am not entirely
happy with this being a Location type, rather than representing them
entirely separate from the Git type. The new code is not worse than the
old, but better types could enforce more safety.
Added support for core.worktree. Overriding it with -c isn't supported
because it's not really clear what to do if a git repo's config is read, is
not bare, and is then overridden to bare. What is the right git directory
in this case? I will worry about this if/when someone has a use case for
overriding core.worktree with -c. (See Git.Config.updateLocation)
Also removed and renamed some functions like gitDir and workTree that
misused git's terminology.
One minor regression is known: git annex add in a bare repository does not
print a nice error message, but runs git ls-files in a way that fails
earlier with a less nice error message. This is because before --work-tree
was always passed to git commands, even in a bare repo, while now it's not.
annex.ssh-options, annex.rsync-options, annex.bup-split-options.
And adjust types to avoid the bugs that broke several config settings
recently. Now "annex." prefixing is enforced at the type level.