This guarantees that stopping an existing socket never fails.
This might be the route out of the mess of needing to worry about socket
lengths in general. However, it would need quite a lot of refactoring
to make every place in git-annex that runs ssh run it with a cwd that was
determined by the location of its connection caching socket. If this
wasn't already such a mess, I'd consider even the thought of that API a bad
idea..
The control socket path passed to ssh needs to be 17 characters shorter
than the maximum unix domain socket length, because ssh appends stuff to it
to make a temporary filename. Closes: #725512
Also, take the shorter of the relative and the absolute paths to the
socket. Typically the relative path will be a lot shorter (unless
deep inside a subdirectory of the repository), and so using it will
avoid flirting with the maximum safe socket lenghts in more situations,
and so lead to less breakage if all my attempts at fixing this are
still buggy.
My implementation does not guard against double locking of the journal. But
it does ensure that the journal is always locked when operated on, by using
a type that is only produced by lockJournal, and which is required as a
parameter of all functions that operate on the journal.
Note that I had to add the fooStale functions for cases where it does not
make sense to lock the journal when querying it. I was more concerned about
ensuring that anything that modifies the journal is locked.
setJournalFile's implementation ensures that any query of the journal will
get one value or the other atomically, even if the journal is being changed
at the time.
This may not strictly be needed -- the transition code bypasses the
journal. However, this ensures that the git-annex branch is only
committed with the journal locked. This will allow for further
improvements.
Overridable with --user-agent option.
Not yet done for S3 or WebDAV due to limitations of libraries used --
nether allows a user-agent header to be specified.
This commit sponsored by Michael Zehrer.
Since 006cf7976f was incomplete, not being
able to get the right mode of the file when the index differs from HEAD,
this is a final workaround. Only buffering the start of the file
in this case avoids leaking memory.
This does not prevent git-cat-file being asked to output the whole file,
which needs to be consumed, and can be slow. But this only happens in a
rare edge case.
Done using a mode witness, which ensures it's fixed everywhere.
Fixing catFileKey was a bear, because git cat-file does not provide a
nice way to query for the mode of a file and there is no other efficient
way to do it. Oh, for libgit2..
Note that I am looking at tree objects from HEAD, rather than the index.
Because I cat-file cannot show a tree object for the index.
So this fix is technically incomplete. The only cases where it matters
are:
1. A new large file has been directly staged in git, but not committed.
2. A file that was committed to HEAD as a symlink has been staged
directly in the index.
This could be fixed a lot better using libgit2.
The second commit had some bad refs which resulted in the race detection
code running. But that commit was unnecessary anyway, it only was there to
merge in the other refs.
Wrote nice pure transition calculator, and ugly code to stage its results
into the git-annex branch. Also had to split up several Log modules
that Annex.Branch needed to use, but that themselves used Annex.Branch.
The transition calculator is limited to looking at and changing one file at
a time. While this made the implementation relatively easy, it precludes
transitions that do stuff like deleting old url log files for keys that are
being removed because they are no longer present anywhere.
Works, more or less. --dead is not implemented, and so far a new branch
is made, but keys no longer present anywhere are not scrubbed.
git annex sync fails to push the synced/git-annex branch after a forget,
because it's not a fast-forward of the existing synced branch. Could be
fixed by making git-annex sync use assistant-style sync branches.
When quvi is installed, git-annex addurl automatically uses it to detect
when an page is a video, and downloads the video file.
web special remote: Also support using quvi, for getting files,
or checking if files exist in the web.
This commit was sponsored by Mark Hepburn. Thanks!
Requires git 1.8.4 or newer. When it's installed, a background
git check-ignore process is run, and used to efficiently check ignores
whenever a new file is added.
Thanks to Adam Spiers, for getting the necessary support into git for this.
A complication is what to do about files that are gitignored but have
been checked into git anyway. git commands assume the ignore has been
overridden in this case, and not need any more overriding to commit a
changed version.
However, for the assistant to do the same, it would have to run git ls-files
to check if the ignored file is in git. This is somewhat expensive. Or it
could use the running git-cat-file process to query the file that way,
but that requires transferring the whole file content over a pipe, so it
can be quite expensive too, for files that are not git-annex
symlinks.
Now imagine if the user knows that a file or directory tree will be getting
frequent changes, and doesn't want the assistant to sync it, so gitignores
it. The assistant could overload the system with repeated ls-files checks!
So, I've decided that the assistant will not automatically commit changes
to files that are gitignored. This is a tradeoff. Hopefully it won't be a
problem to adjust .gitignore settings to not ignore files you want the
assistant to autocommit, or to manually git annex add files that are listed
in .gitignore.
(This could be revisited if git-annex gets access to an interface to check
the content of the index w/o forking a git command. This could be libgit2,
or perhaps a separate git cat-file --batch-check process, so it wouldn't
need to ship over the whole file content.)
This commit was sponsored by Francois Marier. Thanks!
Started with a problem when running addurl on a really long url,
because the whole url is munged into the filename. Ended up doing
a fairly extensive review for places where filenames could get too large,
although it's hard to say I'm not missed any..
Backend.Url had a 128 character limit, which is fine when the limit is 255,
but not if it's a lot shorter on some systems. So check the pathconf()
limit. Note that this could result in fromUrl creating different keys
for the same url, if run on systems with different limits. I don't see
this is likely to cause any problems. That can already happen when using
addurl --fast, or if the content of an url changes.
Both Command.AddUrl and Backend.Url assumed that urls don't contain a
lot of multi-byte unicode, and would fail to truncate an url that did
properly.
A few places use a filename as the template to make a temp file.
While that's nice in that the temp file name can be easily related back to
the original filename, it could lead to `git annex add` failing to add a
filename that was at or close to the maximum length.
Note that in Command.Add.lockdown, the template is still derived from the
filename, just with enough space left to turn it into a temp file.
This is an important optimisation, because the assistant may lock down
a bunch of files all at once, and using the same template for all of them
would cause openTempFile to iterate through the same set of names,
looking for an unused temp file. I'm not very happy with the relatedTemplate
hack, but it avoids that slowdown.
Backend.WORM does not limit the filename stored in the key.
I have not tried to change that; so git annex add will fail on really long
filenames when using the WORM backend. It seems better to preserve the
invariant that a WORM key always contains the complete filename, since
the filename is the only unique material in the key, other than mtime and
size. Since nobody has complained about add failing (I think I saw it
once?) on WORM, probably it's ok, or nobody but me uses it.
There may be compatability problems if using git annex addurl --fast
or the WORM backend on a system with the 255 limit and then trying to use
that repo in a system with a smaller limit. I have not tried to deal with
those.
This commit was sponsored by Alexander Brem. Thanks!
This is ok to do now that the socket filename never needs to be mapped back
to a hostname.
Short hostnames will still appear in the clear, which is less obfuscated.
So this cannot possibly make ssh connection caching fail for a hostname it
used to work for.
Turns out that with -O stop -S socketfile, ssh does not need the real
hostname, or port to be specificed. This is because it simply talks to the
ssh behind the socket and tells it to stop. So, can eliminate the
conversion back from a socketfile to host and port. Which will allow using
shorter filenames for sockets in the future.
If the file is > 8192 bytes, it's certianly not a symlink file.
And if it contains nuls or newlines or whitespace, it's certianly
not a link to annexed content. But it might be a tarball containing
a git-annex repo.
This hack is only needed on FAT filesystems, so there's no point in doing
it the rest of the time. And it's possible for there to be a false
positive, so it's best to avoid the hack when possible.
Test suite on windows failed running git annex init in a bare clone of an
annexed repo. The annex directory didn't exist when it tried to write the
inode sentinal file.
Yeah, that didn't actually work. Got error messages like it couldn't read
from the control socket, so probably ssh doesn't really support that on
Windows, at least the cygwin ssh build I'm using.