Files are now written to a tmp directory in the remote, and once all
chunks are written, etc, it's moved into the final place atomically.
For now, checkpresent still checks every single chunk of a file, because
the old method could leave partially transferred files with some chunks
present and others not.
Inject the required git-remote-xmpp into PATH when running xmpp git push.
Rest of the time it will not be in PATH, and git won't be able to talk to
xmpp remotes.
I now have this topology working:
assistant ---> {bare repo, special remote} <--- assistant
And, I think, also this one:
+----------- bare repo --------+
v v
assistant ---> special remote <--- assistant
While before with assistant <---> assistant connections, both sides got
location info updated after a transfer, in this topology, the bare repo
*might* get its location info updated, but the other assistant has no way to
know that it did. And a special remote doesn't record location info,
so transfers to it won't propigate out location log changes at all.
So, for these to work, after a transfer succeeds, the git-annex branch
needs to be pushed. This is done by recording a synthetic commit has
occurred, which lets the pusher handle pushing out the change (which will
include actually committing any still journalled changes to the git-annex
branch).
Of course, this means rather a lot more syncing action than happened
before. At least the pusher bundles together very close together pushes,
somewhat. Currently it just waits 2 seconds between each push.
I had been using -ignore-package monads-tf to deal with this, but
the XMPP library uses monads-tf, so that also ignores it. Instead,
use PackageImports to force use of mtl in my own code.
This was complicated quite a bit by needing to check numcopies. I optimised
that, so it only looks up numcopies once per file, no matter how many
remotes it checks to drop from. Although it did just occur to me that
it might be better to first check if it wants to drop content, and only
then check numcopies..
When rsyncProgress pipes rsync's stdout, this turns out to cause a ssh
process started by rsync to be left behind as a zombie. I don't know why,
but my recent zombie reaping cleanup was correct, it's just that this other
zombie, that's not directly started by git-annex, was no longer reaped
due to changes in the cleanup. Make rsyncProgress reap the zombie started
by rsync, as a workaround.
FWIW, the process tree looks like this. It seems like the rsync child
is for some reason starting but not waiting on this extra ssh process.
Ssh connection caching may be involved -- disabling it seemed to change
the shape of the tree, but did not eliminate the zombie.
9378 pts/14 S+ 0:00 | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9379 pts/14 S+ 0:00 | | \_ ssh ...
9380 pts/14 S+ 0:00 | | \_ rsync -p --progress --inplace -4 -e 'ssh' '-S' ...
9381 pts/14 Z+ 0:00 | \_ [ssh] <defunct>
If the autostart file lists a repository, for which a directory exists,
but there's not actually a valid git repo in there, the web app used to
try to use it, and see it wasn't valid, and then try to autostart again.
The ensuing runaway loop also ate memory, although not as fast as I was led
to belive was happening to someone on IRC yesterday. So that guy may have
had a different problem. But this seems otherwise a reasonable fit for the
circumstances described, if git-annex was started before something that
occurred during desktop login that made the repository available.
Both when queueing downloads, and uploads, consults the preferred content
settings.
I didn't make it check yet when requeing failed transfers or queuing
deferred downloads; dealing with the preferred content settings (or indeed,
other settings) changing while the assistant is running still needs work.
I'm down to 9 places in the code that can produce unwaited for zombies.
Most of these are pretty innocuous, at least for now, are only
used in short-running commands, or commands that run a set of
actions and explicitly reap zombies after each one.
The one from Annex.Branch.files could be trouble later,
since both Command.Fsck and Command.Unused can trigger it,
and the assistant will be doing those eventally. Ditto the one in
Git.LsTree.lsTree, which Command.Unused uses.
The only ones currently affecting the assistant though, are
in Git.LsFiles. Several threads use several of those.
(And yeah, using pipes or ResourceT would be a less ad-hoc approach,
but I don't really feel like ripping my entire code base apart right
now to change a foundation monad. Maybe one of these days..)
Nearly everything that's reading from git is operating on a small
amount of output and has been switched to use that. Only pipeNullSplit
stuff continues using the lazy version that yields zombies.
This includes a full parser for the boolean expressions in the log,
that compiles them into Matchers. Those matchers are not used yet.
A complication is that matching against an expression should never
crash git-annex with an error. Instead, vicfg checks that the expressions
parse. If a bad expression (or an expression understood by some future
git-annex version) gets into the log, it'll be ignored.
Most of the code in Limit couldn't fail anyway, but I did have to make
limitCopies check its parameter first, and return an error if it's bad,
rather than erroring at runtime.
One note: Deleted lines are not currently parsed as config changes.
That makes sense for trust settings. It may make sense to support deleted
lines as a way to clear group settings.
Incomplete; I need to finish parsing and saving. This will also be used
for editing transfer control expresssions.
Removed the group display from the status output, I didn't really
like that format, and vicfg can be used to see as well as edit rempository
group membership.