git-annex/doc/design/assistant/blog/day_99_shotgun.mdwn

71 lines
3 KiB
Text
Raw Normal View History

2012-10-05 21:09:17 +00:00
Fixed the assistant to wait on all the zombie processes that would sometimes
pile up. I didn't realize this was as bad as it was.
Zombies and git-annex have been a problem since I started developing it,
because back then I made some rather poor choices, due to barely knowing
how to write Haskell. So parts of the code that stream input from git commands
don't clean up after them properly. Not normally a problem, because
git-annex reaps the zombies after each file it processes. But this reaping
is not thread-safe; it cannot be used in the assistant.
If I were starting git-annex today, I'd use one of the new Haskell things like
Conduits, that allow for very clean control over finalization of resources.
But switching it to Conduits now would probably take weeks of work; I've not
yet felt it was worthwhile. (Also it's not clear Conduits are the last,
best thing.)
For now, it keeps track of the pids it needs to wait on, and all the code
run by the assistant is zombie-free. However, some code for fsck and unused
that I anticipate the assistant using eventually still has some lurking
zombies.
----
Solved the issue with preferred content expressions and dropping that
I mentioned yesterday. My solution was to add a parameter to specify a set
of repositories where content should be assumed not to be present. When
deciding whether to drop, it can put the current repository in, and then
if the expression fails to match, the content can be dropped.
Using yesterday's example "(not copies=trusted:2) and (not in=usbdrive)",
when the local repo is one of the 2 trusted copies, the drop check will
see only 1 trusted copy, so the expression matches, and so the content will
not be dropped.
I've not tested my solution, but it type checks. :P I'll wire it up to
`get/drop/move --auto` tomorrow and see how it performs.
----
Would preferred content expressions be more readble if they were inverted
(becoming content filtering expressions)?
1. "(not copies=trusted:2) and (not in=usbdrive)" becomes
"copies=trusted:2 or in=usbdrive"
2. "smallerthan=10mb and include=*.mp3 and exclude=junk/*" becomes
"largerthan=10mb or exclude=*.mp3" or include=junk/*"
3. "(not group=archival) and (not copies=archival:1)" becomes
"group=archival or copies=archival:1"
1 and 3 are improved, but 2, less so. It's a trifle weird for "include"
to mean "include in excluded content".
The other reason not to do this is that currently the expressions
can be fed into `git annex find` on the command line, and it'll come
back with the files that would be kept.
Perhaps a middle groud is to make "dontwant" be an alias for "not".
Then we can write "dontwant (copies=trusted:2 or in=usbdrive)"
----
A user told me this:
> I can confirm that the assistant does what it is supposed to do really well. I
> just hooked up my notebook to the network and it starts syncing from notebook to
> fileserver and the assistant on the fileserver also immediately starts syncing
> to the [..] backup
That makes me happy, it's the first quite so real-world success report I've
heard.