67 lines
		
	
	
	
		
			3.1 KiB
			
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			67 lines
		
	
	
	
		
			3.1 KiB
			
		
	
	
	
		
			Markdown
		
	
	
	
	
	
Since last post, I've worked on speeding up `git annex watch`'s startup time
 | 
						|
in a large repository.
 | 
						|
 | 
						|
The problem was that its initial scan was naively staging every symlink in
 | 
						|
the repository, even though most of them are, presumably, staged correctly
 | 
						|
already. This was done in case the user copied or moved some symlinks
 | 
						|
around while `git annex watch` was not running -- we want to notice and
 | 
						|
commit such changes at startup.
 | 
						|
 | 
						|
Since I already had the `stat` info for the symlink, it can look at the
 | 
						|
`ctime` to see if the symlink was made recently, and only stage it if so.
 | 
						|
This sped up startup in my big repo from longer than I cared to wait (10+
 | 
						|
minutes, or half an hour while profiling) to a minute or so. Of course,
 | 
						|
inotify events are already serviced during startup, so making it scan
 | 
						|
quickly is really only important so people don't think it's a resource hog.
 | 
						|
First impressions are important. :)
 | 
						|
 | 
						|
But what does "made recently" mean exactly? Well, my answer is possibly
 | 
						|
over engineered, but most of it is really groundwork for things I'll need
 | 
						|
later anyway. I added a new data structure for tracking the status of the
 | 
						|
daemon, which is periodically written to disk by another thread (thread #6!)
 | 
						|
to `.git/annex/daemon.status` Currently it looks like this; I anticipate
 | 
						|
adding lots more info as I move into the [[syncing]] stage:
 | 
						|
 | 
						|
	lastRunning:1339610482.47928s
 | 
						|
	scanComplete:True
 | 
						|
 | 
						|
So, only symlinks made after the daemon was last running need to be
 | 
						|
expensively staged on startup. Although, as RichiH pointed out,
 | 
						|
this fails if the clock is changed. But I have been planning to have a
 | 
						|
cleanup thread anyway, that will handle this, and other
 | 
						|
potential problems, so I think that's ok.
 | 
						|
 | 
						|
Stracing its startup scan, it's fairly tight now. There are some repeated 
 | 
						|
`getcwd` syscalls that could be optimised out for a minor speedup.
 | 
						|
 | 
						|
----
 | 
						|
 | 
						|
Added the sanity check thread. Thread #7! It currently only does one sanity
 | 
						|
check per day, but the sanity check is a fairly lightweight job,
 | 
						|
so I may make it run more frequently. OTOH, it may never ever find a
 | 
						|
problem, so once per day seems a good compromise. 
 | 
						|
 | 
						|
Currently it's only checking that all files in the tree are properly staged
 | 
						|
in git. I might make it `git annex fsck` later, but fscking the whole tree
 | 
						|
once per day is a bit much. Perhaps it should only fsck a few files per
 | 
						|
day? TBD
 | 
						|
 | 
						|
Currently any problems found in the sanity check are just fixed and logged.
 | 
						|
It would be good to do something about getting problems that might indicate
 | 
						|
bugs fed back to me, in a privacy-respecting way. TBD
 | 
						|
 | 
						|
----
 | 
						|
 | 
						|
I also refactored the code, which was getting far too large to all be in
 | 
						|
one module. 
 | 
						|
 | 
						|
I have been thinking about renaming `git annex watch` to `git annex assistant`,
 | 
						|
but I think I'll leave the command name as-is. Some users might
 | 
						|
want a simple watcher and stager, without the assistant's other features
 | 
						|
like syncing and the webapp. So the next stage of the
 | 
						|
[[roadmap|design/assistant]] will be a different command that also runs
 | 
						|
`watch`.
 | 
						|
 | 
						|
At this point, I feel I'm done with the first phase of [[inotify]].
 | 
						|
It has a couple known bugs, but it's ready for brave beta testers to try.
 | 
						|
I trust it enough to be running it on my live data.
 |