oh wow, why didn't I think of this before?

This commit is contained in:
Joey Hess 2015-04-03 18:19:56 -04:00
parent 1d5bb9a0a7
commit 4e057c4771

View file

@ -0,0 +1,36 @@
Global fsck updates all location log entries for a repo. This wastes disk
space.
I realized now that it can be implemented w/o such waste. Probably cheaply
enough to be the default!
What we need is a new log file, call it fscktimes.log.
This records the time of the last fsck of each repo.
`git annex fsck --expire` no longer needs to look at the location log at
all. It can just check the repo's fscktimes.log entry. If the entry is
recent enough, we know that the repo has fscked recently, and its location
log is good, and nothing needs to be done. Otherwise, we know that the repo
has stopped fscking, and we simply expire *all* its location logs.
Note that fscktime.log is only used by fsck; it does not impact git-annex
generally or make it slower. And, it's very low overhead to update the one
file. Repos could do a fsck --fast on a daily basis and not grow the
git-annex branch much. Maybe on an hourly basis even.
(BTW, there is some overlap with the fsck.log file that is currently used to
hold the timestamp of the last local fsck. May be able to eliminate that
file too.)
----
It might be worth making the fsck.log record --fast and full fscks
separately so we know the last of each for each repo. This would let
--expire require periodic full fscks and more frequent fast fscks.
----
It would even be possible to make regular fsck check for expiry of other
repos. This would need the expiration values to be stored in the git-annex
branch. It would not add much overhead, but I don't know if I see any
reason to do that.