An earlier commit (mislabeled) made direct mode fsck check file checksums.
While it's expected for files to change at any time in direct mode, and so
fsck cannot complain every time there's a checksum mismatch, it is possible
for it to detect when a file does not *seem* to have changed, then check
its checksum, and so detect disk corruption or other problems.
This commit improves that, by checking a second time, if the checksum
fails, that the file is still not modified, before taking action. This way,
a direct mode file can be modified while being fscked.
Simplified it using existing functions.
I doubt setSticky needs to return the FileMode; if it does for some
reason, it can be changed to use modifyFileMode'
Converted isSticky to a pure function for consistency with isSymlink.
Note that the sticky bit of a file can be tested thus:
isSticky . fileMode <$> getFileStatus file
This commit includes a paydown on technical debt incurred two years ago,
when I didn't know that it was bad to make custom Read and Show instances
for types. As the routes need Read and Show for Transfer, which includes a
Key, and deriving my own Read instance of key was not practical,
I had to finally clean that up.
So the compact Key read and show functions are now file2key and key2file,
and Read and Show are now derived instances.
Changed all code that used the old instances, compiler checked.
(There were a few places, particularly in Command.Unused, and the test
suite where the Show instance continue to be used for legitimate
comparisons; ie show key_x == show key_y (though really in a bloom filter))
Test suite now passes with -threaded!
I traced back all the hangs with -threaded to System.Cmd.Utils. It seems
it's just crappy/unsafe/outdated, and should not be used. System.Process
seems to be the cool new thing, so converted all the code to use it
instead.
In the process, --debug stopped printing commands it runs. I may try to
bring that back later.
Note that even SafeSystem was switched to use System.Process. Since that
was a modified version of code from System.Cmd.Utils, it needed to be
converted too. I also got rid of nearly all calls to forkProcess,
and all calls to executeFile, which I'm also doubtful about working
well with -threaded.
In order to record a semi-useful filename associated with the key,
this required plumbing the filename all the way through to the remotes'
storeKey and retrieveKeyFile.
Note that there is potential for deadlock here, narrowly avoided.
Suppose the repos are A and B. A sends file foo to B, and at the same
time, B gets file foo from A. So, A locks its upload transfer info file,
and then locks B's download transfer info file. At the same time,
B is taking the two locks in the opposite order. This is only not a
deadlock because the lock code does not wait, and aborts. So one of A or
B's transfers will be aborted and the other transfer will continue.
Whew!
While I was in there, I noticed and fixed a bug in the queue size
calculations. It was never encountered only because Queue.add was
only ever run with 1 file in the list.
This is incomplete, it does not honor it yet for hash directories
and other annex bookkeeping files. Some of that is not needed for a bare
repo; some of it may be.
Now gitattributes are looked up, efficiently, in only the places that
really need them, using the same approach used for cat-file.
The old CheckAttr code seemed very fragile, in the way it streamed files
through git check-attr.
I actually found that cad8824852
was still deadlocking with ghc 7.4, at the end of adding a lot of files.
This should fix that problem, and avoid future ones.
The best part is that this removes withAttrFilesInGit and withNumCopies,
which were complicated Seek methods, as well as simplfying the types
for several other Seek methods that had a Backend tupled in.
Fscking a remote is now supported. It's done by retrieving
the contents of the specified files from the remote, and checking them,
so can be an expensive operation.
(Several optimisations are possible, to speed it up, of course.. This is
the slow and stupid remote fsck to start with.)
Still, if the remote is a special remote, or a git repository that you
cannot run fsck in locally, it's nice to have the ability to fsck it.
If you have any directory special remotes, now would be a good time to
fsck them, in case you were hit by the data loss bug fixed in the
previous release!
Supporting multiple directory hash types will allow converting to a
different one, without a flag day.
gitAnnexLocation now checks which of the possible locations have a file.
This means more statting of files. Several places currently use
gitAnnexLocation and immediately check if the returned file exists;
those need to be optimised.
Many functions took the repo as their first parameter. Changing it
consistently to be the last parameter allows doing some useful things with
currying, that reduce boilerplate.
In particular, g <- gitRepo is almost never needed now, instead
use inRepo to run an IO action in the repo, and fromRepo to get
a value from the repo.
This also provides more opportunities to use monadic and applicative
combinators.
Checks location log information, and file contents.
Does not check that numcopies is satisfied, as .gitattributes information
about numcopies is not available in a bare repository. In practice, that
should not be a problem, since fsck is also run in a checkout and will
check numcopies there.
This new approach allows filtering out checks from the default set that are
not appropriate for a command, rather than having to list every check
that is appropriate. It also reduces some boilerplate.
Haskell does not define Eq for functions, so I had to go a long way around
with each check having a unique id. Meh.
The only remaining vestiage of backends is different types of keys. These
are still called "backends", mostly to avoid needing to change user interface
and configuration. But everything to do with storing keys in different
backends was gone; instead different types of remotes are used.
In the refactoring, lots of code was moved out of odd corners like
Backend.File, to closer to where it's used, like Command.Drop and
Command.Fsck. Quite a lot of dead code was removed. Several data structures
became simpler, which may result in better runtime efficiency. There should
be no user-visible changes.
This was a real PITA to fix, since location logs can be staged in
both the current repo, as well as in local remote's repos, in
which case the cwd will not be in the repo. And git add needs different
params in both cases, when absolute paths are not used.
In passing, git annex fsck now stages location log fixes.
* fsck: Check if annex.numcopies is satisfied.
* fsck: Verify the sha1 of files when the SHA1 backend is used.
* fsck: Verify the size of files when the WORM backend is used.
* fsck: Allow specifying individual files to fsk if fscking everything
is not desired.
* fsck: Fix bug, introduced in 0.04, in detection of unused data.