todo that I decided not to do, recorded for posterity
This commit is contained in:
parent
c76d44d7e1
commit
47c010155f
1 changed files with 53 additions and 0 deletions
53
doc/todo/fsverify.mdwn
Normal file
53
doc/todo/fsverify.mdwn
Normal file
|
@ -0,0 +1,53 @@
|
|||
git-annex could use linux's [fsverify](https://www.kernel.org/doc/html/latest/filesystems/fsverity.html)
|
||||
feature as an alternative to hashing and verifying hashes of files itself.
|
||||
|
||||
Benefits would include:
|
||||
|
||||
* Any read of an annexed file that uses fsverify would check the blocks
|
||||
that are read, and the read would fail if the file had gotten corrupted.
|
||||
* Avoiding any theoretical cases where `git-annex add` is hashing a file
|
||||
and something modifies it, causing the file to be added with the wrong
|
||||
hash (which `git-annex fsck` will later detect). The
|
||||
`FS_IOC_ENABLE_VERITY` ioctl prevents anything else from possibly
|
||||
modifying the file while it's hashing it.
|
||||
* Slightly faster git-annex fsck, because it would not need to hash
|
||||
verified files. It would suffice to read the file, and if it all read
|
||||
successfully, it's valid!
|
||||
|
||||
Since fsverify uses a merkle tree, its hashes are not the same as simply
|
||||
using SHA on the whole file. So for git-annex to use the fsverify hash as
|
||||
the key for the file, it would need to be a separate type of key. That's a
|
||||
bit problimatic because then git-annex would need a way to verify that
|
||||
merkle hash itself on systems that do not support fsverify. Also, for large
|
||||
files, the merkle tree can get relatively large (1/127th the size of the
|
||||
file the docs say). So with a terabyte of annexed files, that's gigabytes
|
||||
of merkle hashes, which seems too large to want to stote them in git.
|
||||
|
||||
Alternatively, git-annex could hash as usual for the key. This would mean
|
||||
that `git-annex add` would hash a file twice, once for the git-annex key
|
||||
and the second time calling the `FS_IOC_ENABLE_VERITY` ioctl. Slower, but
|
||||
perhaps these could parallelize and only use 2x the CPU or so.
|
||||
|
||||
Since fsverified files are readonly, this would only be useful for locked
|
||||
files. Unlocking a file would need to either remove the fsverify from it
|
||||
(if possible?) or copy it.
|
||||
|
||||
Using fsverify in this way would not work if the sysctl
|
||||
`fs.verity.require_signatures` is set, because the annexed files would
|
||||
not have signatures.
|
||||
|
||||
---
|
||||
|
||||
Putting all this together, fsverify is not too compelling for use by
|
||||
git-annex. A user who wants the verification on all reads of a file can
|
||||
just call `FS_IOC_ENABLE_VERITY` on it themselves after git-annex add.
|
||||
The annex.freezecontent-command hook could be used to to that.
|
||||
|
||||
Then the only benefit of supporting it in git-annex is that perhaps `git-annex
|
||||
add` could parallize enabling verification with checksumming, or avoid its
|
||||
own checksumming, and so run faster than if a hook were used to enable
|
||||
fsverify. And fsck would use less CPU. Is that worth complicating git-annex for?
|
||||
--[[Joey]]
|
||||
|
||||
> After investigating that, I currently don't think it's compelling, so I'm
|
||||
> gonna close this. [[done]] --[[Joey]]
|
Loading…
Reference in a new issue