todo that I decided not to do, recorded for posterity

This commit is contained in:
Joey Hess 2023-03-14 12:21:59 -04:00
parent c76d44d7e1
commit 47c010155f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38

53
doc/todo/fsverify.mdwn Normal file
View file

@ -0,0 +1,53 @@
git-annex could use linux's [fsverify](https://www.kernel.org/doc/html/latest/filesystems/fsverity.html)
feature as an alternative to hashing and verifying hashes of files itself.
Benefits would include:
* Any read of an annexed file that uses fsverify would check the blocks
that are read, and the read would fail if the file had gotten corrupted.
* Avoiding any theoretical cases where `git-annex add` is hashing a file
and something modifies it, causing the file to be added with the wrong
hash (which `git-annex fsck` will later detect). The
`FS_IOC_ENABLE_VERITY` ioctl prevents anything else from possibly
modifying the file while it's hashing it.
* Slightly faster git-annex fsck, because it would not need to hash
verified files. It would suffice to read the file, and if it all read
successfully, it's valid!
Since fsverify uses a merkle tree, its hashes are not the same as simply
using SHA on the whole file. So for git-annex to use the fsverify hash as
the key for the file, it would need to be a separate type of key. That's a
bit problimatic because then git-annex would need a way to verify that
merkle hash itself on systems that do not support fsverify. Also, for large
files, the merkle tree can get relatively large (1/127th the size of the
file the docs say). So with a terabyte of annexed files, that's gigabytes
of merkle hashes, which seems too large to want to stote them in git.
Alternatively, git-annex could hash as usual for the key. This would mean
that `git-annex add` would hash a file twice, once for the git-annex key
and the second time calling the `FS_IOC_ENABLE_VERITY` ioctl. Slower, but
perhaps these could parallelize and only use 2x the CPU or so.
Since fsverified files are readonly, this would only be useful for locked
files. Unlocking a file would need to either remove the fsverify from it
(if possible?) or copy it.
Using fsverify in this way would not work if the sysctl
`fs.verity.require_signatures` is set, because the annexed files would
not have signatures.
---
Putting all this together, fsverify is not too compelling for use by
git-annex. A user who wants the verification on all reads of a file can
just call `FS_IOC_ENABLE_VERITY` on it themselves after git-annex add.
The annex.freezecontent-command hook could be used to to that.
Then the only benefit of supporting it in git-annex is that perhaps `git-annex
add` could parallize enabling verification with checksumming, or avoid its
own checksumming, and so run faster than if a hook were used to enable
fsverify. And fsck would use less CPU. Is that worth complicating git-annex for?
--[[Joey]]
> After investigating that, I currently don't think it's compelling, so I'm
> gonna close this. [[done]] --[[Joey]]