54 lines
2.7 KiB
Text
54 lines
2.7 KiB
Text
|
git-annex could use linux's [fsverify](https://www.kernel.org/doc/html/latest/filesystems/fsverity.html)
|
||
|
feature as an alternative to hashing and verifying hashes of files itself.
|
||
|
|
||
|
Benefits would include:
|
||
|
|
||
|
* Any read of an annexed file that uses fsverify would check the blocks
|
||
|
that are read, and the read would fail if the file had gotten corrupted.
|
||
|
* Avoiding any theoretical cases where `git-annex add` is hashing a file
|
||
|
and something modifies it, causing the file to be added with the wrong
|
||
|
hash (which `git-annex fsck` will later detect). The
|
||
|
`FS_IOC_ENABLE_VERITY` ioctl prevents anything else from possibly
|
||
|
modifying the file while it's hashing it.
|
||
|
* Slightly faster git-annex fsck, because it would not need to hash
|
||
|
verified files. It would suffice to read the file, and if it all read
|
||
|
successfully, it's valid!
|
||
|
|
||
|
Since fsverify uses a merkle tree, its hashes are not the same as simply
|
||
|
using SHA on the whole file. So for git-annex to use the fsverify hash as
|
||
|
the key for the file, it would need to be a separate type of key. That's a
|
||
|
bit problimatic because then git-annex would need a way to verify that
|
||
|
merkle hash itself on systems that do not support fsverify. Also, for large
|
||
|
files, the merkle tree can get relatively large (1/127th the size of the
|
||
|
file the docs say). So with a terabyte of annexed files, that's gigabytes
|
||
|
of merkle hashes, which seems too large to want to stote them in git.
|
||
|
|
||
|
Alternatively, git-annex could hash as usual for the key. This would mean
|
||
|
that `git-annex add` would hash a file twice, once for the git-annex key
|
||
|
and the second time calling the `FS_IOC_ENABLE_VERITY` ioctl. Slower, but
|
||
|
perhaps these could parallelize and only use 2x the CPU or so.
|
||
|
|
||
|
Since fsverified files are readonly, this would only be useful for locked
|
||
|
files. Unlocking a file would need to either remove the fsverify from it
|
||
|
(if possible?) or copy it.
|
||
|
|
||
|
Using fsverify in this way would not work if the sysctl
|
||
|
`fs.verity.require_signatures` is set, because the annexed files would
|
||
|
not have signatures.
|
||
|
|
||
|
---
|
||
|
|
||
|
Putting all this together, fsverify is not too compelling for use by
|
||
|
git-annex. A user who wants the verification on all reads of a file can
|
||
|
just call `FS_IOC_ENABLE_VERITY` on it themselves after git-annex add.
|
||
|
The annex.freezecontent-command hook could be used to to that.
|
||
|
|
||
|
Then the only benefit of supporting it in git-annex is that perhaps `git-annex
|
||
|
add` could parallize enabling verification with checksumming, or avoid its
|
||
|
own checksumming, and so run faster than if a hook were used to enable
|
||
|
fsverify. And fsck would use less CPU. Is that worth complicating git-annex for?
|
||
|
--[[Joey]]
|
||
|
|
||
|
> After investigating that, I currently don't think it's compelling, so I'm
|
||
|
> gonna close this. [[done]] --[[Joey]]
|