From 47c010155f3e7037db16c913a6b7cc3f7f494e52 Mon Sep 17 00:00:00 2001 From: Joey Hess Date: Tue, 14 Mar 2023 12:21:59 -0400 Subject: [PATCH] todo that I decided not to do, recorded for posterity --- doc/todo/fsverify.mdwn | 53 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 doc/todo/fsverify.mdwn diff --git a/doc/todo/fsverify.mdwn b/doc/todo/fsverify.mdwn new file mode 100644 index 0000000000..4c16044172 --- /dev/null +++ b/doc/todo/fsverify.mdwn @@ -0,0 +1,53 @@ +git-annex could use linux's [fsverify](https://www.kernel.org/doc/html/latest/filesystems/fsverity.html) +feature as an alternative to hashing and verifying hashes of files itself. + +Benefits would include: + +* Any read of an annexed file that uses fsverify would check the blocks + that are read, and the read would fail if the file had gotten corrupted. +* Avoiding any theoretical cases where `git-annex add` is hashing a file + and something modifies it, causing the file to be added with the wrong + hash (which `git-annex fsck` will later detect). The + `FS_IOC_ENABLE_VERITY` ioctl prevents anything else from possibly + modifying the file while it's hashing it. +* Slightly faster git-annex fsck, because it would not need to hash + verified files. It would suffice to read the file, and if it all read + successfully, it's valid! + +Since fsverify uses a merkle tree, its hashes are not the same as simply +using SHA on the whole file. So for git-annex to use the fsverify hash as +the key for the file, it would need to be a separate type of key. That's a +bit problimatic because then git-annex would need a way to verify that +merkle hash itself on systems that do not support fsverify. Also, for large +files, the merkle tree can get relatively large (1/127th the size of the +file the docs say). So with a terabyte of annexed files, that's gigabytes +of merkle hashes, which seems too large to want to stote them in git. + +Alternatively, git-annex could hash as usual for the key. This would mean +that `git-annex add` would hash a file twice, once for the git-annex key +and the second time calling the `FS_IOC_ENABLE_VERITY` ioctl. Slower, but +perhaps these could parallelize and only use 2x the CPU or so. + +Since fsverified files are readonly, this would only be useful for locked +files. Unlocking a file would need to either remove the fsverify from it +(if possible?) or copy it. + +Using fsverify in this way would not work if the sysctl +`fs.verity.require_signatures` is set, because the annexed files would +not have signatures. + +--- + +Putting all this together, fsverify is not too compelling for use by +git-annex. A user who wants the verification on all reads of a file can +just call `FS_IOC_ENABLE_VERITY` on it themselves after git-annex add. +The annex.freezecontent-command hook could be used to to that. + +Then the only benefit of supporting it in git-annex is that perhaps `git-annex +add` could parallize enabling verification with checksumming, or avoid its +own checksumming, and so run faster than if a hook were used to enable +fsverify. And fsck would use less CPU. Is that worth complicating git-annex for? +--[[Joey]] + +> After investigating that, I currently don't think it's compelling, so I'm +> gonna close this. [[done]] --[[Joey]]