comment and todo

This commit is contained in:
Joey Hess 2021-01-19 11:56:14 -04:00
parent 15d3ea5fe9
commit 2b458c2d68
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 43 additions and 0 deletions

View file

@ -0,0 +1,15 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2021-01-19T15:40:49Z"
content="""
I don't think linux would change inode numbers while the filesystem was
mounted. But each mount of FAT gets new inode numbers, since FAT doesn't
actually have them and linux makes them up. So the filesystem being
unmounted and remounted for some reason in between the two steps of the
import would explain the behavior.
Note that I've opened
[[todo/import_tree_from_FAT_does_unncessary_work_due_to_inode_instability]]
after thinking of some other consequences of this.
"""]]

View file

@ -0,0 +1,28 @@
When a FAT filesystem is unmounted and remounted, the inode numbers all
change. This makes import tree from a directory special remote on FAT
think the files have changed, and so it re-imports them. Since the content
is the unchanged, the unnecessary work that is done is limited to hashing
the file on the FAT filesystem. But that can be a lot of work when the tree
being imported has a lot of large files in it.
This makes import tree potentially much slower than the legacy import
interface (although that interface also re-hashes when used with
--duplicate/--skip-duplicates).
Also, the content identifier log gets another entry, with a content
identifier with the new inode number. So over time this can bloat the log.
May be better to omit the inode number from the content
identifier for such a filesystem, instead relying on size and mtime?
Although that would risk missing swaps of files with the same size and
mtime, that seems like an unlikely thing, and in any case git-annex would
import the data, and only miss the renaming of the files. It would also
miss modifications that don't change size and preserve the mtime; such
modifications are theoretically possible, but unlikely.
But how to detect when it's a FAT filesystem with this problem?
The method git-annex uses when running on a FAT filesystem, of maintaining
an inode sentinal file and checking it to tell when inodes have changed
would need importing to write to the drive. That seems strange, and the
drive could even be read-only. May be the directory special remote should
just not use inode numbers at all?