fully specify the pointer file format

This format is designed to detect accidental appends, while having some
room for future expansion.

Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.

Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.

I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2022-02-23 14:20:31 -04:00
parent 649464619e
commit 67245ae00f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 113 additions and 9 deletions

View file

@ -0,0 +1,30 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2022-02-23T16:45:24Z"
content="""
I've now specified a format in [[internals/pointer_file]], which is
designed to allow detecting accidental appends.
And git-annex will now treat a pointer file that has been appeneded to as
not a pointer file any longer.
So, for example:
joey@darkstar:/tmp/r>echo oops >> foo
joey@darkstar:/tmp/r>cat foo
/annex/objects/SHA256E-s14169--bdcf6188db530bc3af79c898208ce2a56df6197f59b3872b03613a248ac8faf4
oops
joey@darkstar:/tmp/r>git add foo
joey@darkstar:/tmp/r>git diff --cached foo | tail -n 2
-/annex/objects/SHA256E-s14169--bdcf6188db530bc3af79c898208ce2a56df6197f59b3872b03613a248ac8faf4
+/annex/objects/SHA256E-s101--b7da3d6b0ad2f6a2a263e783e59efb60f2520f03bb36cea35a556a684b0d5c9d
Since the file is not a valid pointer file after being appended to,
git add does what it would do with any file, in this case adding the
content to the annex.
So at least it keeps the possibly large appeneded content out of git now.
I think that's the most important thing. Detecting and warning about
pointer files that are not valid due to appends should be easy from here.
"""]]