fully specify the pointer file format

This format is designed to detect accidental appends, while having some
room for future expansion.

Detect when an unlocked file whose content is not present has gotten some
other content appended to it, and avoid treating it as a pointer file, so
that appended content will not be checked into git, but will be annexed
like any other file.

Dropped the max size of a pointer file down to 32kb, it was around 80 kb,
but without any good reason and certianly there are no valid pointer files
anywhere that are larger than 8kb, because it's just been specified what it
means for a pointer file with additional data even looks like.

I assume 32kb will be good enough for anyone. ;-) Really though, it needs
to be some smallish number, because that much of a file in git gets read
into memory when eg, catting pointer files. And since we have no use cases
for the extra lines of a pointer file yet, except possibly to add
some human-visible explanation that it is a git-annex pointer file, 32k
seems as reasonable an arbitrary number as anything. Increasing it would be
possible, eg to 64k, as long as users of such jumbo pointer files didn't
mind upgrading all their git-annex installations to one that supports the
new larger size.

Sponsored-by: Dartmouth College's Datalad project
This commit is contained in:
Joey Hess 2022-02-23 14:20:31 -04:00
parent 649464619e
commit 67245ae00f
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
5 changed files with 113 additions and 9 deletions

View file

@ -295,17 +295,45 @@ unableToRestage mf = unwords
, "git update-index -q --refresh " ++ fromMaybe "<file>" mf
]
{- Parses a symlink target or a pointer file to a Key. -}
{- Parses a symlink target or a pointer file to a Key.
-
- Makes sure that the pointer file is valid, including not being longer
- than the maximum allowed size of a valid pointer file, and that any
- subsequent lines after the first contain the validPointerLineTag.
- If a valid pointer file gets some other data appended to it, it should
- never be considered valid, unless that data happened to itself be a
- valid pointer file.
-}
parseLinkTargetOrPointer :: S.ByteString -> Maybe Key
parseLinkTargetOrPointer = go . S8.takeWhile (not . lineend)
parseLinkTargetOrPointer b
| S.length b <= maxValidPointerSz =
let (firstline, rest) = S8.span (/= '\n') b
in case parsekey $ droptrailing '\r' firstline of
Just k | restvalid (dropleading '\n' rest) -> Just k
_ -> Nothing
| otherwise = Nothing
where
go l
parsekey l
| isLinkToAnnex l = fileKey $ snd $ S8.breakEnd pathsep l
| otherwise = Nothing
restvalid r
| S.null r = True
| otherwise =
let (l, r') = S8.span (/= '\n') r
in validPointerLineTag `S.isInfixOf` l
&& (not (S8.null r') && S8.head r' == '\n')
&& restvalid (S8.tail r')
dropleading c l
| S.null l = l
| S8.head l == c = S8.tail l
| otherwise = l
lineend '\n' = True
lineend '\r' = True
lineend _ = False
droptrailing c l
| S.null l = l
| S8.last l == c = S8.init l
| otherwise = l
pathsep '/' = True
#ifdef mingw32_HOST_OS
@ -332,9 +360,17 @@ formatPointer k = prefix <> keyFile k <> nl
-
- 8192 bytes is plenty for a pointer to a key. This adds some additional
- padding to allow for pointer files that have lines of additional data
- after the key. -}
- after the key.
-
- One additional byte is used to detect when a valid pointer file
- got something else appended to it.
-}
maxPointerSz :: Int
maxPointerSz = 81920
maxPointerSz = maxValidPointerSz + 1
{- Maximum size of a valid pointer files is 32kb. -}
maxValidPointerSz :: Int
maxValidPointerSz = 32768
maxSymlinkSz :: Int
maxSymlinkSz = 8192
@ -387,3 +423,7 @@ isLinkToAnnex s = p `S.isInfixOf` s
#ifdef mingw32_HOST_OS
p' = toInternalGitPath p
#endif
{- String that must appear on every line of a valid pointer file. -}
validPointerLineTag :: S.ByteString
validPointerLineTag = "/annex/"