verify git sha from ciddb is in git repository

Fix bug importing a tree from a remote after git-annex forget has been
used, that could result in the imported tree mising git blobs.

In the unlikely situation where the ciddb contains a git sha that
is not in the git repository, this makes it just re-download the file from
the remote. Which should be no problem, since these are small files.

Sponsored-by: Dartmouth College's DANDI project
This commit is contained in:
Joey Hess 2026-01-02 12:19:50 -04:00
commit 69e6c4d024
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 19 additions and 2 deletions

View file

@ -982,14 +982,22 @@ importKeys remote importtreeconfig importcontent thirdpartypopulated importablec
ImportSubTree subdir _ ->
getTopFilePath subdir </> fromImportLocation loc
getcidkey cidmap db cid = liftIO $
getcidkey cidmap db cid = do
-- Avoiding querying the database when it's empty speeds up
-- the initial import.
if CIDDb.databaseIsEmpty db
l <- liftIO $ if CIDDb.databaseIsEmpty db
then getcidkeymap cidmap cid
else CIDDb.getContentIdentifierKeys db rs cid >>= \case
[] -> getcidkeymap cidmap cid
l -> return l
filterM validcidkey l
-- Guard against a content identifier containing a git sha that is
-- not present in the repository. It's possible that it's not,
-- when git-annex forget is used.
validcidkey k = case keyGitSha k of
Just sha -> isJust <$> catObjectMetaData sha
Nothing -> return True
getcidkeymap cidmap cid =
atomically $ maybeToList . M.lookup cid <$> readTVar cidmap

View file

@ -11,6 +11,8 @@ git-annex (10.20251216) UNRELEASED; urgency=medium
on network-multicast or network-info.
* stack.yaml: Update to lts-24.26.
* import: Fix display of some import errors.
* Fix bug importing a tree from a remote after git-annex forget has been
used, that could result in the imported tree mising git blobs.
-- Joey Hess <id@joeyh.name> Thu, 01 Jan 2026 12:20:29 -0400

View file

@ -0,0 +1,7 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2026-01-02T15:59:56Z"
content="""
I've made it deal with the `git-annex forget` scenario now.
"""]]