fix handling of linked worktrees on filesystems w/o symlinks

Fix bug in handling of linked worktrees on filesystems not supporting
symlinks, that caused annexed file content to be stored in the wrong
location inside the git directory, and also caused pointer files to not get
populated.

This parameterizes functions in Annex.Locations with a GitLocationMaker.
The uses of standardGitLocationMaker are in cases where the path returned
by a function should not change when in a linked worktree. For example,
gitAnnexLink uses standardGitLocationMaker because symlink targets should
always be to ".git/annex/objects" paths, even when in a linked worktree.
Hopefully I have gotten all uses of standardGitLocationMaker right.

This also assumes that all path construction to the annex directory
is done via the functions in Annex.Locations, and there is no other,
ad-hoc construction elsewhere. Thankfully, Annex.Locations has been around
since the beginning, and has been used consistently. I think.

---

In fixupUnusualRepos, when symlinks are supported, the .git file is replaced
with a symlink to the linked worktree git directory. And in that directory,
an "annex" symlink points to the main annex directory. In that case,
it's not necessary to set mainWorkTreePath. It would be ok to set it,
but not setting it in that case allows an optimisation of avoiding reading
the "commondir" file.

The change to make fixupUnusualRepos set mainWorkTreePath when the
repository is not initialized yet is done in case the initialization itself
writes to the annex directory. If that were the case, without setting
mainWorkTreePath, the annex symlink would not be set up yet, and so
it might have created the annex directory in the wrong place. Currently
that didn't happen, but now that mainWorkTreePath is available, using it
here avoids any such later problem.

---

This commit does not deal with the mess of a worktree that has
experienced this bug before. In particular, if `git-annex get` were
run in such a worktree, it would have stored the object files in the
linked worktree's git directory, rather than in the main git directory.
Such misplaced object files need to be dealt with; the plan is to make
git-annex fsck notice and fix them.

A worktree that has experienced this bug before will contain unpopulated
pointer files. Those may eventually get fixed up in regular usage of
git-annex, but git-annex fsck will also fix them up.

---

Finally, this has me pondering if all of git-annex's state files should
really be stored in one common place across all linked worktrees. Should
perhaps state files that are specific to the worktree be stored per-worktree?
That has not been the case when using git-annex on filesystems supporting
symlinks, but it *has* been the case on filesystems not supporting
symlinks. Perhaps this leads to some other buggy behavior in some cases.
Or perhaps to extra work being done.

For example, the keys database has an associated files table. Which depends
on the worktree. But reconcileStaged updates that table, so when git-annex
is used first in one worktree and then in another one, reconcileStaged will
update the table to reflect the current worktree. Which is extra work each
time a different worktree is used. But also, what if two git-annex
processes are running at the same time, in separate worktrees? Probably
this needs more thought and investigation.

So there is a risk that this commit exposes such buggy behavior in a
situation where it didn't happen before, due to the filesystem not
supporting symlinks. But, given how much this bug crippled using linked
worktrees in such a situation, I doubt that many people have been doing
that.
This commit is contained in:
Joey Hess 2025-07-14 13:20:39 -04:00
parent ed3d46a2de
commit 6a9e923c74
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
11 changed files with 140 additions and 68 deletions

View file

@ -69,9 +69,14 @@ fixupDirect r = r
- whether a repo is used as a submodule or not, and wheverever the
- submodule is mounted.
-
- git-worktree directories have a .git file.
- That needs to be converted to a symlink, and .git/annex made a symlink
- to the main repository's git-annex directory.
- git-worktree directories have a .git file which points to a different
- git directory than the main git directory. That needs to be converted to
- a symlink, and .git/annex made a symlink to the main repository's
- git-annex directory so that annex symlinks in the git repository point
- to the object files. When the filesystem does not support symlinks, the
- mainWorkTreePath of the repository is set, so that the git-annex
- directory of the main repository will still be used.
-
- The worktree shares git config with the main repository, so the same
- annex uuid and other configuration will be used in the worktree as in
- the main repository.
@ -85,11 +90,15 @@ fixupDirect r = r
- unlocked branches.
-
- Don't do any of this if the repo has not been initialized for git-annex
- use yet.
- use yet. Except, do set mainWorkTreePath.
-}
fixupUnusualRepos :: Repo -> GitConfig -> IO Repo
fixupUnusualRepos r@(Repo { location = l@(Local { worktree = Just w, gitdir = d }) }) c
| isNothing (annexVersion c) = return r
| isNothing (annexVersion c) =
ifM (needsGitLinkFixup r)
( setworktreepath r
, return r
)
| needsSubmoduleFixup r = do
when (coreSymlinks c) $
(replacedotgit >> unsetcoreworktree)
@ -100,11 +109,13 @@ fixupUnusualRepos r@(Repo { location = l@(Local { worktree = Just w, gitdir = d
}
| otherwise = ifM (needsGitLinkFixup r)
( do
when (coreSymlinks c) $
(replacedotgit >> worktreefixup)
`catchNonAsync` \e -> hPutStrLn stderr $
"warning: unable to convert .git file to symlink that will work with git-annex: " ++ show e
return r'
if coreSymlinks c
then do
(replacedotgit >> worktreefixup)
`catchNonAsync` \e -> hPutStrLn stderr $
"warning: unable to convert .git file to symlink that will work with git-annex: " ++ show e
setworktreepath r'
else setworktreepath r'
, return r
)
where
@ -118,20 +129,30 @@ fixupUnusualRepos r@(Repo { location = l@(Local { worktree = Just w, gitdir = d
-- Unsetting a config fails if it's not set, so ignore failure.
unsetcoreworktree = void $ Git.Config.unset "core.worktree" r
worktreefixup = do
-- git-worktree sets up a "commondir" file that contains
-- the path to the main git directory.
-- Using --separate-git-dir does not.
let commondirfile = fromOsPath (d </> literalOsPath "commondir")
catchDefaultIO Nothing (headMaybe . lines <$> readFile commondirfile) >>= \case
Just gd -> do
-- Make the worktree's git directory
-- contain an annex symlink to the main
-- repository's annex directory.
let linktarget = toOsPath gd </> literalOsPath "annex"
R.createSymbolicLink (fromOsPath linktarget) $
fromOsPath $ dotgit </> literalOsPath "annex"
Nothing -> return ()
-- git-worktree sets up a "commondir" file that contains
-- the path to the main git directory.
-- Using --separate-git-dir does not.
commondirfile = fromOsPath (d </> literalOsPath "commondir")
readcommondirfile = catchDefaultIO Nothing $
fmap toOsPath . headMaybe . lines
<$> readFile commondirfile
setworktreepath r'' = readcommondirfile >>= \case
Just gd -> return $ r''
{ mainWorkTreePath = Just gd
}
Nothing -> return r''
worktreefixup = readcommondirfile >>= \case
Just gd -> do
-- Make the worktree's git directory
-- contain an annex symlink to the main
-- repository's annex directory.
let linktarget = gd </> literalOsPath "annex"
R.createSymbolicLink (fromOsPath linktarget) $
fromOsPath $ dotgit </> literalOsPath "annex"
Nothing -> return ()
-- Repo adjusted, so that symlinks to objects that get checked
-- in will have the usual path, rather than pointing off to the

View file

@ -407,7 +407,8 @@ parseLinkTargetOrPointerLazy' b =
formatPointer :: Key -> S.ByteString
formatPointer k = fromOsPath prefix <> fromOsPath (keyFile k) <> nl
where
prefix = toInternalGitPath $ pathSeparator `OS.cons` objectDir
prefix = toInternalGitPath $
pathSeparator `OS.cons` objectDir standardGitLocationMaker
nl = S8.singleton '\n'
{- Maximum size of a file that could be a pointer to a key.
@ -475,7 +476,7 @@ isLinkToAnnex s = p `OS.isInfixOf` s'
#endif
where
s' = toOsPath s
p = pathSeparator `OS.cons` objectDir
p = pathSeparator `OS.cons` objectDir standardGitLocationMaker
#ifdef mingw32_HOST_OS
p' = toInternalGitPath p
#endif

View file

@ -1,6 +1,6 @@
{- git-annex file locations
-
- Copyright 2010-2024 Joey Hess <id@joeyh.name>
- Copyright 2010-2025 Joey Hess <id@joeyh.name>
-
- Licensed under the GNU AGPL version 3 or higher.
-}
@ -9,6 +9,9 @@
{-# LANGUAGE CPP #-}
module Annex.Locations (
GitLocationMaker(..),
standardGitLocationMaker,
repoGitLocationMaker,
keyFile,
fileKey,
keyPaths,
@ -136,6 +139,24 @@ import Git.FilePath
import Annex.DirHashes
import Annex.Fixup
{- When constructing a path that is usually relative to the
- .git directory, this can be used to relocate the path to
- elsewhere.
-
- This is used when in a linked git worktree, which has its own
- git directory, to make the git-annex directory be located in the
- git directory of the main worktree.
-}
newtype GitLocationMaker = GitLocationMaker (OsPath -> OsPath)
standardGitLocationMaker :: GitLocationMaker
standardGitLocationMaker = GitLocationMaker id
repoGitLocationMaker :: Git.Repo -> GitLocationMaker
repoGitLocationMaker r = case Git.mainWorkTreePath r of
Nothing -> standardGitLocationMaker
Just p -> GitLocationMaker (p </>)
{- Conventions:
-
- Functions ending in "Dir" should always return values ending with a
@ -151,13 +172,15 @@ import Annex.Fixup
{- The directory git annex uses for local state, relative to the .git
- directory -}
annexDir :: OsPath
annexDir = addTrailingPathSeparator (literalOsPath "annex")
annexDir :: GitLocationMaker -> OsPath
annexDir (GitLocationMaker glm) = addTrailingPathSeparator $
glm $ literalOsPath "annex"
{- The directory git annex uses for locally available object content,
- relative to the .git directory -}
objectDir :: OsPath
objectDir = addTrailingPathSeparator $ annexDir </> literalOsPath "objects"
objectDir :: GitLocationMaker -> OsPath
objectDir glm = addTrailingPathSeparator $
annexDir glm </> literalOsPath "objects"
{- Annexed file's possible locations relative to the .git directory
- in a non-bare repository.
@ -165,24 +188,26 @@ objectDir = addTrailingPathSeparator $ annexDir </> literalOsPath "objects"
- Normally it is hashDirMixed. However, it's always possible that a
- bare repository was converted to non-bare, or that the cripped
- filesystem setting changed, so still need to check both. -}
annexLocationsNonBare :: GitConfig -> Key -> [OsPath]
annexLocationsNonBare config key =
map (annexLocation config key) [hashDirMixed, hashDirLower]
annexLocationsNonBare :: GitLocationMaker -> GitConfig -> Key -> [OsPath]
annexLocationsNonBare glm config key =
map (annexLocation glm config key) [hashDirMixed, hashDirLower]
{- Annexed file's possible locations relative to a bare repository. -}
annexLocationsBare :: GitConfig -> Key -> [OsPath]
annexLocationsBare config key =
map (annexLocation config key) [hashDirLower, hashDirMixed]
annexLocationsBare :: GitLocationMaker -> GitConfig -> Key -> [OsPath]
annexLocationsBare glm config key =
map (annexLocation glm config key) [hashDirLower, hashDirMixed]
annexLocation :: GitConfig -> Key -> (HashLevels -> Hasher) -> OsPath
annexLocation config key hasher = objectDir </> keyPath key (hasher $ objectHashLevels config)
annexLocation :: GitLocationMaker -> GitConfig -> Key -> (HashLevels -> Hasher) -> OsPath
annexLocation glm config key hasher =
objectDir glm </> keyPath key (hasher $ objectHashLevels config)
{- For exportree remotes with annexobjects=true, objects are stored
- in this location as well as in the exported tree. -}
exportAnnexObjectLocation :: GitConfig -> Key -> ExportLocation
exportAnnexObjectLocation gc k =
mkExportLocation $
literalOsPath ".git" </> annexLocation gc k hashDirLower
literalOsPath ".git"
</> annexLocation standardGitLocationMaker gc k hashDirLower
{- Number of subdirectories from the gitAnnexObjectDir
- to the gitAnnexLocation. -}
@ -203,14 +228,17 @@ gitAnnexLocation :: Key -> Git.Repo -> GitConfig -> IO OsPath
gitAnnexLocation = gitAnnexLocation' doesPathExist
gitAnnexLocation' :: (OsPath -> IO Bool) -> Key -> Git.Repo -> GitConfig -> IO OsPath
gitAnnexLocation' checker key r config = gitAnnexLocation'' key r config
(annexCrippledFileSystem config)
(coreSymlinks config)
checker
(Git.localGitDir r)
gitAnnexLocation' checker key r config =
gitAnnexLocation'' key glm r config
(annexCrippledFileSystem config)
(coreSymlinks config)
checker
(Git.localGitDir r)
where
glm = repoGitLocationMaker r
gitAnnexLocation'' :: Key -> Git.Repo -> GitConfig -> Bool -> Bool -> (OsPath -> IO Bool) -> OsPath -> IO OsPath
gitAnnexLocation'' key r config crippled symlinkssupported checker gitdir
gitAnnexLocation'' :: Key -> GitLocationMaker -> Git.Repo -> GitConfig -> Bool -> Bool -> (OsPath -> IO Bool) -> OsPath -> IO OsPath
gitAnnexLocation'' key glm r config crippled symlinkssupported checker gitdir
{- Bare repositories default to hashDirLower for new
- content, as it's more portable. But check all locations. -}
| Git.repoIsLocalBare r = checkall annexLocationsBare
@ -225,8 +253,8 @@ gitAnnexLocation'' key r config crippled symlinkssupported checker gitdir
else checkall annexLocationsBare
| otherwise = checkall annexLocationsNonBare
where
only = return . inrepo . annexLocation config key
checkall f = check $ map inrepo $ f config key
only = return . inrepo . annexLocation glm config key
checkall f = check $ map inrepo $ f glm config key
inrepo d = gitdir </> d
check locs@(l:_) = fromMaybe l <$> firstM checker locs
@ -238,7 +266,7 @@ gitAnnexLink file key r config = do
currdir <- getCurrentDirectory
let absfile = absNormPathUnix currdir file
let gitdir = getgitdir currdir
loc <- gitAnnexLocation'' key r config False False (\_ -> return True) gitdir
loc <- gitAnnexLocation'' key standardGitLocationMaker r config False False (\_ -> return True) gitdir
toInternalGitPath <$> relPathDirToFile (parentDir absfile) loc
where
getgitdir currdir
@ -299,16 +327,22 @@ gitAnnexInodeSentinal :: Git.Repo -> OsPath
gitAnnexInodeSentinal r = gitAnnexDir r </> literalOsPath "sentinal"
gitAnnexInodeSentinalCache :: Git.Repo -> OsPath
gitAnnexInodeSentinalCache r = gitAnnexInodeSentinal r <> literalOsPath ".cache"
gitAnnexInodeSentinalCache r =
gitAnnexInodeSentinal r <> literalOsPath ".cache"
{- The annex directory of a repository. -}
gitAnnexDir :: Git.Repo -> OsPath
gitAnnexDir r = addTrailingPathSeparator $ Git.localGitDir r </> annexDir
gitAnnexDir r = addTrailingPathSeparator $
Git.localGitDir r </> annexDir glm
where
glm = repoGitLocationMaker r
{- The part of the annex directory where file contents are stored. -}
gitAnnexObjectDir :: Git.Repo -> OsPath
gitAnnexObjectDir r = addTrailingPathSeparator $
Git.localGitDir r </> objectDir
Git.localGitDir r </> objectDir glm
where
glm = repoGitLocationMaker r
{- .git/annex/tmp/ is used for temp files for key's contents -}
gitAnnexTmpObjectDir :: Git.Repo -> OsPath
@ -337,7 +371,8 @@ gitAnnexTmpWatcherDir r = addTrailingPathSeparator $
{- The temp file to use for a given key's content. -}
gitAnnexTmpObjectLocation :: Key -> Git.Repo -> OsPath
gitAnnexTmpObjectLocation key r = gitAnnexTmpObjectDir r </> keyFile key
gitAnnexTmpObjectLocation key r =
gitAnnexTmpObjectDir r </> keyFile key
{- Given a temp file such as gitAnnexTmpObjectLocation, makes a name for a
- subdirectory in the same location, that can be used as a work area
@ -373,13 +408,12 @@ gitAnnexKeysDbDir r c =
{- Lock file for the keys database. -}
gitAnnexKeysDbLock :: Git.Repo -> GitConfig -> OsPath
gitAnnexKeysDbLock r c = gitAnnexKeysDbDir r c <> literalOsPath ".lck"
gitAnnexKeysDbLock r c = gitAnnexKeysDbDir r c <> literalOsPath ".lck"
{- Contains the stat of the last index file that was
- reconciled with the keys database. -}
gitAnnexKeysDbIndexCache :: Git.Repo -> GitConfig -> OsPath
gitAnnexKeysDbIndexCache r c =
gitAnnexKeysDbDir r c <> literalOsPath ".cache"
gitAnnexKeysDbIndexCache r c = gitAnnexKeysDbDir r c <> literalOsPath ".cache"
{- .git/annex/fsck/uuid/ is used to store information about incremental
- fscks. -}
@ -392,19 +426,23 @@ gitAnnexFsckDir u r mc = case annexDbDir =<< mc of
{- used to store information about incremental fscks. -}
gitAnnexFsckState :: UUID -> Git.Repo -> OsPath
gitAnnexFsckState u r = gitAnnexFsckDir u r Nothing </> literalOsPath "state"
gitAnnexFsckState u r =
gitAnnexFsckDir u r Nothing </> literalOsPath "state"
{- Directory containing database used to record fsck info. -}
gitAnnexFsckDbDir :: UUID -> Git.Repo -> GitConfig -> OsPath
gitAnnexFsckDbDir u r c = gitAnnexFsckDir u r (Just c) </> literalOsPath "fsckdb"
gitAnnexFsckDbDir u r c =
gitAnnexFsckDir u r (Just c) </> literalOsPath "fsckdb"
{- Directory containing old database used to record fsck info. -}
gitAnnexFsckDbDirOld :: UUID -> Git.Repo -> GitConfig -> OsPath
gitAnnexFsckDbDirOld u r c = gitAnnexFsckDir u r (Just c) </> literalOsPath "db"
gitAnnexFsckDbDirOld u r c =
gitAnnexFsckDir u r (Just c) </> literalOsPath "db"
{- Lock file for the fsck database. -}
gitAnnexFsckDbLock :: UUID -> Git.Repo -> GitConfig -> OsPath
gitAnnexFsckDbLock u r c = gitAnnexFsckDir u r (Just c) </> literalOsPath "fsck.lck"
gitAnnexFsckDbLock u r c =
gitAnnexFsckDir u r (Just c) </> literalOsPath "fsck.lck"
{- .git/annex/fsckresults/uuid is used to store results of git fscks -}
gitAnnexFsckResultsLog :: UUID -> Git.Repo -> OsPath

View file

@ -133,7 +133,7 @@ repairStaleGitLocks r = do
repairStaleLocks lockfiles
return $ not $ null lockfiles
where
findgitfiles = dirContentsRecursiveSkipping (== dropTrailingPathSeparator annexDir) True . Git.localGitDir
findgitfiles = dirContentsRecursiveSkipping (== dropTrailingPathSeparator (annexDir standardGitLocationMaker)) True . Git.localGitDir
islock f
| literalOsPath "gc.pid" `OS.isInfixOf` f = False
| literalOsPath ".lock" `OS.isSuffixOf` f = True

View file

@ -2,6 +2,10 @@ git-annex (10.20250631) UNRELEASED; urgency=medium
* p2phttp: Scan multilevel directories with --directory.
* p2phttp: Added --socket option.
* Fix bug in handling of linked worktrees on filesystems not supporting
symlinks, that caused annexed file content to be stored in the wrong
location inside the git directory, and also caused pointer files to not
get populated.
-- Joey Hess <id@joeyh.name> Mon, 07 Jul 2025 15:59:42 -0400

View file

@ -1027,9 +1027,10 @@ keyExportLocations :: Remote -> Key -> GitConfig -> UUID -> Maybe [ExportLocatio
keyExportLocations rmt k cfg uuid
| exportTree (Remote.config rmt) || importTree (Remote.config rmt) =
Just $ map (\p -> mkExportLocation (literalOsPath ".git" </> p)) $
concatMap (`annexLocationsBare` k) cfgs
concatMap (`mkloc` k) cfgs
| otherwise = Nothing
where
mkloc = annexLocationsBare standardGitLocationMaker
-- When git-annex has not been initialized yet (eg, when cloning),
-- the Differences are unknown, so make a version of the GitConfig
-- with and without the OneLevelObjectHash difference.

View file

@ -365,7 +365,7 @@ reconcileStaged dbisnew qh = ifM notneeded
-- prefilter so that's ok.
, Param $ "-G" ++
fromOsPath (toInternalGitPath $
pathSeparator `OS.cons` objectDir)
pathSeparator `OS.cons` objectDir standardGitLocationMaker)
-- Disable rename detection.
, Param "--no-renames"
-- Avoid other complications.

View file

@ -304,5 +304,6 @@ newFrom l = Repo
, gitGlobalOpts = []
, gitDirSpecifiedExplicitly = False
, repoPathSpecifiedExplicitly = False
, mainWorkTreePath = Nothing
}

View file

@ -62,6 +62,9 @@ data Repo = Repo
-- -c safe.directory=* and -c safe.bareRepository=all
-- when using this repository.
, repoPathSpecifiedExplicitly :: Bool
-- When a Repo is a linked git worktree, this is the path
-- from its gitdir to the git directory of the main worktree.
, mainWorkTreePath :: Maybe OsPath
} deriving (Show, Eq, Ord)
type RepoConfig = M.Map ConfigKey ConfigValue

View file

@ -303,7 +303,7 @@ setupRepo gcryptid r
- which is needed for rsync of objects to it to work.
-}
rsyncsetup = Remote.Rsync.withRsyncScratchDir $ \tmp -> do
createAnnexDirectory (tmp </> objectDir)
createAnnexDirectory (tmp </> objectDir standardGitLocationMaker)
dummycfg <- liftIO dummyRemoteGitConfig
let (rsynctransport, rsyncurl, _) = rsyncTransport r dummycfg
let tmpconfig = fromOsPath $ tmp </> literalOsPath "config"
@ -466,7 +466,8 @@ checkKey' repo r rsyncopts accessmethod k
checkshell = Ssh.inAnnex repo k
gCryptTopDir :: Git.Repo -> OsPath
gCryptTopDir repo = toOsPath (Git.repoLocation repo) </> objectDir
gCryptTopDir repo =
toOsPath (Git.repoLocation repo) </> objectDir standardGitLocationMaker
{- Annexed objects are hashed using lower-case directories for max
- portability. -}

View file

@ -467,14 +467,16 @@ keyUrls gc repo r key = map tourl locs'
-- If the remote is known to not be bare, try the hash locations
-- used for non-bare repos first, as an optimisation.
locs
| remoteAnnexBare remoteconfig == Just False = annexLocationsNonBare gc key
| otherwise = annexLocationsBare gc key
| remoteAnnexBare remoteconfig == Just False =
annexLocationsNonBare glm gc key
| otherwise = annexLocationsBare glm gc key
#ifndef mingw32_HOST_OS
locs' = map fromOsPath locs
#else
locs' = map (replace "\\" "/" . fromOsPath) locs
#endif
remoteconfig = gitconfig r
glm = repoGitLocationMaker repo
dropKey :: Remote -> State -> Maybe SafeDropProof -> Key -> Annex ()
dropKey r st proof key = do