back out incorrect IO interleaving change

Fix regression in last release that crashes when using --all or running
git-annex in a bare repository. May have also affected git-annex unused and
git-annex info.

Reversed the order of the (++) in Annex.Branch.files so --all will stream
lazily still when there are not a bunch of uncommitted journal files.
Added a todo to maybe improve this later.

This commit was sponsored by Trenton Cronholm on Patreon.
This commit is contained in:
Joey Hess 2018-05-08 13:54:42 -04:00
parent 348c6d56f9
commit d1961e4498
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
6 changed files with 65 additions and 15 deletions

View file

@ -34,7 +34,6 @@ import qualified Data.Map as M
import Data.Function import Data.Function
import Data.Char import Data.Char
import Control.Concurrent (threadDelay) import Control.Concurrent (threadDelay)
import System.IO.Unsafe (unsafeInterleaveIO)
import Annex.Common import Annex.Common
import Annex.BranchState import Annex.BranchState
@ -335,23 +334,16 @@ commitIndex' jl branchref message basemessage retrynum parents = do
commitIndex' jl committedref racemessage basemessage retrynum' [committedref] commitIndex' jl committedref racemessage basemessage retrynum' [committedref]
{- Lists all files on the branch. including ones in the journal {- Lists all files on the branch. including ones in the journal
- that have not been committed yet. There may be duplicates in the list. - that have not been committed yet. There may be duplicates in the list. -}
- Streams lazily. -}
files :: Annex [FilePath] files :: Annex [FilePath]
files = do files = do
update update
withIndex $ do -- ++ forces the content of the first list to be buffered in memory,
g <- gitRepo -- so use getJournalledFilesStale which should be much smaller most
withJournalHandle (go g) -- of the time. branchFiles will stream as the list is consumed.
where (++)
go g jh = readDirectory jh >>= \case <$> getJournalledFilesStale
Nothing -> branchFiles' g <*> branchFiles
Just file
| dirCruft file -> go g jh
| otherwise -> do
let branchfile = fileJournal file
rest <- unsafeInterleaveIO (go g jh)
return (branchfile:rest)
{- Files in the branch, not including any from journalled changes, {- Files in the branch, not including any from journalled changes,
- and without updating the branch. -} - and without updating the branch. -}

View file

@ -55,6 +55,16 @@ getJournalFileStale :: FilePath -> Annex (Maybe String)
getJournalFileStale file = inRepo $ \g -> catchMaybeIO $ getJournalFileStale file = inRepo $ \g -> catchMaybeIO $
readFileStrict $ journalFile file g readFileStrict $ journalFile file g
{- List of existing journal files, but without locking, may miss new ones
- just being added, or may have false positives if the journal is staged
- as it is run. -}
getJournalledFilesStale :: Annex [FilePath]
getJournalledFilesStale = do
g <- gitRepo
fs <- liftIO $ catchDefaultIO [] $
getDirectoryContents $ gitAnnexJournalDir g
return $ filter (`notElem` [".", ".."]) $ map fileJournal fs
withJournalHandle :: (DirectoryHandle -> IO a) -> Annex a withJournalHandle :: (DirectoryHandle -> IO a) -> Annex a
withJournalHandle a = do withJournalHandle a = do
d <- fromRepo gitAnnexJournalDir d <- fromRepo gitAnnexJournalDir

View file

@ -1,3 +1,11 @@
git-annex (6.20180428) UNRELEASED; urgency=medium
* Fix regression in last release that crashes when using
--all or running git-annex in a bare repository. May have also
affected git-annex unused and git-annex info.
-- Joey Hess <id@joeyh.name> Tue, 08 May 2018 13:51:37 -0400
git-annex (6.20180427) upstream; urgency=medium git-annex (6.20180427) upstream; urgency=medium
* move: Now takes numcopies configuration, and required content * move: Now takes numcopies configuration, and required content

View file

@ -101,3 +101,5 @@ error: git-annex died of signal 11
which also resolves with downgrade of git-annex. which also resolves with downgrade of git-annex.
[[!meta author=yoh]] [[!meta author=yoh]]
> [[fixed|done]] --[[Joey]]

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-05-08T17:09:35Z"
content="""
Neither git rev 6316072f0 nor 86b18966f is present in my git-annex git repo,
which makes it kind of hard to look at the diff between whatever versions
of git-annex you're running. Are you building out of a patched git
repository?
My guess is [[!commit bea0ad220a68138dc0a43d0c86bb2353ecf92d2c]]
since it involves both directory reading and unsafeInterleaveIO.
Indeed, it looks like that could defer a readDirStream due to
unsafeInterleaveIO to outside the open/close bracket. And I suppose reading
from a closed file handle might conceivably segfault.
Looks like that would affect bare repositories, or using --all.
I don't think it would affect those commands in a non-bare repository.
Ah, I see youre whereis is indeed in a bare repository, and your drop
used --all, so my analysis seems right.
Fixed by backing out the problematic part of the commit.
"""]]

View file

@ -0,0 +1,14 @@
Using --all, or running in a bare repo, as well as
`git annex unused` and `git annex info` all end up buffering the list of
all keys that have uncommitted journalled changes in memory.
This is due to Annex.Branch.files's call to getJournalledFilesStale which
reads all the files in the directory into a buffer.
Note that the list of keys in the branch *does* stream in, so this
is only really a problem when using annex.alwayscommit=false to build
up big git-annex branch commits via the journal.
An attempt at making it stream via unsafeInterleaveIO failed miserably
and that is not the right approach. This would be a good place to use
ResourceT, but it might need some changes to the Annex monad to allow
combining the two. --[[Joey]]