back out incorrect IO interleaving change
Fix regression in last release that crashes when using --all or running git-annex in a bare repository. May have also affected git-annex unused and git-annex info. Reversed the order of the (++) in Annex.Branch.files so --all will stream lazily still when there are not a bunch of uncommitted journal files. Added a todo to maybe improve this later. This commit was sponsored by Trenton Cronholm on Patreon.
This commit is contained in:
parent
348c6d56f9
commit
d1961e4498
6 changed files with 65 additions and 15 deletions
|
@ -34,7 +34,6 @@ import qualified Data.Map as M
|
||||||
import Data.Function
|
import Data.Function
|
||||||
import Data.Char
|
import Data.Char
|
||||||
import Control.Concurrent (threadDelay)
|
import Control.Concurrent (threadDelay)
|
||||||
import System.IO.Unsafe (unsafeInterleaveIO)
|
|
||||||
|
|
||||||
import Annex.Common
|
import Annex.Common
|
||||||
import Annex.BranchState
|
import Annex.BranchState
|
||||||
|
@ -335,23 +334,16 @@ commitIndex' jl branchref message basemessage retrynum parents = do
|
||||||
commitIndex' jl committedref racemessage basemessage retrynum' [committedref]
|
commitIndex' jl committedref racemessage basemessage retrynum' [committedref]
|
||||||
|
|
||||||
{- Lists all files on the branch. including ones in the journal
|
{- Lists all files on the branch. including ones in the journal
|
||||||
- that have not been committed yet. There may be duplicates in the list.
|
- that have not been committed yet. There may be duplicates in the list. -}
|
||||||
- Streams lazily. -}
|
|
||||||
files :: Annex [FilePath]
|
files :: Annex [FilePath]
|
||||||
files = do
|
files = do
|
||||||
update
|
update
|
||||||
withIndex $ do
|
-- ++ forces the content of the first list to be buffered in memory,
|
||||||
g <- gitRepo
|
-- so use getJournalledFilesStale which should be much smaller most
|
||||||
withJournalHandle (go g)
|
-- of the time. branchFiles will stream as the list is consumed.
|
||||||
where
|
(++)
|
||||||
go g jh = readDirectory jh >>= \case
|
<$> getJournalledFilesStale
|
||||||
Nothing -> branchFiles' g
|
<*> branchFiles
|
||||||
Just file
|
|
||||||
| dirCruft file -> go g jh
|
|
||||||
| otherwise -> do
|
|
||||||
let branchfile = fileJournal file
|
|
||||||
rest <- unsafeInterleaveIO (go g jh)
|
|
||||||
return (branchfile:rest)
|
|
||||||
|
|
||||||
{- Files in the branch, not including any from journalled changes,
|
{- Files in the branch, not including any from journalled changes,
|
||||||
- and without updating the branch. -}
|
- and without updating the branch. -}
|
||||||
|
|
|
@ -55,6 +55,16 @@ getJournalFileStale :: FilePath -> Annex (Maybe String)
|
||||||
getJournalFileStale file = inRepo $ \g -> catchMaybeIO $
|
getJournalFileStale file = inRepo $ \g -> catchMaybeIO $
|
||||||
readFileStrict $ journalFile file g
|
readFileStrict $ journalFile file g
|
||||||
|
|
||||||
|
{- List of existing journal files, but without locking, may miss new ones
|
||||||
|
- just being added, or may have false positives if the journal is staged
|
||||||
|
- as it is run. -}
|
||||||
|
getJournalledFilesStale :: Annex [FilePath]
|
||||||
|
getJournalledFilesStale = do
|
||||||
|
g <- gitRepo
|
||||||
|
fs <- liftIO $ catchDefaultIO [] $
|
||||||
|
getDirectoryContents $ gitAnnexJournalDir g
|
||||||
|
return $ filter (`notElem` [".", ".."]) $ map fileJournal fs
|
||||||
|
|
||||||
withJournalHandle :: (DirectoryHandle -> IO a) -> Annex a
|
withJournalHandle :: (DirectoryHandle -> IO a) -> Annex a
|
||||||
withJournalHandle a = do
|
withJournalHandle a = do
|
||||||
d <- fromRepo gitAnnexJournalDir
|
d <- fromRepo gitAnnexJournalDir
|
||||||
|
|
|
@ -1,3 +1,11 @@
|
||||||
|
git-annex (6.20180428) UNRELEASED; urgency=medium
|
||||||
|
|
||||||
|
* Fix regression in last release that crashes when using
|
||||||
|
--all or running git-annex in a bare repository. May have also
|
||||||
|
affected git-annex unused and git-annex info.
|
||||||
|
|
||||||
|
-- Joey Hess <id@joeyh.name> Tue, 08 May 2018 13:51:37 -0400
|
||||||
|
|
||||||
git-annex (6.20180427) upstream; urgency=medium
|
git-annex (6.20180427) upstream; urgency=medium
|
||||||
|
|
||||||
* move: Now takes numcopies configuration, and required content
|
* move: Now takes numcopies configuration, and required content
|
||||||
|
|
|
@ -101,3 +101,5 @@ error: git-annex died of signal 11
|
||||||
which also resolves with downgrade of git-annex.
|
which also resolves with downgrade of git-annex.
|
||||||
|
|
||||||
[[!meta author=yoh]]
|
[[!meta author=yoh]]
|
||||||
|
|
||||||
|
> [[fixed|done]] --[[Joey]]
|
||||||
|
|
|
@ -0,0 +1,24 @@
|
||||||
|
[[!comment format=mdwn
|
||||||
|
username="joey"
|
||||||
|
subject="""comment 1"""
|
||||||
|
date="2018-05-08T17:09:35Z"
|
||||||
|
content="""
|
||||||
|
Neither git rev 6316072f0 nor 86b18966f is present in my git-annex git repo,
|
||||||
|
which makes it kind of hard to look at the diff between whatever versions
|
||||||
|
of git-annex you're running. Are you building out of a patched git
|
||||||
|
repository?
|
||||||
|
|
||||||
|
My guess is [[!commit bea0ad220a68138dc0a43d0c86bb2353ecf92d2c]]
|
||||||
|
since it involves both directory reading and unsafeInterleaveIO.
|
||||||
|
|
||||||
|
Indeed, it looks like that could defer a readDirStream due to
|
||||||
|
unsafeInterleaveIO to outside the open/close bracket. And I suppose reading
|
||||||
|
from a closed file handle might conceivably segfault.
|
||||||
|
|
||||||
|
Looks like that would affect bare repositories, or using --all.
|
||||||
|
I don't think it would affect those commands in a non-bare repository.
|
||||||
|
Ah, I see youre whereis is indeed in a bare repository, and your drop
|
||||||
|
used --all, so my analysis seems right.
|
||||||
|
|
||||||
|
Fixed by backing out the problematic part of the commit.
|
||||||
|
"""]]
|
14
doc/todo/improve_memory_usage_of_--all.mdwn
Normal file
14
doc/todo/improve_memory_usage_of_--all.mdwn
Normal file
|
@ -0,0 +1,14 @@
|
||||||
|
Using --all, or running in a bare repo, as well as
|
||||||
|
`git annex unused` and `git annex info` all end up buffering the list of
|
||||||
|
all keys that have uncommitted journalled changes in memory.
|
||||||
|
This is due to Annex.Branch.files's call to getJournalledFilesStale which
|
||||||
|
reads all the files in the directory into a buffer.
|
||||||
|
|
||||||
|
Note that the list of keys in the branch *does* stream in, so this
|
||||||
|
is only really a problem when using annex.alwayscommit=false to build
|
||||||
|
up big git-annex branch commits via the journal.
|
||||||
|
|
||||||
|
An attempt at making it stream via unsafeInterleaveIO failed miserably
|
||||||
|
and that is not the right approach. This would be a good place to use
|
||||||
|
ResourceT, but it might need some changes to the Annex monad to allow
|
||||||
|
combining the two. --[[Joey]]
|
Loading…
Reference in a new issue