fix fileJournal

My ByteString rewrite oversimplified it, resulting in any _ in a journal
file turning into a / in the git-annex branch, which was often the wrong
filename, or sometimes (//) an invalid filename that git
refused to add.
This commit is contained in:
Joey Hess 2019-12-18 11:29:34 -04:00
parent 8ed171c69f
commit 3d38ec9585
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 21 additions and 26 deletions

View file

@ -100,29 +100,38 @@ journalDirty = do
`catchIO` (const $ doesDirectoryExist d)
{- Produces a filename to use in the journal for a file on the branch.
-
- The input filename is assumed to not contain any '_' character,
- since path separators are replaced with that.
-
- The journal typically won't have a lot of files in it, so the hashing
- used in the branch is not necessary, and all the files are put directly
- in the journal directory.
-}
journalFile :: RawFilePath -> Git.Repo -> RawFilePath
journalFile file repo = gitAnnexJournalDir' repo P.</> S.map mangle file
journalFile file repo = gitAnnexJournalDir' repo P.</> S.concatMap mangle file
where
mangle c
| P.isPathSeparator c = fromIntegral (ord '_')
| otherwise = c
| P.isPathSeparator c = S.singleton underscore
| c == underscore = S.pack [underscore, underscore]
| otherwise = S.singleton c
underscore = fromIntegral (ord '_')
{- Converts a journal file (relative to the journal dir) back to the
- filename on the branch. -}
fileJournal :: RawFilePath -> RawFilePath
fileJournal = S.map unmangle
fileJournal = go
where
unmangle c
| c == fromIntegral (ord '_') = P.pathSeparator
| otherwise = c
go b =
let (h, t) = S.break (== underscore) b
in h <> case S.uncons t of
Nothing -> t
Just (_u, t') -> case S.uncons t' of
Nothing -> t'
Just (w, t'')
| w == underscore ->
S.cons underscore (go t'')
| otherwise ->
S.cons P.pathSeparator (go t')
underscore = fromIntegral (ord '_')
{- Sentinal value, only produced by lockJournal; required
- as a parameter by things that need to ensure the journal is

View file

@ -11,24 +11,10 @@ than find so the improvement is not as large.
The `bs` branch is in a mergeable state now, but still needs work:
* There's a bug impacting WORM keys with / in the keyname.
The files stored in the git-annex branch used to have the `/` changed
to `_`, but on the bs branch that does not happen. git also outputs
a message about "Ignoring" the file.
Test case:
git config annex.backend WORM
git annex addurl http://localhost/~joey/index.html
Hmm, that prints out the Ignoring message, and the file does not get
written to the git-annex branch. But in my big repo, I saw the message
and saw a file in the branch, with `/` in its keyname. Earlier in the
branch, the same key used `_`. (Look for "36bfe385607b32c4d5150404c0" to
find it again.)
* Profile various commands and look for hot spots.
* ByteString.Char8.putStrLn may truncate?
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely
some places where a value is converted back and forth several times.