fix fileJournal

My ByteString rewrite oversimplified it, resulting in any _ in a journal
file turning into a / in the git-annex branch, which was often the wrong
filename, or sometimes (//) an invalid filename that git
refused to add.
This commit is contained in:
Joey Hess 2019-12-18 11:29:34 -04:00
parent 8ed171c69f
commit 3d38ec9585
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
2 changed files with 21 additions and 26 deletions

View file

@ -100,29 +100,38 @@ journalDirty = do
`catchIO` (const $ doesDirectoryExist d) `catchIO` (const $ doesDirectoryExist d)
{- Produces a filename to use in the journal for a file on the branch. {- Produces a filename to use in the journal for a file on the branch.
-
- The input filename is assumed to not contain any '_' character,
- since path separators are replaced with that.
- -
- The journal typically won't have a lot of files in it, so the hashing - The journal typically won't have a lot of files in it, so the hashing
- used in the branch is not necessary, and all the files are put directly - used in the branch is not necessary, and all the files are put directly
- in the journal directory. - in the journal directory.
-} -}
journalFile :: RawFilePath -> Git.Repo -> RawFilePath journalFile :: RawFilePath -> Git.Repo -> RawFilePath
journalFile file repo = gitAnnexJournalDir' repo P.</> S.map mangle file journalFile file repo = gitAnnexJournalDir' repo P.</> S.concatMap mangle file
where where
mangle c mangle c
| P.isPathSeparator c = fromIntegral (ord '_') | P.isPathSeparator c = S.singleton underscore
| otherwise = c | c == underscore = S.pack [underscore, underscore]
| otherwise = S.singleton c
underscore = fromIntegral (ord '_')
{- Converts a journal file (relative to the journal dir) back to the {- Converts a journal file (relative to the journal dir) back to the
- filename on the branch. -} - filename on the branch. -}
fileJournal :: RawFilePath -> RawFilePath fileJournal :: RawFilePath -> RawFilePath
fileJournal = S.map unmangle fileJournal = go
where where
unmangle c go b =
| c == fromIntegral (ord '_') = P.pathSeparator let (h, t) = S.break (== underscore) b
| otherwise = c in h <> case S.uncons t of
Nothing -> t
Just (_u, t') -> case S.uncons t' of
Nothing -> t'
Just (w, t'')
| w == underscore ->
S.cons underscore (go t'')
| otherwise ->
S.cons P.pathSeparator (go t')
underscore = fromIntegral (ord '_')
{- Sentinal value, only produced by lockJournal; required {- Sentinal value, only produced by lockJournal; required
- as a parameter by things that need to ensure the journal is - as a parameter by things that need to ensure the journal is

View file

@ -11,24 +11,10 @@ than find so the improvement is not as large.
The `bs` branch is in a mergeable state now, but still needs work: The `bs` branch is in a mergeable state now, but still needs work:
* There's a bug impacting WORM keys with / in the keyname.
The files stored in the git-annex branch used to have the `/` changed
to `_`, but on the bs branch that does not happen. git also outputs
a message about "Ignoring" the file.
Test case:
git config annex.backend WORM
git annex addurl http://localhost/~joey/index.html
Hmm, that prints out the Ignoring message, and the file does not get
written to the git-annex branch. But in my big repo, I saw the message
and saw a file in the branch, with `/` in its keyname. Earlier in the
branch, the same key used `_`. (Look for "36bfe385607b32c4d5150404c0" to
find it again.)
* Profile various commands and look for hot spots. * Profile various commands and look for hot spots.
* ByteString.Char8.putStrLn may truncate?
* Eliminate all the fromRawFilePath, toRawFilePath, encodeBS, * Eliminate all the fromRawFilePath, toRawFilePath, encodeBS,
decodeBS conversions. Or at least most of them. There are likely decodeBS conversions. Or at least most of them. There are likely
some places where a value is converted back and forth several times. some places where a value is converted back and forth several times.