cache one more log file for metadata

My worry was that a preferred content expression that matches on metadata
would have removed the location log from cache, causing an expensive
re-read when a Seek action later checked the location log.

Especially when the --all optimisation in the previous commit
pre-cached the location log.

This also means that the --all optimisation could cache the metadata log
too, if it wanted too, but not currently done.

The cache is a list, with the most recently accessed file first. That
optimises it for the common case of reading the same file twice, eg a
get, examine, followed by set reads it twice. And sync --content reads the
location log 3 times in a row commonly.

But, as a list, it should not be made to be too long. I thought about
expanding it to 5 items, but that seemed unlikely to be a win commonly
enough to outweigh the extra time spent checking the cache.

Clearly there could be some further benchmarking and tuning here.
This commit is contained in:
Joey Hess 2020-07-07 14:18:55 -04:00
parent d010ab04be
commit 9483b10469
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 35 additions and 20 deletions

21
Logs.hs
View file

@ -38,6 +38,26 @@ getLogVariety config f
| isMetaDataLog f || f `elem` otherLogs = Just OtherLog
| otherwise = PresenceLog <$> firstJust (presenceLogs config f)
{- Typical number of log files that may be read while processing a single
- key. This is used to size a cache.
-
- The location log is generally read, and the metadata log is read when
- matching a preferred content expression that matches on metadata,
- or when using metadata options.
-
- When using a remote, the url log, chunk log, remote state log, remote
- metadata log, and remote content identifier log might each be used,
- but probably at most 3 out of the 6. However, caching too much slows
- down all operations because the cache is a linear list, so the cache
- is not currently sized to include these.
-
- The result is that when seeking for files to operate on,
- the location log will stay in the cache if the metadata log is also
- read.
-}
logFilesToCache :: Int
logFilesToCache = 2
{- All the old-format uuid-based logs stored in the top of the git-annex branch. -}
topLevelOldUUIDBasedLogs :: [RawFilePath]
topLevelOldUUIDBasedLogs =
@ -59,7 +79,6 @@ topLevelNewUUIDBasedLogs =
[ exportLog
]
{- All the ways to get a key from a presence log file -}
presenceLogs :: GitConfig -> RawFilePath -> [Maybe Key]
presenceLogs config f =