sqlite datbase for importfeed

importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
This commit is contained in:
Joey Hess 2023-10-23 16:12:26 -04:00
parent df4a60e28d
commit 8bde6101e3
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 287 additions and 82 deletions

View file

@ -73,18 +73,18 @@ share [mkPersist sqlSettings, mkMigrate "migrateExport"] [persistLowerCase|
-- Files that have been exported to the remote and are present on it.
Exported
key Key
file SFilePath
file SByteString
ExportedIndex key file
-- Directories that exist on the remote, and the files that are in them.
ExportedDirectory
subdir SFilePath
file SFilePath
subdir SByteString
file SByteString
ExportedDirectoryIndex subdir file
-- The content of the tree that has been exported to the remote.
-- Not all of these files are necessarily present on the remote yet.
ExportTree
key Key
file SFilePath
file SByteString
ExportTreeKeyFileIndex key file
ExportTreeFileKeyIndex file key
-- The tree stored in ExportTree
@ -139,26 +139,26 @@ addExportedLocation :: ExportHandle -> Key -> ExportLocation -> IO ()
addExportedLocation h k el = queueDb h $ do
void $ insertUniqueFast $ Exported k ef
let edirs = map
(\ed -> ExportedDirectory (SFilePath (fromExportDirectory ed)) ef)
(\ed -> ExportedDirectory (SByteString (fromExportDirectory ed)) ef)
(exportDirectories el)
putMany edirs
where
ef = SFilePath (fromExportLocation el)
ef = SByteString (fromExportLocation el)
removeExportedLocation :: ExportHandle -> Key -> ExportLocation -> IO ()
removeExportedLocation h k el = queueDb h $ do
deleteWhere [ExportedKey ==. k, ExportedFile ==. ef]
let subdirs = map (SFilePath . fromExportDirectory)
let subdirs = map (SByteString . fromExportDirectory)
(exportDirectories el)
deleteWhere [ExportedDirectoryFile ==. ef, ExportedDirectorySubdir <-. subdirs]
where
ef = SFilePath (fromExportLocation el)
ef = SByteString (fromExportLocation el)
{- Note that this does not see recently queued changes. -}
getExportedLocation :: ExportHandle -> Key -> IO [ExportLocation]
getExportedLocation (ExportHandle h _) k = H.queryDbQueue h $ do
l <- selectList [ExportedKey ==. k] []
return $ map (mkExportLocation . (\(SFilePath f) -> f) . exportedFile . entityVal) l
return $ map (mkExportLocation . (\(SByteString f) -> f) . exportedFile . entityVal) l
{- Note that this does not see recently queued changes. -}
isExportDirectoryEmpty :: ExportHandle -> ExportDirectory -> IO Bool
@ -166,13 +166,13 @@ isExportDirectoryEmpty (ExportHandle h _) d = H.queryDbQueue h $ do
l <- selectList [ExportedDirectorySubdir ==. ed] []
return $ null l
where
ed = SFilePath $ fromExportDirectory d
ed = SByteString $ fromExportDirectory d
{- Get locations in the export that might contain a key. -}
getExportTree :: ExportHandle -> Key -> IO [ExportLocation]
getExportTree (ExportHandle h _) k = H.queryDbQueue h $ do
l <- selectList [ExportTreeKey ==. k] []
return $ map (mkExportLocation . (\(SFilePath f) -> f) . exportTreeFile . entityVal) l
return $ map (mkExportLocation . (\(SByteString f) -> f) . exportTreeFile . entityVal) l
{- Get keys that might be currently exported to a location.
-
@ -183,19 +183,19 @@ getExportTreeKey (ExportHandle h _) el = H.queryDbQueue h $ do
map (exportTreeKey . entityVal)
<$> selectList [ExportTreeFile ==. ef] []
where
ef = SFilePath (fromExportLocation el)
ef = SByteString (fromExportLocation el)
addExportTree :: ExportHandle -> Key -> ExportLocation -> IO ()
addExportTree h k loc = queueDb h $
void $ insertUniqueFast $ ExportTree k ef
where
ef = SFilePath (fromExportLocation loc)
ef = SByteString (fromExportLocation loc)
removeExportTree :: ExportHandle -> Key -> ExportLocation -> IO ()
removeExportTree h k loc = queueDb h $
deleteWhere [ExportTreeKey ==. k, ExportTreeFile ==. ef]
where
ef = SFilePath (fromExportLocation loc)
ef = SByteString (fromExportLocation loc)
-- An action that is passed the old and new values that were exported,
-- and updates state.