sqlite datbase for importfeed

importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
This commit is contained in:
Joey Hess 2023-10-23 16:12:26 -04:00
parent df4a60e28d
commit 8bde6101e3
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
13 changed files with 287 additions and 82 deletions

View file

@ -79,15 +79,15 @@ instance PersistField ContentIdentifier where
instance PersistFieldSql ContentIdentifier where
sqlType _ = SqlBlob
-- A serialized RawFilePath.
newtype SFilePath = SFilePath S.ByteString
-- A serialized bytestring.
newtype SByteString = SByteString S.ByteString
deriving (Eq, Show)
instance PersistField SFilePath where
toPersistValue (SFilePath b) = toPersistValue b
fromPersistValue v = SFilePath <$> fromPersistValue v
instance PersistField SByteString where
toPersistValue (SByteString b) = toPersistValue b
fromPersistValue v = SByteString <$> fromPersistValue v
instance PersistFieldSql SFilePath where
instance PersistFieldSql SByteString where
sqlType _ = SqlBlob
-- A serialized git Sha