git-annex

mirrors/git-annex

Fork 0

Commit graph

Author	SHA1	Message	Date
Joey Hess	c9866d2164	speed up populating the importfeed database Avoid conversion from ByteString to String for urls that will just be converted right back to ByteString to go into the database. Also setTempUrl is not used by importfeed, so avoid checking for temp urls in this code path. This benchmarks as only a small improvement. From 2.99s to 2.78s when populating a database with 33k urls. Note that it does not seem worth replacing URLString with URLByteString generally, because the ways urls are used all entails either parseURI, which takes a string, or passing a parameter to eg curl, which also is currently a string. Sponsored-by: Leon Schuermann on Patreon	2023-10-25 13:00:17 -04:00
Joey Hess	8bde6101e3	sqlite datbase for importfeed importfeed: Use caching database to avoid needing to list urls on every run, and avoid using too much memory. Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster, and memory use dropped from 203000k to 59408k. Database.ImportFeed is Database.ContentIdentifier with the serial number filed off. There is a bit of code duplication I would like to avoid, particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use the persistent sqlite tables, so despite the code being the same, they cannot be factored out. Since this database includes the contentidentifier metadata, it will be slightly redundant if a sqlite database is ever added for metadata. I did consider making such a generic database and using it for this. But, that would then need importfeed to update both the url database and the metadata database, which is twice as much work diffing the git-annex branch trees. Or would entagle updating two databases in a complex way. So instead it seems better to optimise the database that importfeed needs, and if the metadata database is used by another command, use a little more disk space and do a little bit of redundant work to update it. Sponsored-by: unqueued on Patreon	2023-10-23 16:46:22 -04:00

Author

SHA1

Message

Date

Joey Hess

c9866d2164

speed up populating the importfeed database

Avoid conversion from ByteString to String for urls that will just be
converted right back to ByteString to go into the database.

Also setTempUrl is not used by importfeed, so avoid checking for temp
urls in this code path.

This benchmarks as only a small improvement. From 2.99s to 2.78s
when populating a database with 33k urls.

Note that it does not seem worth replacing URLString with URLByteString
generally, because the ways urls are used all entails either parseURI,
which takes a string, or passing a parameter to eg curl, which also is
currently a string.

Sponsored-by: Leon Schuermann on Patreon

2023-10-25 13:00:17 -04:00

Joey Hess

8bde6101e3

sqlite datbase for importfeed

importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon

2023-10-23 16:46:22 -04:00

2 commits