git-annex/doc/design
Joey Hess 8bde6101e3
sqlite datbase for importfeed
importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.

Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.

Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.

Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.

Sponsored-by: unqueued on Patreon
2023-10-23 16:46:22 -04:00
..
adjusted_branches Added a comment: adjusted branche to "focus" on a specific subtree 2016-08-22 14:19:57 +00:00
assistant Typo fix unncessary -> unnecessary. 2022-08-20 09:40:19 -04:00
balanced_preferred_content Added a comment 2023-07-24 13:10:09 +00:00
encryption
exporting_trees_to_special_remotes Added a comment 2018-02-07 20:01:53 +00:00
external_backend_protocol Added a comment: xxHash as the backend 2022-12-12 08:21:35 +00:00
external_special_remote_protocol comment and update todo 2023-06-23 12:25:08 -04:00
git-remote-daemon Added a comment: Rolling hash chunking 2014-04-04 14:16:25 +00:00
iabackup Added a comment: 14 of 21PB, actually 2015-04-30 02:58:05 +00:00
metadata followup 2015-04-09 14:33:11 -04:00
new_repo_versions devblog 2016-05-04 14:39:53 -04:00
p2p_protocol comment 2019-04-03 13:11:34 -04:00
requests_routing Added a comment: Friendly bump to keep on the radar 2019-10-24 09:26:23 +00:00
adjusted_branches.mdwn link to the adjust manpage 2016-06-23 14:39:49 +00:00
assistant.mdwn clarify that this is mostly done (i think?) 2014-04-07 04:41:56 +00:00
balanced_preferred_content.mdwn Fix a typo 2016-02-08 10:54:08 -04:00
caching_database.mdwn sqlite datbase for importfeed 2023-10-23 16:46:22 -04:00
encryption.mdwn Fix typos "=yet" -> "=yes" 2023-03-10 18:07:20 +01:00
exporting_trees_to_special_remotes.mdwn comment 2022-05-02 14:45:45 -04:00
external_backend_protocol.mdwn this protocol is not draft for some time 2020-10-22 19:55:29 -04:00
external_special_remote_protocol.mdwn let Remote.availability return Unavilable 2023-08-16 14:31:31 -04:00
gcrypt.mdwn
git-remote-daemon.mdwn update 2015-01-15 15:58:56 -04:00
iabackup.mdwn Fix spelling in doc/design/iabackup.mdwn 2018-06-03 12:28:26 +00:00
importing_trees_from_special_remotes.mdwn improve docs about removeExportDirectory 2019-05-28 11:16:01 -04:00
metadata.mdwn update for v6 unlocked files 2015-12-26 14:59:06 -04:00
new_repo_versions.mdwn Typo: sansative -> sensitive 2023-03-17 15:14:50 -04:00
p2p_protocol.mdwn typo 2021-08-09 12:44:20 -04:00
preferred_content.mdwn
requests_routing.mdwn keep track of satisfied requests, and summarize 2014-05-09 16:41:05 -03:00
roadmap.mdwn avoid truncating the list of confirmed items 2023-06-23 16:20:00 -04:00