importfeed: Use caching database to avoid needing to list urls on every
run, and avoid using too much memory.
Benchmarking in my podcasts repo, importfeed got 1.42 seconds faster,
and memory use dropped from 203000k to 59408k.
Database.ImportFeed is Database.ContentIdentifier with the serial number
filed off. There is a bit of code duplication I would like to avoid,
particularly recordAnnexBranchTree, and getAnnexBranchTree. But these use
the persistent sqlite tables, so despite the code being the same, they
cannot be factored out.
Since this database includes the contentidentifier metadata, it will be
slightly redundant if a sqlite database is ever added for metadata. I
did consider making such a generic database and using it for this. But,
that would then need importfeed to update both the url database and the
metadata database, which is twice as much work diffing the git-annex
branch trees. Or would entagle updating two databases in a complex way.
So instead it seems better to optimise the database that
importfeed needs, and if the metadata database is used by another command,
use a little more disk space and do a little bit of redundant work to
update it.
Sponsored-by: unqueued on Patreon
The crash occurred because writeCreds got called twice, and writeFileProtected
neglected to close its file handle, so the file was open for write when
written the second time.
It seems unncessary and suboptimal that writeCreds gets called twice.
One call is from getRemoteCredPair and the other from setRemoteCredPair'.
What happens is that in the enableremote case, code that also runs at
initremote does unncessary work. Might be possible to improve that, but
I've gone for the simple fix.
Sponsored-by: k0ld on Patreon
git-annex only writes regular files there, but other things may drop junk
like empty .DAV directories around the tree. And trying to hash such things
can have weird and hard to understand effects. So it seems best to do a
small amount of work in statting the journal file to make sure it's a
regular file.
Sponsored-by: Jack Hill on Patreon