importfeed: Made checking known urls step around 10% faster.
This was a bit disappointing, I was hoping for a 2x speedup. But, I think the metadata lookup is wasting a lot of time and also needs to be made to stream. The changes to catObjectStreamLsTree were benchmarked to not also speed up --all around 3% more. Seems I managed to make it polymorphic after all.
This commit is contained in:
parent
a6afa62a60
commit
535cdc8d48
6 changed files with 58 additions and 42 deletions
|
@ -0,0 +1,11 @@
|
|||
git-annex tries to run in a constant amount of memory, however `knownUrls`
|
||||
loads all urls ever seen into a list, so the more urls there are, the more
|
||||
memory `git annex importfeed` will need.
|
||||
|
||||
This is probably not a big problem in practice, but seems worth doing
|
||||
something about if somehow possible.
|
||||
|
||||
Unfortunately, can't use a bloom filter, because a false positive would
|
||||
prevent importing an url that has not been imported before. A sqlite
|
||||
database would work, but would need to be updated whenever the git-annex
|
||||
branch is changed. --[[Joey]]
|
|
@ -61,3 +61,6 @@ looked up efficiently. (Before these changes, the same key lookup was done
|
|||
speedup when such limits are used. What that optimisation needs is a way to
|
||||
tell if the current limit needs the key or not. If it does, then match on
|
||||
it after getting the key, otherwise before getting the key.
|
||||
|
||||
Also, importfeed could be sped up more, probably, if knownItems
|
||||
streamed through cat-file --buffer.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue