importfeed: fix bug while also speeding up 12x!

* Fix bug that could make git-annex importfeed not see recently recorded
  state when configured with annex.alwayscommit=false.
* importfeed: Made "checking known urls" phase run 12 times faster.

The massive speedup is because it no longer queries for metadata
accompanying each url. Instead it processes the whole git-annex branch and
checks all metadata files for feed item ids, and uses any it finds.

This could result in a behavior change, in an unlikely situation: If a feed
id is recorded in a key's metadata, but the url gets removed, the old code
would not see that item id and would re-download it if it finds an url for
it in a feed, while the new code will see the item id. I don't think
the old behavior was intentional, and it may be that the new behavior is
better. Not gonna worry about this.
This commit is contained in:
Joey Hess 2021-04-23 12:36:56 -04:00
parent b689f17062
commit 0547884eb2
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
3 changed files with 29 additions and 43 deletions

View file

@ -155,8 +155,8 @@ later write.
> * [[bugs/git-annex_branch_caching_bug]] was a problem, now fixed.
> * Any other similar direct accesses of the branch, not going through
> Annex.Branch, also need to be fixed (and may be missing journal files
> already?) Command.ImportFeed.knownItems is one. Command.Log behavior
> needs to be investigated, may be ok.
> already?) Most fixed now. Command.Log behavior needs to be
> investigated still.
>
> * Need to implement regardingPrivateUUID and privateUUIDsKnown,
> which need to look at the git config to find the private uuids.