git-annex/doc/todo/podcatching_handling_updated_files.mdwn
Joey Hess 9e25cbde20 importfeed: Avoid downloading a redundant item from a feed whose guid has been downloaded before, even when the url has changed.
To support this, always store itemid in metadata; before this was only done
when annex.genmetadata was set.
2015-03-31 13:30:13 -04:00

19 lines
862 B
Markdown

Files in feeds can be updated, and if this update includes changing the
url, `importfeed` will treat this as a new file. This results in `foo.mp3`
having a `2_foo.mp3` added next to it.
This seems to happen especially commonly with feeds using FeedBurner.
Saw several with same size, different checksum and url.
To detect this, `importfeed` could store the item's guid in the metadata
of the key. Where it currently builds a `Map URLString Key` of all
known items, it could instead build a `Map (Either URlString GUID) Key`.
This would at least prevent the duplication, when the feed has guids.
> [[done]] --[[Joey]]
It would be even nicer if the old file could be updated with the new
content. But, since files can be moved around, deleted, tagged, etc,
that only seems practical at all if the file is still in the directory
where `importfeed` created it.