importfeed: Look at not only permalinks, but now also guids to identify previously downloaded files.

I've seen rss feeds that have no permalinks, only guids (which are
sometimes in the form of permalinks, argh/sigh).

I had previously avoided trusting guids to be globally unique, because my
survey of rss feeds that I subscribe to shows a lot of pretty bad
"guids" like "2 at http://serialpodcast.org" or even worse "oth20150401-hq".
Worry was that two podcasts that are generating guids so badly, that
there's no guarantee they're actually globally unique.

But, I'm seeing too many url changes that result in redundant files, so
let's try this. If feeds are so broken that guids overlap, they could just
as well incorrectly call them permalinks too.
This commit is contained in:
Joey Hess 2015-07-20 14:56:57 -04:00
parent 3c134ee21a
commit f95a8c8672
2 changed files with 6 additions and 2 deletions

View file

@ -219,8 +219,7 @@ performDownload opts cache todownload = case location todownload of
| otherwise = a
knownitemid = case getItemId (item todownload) of
-- only when it's a permalink
Just (True, itemid) -> S.member itemid (knownitems cache)
Just (_, itemid) -> S.member itemid (knownitems cache)
_ -> False
rundownload url extension getter = do

5
debian/changelog vendored
View file

@ -20,6 +20,11 @@ git-annex (5.20150714) UNRELEASED; urgency=medium
* sync --content: Fix bug that caused files to be uploaded to eg,
more archive remotes than wanted copies, only to later be dropped
to satisfy the preferred content settings.
* importfeed: Improve detection of known items whose url has changed,
and avoid adding redundant files. Where before this only looked at
permalinks in rss feeds, it now also looks at guids.
* importfeed: Look at not only permalinks, but now also guids
to identify previously downloaded files.
-- Joey Hess <id@joeyh.name> Fri, 10 Jul 2015 16:36:42 -0400