This commit is contained in:
ewen 2017-03-21 08:49:19 +00:00 committed by admin
parent 4652413766
commit 12831ed724

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="ewen"
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
subject="Track GUIDs to avoid duplicate downloads"
date="2017-03-21T08:48:04Z"
content="""
While tracking podcast media URLs *usually* works to avoid duplicate downloads, when it fails it usually fails spectacularly. In particular if a podcast feed decides to update *all* the URLs (for old and new podcasts) to use a different URL scheme, then suddenly that looks like a huge volume of new URLs, and all of them get downloaded -- even if the content has actually already been retrieved from a different URL. For instance the `acast.com` service has changed their URL scheme a couple of times in the last 1-2 years, rewriting all the historical URLs, so I have three copies of many of the episodes on podcasts on their service :-( (Many downloaded; some skipped once I caught the bulk download and stopped it/reran with `--fast` or `--relaxed` to make placeholders instead. `acast.com` seem to have managed to cause even more confusion by rewriting many of the older `mp3` files with new `id3)
Some (all?) podcast feeds also have a `guid` field, which specifies what should be a unique per-episode
"""]]