Regression in podcast feed download for one RSS feed
This commit is contained in:
parent
7b7ae02fe5
commit
ff682b6029
1 changed files with 228 additions and 0 deletions
|
@ -0,0 +1,228 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Since upgrading to `6.20180626-gdf91a5cff` (pre-built OS X binary on OS X 10.11) it appears `git annex importfeed` is unable to properly download a RSS feed from one particular site -- it gets back `400 Bad Request` from the website, and then unsurprisingly reports `warning: bad feed content; no enclosures to download`. The relevant RSS feed appears sane when downloaded from the same URL with `wget` or `curl`. The previous `git-annex` version I was using on this machine (before the security update), `6.20170320-g41c5d9d` was able to download this RSS feed (I realise that's a rather large window of changes!).
|
||||
|
||||
It appears the older version (`6.20170320-g41c5d9d`) used `wget` to download the podcast RSS, and the newer version (`6.20180626-gdf91a5cff`) seems to fetch the RSS internally in a way that does not work with this RSS feed (nor does the newer version appear to provide debug output of what it is doing at that step that goes wrong, unlike the previous version which shows the explicit `wget` command run, even if multiple `--debug` and/or `--verbose` options are provided).
|
||||
|
||||
All other podcast RSS feeds work on both versions.
|
||||
|
||||
As best I can tell from packet captures and `telnet` debugging, the issue is that the new `git annex importfeed` does not send a `User-Agent:` header. Possibly some anti-DoS/anti-spam front end is rejecting it as a result?
|
||||
|
||||
Can a suitable `User-Agent:` header be added to the `git annex importfeed` HTTP requests?
|
||||
|
||||
(*Note* the bug title probably needs updating since I seem to have discovered the root cause after starting creating the bug.)
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
Attempt to `git annex importfeed` the feed `'https://theythempodcast.com/episodes?format=RSS'`. The problem also appears reproducible with `'http://theythempodcast.com/episodes?format=RSS'` (ie, unencrypted), which makes debugging a wee bit simpler (ie, packet captures can help).
|
||||
|
||||
Failing, on `6.20180626-gdf91a5cff`:
|
||||
|
||||
ewen@ashram:~/Music/podcasts$ git annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1
|
||||
importfeed checking known urls [2018-07-15 13:36:53.298188] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
|
||||
[2018-07-15 13:36:53.306676] process done ExitSuccess
|
||||
[2018-07-15 13:36:53.306768] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2018-07-15 13:36:53.314672] process done ExitSuccess
|
||||
[2018-07-15 13:36:53.315353] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","."]
|
||||
[2018-07-15 13:36:53.325249] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
|
||||
[2018-07-15 13:36:53.32666] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2018-07-15 13:36:53.768979] process done ExitSuccess
|
||||
ok
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS download failed: <html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>e64OsAKG/PqPs52sZ @ Sun, 15 Jul 2018 01:36:57 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
warning: bad feed content; no enclosures to download
|
||||
ok
|
||||
[2018-07-15 13:36:58.076606] process done ExitSuccess
|
||||
[2018-07-15 13:36:58.077183] process done ExitSuccess
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
|
||||
Working, on `6.20170320-g41c5d9d`:
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS'
|
||||
importfeed checking known urls ok
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS
|
||||
/var/folders/p1/gpj 100%[===================>] 32.62K 61.4KB/s in 0.5s
|
||||
2018-07-15 13:31:17 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
|
||||
ok
|
||||
ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1 | grep -A 1 "format=RSS"
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS
|
||||
[2018-07-15 13:33:05.367548] call: wget ["-nv","--show-progress","--clobber","-c","-O","/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249","https://theythempodcast.com/episodes?format=RSS","--user-agent","git-annex/6.20170320-g41c5d9d"]
|
||||
|
||||
0K .......... .......... .......... .. 100% 121K=0.3s2018-07-15 13:33:07 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
|
||||
[2018-07-15 13:33:07.050443] process done ExitSuccess
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
</pre>
|
||||
|
||||
In both cases the `TEMPLATE` value is the same one I've used for years, and works with all the other podcast feeds:
|
||||
|
||||
ewen@ashram:~/Music/podcasts$ set | grep TEMPLATE
|
||||
TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
OS X 10.11 (El Capitan), using the pre-built OS X binaries downloaded just after the security release came out.
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~/Music/podcasts$ git annex version
|
||||
git-annex version: 6.20180626-gdf91a5cff
|
||||
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV FsEvents ConcurrentOutput TorrentParser MagicMime Feeds Testsuite
|
||||
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client-0.5.7.0 persistent-sqlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
|
||||
operating system: darwin x86_64
|
||||
supported repository versions: 3 5 6
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5
|
||||
local repository version: 5
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
</pre>
|
||||
|
||||
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
Working request (created with `wget`, then replicated with `telnet`):
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
User-Agent: Wget/1.19.5 (darwin15.6.0)
|
||||
Accept: */*
|
||||
Accept-Encoding: identity
|
||||
Host: theythempodcast.com
|
||||
Connection: Keep-Alive
|
||||
Range: bytes=0-
|
||||
|
||||
HTTP/1.1 301 Moved Permanently
|
||||
Date: Sun, 15 Jul 2018 02:02:53 GMT
|
||||
X-ServedBy: web012
|
||||
Location: https://theythempodcast.com/episodes?format=RSS
|
||||
Transfer-Encoding: chunked
|
||||
x-contextid: ZBu72tcg/OJSeq1q9
|
||||
x-via: 1.1 echo017
|
||||
|
||||
0
|
||||
|
||||
^]
|
||||
telnet> quit
|
||||
Connection closed.
|
||||
ewen@ashram:~$
|
||||
</pre>
|
||||
|
||||
Failing request (as sent by `git-annex importfeed`):
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
Host: theytheypodcast.com
|
||||
Range: bytes=0-
|
||||
Accept-Encoding: identity
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
content-length: 378
|
||||
x-synthetic: true
|
||||
expires: Thu, 01 Jan 1970 00:00:00 UTC
|
||||
pragma: no-cache
|
||||
cache-control: no-cache, must-revalidate
|
||||
content-type: text/html; charset=UTF-8
|
||||
connection: close
|
||||
date: Sun, 15 Jul 2018 02:01:14 UTC
|
||||
x-contextid: 1WIJdGgE/Y1Tq8lJu
|
||||
x-via: 1.1 echo008
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>1WIJdGgE/Y1Tq8lJu @ Sun, 15 Jul 2018 02:01:14 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>Connection closed by foreign host.
|
||||
ewen@ashram:~$
|
||||
|
||||
Also failing request, with minimal HTTP/1.1 headers:
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
Host: theythempodcast.com
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
content-length: 378
|
||||
x-synthetic: true
|
||||
expires: Thu, 01 Jan 1970 00:00:00 UTC
|
||||
pragma: no-cache
|
||||
cache-control: no-cache, must-revalidate
|
||||
content-type: text/html; charset=UTF-8
|
||||
connection: close
|
||||
date: Sun, 15 Jul 2018 02:07:14 UTC
|
||||
x-contextid: c05RnFlG/h78z0OqB
|
||||
x-via: 1.1 echo034
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>c05RnFlG/h78z0OqB @ Sun, 15 Jul 2018 02:07:14 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>Connection closed by foreign host.
|
||||
ewen@ashram:~$
|
||||
|
||||
Apparently minimal working request:
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
User-Agent: Wget/1.19.5 (darwin15.6.0)
|
||||
Host: theythempodcast.com
|
||||
|
||||
HTTP/1.1 301 Moved Permanently
|
||||
Date: Sun, 15 Jul 2018 02:09:09 GMT
|
||||
X-ServedBy: web025
|
||||
Location: https://theythempodcast.com/episodes?format=RSS
|
||||
Transfer-Encoding: chunked
|
||||
x-contextid: veJm4JC0/sT3ENh27
|
||||
x-via: 1.1 echo017
|
||||
|
||||
0
|
||||
|
||||
^]
|
||||
telnet> quit
|
||||
Connection closed.
|
||||
ewen@ashram:~$
|
||||
|
||||
Note that the only addition was adding a `User-Agent:` header (copied from the `wget` request). The redirect to HTTPS is expected here, as we are initially requesting via HTTP for `telnet`s benefit (I did try to get `openssl s_client` working, but it appeared the TLS connection timed out before I could manually complete a request, and I had no luck piping the request into `openssl s_client`).
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Absolutely :-) `git annex` has been my podcatcher, and media management tool for years. Thanks for writing it!
|
Loading…
Add table
Reference in a new issue