followup
This commit is contained in:
commit
bd6799ebef
3 changed files with 256 additions and 0 deletions
|
@ -0,0 +1,20 @@
|
|||
[[!comment format=mdwn
|
||||
username="joey"
|
||||
subject="""comment 2"""
|
||||
date="2018-07-16T15:27:13Z"
|
||||
content="""
|
||||
Something more is involved than git annex add and sync on an adjusted
|
||||
branch, because when I try that the problem does not appear.
|
||||
(Also tried on OSX in case it was somehow OSX specific.)
|
||||
|
||||
I see that the mode of the files in the problem tree you showed
|
||||
is 120000, which tells me that tree is one that was committed to the master
|
||||
branch, not the adjusted unlocked branch. That suggests that the problem is
|
||||
in reverseAdjustedTree. But I don't think I will be able to find it by
|
||||
staring at the code; I need a way to reproduce the problem.
|
||||
|
||||
What *might* help is if you can show the full directory tree in the
|
||||
repository (with names mangled for privacy as needed). It may be that it's
|
||||
somehow caused by adding a file at a particular location in the tree,
|
||||
since a lot of the complication in the code is around handling such things.
|
||||
"""]]
|
226
doc/bugs/importfeed_bad_request_without_User-Agent__58__.mdwn
Normal file
226
doc/bugs/importfeed_bad_request_without_User-Agent__58__.mdwn
Normal file
|
@ -0,0 +1,226 @@
|
|||
### Please describe the problem.
|
||||
|
||||
Since upgrading to `6.20180626-gdf91a5cff` (pre-built OS X binary on OS X 10.11) it appears `git annex importfeed` is unable to properly download a RSS feed from one particular site -- it gets back `400 Bad Request` from the website, and then unsurprisingly reports `warning: bad feed content; no enclosures to download`. The relevant RSS feed appears sane when downloaded from the same URL with `wget` or `curl`. The previous `git-annex` version I was using on this machine (before the security update), `6.20170320-g41c5d9d` was able to download this RSS feed (I realise that's a rather large window of changes!).
|
||||
|
||||
It appears the older version (`6.20170320-g41c5d9d`) used `wget` to download the podcast RSS, and the newer version (`6.20180626-gdf91a5cff`) seems to fetch the RSS internally in a way that does not work with this RSS feed (nor does the newer version appear to provide debug output of what it is doing at that step that goes wrong, unlike the previous version which shows the explicit `wget` command run, even if multiple `--debug` and/or `--verbose` options are provided).
|
||||
|
||||
All other podcast RSS feeds work on both versions.
|
||||
|
||||
As best I can tell from packet captures and `telnet` debugging, the issue is that the new `git annex importfeed` does not send a `User-Agent:` header. Possibly some anti-DoS/anti-spam front end is rejecting it as a result?
|
||||
|
||||
Can a suitable `User-Agent:` header be added to the `git annex importfeed` HTTP requests?
|
||||
|
||||
### What steps will reproduce the problem?
|
||||
|
||||
Attempt to `git annex importfeed` the feed `'https://theythempodcast.com/episodes?format=RSS'`. The problem also appears reproducible with `'http://theythempodcast.com/episodes?format=RSS'` (ie, unencrypted), which makes debugging a wee bit simpler (ie, packet captures can help).
|
||||
|
||||
Failing, on `6.20180626-gdf91a5cff`:
|
||||
|
||||
ewen@ashram:~/Music/podcasts$ git annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1
|
||||
importfeed checking known urls [2018-07-15 13:36:53.298188] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","git-annex"]
|
||||
[2018-07-15 13:36:53.306676] process done ExitSuccess
|
||||
[2018-07-15 13:36:53.306768] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","show-ref","--hash","refs/heads/git-annex"]
|
||||
[2018-07-15 13:36:53.314672] process done ExitSuccess
|
||||
[2018-07-15 13:36:53.315353] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","ls-files","--stage","-z","--","."]
|
||||
[2018-07-15 13:36:53.325249] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch"]
|
||||
[2018-07-15 13:36:53.32666] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)"]
|
||||
[2018-07-15 13:36:53.768979] process done ExitSuccess
|
||||
ok
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS download failed: <html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>e64OsAKG/PqPs52sZ @ Sun, 15 Jul 2018 01:36:57 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>
|
||||
|
||||
warning: bad feed content; no enclosures to download
|
||||
ok
|
||||
[2018-07-15 13:36:58.076606] process done ExitSuccess
|
||||
[2018-07-15 13:36:58.077183] process done ExitSuccess
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
|
||||
Working, on `6.20170320-g41c5d9d`:
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS'
|
||||
importfeed checking known urls ok
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS
|
||||
/var/folders/p1/gpj 100%[===================>] 32.62K 61.4KB/s in 0.5s
|
||||
2018-07-15 13:31:17 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
|
||||
ok
|
||||
ewen@ashram:~/Music/podcasts$ /Applications/OpenSource/git-annex-2017-03-20.app/Contents/MacOS/git-annex --debug importfeed --template="${TEMPLATE}" 'https://theythempodcast.com/episodes?format=RSS' 2>&1 | grep -A 1 "format=RSS"
|
||||
importfeed https://theythempodcast.com/episodes?format=RSS
|
||||
[2018-07-15 13:33:05.367548] call: wget ["-nv","--show-progress","--clobber","-c","-O","/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249","https://theythempodcast.com/episodes?format=RSS","--user-agent","git-annex/6.20170320-g41c5d9d"]
|
||||
|
||||
0K .......... .......... .......... .. 100% 121K=0.3s2018-07-15 13:33:07 URL:https://theythempodcast.com/episodes?format=RSS [33404/33404] -> "/var/folders/p1/gpjb1g7s00zgp64p87w19pgm0000gn/T/feed16807282475249" [1]
|
||||
[2018-07-15 13:33:07.050443] process done ExitSuccess
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
</pre>
|
||||
|
||||
In both cases the `TEMPLATE` value is the same one I've used for years, and works with all the other podcast feeds:
|
||||
|
||||
ewen@ashram:~/Music/podcasts$ set | grep TEMPLATE
|
||||
TEMPLATE='archive/${feedtitle}/${itemtitle}${extension}'
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
|
||||
### What version of git-annex are you using? On what operating system?
|
||||
|
||||
OS X 10.11 (El Capitan), using the pre-built OS X binaries downloaded just after the security release came out.
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~/Music/podcasts$ git annex version
|
||||
git-annex version: 6.20180626-gdf91a5cff
|
||||
build flags: Assistant Webapp Pairing S3(multipartupload)(storageclasses) WebDAV FsEvents ConcurrentOutput TorrentParser MagicMime Feeds Testsuite
|
||||
dependency versions: aws-0.17.1 bloomfilter-2.0.1.0 cryptonite-0.23 DAV-1.3.1 feed-0.3.12.0 ghc-8.0.2 http-client-0.5.7.0 persistent-sqlite-2.6.2 torrent-10000.1.1 uuid-1.3.13 yesod-1.4.5
|
||||
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 BLAKE2B256E BLAKE2B256 BLAKE2B512E BLAKE2B512 BLAKE2B160E BLAKE2B160 BLAKE2B224E BLAKE2B224 BLAKE2B384E BLAKE2B384 BLAKE2S256E BLAKE2S256 BLAKE2S160E BLAKE2S160 BLAKE2S224E BLAKE2S224 BLAKE2SP256E BLAKE2SP256 BLAKE2SP224E BLAKE2SP224 SHA1E SHA1 MD5E MD5 WORM URL
|
||||
remote types: git gcrypt p2p S3 bup directory rsync web bittorrent webdav adb tahoe glacier ddar hook external
|
||||
operating system: darwin x86_64
|
||||
supported repository versions: 3 5 6
|
||||
upgrade supported from repository versions: 0 1 2 3 4 5
|
||||
local repository version: 5
|
||||
ewen@ashram:~/Music/podcasts$
|
||||
</pre>
|
||||
|
||||
|
||||
|
||||
### Please provide any additional information below.
|
||||
|
||||
Working request (created with `wget`, then replicated with `telnet`):
|
||||
|
||||
<pre>
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
User-Agent: Wget/1.19.5 (darwin15.6.0)
|
||||
Accept: */*
|
||||
Accept-Encoding: identity
|
||||
Host: theythempodcast.com
|
||||
Connection: Keep-Alive
|
||||
Range: bytes=0-
|
||||
|
||||
HTTP/1.1 301 Moved Permanently
|
||||
Date: Sun, 15 Jul 2018 02:02:53 GMT
|
||||
X-ServedBy: web012
|
||||
Location: https://theythempodcast.com/episodes?format=RSS
|
||||
Transfer-Encoding: chunked
|
||||
x-contextid: ZBu72tcg/OJSeq1q9
|
||||
x-via: 1.1 echo017
|
||||
|
||||
0
|
||||
|
||||
^]
|
||||
telnet> quit
|
||||
Connection closed.
|
||||
ewen@ashram:~$
|
||||
</pre>
|
||||
|
||||
Failing request (as sent by `git-annex importfeed`):
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
Host: theytheypodcast.com
|
||||
Range: bytes=0-
|
||||
Accept-Encoding: identity
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
content-length: 378
|
||||
x-synthetic: true
|
||||
expires: Thu, 01 Jan 1970 00:00:00 UTC
|
||||
pragma: no-cache
|
||||
cache-control: no-cache, must-revalidate
|
||||
content-type: text/html; charset=UTF-8
|
||||
connection: close
|
||||
date: Sun, 15 Jul 2018 02:01:14 UTC
|
||||
x-contextid: 1WIJdGgE/Y1Tq8lJu
|
||||
x-via: 1.1 echo008
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>1WIJdGgE/Y1Tq8lJu @ Sun, 15 Jul 2018 02:01:14 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>Connection closed by foreign host.
|
||||
ewen@ashram:~$
|
||||
|
||||
Also failing request, with minimal HTTP/1.1 headers:
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
Host: theythempodcast.com
|
||||
|
||||
HTTP/1.1 400 Bad Request
|
||||
content-length: 378
|
||||
x-synthetic: true
|
||||
expires: Thu, 01 Jan 1970 00:00:00 UTC
|
||||
pragma: no-cache
|
||||
cache-control: no-cache, must-revalidate
|
||||
content-type: text/html; charset=UTF-8
|
||||
connection: close
|
||||
date: Sun, 15 Jul 2018 02:07:14 UTC
|
||||
x-contextid: c05RnFlG/h78z0OqB
|
||||
x-via: 1.1 echo034
|
||||
|
||||
<html>
|
||||
<head>
|
||||
<title>400 Bad Request</title>
|
||||
<style> body { background-color: #F2F2F2; color: #3E3E3E; font-family: 'Helvetica Neue', Helvetica, Arial, sans-serif; font-size: 12px; } pre { word-wrap: break-word; } </style>
|
||||
</head>
|
||||
<body>
|
||||
<h1>400 Bad Request</h1>
|
||||
<p><pre>c05RnFlG/h78z0OqB @ Sun, 15 Jul 2018 02:07:14 GMT</pre>
|
||||
<p><pre>SEC-43</pre>
|
||||
<p><pre></pre>
|
||||
</body>
|
||||
</html>Connection closed by foreign host.
|
||||
ewen@ashram:~$
|
||||
|
||||
Apparently minimal working request:
|
||||
|
||||
ewen@ashram:~$ telnet 198.185.159.144 80
|
||||
Trying 198.185.159.144...
|
||||
Connected to 198.185.159.144.
|
||||
Escape character is '^]'.
|
||||
GET /episodes?format=RSS HTTP/1.1
|
||||
User-Agent: Wget/1.19.5 (darwin15.6.0)
|
||||
Host: theythempodcast.com
|
||||
|
||||
HTTP/1.1 301 Moved Permanently
|
||||
Date: Sun, 15 Jul 2018 02:09:09 GMT
|
||||
X-ServedBy: web025
|
||||
Location: https://theythempodcast.com/episodes?format=RSS
|
||||
Transfer-Encoding: chunked
|
||||
x-contextid: veJm4JC0/sT3ENh27
|
||||
x-via: 1.1 echo017
|
||||
|
||||
0
|
||||
|
||||
^]
|
||||
telnet> quit
|
||||
Connection closed.
|
||||
ewen@ashram:~$
|
||||
|
||||
Note that the only addition was adding a `User-Agent:` header (copied from the `wget` request). The redirect to HTTPS is expected here, as we are initially requesting via HTTP for `telnet`s benefit (I did try to get `openssl s_client` working, but it appeared the TLS connection timed out before I could manually complete a request, and I had no luck piping the request into `openssl s_client`).
|
||||
|
||||
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
|
||||
|
||||
Absolutely :-) `git annex` has been my podcatcher, and media management tool for years. Thanks for writing it!
|
|
@ -0,0 +1,10 @@
|
|||
[[!comment format=mdwn
|
||||
username="ewen"
|
||||
avatar="http://cdn.libravatar.org/avatar/605b2981cb52b4af268455dee7a4f64e"
|
||||
subject="User-Agent"
|
||||
date="2018-07-15T02:21:47Z"
|
||||
content="""
|
||||
In theory `User-Agent:` is \"optional\", but [clients `SHOULD` send it](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.43). Looks like some servers are particularly fussy about it being sent... and off the top of my head I can't think of a reason *not* to send a `User-Agent:` header of some sort, as it is *very* widely sent.
|
||||
|
||||
Ewen
|
||||
"""]]
|
Loading…
Reference in a new issue