git-annex/doc/todo/switch_from_quvi_to_youtube-dl.mdwn
2017-11-30 17:08:18 -04:00

71 lines
3.6 KiB
Markdown

quvi does not seem maintained (last upstream release in 2013)
and it supports many fewer videos than youtube-dl does.
The difficulty with using youtube-dl is it, by design, does not
provide a way to probe if it supports an url, other than running it
and seeing if it finds a video at the url. This would make `git annex
addurl` significantly slower if it ran youtube-dl to probe every url.
It is possible to use youtube-dl to download arbitrary non-video files;
it stores the file to disk just as wget or curl. But, that's well outside
its intended use case, and so it does not feel like a good idea to make
git-annex depend on using youtube-dl to download generic urls.
(Also, youtube-dl has bugs with downloading non-video
urls, see for example http://bugs.debian.org/874321)
So, switching to youtube-dl would probably need a new switch, like `git
annex addurl --rip` that enables using it.
(Importfeed only treats links in the feed as video urls, not enclosures,
so this problem does not affect it and it would not need a new switch.)
That would need changes to users' workflows. git-annex could keep
supporting quvi for some time, and warn when it uses quvi, to
help with the transition.
> Alternatively, git-annex addurl could download the url first, and then
> check the file to see if it looks like html. If so, run youtube-dl (which
> unfortunately has to download it again) and see if it manages to rip
> media from it. This way, addurl of non-html files does not have extra
> overhead, and the redundant download is fairly small compared to ripping
> the media. Only the unusual case where addurl is being used on html that
> does not contain media becomes more expensive.
>
> However, for --relaxed, running youtube-dl --get-filename would be
> significantly more expensive since it hits the network. It seems that
> --relaxed would need to change to not rip videos; users who want that
> could use --fast.
>
> --fast already hits the network, but
> if it uses youtube-dl --get-filename, it would fall afoul of
> bugs like <http://bugs.debian.org/874321>, although those can be worked
> around (/dev/null stderr in cast youtube-dl crashes)
Another gotcha is playlists. youtube-dl downloads playlists automatically.
But, git-annex needs to record an url that downloads a single file so that
`git annex get` works right. So, playlists will need to be disabled when
git-annex runs youtube-dl. But, `--no-playlist` does not always disable
playlists. Best option seems to be `--no-playlist --playlist-items 0` which works for
non-playlists, and downloads only 1 item from playlists (hopefully a fairly
stable item, but who knows..).
(`git annex importfeed` handles youtube playlist downloads, but needs the
user to find the url to the rss feed for the playlist. Youtube still has
these, although it makes them hard to find.)
Another gotcha is that youtube-dl's -o option does not fully determine the
filename it downloads to. Sometims it will tack on an additional extension
(seen with youtube videos where it added a ".mkv").
And --get-filename does not report the actual filename when that happens.
This seems to be due to format merging by ffmpeg; with -f best, it does
not merge and so does not do that.
<https://github.com/rg3/youtube-dl/issues/14864>
To do disk free space checking will need a different technique than
git-annex normally uses, because youtube-dl does not provide an easy way to
query for size. Could use --dump-json, but that would require downloading
the web page yet again, so too expensive.. and, the json seems to have
"filesize: null" for youtube videos. What does work is the --max-filesize
option, which makes youtube-dl abort if it's too big.
> [[done]] --[[Joey]]