default to yt-dlp and fix progress parsing bugs

I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.

Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.

So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.

git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.

Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.

Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?

Sponsored-by: Joshua Antonishen on Patreon
This commit is contained in:
Joey Hess 2023-05-27 12:45:16 -04:00
parent f1cdb79ca4
commit f2db6da938
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
7 changed files with 79 additions and 73 deletions

View file

@ -74,7 +74,7 @@ and transferring to your laptop on demand.
## youtube channels
You can also use `git annex importfeed` on youtube channels.
It will use youtube-dl to automatically
It will use yt-dlp to automatically
download the videos.
To download a youtube channel, you need to find the feed associated with that
@ -84,7 +84,7 @@ manually. For a channel url like
"https://www.youtube.com/channel/$foo", the
feed is "https://www.youtube.com/feeds/videos.xml?channel_id=$foo"
Use of youtube-dl is disabled by default as it can be a security risk.
Use of yt-dlp is disabled by default as it can be a security risk.
See the documentation of annex.security.allowed-ip-addresses
in [[git-annex]] for details.)

View file

@ -75,9 +75,9 @@ number takes that many paths from the end.
<a name=videos></a>
There's support for downloading videos from sites like YouTube, Vimeo,
and many more. This relies on youtube-dl to download the videos.
and many more. This relies on yt-dlp to download the videos.
When you have youtube-dl installed, you can just
When you have yt-dlp installed, you can just
`git annex addurl http://youtube.com/foo` and it will detect that
it is a video and download the video content for offline viewing.
@ -86,16 +86,14 @@ See the documentation of annex.security.allowed-ip-addresses
in [[git-annex]] for details.)
Later, in another clone of the repository, you can run `git annex get` on
the file and it will also be downloaded with youtube-dl. This works
the file and it will also be downloaded with yt-dlp. This works
even if the video host has transcoded or otherwise changed the video
in the meantime; the assumption is that these video files are equivalent.
There is an `annex.youtube-dl-options` configuration setting that can be used
to pass parameters to quvi. For example, you could set `git config
to pass parameters to yt-dlp. For example, you could set `git config
annex.youtube-dl-options "--format worst"` to configure it to download low
quality videos from YouTube. Note that the youtube-dl configuration files
are not read when git-annex runs youtube-dl, to avoid config settings that
break its integration.
quality videos from YouTube.
To download a youtube channel, you need to find the RSS feed associated with
that channel, and pass it to `git annex importfeed`. There does not seem to