I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
Note that when this is specified and an older git-annex is used to
enableremote such a special remote, it will simply ignore the cost= field
and use whatever the default cost is.
In passing, fixed adb to support the remote.name.cost and
remote.name.cost-command configs.
Sponsored-by: Dartmouth College's DANDI project
When a web special remote does not have urlinclude/urlexclude
configured, make it respect the configuration of other web special
remotes and avoid using urls that match the config of another.
Note that the other web special remote does not have to be enabled.
That seems ok, it would have been extra work to check for only ones that
are enabled.
The implementation does mean that the web special remote re-parses
its own config once at startup, as well as re-parsing the configs of any
other web special remotes. This should be a very small slowdown
unless there are lots of web special remotes.
Sponsored-by: Dartmouth College's DANDI project