git-annex/Annex
Joey Hess f2db6da938
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.

Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.

So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.

git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.

Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.

Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?

Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 13:04:53 -04:00
..
AdjustedBranch filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Branch handle transitions with read-only unmerged git-annex branches 2021-12-28 13:23:32 -04:00
Concurrent differentiate between concurrency enabled at command line and by git config 2020-09-16 11:47:12 -04:00
Content filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Debug implement fastDebug 2021-04-06 15:24:28 -04:00
LockPool avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
MetaData update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
SpecialRemote configremote 2023-04-18 15:30:49 -04:00
VectorClock deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
View annex.maxextensionlength for view 2023-03-24 14:01:38 -04:00
Action.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
AdjustedBranch.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
AutoMerge.hs Windows: Support long filenames in more (possibly all) of the code 2023-03-01 15:55:58 -04:00
BloomFilter.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Branch.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
BranchState.hs disable journalIgnorable in enableInteractiveBranchAccess 2022-07-15 13:48:41 -04:00
CatFile.hs read a consistent amount from pointer file 2022-02-23 12:52:34 -04:00
ChangedRefs.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
CheckAttr.hs mincopies 2021-01-06 14:15:19 -04:00
CheckIgnore.hs move several readonly values to AnnexRead 2022-06-28 15:40:19 -04:00
Common.hs rename Git.Filename to Git.Quote 2023-04-12 17:22:03 -04:00
Concurrent.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Content.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
CopyFile.hs Copy with a reflink when exporting a tree to a directory special remote 2023-03-28 13:09:14 -04:00
CurrentBranch.hs refactor getCurrentBranch 2018-10-19 17:29:18 -04:00
Debug.hs fix fastDebug to check if debugging is actually enabled 2021-04-06 16:28:37 -04:00
Difference.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
DirHashes.hs Added http special remote, which is useful for accessing other remotes that publish content stored in them via http/https. 2020-09-01 15:16:35 -04:00
Drop.hs prevent numcopies or mincopies being configured to 0 2022-03-28 15:20:34 -04:00
Environment.hs improve comments 2023-04-04 15:23:39 -04:00
Export.hs rename Git.Filename to Git.Quote 2023-04-12 17:22:03 -04:00
ExternalAddonProcess.hs use fastDebug everywhere it can be used 2021-04-06 15:41:24 -04:00
FileMatcher.hs Support "inbackend" in preferred content expressions 2022-09-26 16:06:49 -04:00
Fixup.hs fix a bug that prevented git-annex init from working in a submodule 2021-01-21 15:33:15 -04:00
GitOverlay.hs filter out control characters in error messages 2023-04-10 13:50:51 -04:00
HashObject.hs use ResourcePool for hash-object handles 2022-07-25 17:32:39 -04:00
Hook.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Import.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Ingest.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Init.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
InodeSentinal.hs fix perms for core.sharedRepository 2023-04-26 16:29:11 -04:00
Journal.hs fix perms for core.sharedRepository 2023-04-26 16:29:11 -04:00
Link.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Locations.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
LockFile.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
LockPool.hs update licenses from GPL to AGPL 2019-03-13 15:48:14 -04:00
Magic.hs Serialize use of C magic library, which is not thread safe. 2020-09-17 17:27:42 -04:00
MetaData.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
Multicast.hs use programPath consistently, not readProgramFile 2020-03-30 16:06:27 -04:00
Notification.hs fix build when dbus is enabled 2022-07-05 13:06:45 -04:00
NumCopies.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Path.hs Apply codespell -w throughout 2023-03-17 15:14:58 -04:00
Perms.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
PidLock.hs fix windows build 2022-09-26 12:08:04 -04:00
Queue.hs add restage log 2022-09-23 15:47:24 -04:00
RemoteTrackingBranch.hs refactor 2019-11-11 19:10:52 -04:00
ReplaceFile.hs improve createDirectoryUnder to allow alternate top directories 2022-08-12 12:52:37 -04:00
SpecialRemote.hs init: Avoid autoenabling special remotes that have control characters in their names 2023-04-12 12:37:12 -04:00
Ssh.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
StallDetection.hs bwlimit 2021-09-21 16:58:10 -04:00
TaggedPush.hs Ref ByteString conversion done 2020-04-07 17:41:09 -04:00
Tmp.hs Windows: Support long filenames in more (possibly all) of the code 2023-03-01 15:55:58 -04:00
Transfer.hs avoid annexFileMode special case 2023-04-27 15:58:37 -04:00
TransferrerPool.hs avoid build warning on windows 2023-03-27 12:19:26 -04:00
UntrustedFilePath.hs fix mojibake reversion in display of utf8 2023-04-12 13:53:30 -04:00
UpdateInstead.hs v7 for all repositories 2019-08-30 14:09:14 -04:00
Url.hs filter out control characters in warning messages 2023-04-10 15:55:44 -04:00
UUID.hs simplify and speed up Utility.FileSystemEncoding 2021-08-11 12:13:31 -04:00
VariantFile.hs more RawFilePath 2019-12-18 17:10:28 -04:00
VectorClock.hs deal better with clock skew situations, using vector clocks 2021-08-04 12:33:46 -04:00
Verify.hs filter out control characters in all other Messages 2023-04-11 12:58:01 -04:00
Version.hs v8 repositories automatically upgrade to v9 2022-07-25 16:20:04 -04:00
View.hs annex.maxextensionlength for view 2023-03-24 14:01:38 -04:00
Wanted.hs new matching options --want-get-by and --want-drop-by 2022-07-28 13:26:03 -04:00
WorkerPool.hs start splitting out readonly values from AnnexState 2021-04-02 15:51:44 -04:00
WorkTree.hs use lookupKeyStaged in --batch code paths 2022-10-26 14:43:06 -04:00
YoutubeDl.hs default to yt-dlp and fix progress parsing bugs 2023-05-27 13:04:53 -04:00