2015-03-23 19:36:10 +00:00
|
|
|
# NAME
|
|
|
|
|
|
|
|
git-annex importfeed - import files from podcast feeds
|
|
|
|
|
|
|
|
# SYNOPSIS
|
|
|
|
|
|
|
|
git annex importfeed `[url ...]`
|
|
|
|
|
|
|
|
# DESCRIPTION
|
|
|
|
|
2024-01-30 19:37:29 +00:00
|
|
|
Imports the contents of podcasts and other rss and atom feeds. Only
|
|
|
|
downloads files whose content has not already been added to the repository
|
|
|
|
before, so you can delete, rename, etc the resulting files and repeated
|
|
|
|
runs won't duplicate them.
|
2015-03-23 19:36:10 +00:00
|
|
|
|
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
|
|
|
When `yt-dlp` is installed, it can be used to download links in the feed.
|
2017-11-28 18:05:58 +00:00
|
|
|
This allows importing e.g., YouTube playlists.
|
2018-06-17 18:46:22 +00:00
|
|
|
(However, this is disabled by default as it can be a security risk.
|
2019-05-30 16:43:40 +00:00
|
|
|
See the documentation of annex.security.allowed-ip-addresses
|
2018-06-17 18:46:22 +00:00
|
|
|
in [[git-annex]](1) for details.)
|
2015-03-23 19:36:10 +00:00
|
|
|
|
2015-03-31 17:48:13 +00:00
|
|
|
To make the import process add metadata to the imported files from the feed,
|
|
|
|
`git config annex.genmetadata true`
|
|
|
|
|
2020-06-24 18:31:46 +00:00
|
|
|
By default, the downloaded files are put in a directory with the title
|
|
|
|
of the feed, and files are named based on the title of the item in the
|
|
|
|
feed. This can be changed using the --template option.
|
|
|
|
|
|
|
|
Existing files are not overwritten by this command. If "some feed/foo.mp3"
|
|
|
|
already exists, it will instead write to "some feed/2\_foo.mp3"
|
|
|
|
(or 3, 4, etc). Sometimes a feed will change an item's url,
|
|
|
|
resulting in the new url being downloaded to such a filename.
|
|
|
|
|
2015-03-23 19:36:10 +00:00
|
|
|
# OPTIONS
|
|
|
|
|
|
|
|
* `--force`
|
|
|
|
|
2015-05-30 14:54:14 +00:00
|
|
|
Force downloading items it's seen before.
|
2015-03-23 19:36:10 +00:00
|
|
|
|
2024-02-29 17:26:06 +00:00
|
|
|
* `--fast`, `--relaxed`, `--verifiable`, `--raw`, `--raw-except`
|
2020-06-24 18:31:46 +00:00
|
|
|
|
|
|
|
These options behave the same as when using [[git-annex-addurl]](1).
|
|
|
|
|
|
|
|
* `--fast`
|
|
|
|
|
|
|
|
Avoid immediately downloading urls. The url is still checked
|
|
|
|
(via HEAD) to verify that it exists, and to get its size if possible.
|
|
|
|
|
|
|
|
* `--relaxed`
|
|
|
|
|
|
|
|
Don't immediately download urls, and avoid storing the size of the
|
|
|
|
url's content. This makes git-annex accept whatever content is there
|
|
|
|
at a future point.
|
|
|
|
|
|
|
|
* `--raw`
|
|
|
|
|
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
|
|
|
Prevent special handling of urls by yt-dlp, bittorrent, and other
|
2020-06-24 18:31:46 +00:00
|
|
|
special remotes. This will for example, make importfeed
|
|
|
|
download a .torrent file and not the contents it points to.
|
|
|
|
|
2021-06-27 15:13:38 +00:00
|
|
|
* `--no-raw`
|
|
|
|
|
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
|
|
|
Require content pointed to by the url to be downloaded using yt-dlp
|
2021-06-27 15:13:38 +00:00
|
|
|
or a special remote, rather than the raw content of the url. if that
|
|
|
|
cannot be done, the import will fail, and the next import of the feed
|
|
|
|
will retry.
|
|
|
|
|
2024-01-30 19:37:29 +00:00
|
|
|
* `--scrape`
|
|
|
|
|
|
|
|
Rather than downloading the url and parsing it as a rss/atom feed
|
|
|
|
to find files to import, uses yt-dlp to screen scrape the equivilant
|
|
|
|
of a feed, and imports what it found.
|
|
|
|
|
2015-03-23 19:36:10 +00:00
|
|
|
* `--template`
|
|
|
|
|
|
|
|
Controls where the files are stored.
|
|
|
|
|
|
|
|
The default template is '${feedtitle}/${itemtitle}${extension}'
|
|
|
|
|
2020-06-24 17:23:16 +00:00
|
|
|
The available variables in the template include these that
|
2023-07-06 04:10:19 +00:00
|
|
|
are information about the feed: feedtitle, feedauthor, feedurl
|
2020-06-24 17:23:16 +00:00
|
|
|
|
|
|
|
And these that are information about individual items in the feed:
|
|
|
|
itemtitle, itemauthor, itemsummary, itemdescription, itemrights,
|
|
|
|
itemid.
|
|
|
|
|
|
|
|
Also, title is itemtitle but falls back to feedtitle if the item has no
|
|
|
|
title, and author is itemauthor but falls back to feedauthor.
|
|
|
|
|
|
|
|
(All of the above are also added as metadata when annex.genmetadata is
|
|
|
|
set.)
|
|
|
|
|
2020-06-24 18:24:50 +00:00
|
|
|
The extension variable is the extension of the file in the feed,
|
|
|
|
or sometimes ".m" if no extension can be determined.
|
|
|
|
|
2020-06-24 17:23:16 +00:00
|
|
|
The template also has some variables for when an item was published.
|
|
|
|
|
2020-06-24 18:24:50 +00:00
|
|
|
itempubyear (YYYY), itempubmonth (MM), itempubday (DD), itempubhour (HH),
|
|
|
|
itempubminute (MM), itempubsecond (SS),
|
|
|
|
itempubdate (YYYY-MM-DD or if the feed's date cannot be parsed, the raw
|
2020-06-24 17:23:16 +00:00
|
|
|
value from the feed).
|
|
|
|
|
|
|
|
(These use the UTC time zone, not the local time zone.)
|
2015-03-23 19:36:10 +00:00
|
|
|
|
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.
(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)
addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.
In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 17:12:04 +00:00
|
|
|
* `--no-check-gitignore`
|
|
|
|
|
|
|
|
By default, gitignores are honored and it will refuse to download an
|
|
|
|
url to a file that would be ignored. This makes such files be added
|
|
|
|
despite any ignores.
|
|
|
|
|
2023-05-09 19:49:05 +00:00
|
|
|
* `--jobs=N` `-JN`
|
|
|
|
|
|
|
|
Runs multiple downloads parallel. For example: `-J4`
|
|
|
|
|
|
|
|
Setting this to "cpus" will run one job per CPU core.
|
|
|
|
|
2022-07-05 19:34:49 +00:00
|
|
|
* `--backend`
|
|
|
|
|
|
|
|
Specifies which key-value backend to use.
|
|
|
|
|
2023-05-09 20:43:16 +00:00
|
|
|
* `--json`
|
|
|
|
|
|
|
|
Enable JSON output. This is intended to be parsed by programs that use
|
|
|
|
git-annex. Each line of output is a JSON object.
|
|
|
|
|
|
|
|
* `--json-progress`
|
|
|
|
|
|
|
|
Include progress objects in JSON output.
|
|
|
|
|
|
|
|
* `--json-error-messages`
|
|
|
|
|
|
|
|
Messages that would normally be output to standard error are included in
|
|
|
|
the JSON instead.
|
|
|
|
|
2021-05-10 19:00:13 +00:00
|
|
|
* Also the [[git-annex-common-options]](1) can be used.
|
|
|
|
|
2015-03-23 19:36:10 +00:00
|
|
|
# SEE ALSO
|
|
|
|
|
|
|
|
[[git-annex]](1)
|
|
|
|
|
2015-05-29 16:12:55 +00:00
|
|
|
[[git-annex-addurl]](1)
|
|
|
|
|
2015-03-23 19:36:10 +00:00
|
|
|
# AUTHOR
|
|
|
|
|
|
|
|
Joey Hess <id@joeyh.name>
|
|
|
|
|
|
|
|
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
|