Commit graph

25 commits

Author SHA1 Message Date
Joey Hess
0f7143d226
support VURL backend
Not yet implemented is recording hashes on download from web and
verifying hashes.

addurl --verifiable option added with -V short option because I
expect a lot of people will want to use this.

It seems likely that --verifiable will become the default eventually,
and possibly rather soon. While old git-annex versions don't support
VURL, that doesn't prevent using them with keys that use VURL. Of
course, they won't verify the content on transfer, and fsck will warn
that it doesn't know about VURL. So there's not much problem with
starting to use VURL even when interoperating with old versions.

Sponsored-by: Joshua Antonishen on Patreon
2024-02-29 13:48:51 -04:00
Joey Hess
6b38d0c427
addurl, importfeed: Added --raw-except option
--raw-except=web allows using yt-dlp but not any other special remotes.

Currently this option can only be used once, trying to use it repeatedly
will make option parsing fail. Perhaps it ought to support being used more
than once, but it seemed like an unlikely use case to need that.

Note that getParsed is called repeatedly when the option is used with
several urls. While implementing DeferredParseClass would avoid that
innefficiency, it didn't seem worth the added boilerplate since
getParsed only calls byNameWithUUID which does minimal work.

Sponsored-by: Dartmouth College's DANDI project
2024-02-05 15:16:25 -04:00
Joey Hess
90db97d9a2
importfeed: Added --scrape option
Which uses yt-dlp to screen scrape the equivilant of an RSS feed.

Note that youtubedlscraped is a speed optimisation. Since yt-dlp found
the urls, we know it can download them. That avoids calling
youtubeDlSupported on each url, which makes --fast a lot faster.

Almost all the same metadata fields and file formatting fields are
populated, when yt-dlp is able to get the data. Note that yt-dlp has some
additional useful metadata that could be exposed. But, much of it is
specific to particular websites, and it would be hard to document on the
git-annex importfeed man page.

Sponsored-by: unqueued on Patreon
2024-01-30 15:37:29 -04:00
Joey Hess
51b24aac91
importfeed: Add feedurl to the metadata
(And allow it to be used in the --template although that seems unlikely to
be very useful.)

My use case for this is that one of the podcast feeds I subscribe to is
sometimes leaking episodes of some other podcast. The other podcast is also
very close to spam, so this may be a form of intentional spamming. I have
not been able to catch the podcast feed containing those episodes, so I
don't know which one is at fault. So putting this in the metadata will let
me eventually catch it.
2023-07-06 00:11:38 -04:00
Joey Hess
f2db6da938
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.

Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.

So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.

git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.

Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.

Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?

Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 13:04:53 -04:00
Joey Hess
7919349cee
importfeed: Support --json and --json-error-messages and --json-progress
Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:51:16 -04:00
Joey Hess
04ee6c4c6b
importfeed: Support -J (and work toward supporting --json)
Both -J and --json needed importfeed to be refactored to use commandAction.

That was difficult, because of the interrelated nature of downloading feeds
and then downloading files from feeds, both of which needed to use
commandAction. And then checking for problems in feeds has to come after
these actions, which may be run as background jobs.

As for --json support, it's most of the way there, but still has some
warts, so I didn't enable jsonOptions yet. The warts include:

- An initial empty json record is displayed by getCache.
- Input is not populated, should be feed url
- feedProblem at end will not be captured by --json-error-messages
  (see FIXME)

Sponsored-By: the NIH-funded NICEMAN (ReproNim TR&D3) project
2023-05-09 16:13:56 -04:00
Joey Hess
149d12f188
support --backend again in addurl and importfeed
Missed these two when converting from a global option.

Sponsored-by: Dartmouth College's Datalad project
2022-07-05 15:35:43 -04:00
Joey Hess
b8e32e200e
addurl, importfeed: Added --no-raw option
Forces eg, download with youtube-dl without falling back to raw download.

Since youtube-dl failing due to an url not being supported is difficult to
distinguish from it failing due to being blocked in some way, this can be
useful to avoid the fallback of git-annex downloading the raw web page and
adding that.

Since --raw also prevents using special remotes, --no-raw also
allows special remote downloads. Although it's always possible that some
special remote may claim an url and fall back to raw download of the
content, which --no-raw cannot prevent.

Sponsored-by: Boyd Stephen Smith Jr. on Patreon
2021-06-27 11:14:51 -04:00
Joey Hess
b184fc490a
split out common options to its own page and mention it on each subcommand page
Sometimes users would get confused because an option they were looking
for was not mentioned on a subcommand's man page, and they had not
noticed that the main git-annex man page had a list of common options.
This change lets each subcommand mention the common options, similarly
to how the matching options are handled.

This commit was sponsored by Svenne Krap on Patreon.
2021-05-10 15:00:13 -04:00
Joey Hess
d0b06c17c0
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.

(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)

addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.

In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 13:19:13 -04:00
Joey Hess
bf2aa50646
document the 2_ files 2020-06-24 14:31:46 -04:00
Joey Hess
4229713e63
importfeed: Added some additional --template variables for date and time
This commit was sponsored by Ethan Aubin.
2020-06-24 14:24:50 -04:00
Joey Hess
57edd81849
improve docs of template variables 2020-06-24 13:30:36 -04:00
Joey Hess
1871295765
rename annex.security.allowed-http-addresses
Renamed annex.security.allowed-http-addresses to
annex.security.allowed-ip-addresses because it is not really specific to
the http protocol, also limiting eg, git-annex's use of ftp and via
youtube-dl, several other protocols.

The old name for the config will still work.

If both old and new name are set, the new name will win.
2019-05-30 12:43:40 -04:00
Joey Hess
e62c4543c3
default to not using youtube-dl, for security
Pity, but same reasoning as curl applies to it.

This commit was sponsored by Peter on Patreon.
2018-06-17 14:51:02 -04:00
Joey Hess
24f27ec39d
convert importfeed to youtube-dl
Fully working, including --fast/--relaxed.

Note that, while git-annex addurl --relaxed is not going to check
youtube-dl, I kept git annex importfeed --relaxed checking it.
Thinking is that, let's not break people's importfeed cron jobs, and
importfeed does not typically have to check a large number of new items,
so it's ok if it's a little bit slower when used with youtube playlist
feeds.

importfeed's behavior is also improved (?) when a feed has links in it
to non-media files. Before, those were skipped. Now, the content of the
link is downloaded. This had to be done, because trying to use
youtube-dl is slow, and if those were skipped, it would have to check
every time importfeed was run. While this behavior change may not be
desirable for some feeds, that intersperse links to web pages with
enclosures, it will be desirable for other feeds, that have
non-enclosure directy links to media files.

Remove old quvi modules.

This commit was sponsored by Øyvind Andersen Holm.
2017-11-29 17:30:02 -04:00
Joey Hess
d6d8f72957
documentation update for youtube-dl
Code not updated yet.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-11-28 14:05:58 -04:00
Øyvind A. Holm
67f7de5986 doc/*.mdwn: Minor fixes (typos, letter case) 2015-07-26 04:21:06 +02:00
Joey Hess
a3c6762636 fix typo 2015-07-21 12:20:48 -04:00
Øyvind A. Holm
3d643a5707 doc/*.mdwn: Various typo fixes 2015-05-30 16:54:14 +02:00
Antoine Beaupré
21ec5872f4 expand manpages cross-references significantly
i found that most man pages only had references to the main git-annex
manpage, which i stillfind pretty huge and hard to navigate through.

i tried to sift through all the man pages and add cross-references
between relevant pages. my general rule of thumb is that links should
be both ways unless one of the pages is a more general page that would
become ridiculously huge if all backlinks would be added
(git-annex-preferred-content comes to mind).

i have also make the links one per line as this is how it was done in
the metadata pages so far.

i did everything but the plumbing, utility and test commands, although
some of those are linked from the other commands so cross-links were
added there as well.
2015-05-29 12:12:55 -04:00
Joey Hess
aaeefaa74f update importfeed man page, including a mention of annex.genmetadata 2015-03-31 13:48:13 -04:00
Joey Hess
9e25cbde20 importfeed: Avoid downloading a redundant item from a feed whose guid has been downloaded before, even when the url has changed.
To support this, always store itemid in metadata; before this was only done
when annex.genmetadata was set.
2015-03-31 13:30:13 -04:00
Joey Hess
daec4b007a splitting up the man page
Common command man pages all split out and often expanded.

A few sections split out into their own pages.

Still need to do all the other commands..
2015-03-23 15:36:10 -04:00