2015-08-09 14:12:47 +00:00
|
|
|
# NAME
|
|
|
|
|
|
|
|
git-annex addurl - add urls to annex
|
|
|
|
|
|
|
|
# SYNOPSIS
|
|
|
|
|
|
|
|
git annex addurl `[url ...]`
|
|
|
|
|
|
|
|
# DESCRIPTION
|
|
|
|
|
|
|
|
Downloads each url to its own file, which is added to the annex.
|
|
|
|
|
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
|
|
|
When `yt-dlp` is installed, it can be used to check for a video
|
2018-06-17 18:46:22 +00:00
|
|
|
embedded in a web page at the url, and that is added to the annex instead.
|
|
|
|
(However, this is disabled by default as it can be a security risk.
|
2019-05-30 16:43:40 +00:00
|
|
|
See the documentation of annex.security.allowed-ip-addresses
|
2018-06-17 18:46:22 +00:00
|
|
|
in [[git-annex]](1) for details.)
|
|
|
|
|
2019-11-11 17:15:01 +00:00
|
|
|
Special remotes can add other special handling of particular urls. For
|
|
|
|
example, the bittorrent special remotes makes urls to torrent files
|
|
|
|
(including magnet links) download the content of the torrent,
|
|
|
|
using `aria2c`.
|
2015-08-09 14:12:47 +00:00
|
|
|
|
|
|
|
Normally the filename is based on the full url, so will look like
|
|
|
|
"www.example.com_dir_subdir_bigfile". In some cases, addurl is able to
|
|
|
|
come up with a better filename based on other information. Options can also
|
|
|
|
be used to get better filenames.
|
|
|
|
|
|
|
|
# OPTIONS
|
|
|
|
|
|
|
|
* `--fast`
|
|
|
|
|
2015-08-19 16:24:55 +00:00
|
|
|
Avoid immediately downloading the url. The url is still checked
|
|
|
|
(via HEAD) to verify that it exists, and to get its size if possible.
|
2015-08-09 14:12:47 +00:00
|
|
|
|
|
|
|
* `--relaxed`
|
|
|
|
|
2017-11-28 21:17:40 +00:00
|
|
|
Don't immediately download the url, and avoid storing the size of the
|
|
|
|
url's content. This makes git-annex accept whatever content is there
|
2017-11-30 17:45:43 +00:00
|
|
|
at a future point.
|
2015-08-09 14:12:47 +00:00
|
|
|
|
2017-11-30 17:45:43 +00:00
|
|
|
This is the fastest option, but it still has to access the network
|
|
|
|
to check if the url contains embedded media. When adding large numbers
|
|
|
|
of urls, using `--relaxed --raw` is much faster.
|
2024-02-29 17:26:06 +00:00
|
|
|
|
|
|
|
* `--verifiable` `-V`
|
|
|
|
|
|
|
|
This can be used with the `--fast` or `--relaxed` option. It improves
|
|
|
|
the safety of the resulting annexed file, by letting its content be
|
|
|
|
verified with a checksum when it is transferred between git-annex
|
|
|
|
repositories, as well as by things like `git-annex fsck`.
|
|
|
|
|
|
|
|
When used with --relaxed, content from the web will always be accepted,
|
|
|
|
even if it has changed, and the checksum recorded for later verification.
|
|
|
|
|
|
|
|
When used with --fast, the checksum is recorded the first time the
|
|
|
|
content is downloaded from the web. Once a checksum has been recorded,
|
|
|
|
subsequent downloads from the web must have the same checksum.
|
|
|
|
|
2024-03-01 20:42:02 +00:00
|
|
|
When addurl was used without this option before, the file it added
|
|
|
|
can be converted to be verifiable by migrating it to the VURL backend.
|
|
|
|
For example: `git-annex migrate foo --backend=VURL`
|
2024-02-29 17:26:06 +00:00
|
|
|
|
2015-08-09 14:12:47 +00:00
|
|
|
* `--raw`
|
|
|
|
|
2024-02-05 19:16:25 +00:00
|
|
|
Prevent special handling of urls by yt-dlp, and by bittorrent
|
|
|
|
and other special remotes. This will for example, make addurl
|
2015-08-09 14:12:47 +00:00
|
|
|
download the .torrent file and not the contents it points to.
|
|
|
|
|
2021-06-27 15:13:38 +00:00
|
|
|
* `--no-raw`
|
|
|
|
|
default to yt-dlp and fix progress parsing bugs
I noticed git-annex was using a lot of CPU when downloading from youtube,
and was not displaying progress. Turns out that yt-dlp (and I think also
youtube-dl) sometimes only knows an estimated size, not the actual size,
and displays the progress output slightly differently for that. That broke
the parser. And, the parser was feeding chunks that failed to parse back
as a remainder, which caused it to try to re-parse the entire output each
time, so it got slower and slower.
Using --progress-template like this should avoid parsing problems as well
as future proof against output changes. But it will work with only yt-dlp.
So, this seemed like the right time to deprecate youtube-dl, and default
to yt-dlp when available.
git-annex will still use youtube-dl if that's all that's available.
However, since the progress parser for youtube-dl was buggy, and I don't
want to maintain two different progress parsers (especially since
youtube-dl is no longer in debian unstable having been replaced by
yt-dlp), made git-annex no longer try to parse youtube-dl's progress.
Also, updated docs for yt-dlp being default. It did not seem worth
renaming annex.youtube-dl-options and annex.youtube-dl-command.
Note that yt-dlp does not seem to document the fields available in the
progress template. I found them by reading the source and looking at
the templates it uses internally. Also note that the use of "i" (rather
than "s") in progressTemplate makes it display floats rounded to integers;
particularly the estimated total size can be a float. That also does not
seem to be documented but I assume is a python thing?
Sponsored-by: Joshua Antonishen on Patreon
2023-05-27 16:45:16 +00:00
|
|
|
Require content pointed to by the url to be downloaded using yt-dlp
|
2021-06-27 15:13:38 +00:00
|
|
|
or a special remote, rather than the raw content of the url. if that
|
|
|
|
cannot be done, the add will fail.
|
|
|
|
|
2024-02-05 19:16:25 +00:00
|
|
|
* `--raw-except=remote`
|
|
|
|
|
|
|
|
Prevent special handling of urls by all special remotes except
|
|
|
|
for the specified one. To allow special handling only
|
|
|
|
by yt-dlp, use `--raw-except=web`.
|
|
|
|
|
2015-08-09 14:12:47 +00:00
|
|
|
* `--file=name`
|
|
|
|
|
|
|
|
Use with a filename that does not yet exist to add a new file
|
|
|
|
with the specified name and the content downloaded from the url.
|
|
|
|
|
|
|
|
If the file already exists, addurl will record that it can be downloaded
|
|
|
|
from the specified url(s).
|
|
|
|
|
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
without any sanitization, but with some security checking.
Not yet implemented for remotes other than the web.
* addurl, importfeed: Avoid adding filenames with leading '.', instead
it will be replaced with '_'.
This might be considered a security fix, but a CVE seems unwattanted.
It was possible for addurl to create a dotfile, which could change
behavior of some program. It was also possible for a web server to say
the file name was ".git" or "foo/.git". That would not overrwrite the
.git directory, but would cause addurl to fail; of course git won't
add "foo/.git".
sanitizeFilePath is too opinionated to remain in Utility, so moved it.
The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:
isDrive will never succeed, because "c:" gets munged to "c_"
".." gets sanitized now
".git" gets sanitized now
It will never be null, because sanitizeFilePath keeps the length
the same, and splitDirectories never returns a null path.
Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 20:09:29 +00:00
|
|
|
* `--preserve-filename`
|
|
|
|
|
|
|
|
When the web server (or torrent, etc) provides a filename, use it as-is,
|
|
|
|
avoiding sanitizing unusual characters, or truncating it to length, or any
|
|
|
|
other modifications.
|
|
|
|
|
|
|
|
git-annex will still check the filename for safety, and if the filename
|
2023-04-10 16:13:26 +00:00
|
|
|
has a security problem such as path traversal or a control character,
|
addurl --preserve-filename and a few related changes
* addurl --preserve-filename: New option, uses server-provided filename
without any sanitization, but with some security checking.
Not yet implemented for remotes other than the web.
* addurl, importfeed: Avoid adding filenames with leading '.', instead
it will be replaced with '_'.
This might be considered a security fix, but a CVE seems unwattanted.
It was possible for addurl to create a dotfile, which could change
behavior of some program. It was also possible for a web server to say
the file name was ".git" or "foo/.git". That would not overrwrite the
.git directory, but would cause addurl to fail; of course git won't
add "foo/.git".
sanitizeFilePath is too opinionated to remain in Utility, so moved it.
The changes to mkSafeFilePath are because it used sanitizeFilePath.
In particular:
isDrive will never succeed, because "c:" gets munged to "c_"
".." gets sanitized now
".git" gets sanitized now
It will never be null, because sanitizeFilePath keeps the length
the same, and splitDirectories never returns a null path.
Also, on the off chance a web server suggests a filename of "",
ignore that, rather than trying to save to such a filename, which would
fail in some way.
2020-05-08 20:09:29 +00:00
|
|
|
it will refuse to add it.
|
|
|
|
|
2015-08-09 14:12:47 +00:00
|
|
|
* `--pathdepth=N`
|
|
|
|
|
|
|
|
Rather than basing the filename on the whole url, this causes a path to
|
|
|
|
be constructed, starting at the specified depth within the path of the
|
|
|
|
url.
|
|
|
|
|
|
|
|
For example, adding the url http://www.example.com/dir/subdir/bigfile
|
|
|
|
with `--pathdepth=1` will use "dir/subdir/bigfile",
|
|
|
|
while `--pathdepth=3` will use "bigfile".
|
|
|
|
|
|
|
|
It can also be negative; `--pathdepth=-2` will use the last
|
|
|
|
two parts of the url.
|
|
|
|
|
|
|
|
* `--prefix=foo` `--suffix=bar`
|
|
|
|
|
|
|
|
Use to adjust the filenames that are created by addurl. For example,
|
|
|
|
`--suffix=.mp3` can be used to add an extension to the file.
|
|
|
|
|
Added --no-check-gitignore option for finer grained control than using --force.
add, addurl, importfeed, import: Added --no-check-gitignore option
for finer grained control than using --force.
(--force is used for too many different things, and at least one
of these also uses it for something else. I would like to reduce
--force's footprint until it only forces drops or a few other data
losses. For now, --force still disables checking ignores too.)
addunused: Don't check .gitignores when adding files. This is a behavior
change, but I justify it by analogy with git add of a gitignored file
adding it, asking to add all unused files back should add them all back,
not skip some. The old behavior was surprising.
In Command.Lock and Command.ReKey, CheckGitIgnore False does not change
behavior, it only makes explicit what is done. Since these commands are run
on annexed files, the file is already checked into git, so git add won't
check ignores.
2020-09-18 17:12:04 +00:00
|
|
|
* `--no-check-gitignore`
|
|
|
|
|
|
|
|
By default, gitignores are honored and it will refuse to download an
|
|
|
|
url to a file that would be ignored. This makes such files be added
|
|
|
|
despite any ignores.
|
|
|
|
|
2015-11-05 22:24:15 +00:00
|
|
|
* `--jobs=N` `-JN`
|
|
|
|
|
|
|
|
Enables parallel downloads when multiple urls are being added.
|
|
|
|
For example: `-J4`
|
|
|
|
|
2019-05-10 17:24:31 +00:00
|
|
|
Setting this to "cpus" will run one job per CPU core.
|
|
|
|
|
2015-12-21 16:57:13 +00:00
|
|
|
* `--batch`
|
|
|
|
|
|
|
|
Enables batch mode, in which lines containing urls to add are read from
|
|
|
|
stdin.
|
|
|
|
|
added -z
Added -z option to git-annex commands that use --batch, useful for
supporting filenames containing newlines.
It only controls input to --batch, the output will still be line delimited
unless --json or etc is used to get some other output. While git often
makes -z affect both input and output, I don't like trying them together,
and making it affect output would have been a significant complication,
and also git-annex output is generally not intended to be machine parsed,
unless using --json or a format option.
Commands that take pairs like "file key" still separate them with a space
in --batch mode. All such commands take care to support filenames with
spaces when parsing that, so there was no need to change it, and it would
have needed significant changes to the batch machinery to separate tose
with a null.
To make fromkey and registerurl support -z, I had to give them a --batch
option. The implicit batch mode they enter when not provided with input
parameters does not support -z as that would have complicated option
parsing. Seemed better to move these toward using the same --batch as
everything else, though the implicit batch mode can still be used.
This commit was sponsored by Ole-Morten Duesund on Patreon.
2018-09-20 20:09:21 +00:00
|
|
|
* `-z`
|
|
|
|
|
|
|
|
Makes the `--batch` input be delimited by nulls instead of the usual
|
|
|
|
newlines.
|
|
|
|
|
2015-12-22 16:20:39 +00:00
|
|
|
* `--with-files`
|
|
|
|
|
|
|
|
When batch mode is enabled, makes it parse lines of the form: "$url $file"
|
|
|
|
|
|
|
|
That adds the specified url to the specified file, downloading its
|
|
|
|
content if the file does not yet exist; the same as
|
|
|
|
`git annex addurl $url --file $file`
|
|
|
|
|
2016-01-13 18:25:30 +00:00
|
|
|
* `--json`
|
|
|
|
|
|
|
|
Enable JSON output. This is intended to be parsed by programs that use
|
2023-04-25 21:37:34 +00:00
|
|
|
git-annex. Each line of output is a JSON object.
|
2016-01-13 18:25:30 +00:00
|
|
|
|
2016-09-09 19:06:54 +00:00
|
|
|
* `--json-progress`
|
|
|
|
|
|
|
|
Include progress objects in JSON output.
|
|
|
|
|
2018-02-19 18:28:17 +00:00
|
|
|
* `--json-error-messages`
|
|
|
|
|
2023-04-25 21:37:34 +00:00
|
|
|
Messages that would normally be output to standard error are included in
|
|
|
|
the JSON instead.
|
2018-02-19 18:28:17 +00:00
|
|
|
|
2022-07-05 19:34:49 +00:00
|
|
|
* `--backend`
|
|
|
|
|
|
|
|
Specifies which key-value backend to use.
|
|
|
|
|
2021-05-10 19:00:13 +00:00
|
|
|
* Also the [[git-annex-common-options]](1) can be used.
|
|
|
|
|
2016-01-25 17:41:21 +00:00
|
|
|
# CAVEATS
|
|
|
|
|
|
|
|
If annex.largefiles is configured, and does not match a file, `git annex
|
|
|
|
addurl` will add the non-large file directly to the git repository,
|
|
|
|
instead of to the annex. However, this is not done when --fast or --relaxed
|
|
|
|
is used.
|
|
|
|
|
2015-08-09 14:12:47 +00:00
|
|
|
# SEE ALSO
|
|
|
|
|
|
|
|
[[git-annex]](1)
|
|
|
|
|
|
|
|
[[git-annex-rmurl]](1)
|
|
|
|
|
|
|
|
[[git-annex-registerurl]](1)
|
|
|
|
|
|
|
|
[[git-annex-importfeed]](1)
|
|
|
|
|
|
|
|
# AUTHOR
|
|
|
|
|
|
|
|
Joey Hess <id@joeyh.name>
|
|
|
|
|
|
|
|
Warning: Automatically converted into a man page by mdwn2man. Edit with care.
|