Commit graph

552 commits

Author SHA1 Message Date
Joey Hess
afd57e137b
tweak to fix man page indent 2019-05-04 11:45:55 -04:00
Joey Hess
c0c38e986d
added renameremote command 2019-04-15 13:49:03 -04:00
Joey Hess
e40fcf2530
fix name of --trust-glacier command in man page 2019-04-03 13:00:30 -04:00
Joey Hess
d9ee048d85
doc updates for import 2019-03-09 13:10:30 -04:00
Joey Hess
5bac8babdb
doc updated for import tree
Deprecated git annex export --tracking because it makes sense to have a
single configuration of tracking for both imports and exports.
2019-02-23 15:46:03 -04:00
Ilya_Shlyakhter
b131ba57db clarified annex.maxextensionlength 2019-02-13 17:11:34 +00:00
Joey Hess
11d6e2e260
new improved benchmark command that can benchmark anything git-annex does 2019-01-04 13:46:36 -04:00
Joey Hess
904be4e6be
add --branch option to git-annex find and mildly deprecate findref in favor of it
No deprecation warning at run time, just one on the man page.

One thing findref remains able to do that find cannot is to run in a bare
repo. Find was made to refuse to run in a bare repo because it seemed
confusing for it to not list any files ever in that situation. It would be
better for find --branch to work in a bare repo but not without --branch
but I don't currently have a way to do that.

Probably a better solution would be to make git-annex in a bare repo
default to --branch master or something like that instead of --all.

This commit was sponsored by Denis Dzyubenko on Patreon.
2018-12-09 14:10:37 -04:00
Joey Hess
ab7746a2ae
annex.cachecreds: New config to allow disabling of credentials caching for special remotes.
Note that it does not prevent storing p2p access tokens or multicast
encryption keys, since those are not cached; the previous commit
established the distinction.

How well this works depends on how often getRemoteCredPair is called and
how expensive it is. In some cases setting this will result in an annoying
number of gpg password prompts and/or slowdowns due to reading creds
from the git-annex branch and decrypting, which could be improved by calling
getRemoteCredPair less often.

This commit was sponsored by Ilya Shlyakhter on Patreon.
2018-12-04 14:16:56 -04:00
Joey Hess
7540ca6fca
some more v6 -> v7 doc changes 2018-10-26 13:56:36 -04:00
Joey Hess
3af29b3ba9
When annex.thin is set, allow hard links to be made between executable work tree files and annex objects.
This is safe, because while the annex object ends up executable,
there were already at least two other cases where it ended up executable:

1. git add an an executable file
2. chmod +x of a a non-executable worktree file that was hard linked to the
   annex object

After copy/hard link, it always fixes up the permissions to match the mode
of the worktree file, so when an executable annex object gets hard linked
to a non-executable worktree file, its execute bit gets removed.

Commit b7c8bf5274 already *said* it would do
this; I suspect the line of code I've removed was included in that commit
accidentially.

Also improves annex.thin documentation.

This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
2018-10-26 13:51:43 -04:00
Joey Hess
917a2c6095
defer updating unlocked files until after smudge filter
The smuge filter no longer provides git with annexed file content, to
avoid a git memory leak, and because that did not honor annex.thin.

git annex smudge --update has to be run after a checkout to update
unlocked files in the working tree with annexed file contents.

No hooks yet to run it.

This commit was sponsored by Nick Piper on Patreon.
2018-10-25 15:08:20 -04:00
Joey Hess
6ba3dea566
annex.jobs
Added annex.jobs setting, which is like using the -J option.

Of course, -J overrides annex.jobs.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-10-04 12:47:27 -04:00
Joey Hess
bc31b93c77
remote.name.annex-security-allow-unverified-downloads
Added remote.name.annex-security-allow-unverified-downloads, a per-remote
setting for annex.security.allow-unverified-downloads.

This commit was sponsored by Brock Spratlen on Patreon.
2018-09-25 15:34:47 -04:00
Joey Hess
4ecba916a1
annex.maxextensionlength
Added annex.maxextensionlength for use cases where extensions longer than 4
characters are needed.

This commit was sponsored by Henrik Riomar on Patreon.
2018-09-24 12:10:18 -04:00
Joey Hess
f1f3d15404
response 2018-09-12 15:31:03 -04:00
Joey Hess
ae11394efa
added annex.commitmessage
Added annex.commitmessage config that can specify a commit message for the
git-annex branch instead of the usual "update".

This commit was supported by the NSF-funded DataLad project.
2018-08-02 14:06:06 -04:00
Joey Hess
fd5a392006
cache remotes via annex-speculate-present
Added remote.name.annex-speculate-present config that can be used to
make cache remotes.

Implemented it in Remote.keyPossibilities, which is used by the
get/move/copy/mirror commands, and nothing else. This way, things like
whereis will not show content that's speculatively present.

The assistant and sync --content were not using Remote.keyPossibilities,
and were changed to use it.

The efficiency hit should be small; Remote.keyPossibilities is only
used before transferring a file, which is the expensive operation.
And, it's only doing one lookup of the remoteList and a very cheap
filter over it.

Note that, git-annex still updates the location log when copying content
to a remote with annex-speculate-present set. In this case, the location
tracking will indicate that content is present in the remote. This may
not be wanted for caches, or may not be a real problem for them. TBD.

This commit was supported by the NSF-funded DataLad project.
2018-08-01 14:28:05 -04:00
Joey Hess
b657242f5d
enforce retrievalSecurityPolicy
Leveraged the existing verification code by making it also check the
retrievalSecurityPolicy.

Also, prevented getViaTmp from running the download action at all when the
retrievalSecurityPolicy is going to prevent verifying and so storing it.

Added annex.security.allow-unverified-downloads. A per-remote version
would be nice to have too, but would need more plumbing, so KISS.
(Bill the Cat reference not too over the top I hope. The point is to
make this something the user reads the documentation for before using.)

A few calls to verifyKeyContent and getViaTmp, that don't
involve downloads from remotes, have RetrievalAllKeysSecure hard-coded.
It was also hard-coded for P2P.Annex and Command.RecvKey,
to match the values of the corresponding remotes.

A few things use retrieveKeyFile/retrieveKeyFileCheap without going
through getViaTmp.
* Command.Fsck when downloading content from a remote to verify it.
  That content does not get into the annex, so this is ok.
* Command.AddUrl when using a remote to download an url; this is new
  content being added, so this is ok.

This commit was sponsored by Fernando Jimenez on Patreon.
2018-06-21 13:37:01 -04:00
Joey Hess
e00b3ab3d5
doc typo 2018-06-18 15:57:13 -04:00
Joey Hess
3c0a538335
allow ftp urls by default
They're no worse than http certianly. And, the backport of these
security fixes has to deal with wget, which supports http https and ftp
and has no way to turn off individual schemes, so this will make that
easier.
2018-06-18 15:37:17 -04:00
Joey Hess
e62c4543c3
default to not using youtube-dl, for security
Pity, but same reasoning as curl applies to it.

This commit was sponsored by Peter on Patreon.
2018-06-17 14:51:02 -04:00
Joey Hess
b54b2cdc0e
prevent http connections to localhost and private ips by default
Security fix!

* git-annex will refuse to download content from http servers on
  localhost, or any private IP addresses, to prevent accidental
  exposure of internal data. This can be overridden with the
  annex.security.allowed-http-addresses setting.
* Since curl's interface does not have a way to prevent it from accessing
  localhost or private IP addresses, curl defaults to not being used
  for url downloads, even if annex.web-options enabled it before.
  Only when annex.security.allowed-http-addresses=all will curl be used.

Since S3 and WebDav use the Manager, the same policies apply to them too.

youtube-dl is not handled yet, and a http proxy configuration can bypass
these checks too. Those cases are still TBD.

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2018-06-17 13:30:28 -04:00
Joey Hess
28720c795f
limit url downloads to whitelisted schemes
Security fix! Allowing any schemes, particularly file: and
possibly others like scp: allowed file exfiltration by anyone who had
write access to the git repository, since they could add an annexed file
using such an url, or using an url that redirected to such an url,
and wait for the victim to get it into their repository and send them a copy.

* Added annex.security.allowed-url-schemes setting, which defaults
  to only allowing http and https URLs. Note especially that file:/
  is no longer enabled by default.

* Removed annex.web-download-command, since its interface does not allow
  supporting annex.security.allowed-url-schemes across redirects.
  If you used this setting, you may want to instead use annex.web-options
  to pass options to curl.

With annex.web-download-command removed, nearly all url accesses in
git-annex are made via Utility.Url via http-client or curl. http-client
only supports http and https, so no problem there.
(Disabling one and not the other is not implemented.)

Used curl --proto to limit the allowed url schemes.

Note that this will cause git annex fsck --from web to mark files using
a disallowed url scheme as not being present in the web. That seems
acceptable; fsck --from web also does that when a web server is not available.

youtube-dl already disabled file: itself (probably for similar
reasons). The scheme check was also added to youtube-dl urls for
completeness, although that check won't catch any redirects it might
follow. But youtube-dl goes off and does its own thing with other
protocols anyway, so that's fine.

Special remotes that support other domain-specific url schemes are not
affected by this change. In the bittorrent remote, aria2c can still
download magnet: links. The download of the .torrent file is
otherwise now limited by annex.security.allowed-url-schemes.

This does not address any external special remotes that might download
an url themselves. Current thinking is all external special remotes will
need to be audited for this problem, although many of them will use
http libraries that only support http and not curl's menagarie.

The related problem of accessing private localhost and LAN urls is not
addressed by this commit.

This commit was sponsored by Brett Eisenberg on Patreon.
2018-06-16 11:57:50 -04:00
Joey Hess
c34152777b
Use http-conduit for url downloads by default, annex.web-options enables curl
* For url downloads, git-annex now defaults to using a http library,
  rather than wget or curl. But, if annex.web-options is set, it will
  use curl. To use the .netrc file, run:
    git config annex.web-options --netrc
* git-annex no longer uses wget (and wget is no longer shipped with
  git-annex builds).

Note that curl is always run in silent mode, since the new API for
download has a MeterUpdate and doesn't make way for curl progress
output. It might be worth writing a parser for curl's progress output
to update the meter when using it, but I didn't bother with this edge
case for now.

This commit was supported by the NSF-funded DataLad project.
2018-04-06 17:36:20 -04:00
Joey Hess
2927618d35
Added adb special remote which allows exporting files to Android devices.
git annex testremote passes.

exportree not implemented yet, although the documentation talks about it,
since it will be the main way this remote will be used.

The adb push/pull progress is displayed for now; it would be better
to consume it and use it to update the git-annex progress bar.

This commit was sponsored by andrea rota.
2018-03-27 14:54:41 -04:00
Joey Hess
b3dc4ccff5
add retry configuration (not used yet) 2018-03-24 10:37:25 -04:00
Joey Hess
4015c5679a
force verification when resuming download
When resuming a download and not using a rolling checksummer like rsync,
the partial file we start with might contain garbage, in the case where a
file changed as it was being downloaded. So, disabling verification on
resumes risked a bad object being put into the annex.

Even downloads with rsync are currently affected. It didn't seem worth the
added complexity to special case those to prevent verification, especially
since git-annex is using rsync less often now.

This commit was sponsored by Brock Spratlen on Patreon.
2018-03-13 14:50:49 -04:00
Joey Hess
978078f0fe
fix markdown 2018-03-08 12:54:56 -04:00
Joey Hess
09e73a3ab6
annex.merge-annex-branches
Added annex.merge-annex-branches config setting which can be used to
disable automatic merge of git-annex branches.

I wonder if git-annex merge/sync/assistant should disable this
setting? Not sure yet, so have not done so. May be that users will not set
it in git config, but pass it via -c to commands that need it.

Checking the config setting adds a very small overhead, but it's
only checked once per command so should be insignificant.

This commit was supported by the NSF-funded DataLad project.
2018-02-22 14:25:32 -04:00
Joey Hess
addb91b24b
wording 2018-02-22 13:55:09 -04:00
Joey Hess
a28c541e23
add remote.<name>.annex-checkuuid
Added remote.<name>.annex-checkuuid config, which can be set to false to
disable the default checking of the uuid of remotes that point to
directories. This can be useful to avoid unncessary drive spin-ups and
automounting.

Note that the UUID check is still done before writing to the repository,
to avoid writing to the wrong repository if it got relocated. Check is
also done before checkPresent to avoid getting confused about what is in
which repo. This is effectively the same as the use of git-annex-shell
with a uuid to check that the remote repository is the expected one.
Did not bother with the check for retrieveKeyFile because it doesn't
matter if the wrong repo is used then.

This commit was sponsored by Trenton Cronholm on Patreon.
2018-01-10 14:21:18 -04:00
Joey Hess
67338fd7ac
Added inprogress command for accessing files as they are being downloaded.
Chose to make this only handle files actively being downloaded, not temp
files for downloads that were interrupted or files that have been fully
downloaded.

This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-12-28 11:46:39 -04:00
Joey Hess
31b4d7c6d0
pass git config options to youtube-dl --simulate
Decided not to --ignore-config by default. It the user has something in
their youtube-dl config files that breaks git-annex they can configure
it to use that option.
2017-11-29 20:07:03 -04:00
Joey Hess
d6d8f72957
documentation update for youtube-dl
Code not updated yet.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-11-28 14:05:58 -04:00
Joey Hess
e8eacf96d5
Add day to metadata when annex.genmetadata is enabled.
Thanks, Sean T Parsons
2017-10-25 15:11:38 -04:00
Joey Hess
527f734492
configuration and docs for tracking exports
Not yet handled by sync or assistant.

This commit was sponsored by Nick Daly on Patreon.
2017-09-19 13:05:43 -04:00
Joey Hess
8f35c6584d
documentation for export
This commit was sponsored by Ole-Morten Duesund on Patreon.
2017-08-29 13:25:48 -04:00
Joey Hess
d39c120afa
add annex-ignore-command and annex-sync-command configs
Added remote configuration settings annex-ignore-command and
annex-sync-command, which are dynamic equivilants of the annex-ignore
and annex-sync configurations.

For this I needed a new DynamicConfig infrastructure. Its implementation
should be as fast as before when there is no dynamic config, and it caches
so shell commands are only run once.

Note that annex-ignore-command exits nonzero when the remote should be ignored.
While that may seem backwards, it allows using the same command for it as
for annex-sync-command when you want to disable both.

This commit was sponsored by Trenton Cronholm on Patreon.
2017-08-17 13:54:14 -04:00
Joey Hess
2cecc8d2a3
Added GIT_ANNEX_VECTOR_CLOCK environment variable
Can be used to override the default timestamps used in log files in the
git-annex branch. This is a dangerous environment variable; use with
caution.

Note that this only affects writing to the logs on the git-annex branch.
It is not used for metadata in git commits (other env vars can be set for
that).

There are many other places where timestamps are still used, that don't
get committed to git, but do touch disk. Including regular timestamps
of files, and timestamps embedded in some files in .git/annex/, including
the last fsck timestamp and timestamps in transfer log files.

A good way to find such things in git-annex is to get for getPOSIXTime and
getCurrentTime, although some of the results are of course false positives
that never hit disk (unless git-annex gets swapped out..)

So this commit does NOT necessarily make git-annex comply with some HIPPA
privacy regulations; it's up to the user to determine if they can use it in
a way compliant with such regulations.

Benchmarking: It takes 0.00114 milliseconds to call getEnv
"GIT_ANNEX_VECTOR_CLOCK" when that env var is not set. So, 100 thousand log
files can be written with an added overhead of only 0.114 seconds. That
should be by far swamped by the actual overhead of writing the log files
and making the commit containing them.

This commit was supported by the NSF-funded DataLad project.
2017-08-14 14:19:58 -04:00
Joey Hess
94351daba6
configuration to disable automatic merge conflict resolution
* Added annex.resolvemerge configuration, which can be set to false to
  disable the usual automatic merge conflict resolution done by git-annex
  sync and the assistant.
* sync: Added --no-resolvemerge option.

Note that disabling merge conflict resolution is probably not a good idea
in a direct mode repo or adjusted branch. Since updates to both are done
outside the usual work tree, if it fails the tree is not left in a
conflicted state, and it would be hard to manually resolve the conflict.
Still, made annex.resolvemerge be supported in those cases for consistency.

This commit was sponsored by Riku Voipio.
2017-06-01 12:51:01 -04:00
Joey Hess
4c1e3210fa
annex.backend is the new name for what was annex.backends
It takes a single key-value backend, rather than the unncessary and confusing list.
The old option still works if set.

Simplified some old old code too.

This commit was sponsored by Thomas Hochstein on Patreon.
2017-05-09 15:04:07 -04:00
Joey Hess
dccd7ba6d4
response; update man page 2017-05-09 14:02:48 -04:00
Joey Hess
b6f26bac86
Disable git-annex's support for GIT_SSH and GIT_SSH_COMMAND, unless GIT_ANNEX_USE_GIT_SSH=1 is also set in the environment.
This is necessary because as feared, the extra -n parameter that git-annex
passes breaks uses of these environment variables that expect exactly the
parameters that git passes.

For example, see https://github.com/datalad/datalad/issues/1456

It would of course be possible to pre-close stdin before running ssh so not
needing the -n, and I think that would not even break ssh's password
caching. But it would probably involve a lot of work, possibly would need
to deal with some layering violations, and would be error-prone. The really
clean fix would be to make all the ssh stuff return a CreateProcess, which
could have the handle closed when appropriate, but that would be a large
reworing of the code base.

This commit was supported by the NSF-funded DataLad project.
2017-04-07 11:35:27 -04:00
Joey Hess
29e73f76ef
Added remote.<name>.annex-push and remote.<name>.annex-pull
The former can be useful to make remotes that don't get fully synced with
local changes, which comes up in a lot of situations.

The latter was mostly added for symmetry, but could be useful (though less
likely to be).

Implementing `remote.<name>.annex-pull` was a bit tricky, as there's no one
place where git-annex pulls/fetches from remotes. I audited all
instances of "fetch" and "pull". A few cases were left not checking this
config:

* Git.Repair can try to pull missing refs from a remote, and if the local
  repo is corrupted, that seems a reasonable thing to do even though
  the config would normally prevent it.
* Assistant.WebApp.Gpg and Remote.Gcrypt and Remote.Git do fetches
  as part of the setup process of a remote. The config would probably not
  be set then, and having the setup fail seems worse than honoring it if it
  is already set.

I have not prevented all the code that does a "merge" from merging branches
from remotes with remote.<name>.annex-pull=false. That could perhaps
be done, but it would need a way to map from branch name to remote name,
and the way refspecs work makes that hard to get really correct. So if the
user fetches manually, the git-annex branch will get merged, for example.
Anther way of looking at/justifying this is that the setting is called
"annex-pull", not "annex-merge".

This commit was supported by the NSF-funded DataLad project.
2017-04-05 13:22:35 -04:00
Joey Hess
c3970f6c1a
multicast: New command, uses uftp to multicast annexed files, for eg a classroom setting.
This commit was supported by the NSF-funded DataLad project.
2017-03-30 19:35:30 -04:00
Joey Hess
faecd73f32
Support GIT_SSH and GIT_SSH_COMMAND
They are handled close the same as they are by git. However, unlike git,
git-annex sometimes needs to pass the -n parameter when using these.

So, this has the potential for breaking some setup, and perhaps there ought
to be a ANNEX_USE_GIT_SSH=1 needed to use these. But I'd rather avoid that
if possible, so let's see if anyone complains.

Almost all places where "ssh" was run have been changed to support the env
vars. Anything still calling sshOptions does not support them. In
particular, rsync special remotes don't. Seems that annex-rsync-transport
already gives sufficient control there.

(Fixed in passing: Remote.Helper.Ssh.toRepo used to extract
remoteAnnexSshOptions and pass them to sshOptions, which was redundant
since sshOptions also extracts those.)

This commit was sponsored by Jeff Goeke-Smith on Patreon.
2017-03-17 16:20:37 -04:00
Joey Hess
e53070c1ff
inheritable annex.securehashesonly
* init: When annex.securehashesonly has been set with git-annex config,
  copy that value to the annex.securehashesonly git config.
* config --set: As well as setting value in git-annex branch,
  set local gitconfig. This is needed especially for
  annex.securehashesonly, which is read only from local gitconfig and not
  the git-annex branch.

doc/todo/sha1_collision_embedding_in_git-annex_keys.mdwn has the
rationalle for doing it this way. There's no perfect solution; this
seems to be the least-bad one.

This commit was supported by the NSF-funded DataLad project.
2017-02-27 16:08:23 -04:00
Joey Hess
942e0174b3
make fsck check annex.securehashesonly, and new tip for working around SHA1 collisions with git-annex
This commit was sponsored by andrea rota.
2017-02-27 13:55:15 -04:00
Joey Hess
6346704a04
clarify that annex.backends is used when adding new files
Even if annex.backends does not include a backend, that does not prevent
git-annex commands from acting on a file using the missing backend.

(There's really no reason at all for annex.backends to be a list.)
2017-02-24 11:53:59 -04:00