Cryptographically secure hashes can be forced to be used in a repository,
by setting annex.securehashesonly. This does not prevent the git repository
from containing files with insecure hashes, but it does prevent the content
of such files from being pulled into .git/annex/objects from another
repository.
We want to make sure that at no point does git-annex accept content into
.git/annex/objects that is hashed with an insecure key. Here's how it
was done:
* .git/annex/objects/xx/yy/KEY/ is kept frozen, so nothing can be
written to it normally
* So every place that writes content must call, thawContent or modifyContent.
We can audit for these, and be sure we've considered all cases.
* The main functions are moveAnnex, and linkToAnnex; these were made to
check annex.securehashesonly, and are the main security boundary
for annex.securehashesonly.
* Most other calls to modifyContent deal with other files in the KEY
directory (inode cache etc). The other ones that mess with the content
are:
- Annex.Direct.toDirectGen, in which content already in the
annex directory is moved to the direct mode file, so not relevant.
- fix and lock, which don't add new content
- Command.ReKey.linkKey, which manually unlocks it to make a
copy.
* All other calls to thawContent appear safe.
Made moveAnnex return a Bool, so checked all callsites and made them
deal with a failure in appropriate ways.
linkToAnnex simply returns LinkAnnexFailed; all callsites already deal
with it failing in appropriate ways.
This commit was sponsored by Riku Voipio.
Note that GPGHMAC keys are not cryptographically secure, because their
content has no relation to the name of the key. So, things that use this
function to avoid sending keys to a remote will need to special case in
support for those keys. If GPGHMAC keys were accepted as
cryptographically secure, symlinks using them could be committed to a
git repo, and their content would be accepted into the repo, with no
guarantee that two repos got the same content, which is what we're aiming
to prevent.
9c4650358c changed the Read instance for
Key.
I've checked all uses of that instance (by removing it and seeing what
breaks), and they're all limited to the webapp, except one.
That is GitAnnexDistribution's Read instance.
So, 9c4650358c would have broken upgrades
of git-annex from downloads.kitenet.net. Once the .info files there got
updated for a new release, old releases would have failed to parse them
and never upgraded.
To fix this, I found a way to make the .info files that contain
GitAnnexDistribution values be readable by the old version of git-annex.
This commit was sponsored by Ewen McNeill.
Where before the "name" of a key and a backend was a string, this makes
it a concrete data type.
This is groundwork for allowing some varieties of keys to be disabled
in file2key, so git-annex won't use them at all.
Benchmarks ran in my big repo:
old git-annex info:
real 0m3.338s
user 0m3.124s
sys 0m0.244s
new git-annex info:
real 0m3.216s
user 0m3.024s
sys 0m0.220s
new git-annex find:
real 0m7.138s
user 0m6.924s
sys 0m0.252s
old git-annex find:
real 0m7.433s
user 0m7.240s
sys 0m0.232s
Surprising result; I'd have expected it to be slower since it now parses
all the key varieties. But, the parser is very simple and perhaps
sharing KeyVarieties uses less memory or something like that.
This commit was supported by the NSF-funded DataLad project.
I am not happy that I had to put backend-specific code in file2key. But
it would be very difficult to avoid this layering violation.
Most of the time, when parsing a Key from a symlink target, git-annex
never looks up its Backend at all, so adding this check to a method of
the Backend object would not work.
The Key could be made to contain the appropriate
Backend, but since Backend is parameterized on an "a" that is fixed to
the Annex monad later, that would need Key to change to "Key a".
The only way to clean this up that I can see would be to have the Key
contain a LowlevelBackend, and put the validation in LowlevelBackend.
Perhaps later, but that would be an extensive change, so let's not do
it in this commit which may want to cherry-pick to backports.
This commit was sponsored by Ethan Aubin.
* Added post-recieve hook, which makes updateInstead work with direct
mode and adjusted branches.
* init: Set up the post-receive hook.
This commit was sponsored by Fernando Jimenez on Patreon.
Most remotes have an idempotent setup that can be reused for
enableremote, but in a few cases, it needs to tell which, and whether
a UUID was provided to setup was used.
This is groundwork for making initremote be able to provide a UUID.
It should not change any behavior.
Note that it would be nice to make the UUID always be provided to setup,
and make setup not need to generate and return a UUID. What prevented
this simplification is Remote.Git.gitSetup, which needs to reuse the
UUID of the git remote when setting it up, and so has to return that
UUID.
This commit was sponsored by Thom May on Patreon.
.. which can be set to true to make git annex sync default to --content.
This may become the default at some point in the future.
As well as being configuable by git config, it can be configured by
git-annex config to control the default behavior in all clones of a
repository.
Had to add a separate --no-content switch to we can tell if it's been
explicitly set, and should override annex.synccontent. If --content was the
default, this complication would not be necessary.
This commit was sponsored by Jake Vosloo on Patreon.
... to control the default behavior in all clones of a repository.
This includes a new Configurable data type, so the GitConfig type indicates
which values can be configured this way.
The implementation should be quite efficient; the config log is only read
once, and only when a Configurable value has not already been set by
git-config.
Indeed, it would be nice in the future to extend this, so that git-config
is itself only read on demand. Some commands may not need to look at the
git configuration at all.
This commit was sponsored by Trenton Cronholm on Patreon.
Turns out that Data.List.Utils.split is slow and makes a lot of
allocations. Here's a much simpler single character splitter that behaves
the same (even in wacky corner cases) while running in half the time and
75% the allocations.
As well as being an optimisation, this helps move toward eliminating use of
missingh.
(Data.List.Split.splitOn is nearly as slow as Data.List.Utils.split and
allocates even more.)
I have not benchmarked the effect on git-annex, but would not be surprised
to see some parsing of eg, large streams from git commands run twice as
fast, and possibly in less memory.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
Almost working, but there's a bug in the relaying.
Also, made tor hidden service setup pick a random port, to make it harder
to port scan.
This commit was sponsored by Boyd Stephen Smith Jr. on Patreon.
For use with tor hidden services, and perhaps other transports later.
Based on Utility.SimpleProtocol, it's a line-based protocol,
interspersed with transfers of bytestrings of a specified size.
Implementation of the local and remote sides of the protocol is done
using a free monad. This lets monadic code be included here, without
tying it to any particular way to get bytes peer-to-peer.
This adds a dependency on the haskell package "free", although that
was probably pulled in transitively from other dependencies already.
This commit was sponsored by Jeff Goeke-Smith on Patreon.
This gets rid of quite a lot of ugly hacks around json generation.
I doubt that any real-world json parsers can parse incomplete objects, so
while it's not as nice to need to wait for the complete object, especially
for commands like `git annex info` that take a while, it doesn't seem worth
the added complexity.
This also causes the order of fields within the json objects to be
reordered. Since any real json parser shouldn't care, the only possible
problem would be with ad-hoc parsers of the old json output.
Avoid threads emitting json at the same time and scrambling, which was
still possible even with the buffering, just less likely.
Converted json IO actions to JSONChunk data too.
This makes -Jn work with --json and --quiet, where before
setting -Jn disabled those options.
Concurrent json output is currently a mess though since threads output
chunks over top of one-another.
And any other messages that might be output before a command starts.
Fixes a reversion introduced in version 5.20150727.
During the optparse-applicative conversion, I needed a place to run
per-command global option setters, and I made it get run during the seek stage. But
that is too late to have --json and --quiet disable output produced in the
check stage. Fix is just to run those per-command global option setters at
the same time as the all-command global option setters.
This commit was sponsored by Thom May.
Note that get --from foo --failed will get things that a previous get --from bar
tried and failed to get, etc. I considered making --failed only retry
transfers from the same remote, but it was easier, and seems more useful,
to not have the same remote requirement.
Noisy due to some refactoring into Types/
metadata --json output format has changed, adding a inner json object
named "fields" which contains only the fields and their values.
This should be easier to parse than the old format, which mixed up
metadata fields with other keys in the json object.
Any consumers of the old format will need to be updated.
This adds a dependency on unordered-containers for parsing MetaData
from JSON, but it's a free dependency; aeson pulls in that library.
Removed the instance LensGpgEncParams RemoteConfig because it encouraged
code that does not take the RemoteGitConfig into account.
RemoteType's setup was changed to take a RemoteGitConfig,
although the only place that is able to provide a non-empty one is
enableremote, when it's changing an existing remote. This led to several
folow-on changes, and got RemoteGitConfig plumbed through.
This is useful for makking a special remote that anyone with a clone of the
repo and your public keys can upload files to, but only you can decrypt the
files stored in it.
The naming is unofrtunately not consistent, but the gnupg-options
were only used for encrypting, and it's too late to change that.
It would be nice to have a third setting that is always passed to gnupg,
but ~/.gnupg/options can be used to specify such global options when really
needed.
* add, addurl, import, importfeed: When in a v6 repository on a crippled
filesystem, add files unlocked.
* annex.addunlocked: New configuration setting, makes files always be
added unlocked. (v6 only)
Instead -J will behave as if it was built without concurrent-output support
in this situation. Ie, it will be mostly quiet, except when there's an
error.
Note that it's not a problem for a filename to contain invalid utf-8 when
in a utf-8 locale. That is handled ok by concurrent-output. It's only
displaying unicode characters in a non-unicode locale that doesn't work.
This allows things like Command.Find to use noMessages and generate their
own complete json objects. Previouly, Command.Find managed that only via a
hack, which wasn't compatable with batch mode.
Only Command.Find, Command.Smudge, and Commange.Status use noMessages
currently, and none except for Command.Find are impacted by this change.
Fixes find --json --batch output
Decided it's too scary to make v6 unlocked files have 1 copy by default,
but that should be available to those who need it. This is consistent with
git-annex not dropping unused content without --force, etc.
* Added annex.thin setting, which makes unlocked files in v6 repositories
be hard linked to their content, instead of a copy. This saves disk
space but means any modification of an unlocked file will lose the local
(and possibly only) copy of the old version.
* Enable annex.thin by default on upgrade from direct mode to v6, since
direct mode made the same tradeoff.
* fix: Adjusts unlocked files as configured by annex.thin.