git-annex/doc/todo/git-hook_to_sanity-check_git-annex_branch_pushes.mdwn

IA.BAK and another project both need a way to let untrusted clients push
git-annex branch changes to a central server. It's desired to only
let a client make non-malicious pushes; a malicious client could screw
up a lot of info in the branch.

I propose adding a git-annex command that can be used in a git pre-receive
hook to do this. --[[Joey]]

There are two levels of checking it seems such a command could do:

1. Only allow certain files to be changed. For example, maybe clients are only
   expected to change location tracking files, and the activity.log
   file, but not others like trust.log.

2. Only allow modifications of data about a specific UUID. The UUID
   would be provided to the command (and could be determined based on a
   per-client ssh key or etc).

  The changes to the branch would be checked, so this needs centralized
  knowledge about the format of each file on the branch. I think this
  mostly exists already in Logs.hs.

Of these the second seems more likely to be useful, but the first would
be by far the easier to add. So, do both?

This might be too limiting for some situations:

* If someone has 2 clients, that are talking with one-another,
  then a push would include changes involving the UUIDs of both clients.
  The command could be given multiple UUIDs to allow, to allow
  for these kinds of setups.

* A client might add a special remote somewhere, but this would need
  changes to remote.log, which the first level of checking would not allow.
  And, it would add another UUID, which the second level of checking would
  need to be configured to allow.

Python implementation
---------------------

I started doing an implementation of this in Python here. For technical reasons the git repo is not publicly available, but here's a [dump](http://paste.debian.net/232563/) of the code. I went through what seems to be a rather convoluted process with libgit there because I wanted to have some proper unit tests and generating git commands by hand in a shell script is rather painful.Also, it currently adopts a "blocking" approach, ie. it blocks known problems, but maybe it should be based on an "allow" approach, that is: only allow certain things to go through. So far it only forbids removals and changes to trust.log.  A bunch of stuff is still missing like parameters (to allow changing the list of protected files) and checking the log tracking info. Feedback welcome. --[[anarcat]]
design for a hook 2015-06-02 16:32:31 +00:00			`IA.BAK and another project both need a way to let untrusted clients push`
			`git-annex branch changes to a central server. It's desired to only`
			`let a client make non-malicious pushes; a malicious client could screw`
			`up a lot of info in the branch.`

			`I propose adding a git-annex command that can be used in a git pre-receive`
			`hook to do this. --[[Joey]]`

			`There are two levels of checking it seems such a command could do:`

first python implementation of this 2015-06-16 21:03:48 +00:00			`1. Only allow certain files to be changed. For example, maybe clients are only`
design for a hook 2015-06-02 16:32:31 +00:00			`expected to change location tracking files, and the activity.log`
			`file, but not others like trust.log.`

first python implementation of this 2015-06-16 21:03:48 +00:00			`2. Only allow modifications of data about a specific UUID. The UUID`
design for a hook 2015-06-02 16:32:31 +00:00			`would be provided to the command (and could be determined based on a`
			`per-client ssh key or etc).`

			`The changes to the branch would be checked, so this needs centralized`
			`knowledge about the format of each file on the branch. I think this`
			`mostly exists already in Logs.hs.`

			`Of these the second seems more likely to be useful, but the first would`
			`be by far the easier to add. So, do both?`

			`This might be too limiting for some situations:`

			`* If someone has 2 clients, that are talking with one-another,`
			`then a push would include changes involving the UUIDs of both clients.`
			`The command could be given multiple UUIDs to allow, to allow`
			`for these kinds of setups.`

			`* A client might add a special remote somewhere, but this would need`
			`changes to remote.log, which the first level of checking would not allow.`
			`And, it would add another UUID, which the second level of checking would`
			`need to be configured to allow.`
first python implementation of this 2015-06-16 21:03:48 +00:00
sign and split out 2015-06-16 21:06:14 +00:00			`Python implementation`
			`---------------------`

			I started doing an implementation of this in Python here. For technical reasons the git repo is not publicly available, but here's a [dump](http://paste.debian.net/232563/) of the code. I went through what seems to be a rather convoluted process with libgit there because I wanted to have some proper unit tests and generating git commands by hand in a shell script is rather painful.Also, it currently adopts a "blocking" approach, ie. it blocks known problems, but maybe it should be based on an "allow" approach, that is: only allow certain things to go through. So far it only forbids removals and changes to trust.log. A bunch of stuff is still missing like parameters (to allow changing the list of protected files) and checking the log tracking info. Feedback welcome. --[[anarcat]]