Merge branch 'git-lfs'

This commit is contained in:
Joey Hess 2019-08-05 13:44:04 -04:00
commit 3e0770e800
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
19 changed files with 1217 additions and 108 deletions

View file

@ -1546,71 +1546,71 @@ Here are all the supported configuration settings.
For example, to use the wipe command, set it to `wipe -f %file`.
* `remote.<name>.rsyncurl`
* `remote.<name>.annex-rsyncurl`
Used by rsync special remotes, this configures
the location of the rsync repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.buprepo`
* `remote.<name>.annex-buprepo`
Used by bup special remotes, this configures
the location of the bup repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.ddarrepo`
* `remote.<name>.annex-ddarrepo`
Used by ddar special remotes, this configures
the location of the ddar repository to use. Normally this is automatically
set up by `git annex initremote`, but you can change it if needed.
* `remote.<name>.directory`
* `remote.<name>.annex-directory`
Used by directory special remotes, this configures
the location of the directory where annexed files are stored for this
remote. Normally this is automatically set up by `git annex initremote`,
but you can change it if needed.
* `remote.<name>.adb`
* `remote.<name>.annex-adb`
Used to identify remotes on Android devices accessed via adb.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.androiddirectory`
* `remote.<name>.annex-androiddirectory`
Used by adb special remotes, this is the directory on the Android
device where files are stored for this remote. Normally this is
automatically set up by `git annex initremote`, but you can change
it if needed.
* `remote.<name>.androidserial`
* `remote.<name>.annex-androidserial`
Used by adb special remotes, this is the serial number of the Android
device used by the remote. Normally this is automatically set up by
`git annex initremote`, but you can change it if needed, eg when
upgrading to a new Android device.
* `remote.<name>.s3`
* `remote.<name>.annex-s3`
Used to identify Amazon S3 special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.glacier`
* `remote.<name>.annex-glacier`
Used to identify Amazon Glacier special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.webdav`
* `remote.<name>.annex-webdav`
Used to identify webdav special remotes.
Normally this is automatically set up by `git annex initremote`.
* `remote.<name>.tahoe`
* `remote.<name>.annex-tahoe`
Used to identify tahoe special remotes.
Points to the configuration directory for tahoe.
* `remote.<name>.gcrypt`
* `remote.<name>.annex-gcrypt`
Used to identify gcrypt special remotes.
Normally this is automatically set up by `git annex initremote`.
@ -1619,7 +1619,14 @@ Here are all the supported configuration settings.
If the gcrypt remote is accessible over ssh and has git-annex-shell
available to manage it, it's set to "shell".
* `remote.<name>.hooktype`, `remote.<name>.externaltype`
* `remote.<name>.annex-git-lfs`
Used to identify git-lfs special remotes.
Normally this is automatically set up by `git annex initremote`.
It is set to "true" if this is a git-lfs remote.
* `remote.<name>.annex-hooktype`, `remote.<name>.annex-externaltype`
Used by hook special remotes and external special remotes to record
the type of the remote.

View file

@ -15,6 +15,7 @@ the git history is not stored in them.
* [[ddar]]
* [[directory]]
* [[gcrypt]] (encrypted git repositories!)
* [[git-lfs]]
* [[hook]]
* [[rclone]]
* [[rsync]]

View file

@ -4,6 +4,12 @@ remote allows git-annex to also store its files in such repositories.
Naturally, git-annex encrypts the files it stores too, so everything
stored on the remote is encrypted.
This special remote needs the server hosting the remote repository
to either have git-annex-shell or rsync accessible via ssh. git-annex
uses those to store its content in the remote. If the remote repository
is instead hosted on a server using git-lfs, you can use the [[git-lfs]]
special remote instead of this one; it also supports using gcrypt.
See [[tips/fully_encrypted_git_repositories_with_gcrypt]] for some examples
of using gcrypt.
@ -35,11 +41,12 @@ shell access, and `rsync` must be installed. Those are the minimum
requirements, but it's also recommended to install git-annex on the remote
server, so that [[git-annex-shell]] can be used.
While you can use git-remote-gcrypt with servers like github, git-annex
can't store files on them. In such a case, you can just use
git-remote-gcrypt directly.
If you can't run `rsync` or `git-annex-shell` on the remote server,
you can't use this special remote. Other options are the [[git-lfs]]
special remote, which can also be combined with gcrypt, or
using git-remote-gcrypt to encrypt a remote that git-annex cannot use.
If you use encryption=hybrid, you can add more gpg keys that can access
If you use encryption=hybrid, you can later add more gpg keys that can access
the files git-annex stored in the gcrypt repository. However, due to the
way git-remote-gcrypt encrypts the git repository, you will need to somehow
force it to re-push everything again, so that the encrypted repository can

View file

@ -0,0 +1,101 @@
git-annex has a special remote that lets it store content in git-lfs
repositories.
See [[tips/storing_data_in_git-lfs]] for some examples of how to use this.
## configuration
These parameters can be passed to `git annex initremote` to configure
the git-lfs special remote:
* `url` - Required. The url to the git-lfs repository to use.
Can be either a ssh url (scp-style is also accepted) or a http url.
But currently, a http url accesses the git-lfs repository without
authentication. To authenticate, you will need to use a ssh url.
This parameter needs to be specified in the initial `git annex
initremote` but also each time you `git annex enableremote`
an existing git-lfs special remote. It's fine to use different urls
at different times as long as they point to the same git-lfs repository.
* `encryption` - One of "none", "hybrid", "shared", or "pubkey".
Required. See [[encryption]]. Also see the encryption notes below.
* `keyid` - Specifies the gpg key to use for encryption of both the files
git-annex stores in the repository, as well as to encrypt the git
repository itself when using gcrypt. May be repeated when
multiple participants should have access to the repository.
## efficiency note
Since git-lfs uses SHA256 checksums, git-annex needs to keep track of the
SHA256 of content stored in it, in order to be able to retrieve that
content. When a git-annex key uses a [[backend|backends]]
of SHA256 or SHA256E, that's easy. But, if a git-annex key uses some
other backend, git-annex has to additionally store the SHA256 checksum
into the git-annex branch when storing content in git-lfs. That adds a
small bit of size overhead to using this remote.
When encrypting data sent to the git-lfs remote, git-annex always has to
store its SHA256 checksum in the git-annex branch.
## encryption notes
To encrypt a git-lfs repository, there are two separate things that
have to be encrypted: the data git-annex stores there, and the content
of the git repository itself. After all, a git-lfs remote is a git remote
and git push doesn't encrypt data by default.
To encrypt your git pushes, you can use
[git-remote-gcrypt](https://spwhitton.name/tech/code/git-remote-gcrypt/)
and prefix the repository url with "gcrypt::"
To make git-annex encrypt the data it stores, you can use the encrption=
configuration.
An example of combining the two:
git annex initremote lfstest type=git-lfs url=gcrypt::git@github.com:username/somerepo.git encryption=shared
In that example, the git-annex shared encryption key is stored in
git, but that's ok because git push will encrypt it, along with all
the other git data, using your gpg key. You could instead use
"encryption=shared keyid=" to make git-annex and gcrypt both encrypt
to a specified gpg key.
git-annex will detect if one part of the repository is encrypted,
but you forgot to encrypt the other part, and will refuse to set up
such an insecure half-encrypted repository.
If you use encryption=hybrid, you can later add more gpg keys that can access
the files git-annex stored in the git-lfs repository. However, due to the
way git-remote-gcrypt encrypts the git repository, you will need to somehow
force it to re-push everything again, so that the encrypted repository can
be decrypted by the added keys. Probably this can be done by setting
`GCRYPT_FULL_REPACK` and doing a forced push of branches.
git-annex will set `remote.<name>`gcrypt-publish-participants` when setting
up a repository that uses gcrypt. This is done to avoid unncessary gpg
passphrase prompts, but it does publish the gpg keyids that can decrypt the
repository. Unset it if you need to obscure that.
## limitations
The git-lfs protocol does not support deleting content, so git-annex
**cannot delete anything** from a git-lfs special remote.
The git-lfs protocol does not support resuming uploads, and so an
interrupted upload will have to restart from the beginning. Interrupted
downloads will resume.
git-lfs has a concept of git ref based access control, so a user may only
be able to send content associated with a particular git ref. git-annex
does not currently provide any git ref, so won't work with a git-lfs server
that uses that.
git-annex only supports the "basic" git-lfs transfer adapter, but that's
the one used by most git-lfs servers.
The git-lfs protocol is designed around batching of transfers, but
git-annex doesn't do batching. This may cause it to fall afoul of
rate limiting of git-lfs servers when transferring a lot of files.

View file

@ -59,4 +59,9 @@ Walltime,
Caleb Allen,
TD,
Pedro Araújo,
Ryan Newton,
David W,
L N D,
EVAN HENSHAWPLATH,
James Read,
Luke Shumaker,

View file

@ -1,8 +1,7 @@
[git-remote-gcrypt](https://spwhitton.name/tech/code/git-remote-gcrypt/)
adds support for encrypted remotes to git. The git-annex
[[gcrypt special remote|special_remotes/gcrypt]] allows git-annex to
also store its files in such repositories. Naturally, git-annex encrypts
the files it stores too, so everything stored on the remote is encrypted.
adds support for encrypted remotes to git. Combine this with git-annex
encrypting the files it stores in a remote, and you can fully encrypt
all the data stored on a remote.
Here are some ways you can use this awesome stuff..
@ -15,7 +14,12 @@ repositories.
## prerequisites
* Install [git-remote-gcrypt](https://spwhitton.name/tech/code/git-remote-gcrypt/)
* Install git-annex version 4.20130909 or newer.
* Set up a gpg key. You might consider generating a special purpose key
just for this use case, since you may end up wanting to put the key
on multiple machines that you would not trust with your main gpg key.
The examples below use "$mykey" where you should put your gpg keyid.
## encrypted backup drive
@ -24,18 +28,18 @@ both the full contents of your git repository, and all the files you
instruct git-annex to store on it, and everything will be encrypted so that
only you can see it.
First, you need to set up a gpg key. You might consider generating a
special purpose key just for this use case, since you may end up wanting to
put the key on multiple machines that you would not trust with your
main gpg key.
You need to tell git-annex the keyid of the key when setting up the
encrypted repository:
Here's how to set up the encrypted repository:
git init --bare /mnt/encryptedbackup
git annex initremote encryptedbackup type=gcrypt gitrepo=/mnt/encryptedbackup keyid=$mykey
git annex sync encryptedbackup
(Remember to replace "$mykey" with the keyid of your gpg key.)
This uses the [[gcrypt special remote|special_remotes/gcrypt]] to encrypt
pushes to the git remote, and git-annex will also encrypt the files it
stores there.
Now you can copy (or even move) files to the repository. After
sending files to it, you'll probably want to do a sync, which pushes
the git repository changes to it as well.
@ -62,23 +66,25 @@ the gpg key used to encrypt it, and then:
## encrypted git-annex repository on a ssh server
If you have a ssh server that has rsync installed, you can set up an
encrypted repository there. Works just like the encrypted drive except
without the cable.
If you have a ssh server that has git-annex or rsync installed on it, you
can set up an encrypted repository there. Works just like the encrypted
drive except without the cable.
First, on the server, run:
git init --bare encryptedrepo
(Also, install git-annex on the server if it's possible & easy to do so.
While this will work without git-annex being installed on the server, it
is recommended to have it installed.)
Now, in your existing git-annex repository, set up the encrypted remote:
git annex initremote encryptedrepo type=gcrypt gitrepo=ssh://my.server/home/me/encryptedrepo keyid=$mykey
git annex sync encryptedrepo
(Remember to replace "$mykey" with the keyid of your gpg key.)
This uses the [[gcrypt special remote|special_remotes/gcrypt]] to encrypt
pushes to the git remote, and git-annex will also encrypt the files it
stores there.
If you're going to be sharing this repository with others, be sure to also
include their keyids, by specifying keyid= repeatedly.
@ -97,11 +103,31 @@ used to encrypt it can check it out:
git annex enableremote encryptedrepo gitrepo=ssh://my.server/home/me/encryptedrepo
git annex get --from encryptedrepo
## private encrypted git remote on hosting site
## private encrypted git remote on a git-lfs hosting site
Some git repository hosting sites do not support git-annex, but do support
the similar git-lfs for storing large files alongside a git repository.
git-annex can use the git-lfs protocol to store files in such repositories,
and with gcrypt, everything stored in the remote can be encrypted.
First, make a new, empty git repository on the hosting site.
Get the ssh clone url for the repository, which might look
like "git@github.com:username/somerepo.git"
Then, in your git-annex repository, set up the encrypted remote:
git annex initremote lfstest type=git-lfs url=gcrypt::git@github.com:username/somerepo.git keyid=$mykey
(Remember to replace "$mykey" with the keyid of your gpg key.)
This uses the [[git-lfs special remote|special_remotes/git-lfs]], and the
`gcrypt::` prefix on the url makes pushes be encrypted with gcrypt.
## private encrypted git remote on a git hosting site
You can use gcrypt to store your git repository in encrypted form on any
hosting site that supports git. Only you can decrypt its contents.
Using it this way, git-annex does not store large files on the hosting site; it's
hosting site that supports git. Only you can decrypt its contents. Using it
this way, git-annex does not store large files on the hosting site; it's
only used to store your git repository itself.
git remote add encrypted gcrypt::ssh://hostingsite/myrepo.git
@ -115,7 +141,7 @@ url you used when setting it up:
git clone gcrypt::ssh://hostingsite/myrepo.git
## multiuser encrypted git remote on hosting site
## multiuser encrypted git remote on a git hosting site
Suppose two users want to share an encrypted git remote. Both of you
need to set up the remote, and configure gcrypt to encrypt it so that both

View file

@ -0,0 +1,34 @@
git-annex can store data in [git-lfs](https://git-lfs.github.com/)
repositories, using the [[git-lfs special remote|special_remotes/git-lfs]].
You do not need the git-lfs program installed to use it, just a recent
enough version of git-annex.
Here's how to initialize a git-lfs special remote on Github.
git annex initremote lfs type=git-lfs encryption=none url=git@github.com:yourname/yourrepo.git
In this example, the remote will not be encrypted, so anyone who can access
it can see its contents. It is possible to encrypt everything stored in a
git-lfs remote, see [[fully_encrypted_git_repositories_with_gcrypt]].
Once the git-lfs remote is set up, git-annex can store and retrieve
content in the usual ways:
git annex copy * --to lfs
git annex get --from lfs
But, git-annex **cannot delete anything** from a git-lfs special remote,
because the protocol does not support deletion.
A git-lfs special remote also functions as a regular git remote. You can
use things like `git push` and `git pull` with it.
To enable an existing git-lgs remote in another clone of the repository,
you'll need to provide an url to it again. It's ok to provide a different
url as long as it points to the same git-lfs repository.
git annex enableremote lfs url=https://github.com/yourname/yourrepo.git
Note that http urls currently only allow read access to the git-lfs
repository.