git-annex/doc/backends.mdwn
Joey Hess fc61915230
use GIT keys for export of non-annexed files
This solves the problem that import of such files gets confused and
converts them back to annexed files.

The import code already used GIT keys internally when it determined a
file should not be annexed. So now when it sees a GIT key that export
used, it already does the right thing.

This also means that even older version of git-annex can import and will
do the right thing, once a fixed version has exported. Still, there may
be other complications around upgrades; still need to think it all
through.

Moved gitShaKey and keyGitSha from Key to Annex.Export since they're
only used for export/import.

Documented GIT keys in backends, since they do appear in the git-annex
branch now.

This commit was sponsored by Graham Spencer on Patreon.
2021-03-05 14:12:11 -04:00

114 lines
5.2 KiB
Markdown

When a file is annexed, a [[key|internals/key_format]] is generated from
its content and/or filesystem metadata. The file checked into git symlinks
to the key. This key can later be used to retrieve the file's content (its
value).
Multiple key-value backends are supported, and a single repository
can use different ones for different files.
## configuring which backend to use
The `annex.backend` git-config setting can be used to configure the
default backend to use when adding new files.
For finer control of what backend is used when adding different types of
files, the `.gitattributes` file can be used. The `annex.backend`
attribute can be set to the name of the backend to use for matching files.
For example, to use the SHA256E backend for sound files, which tend to be
smallish and might be modified or copied over time,
while using the WORM backend for everything else, you could set
in `.gitattributes`:
* annex.backend=WORM
*.mp3 annex.backend=SHA256E
*.ogg annex.backend=SHA256E
## recommended backends to use
* `SHA256E` -- The default backend for new files, combines a 256 bit SHA-2
hash of the file's content with the file's extension. This allows
verifying that the file content is right, and can avoid duplicates of
files with the same content. Its need to generate checksums
can make it slower for large files.
* `SHA256` -- SHA-2 hash that does not include the file extension in the
key, which can lead to better deduplication but can confuse some programs.
* `SHA512`, `SHA512E` -- Best SHA-2 hash, for the very paranoid.
* `SHA384`, `SHA384E`, `SHA224`, `SHA224E` -- SHA-2 hashes for
people who like unusual sizes.
* `SHA3_512`, `SHA3_512E`, `SHA3_384`, `SHA3_384E`, `SHA3_256`, `SHA3_256E`, `SHA3_224`, `SHA3_224E`
-- SHA-3 hashes, for bleeding edge fun.
* `SKEIN512`, `SKEIN512E`, `SKEIN256`, `SKEIN256E`
-- [Skein hash](http://en.wikipedia.org/wiki/Skein_hash),
a well-regarded SHA3 hash competition finalist.
* `BLAKE2B160`, `BLAKE2B224`, `BLAKE2B256`, `BLAKE2B384`, `BLAKE2B512`
`BLAKE2B160E`, `BLAKE2B224E`, `BLAKE2B256E`, `BLAKE2B384E`, `BLAKE2B512E`
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for 64 bit
platforms.
* `BLAKE2S160`, `BLAKE2S224`, `BLAKE2S256`
`BLAKE2S160E`, `BLAKE2S224E`, `BLAKE2S256E`
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for 32 bit
platforms.
* `BLAKE2BP512`, `BLAKE2BP512E`
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for
4-way CPUs.
* `BLAKE2SP224`, `BLAKE2SP256`
`BLAKE2SP224E`, `BLAKE2SP256E`
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for
8-way CPUs.
## non-cryptograpgically secure backends
The backends below do not guarantee cryptographically that the
content of an annexed file remains unchanged.
* `SHA1`, `SHA1E`, `MD5`, `MD5E` -- Smaller hashes than `SHA256`
for those who want a checksum but are not concerned about security.
* `WORM` ("Write Once, Read Many") -- This assumes that any file with
the same filename, size, and modification time has the same content.
This is the least expensive backend, recommended for really large
files or slow systems.
* `URL` -- This is a key that is generated from the url to a file.
It's generated when using eg, `git annex addurl --fast`, when the file
content is not available for hashing. The key may not contain the full
URL; for long URLs, part of the URL may be represented by a checksum.
The URL key may contain `&` characters; be sure to quote the key if
passing it to a shell script. The URL-backend key is distinct from URLs/URIs
that may be attached to a key (from any backend) indicating the key's location
on the web or in one of [[special_remotes]].
* `GIT` -- This is used internally by git-annex when exporting trees
containing files stored in git, rather than git-annex. It represents a
git sha. This is never used for git-annex links, but information about
keys of this type is stored in the git-annex branch.
## external backends
While most backends are built into git-annex, it also supports external
backends. These are programs with names like `git-annex-backend-XFOO`,
which can be provided by others. See [[design/external_backend_protocol]]
for details about how to write them.
Here's a list of external backends. Edit this page to add yours to the list.
* [[design/external_backend_protocol/git-annex-backend-XFOO]]
is a demo program implementing the protocol with a shell script.
Like with git-annex's builtin backends, you can add "E" to the end of the
name of an external backend, to get a version that includes the file
extension in the key.
## notes
If you want to be able to prove that you're working with the same file
contents that were checked into a repository earlier, you should avoid
using non-cryptographically-secure backends, and will need to use
signed git commits. See [[tips/using_signed_git_commits]] for details.
Retrieval of WORM and URL from many [[special_remotes]] is prohibited
for [[security_reasons|security/CVE-2018-10857_and_CVE-2018-10859]].
Note that the various 512 and 384 length hashes result in long paths,
which are known to not work on Windows. If interoperability on Windows is a
concern, avoid those.
See also: [[git-annex-examinekey]]