0f7143d226
Not yet implemented is recording hashes on download from web and verifying hashes. addurl --verifiable option added with -V short option because I expect a lot of people will want to use this. It seems likely that --verifiable will become the default eventually, and possibly rather soon. While old git-annex versions don't support VURL, that doesn't prevent using them with keys that use VURL. Of course, they won't verify the content on transfer, and fsck will warn that it doesn't know about VURL. So there's not much problem with starting to use VURL even when interoperating with old versions. Sponsored-by: Joshua Antonishen on Patreon
115 lines
5.3 KiB
Markdown
115 lines
5.3 KiB
Markdown
The "backend" in git-annex specifies how a key is generated from a file's
|
|
content and/or filesystem metadata. Most backends are different kinds of
|
|
hashes. A single repository can use different backends for different files.
|
|
The [[key|internals/key_format]] includes the backend that is used for that
|
|
key.
|
|
|
|
## configuring which backend to use
|
|
|
|
The `annex.backend` git-config setting can be used to configure the
|
|
default backend to use when adding new files.
|
|
|
|
For finer control of what backend is used when adding different types of
|
|
files, the `.gitattributes` file can be used. The `annex.backend`
|
|
attribute can be set to the name of the backend to use for matching files.
|
|
|
|
For example, to use the SHA256E backend for sound files, which tend to be
|
|
smallish and might be modified or copied over time,
|
|
while using the WORM backend for everything else, you could set
|
|
in `.gitattributes`:
|
|
|
|
* annex.backend=WORM
|
|
*.mp3 annex.backend=SHA256E
|
|
*.ogg annex.backend=SHA256E
|
|
|
|
## recommended backends to use
|
|
|
|
* `SHA256E` -- The default backend for new files, combines a 256 bit SHA-2
|
|
hash of the file's content with the file's extension. This allows
|
|
verifying that the file content is right, and can avoid duplicates of
|
|
files with the same content. Its need to generate checksums
|
|
can make it slower for large files.
|
|
* `SHA256` -- SHA-2 hash that does not include the file extension in the
|
|
key, which can lead to better deduplication but can confuse some programs.
|
|
* `SHA512`, `SHA512E` -- Best SHA-2 hash, for the very paranoid.
|
|
* `SHA384`, `SHA384E`, `SHA224`, `SHA224E` -- SHA-2 hashes for
|
|
people who like unusual sizes.
|
|
* `SHA3_512`, `SHA3_512E`, `SHA3_384`, `SHA3_384E`, `SHA3_256`, `SHA3_256E`, `SHA3_224`, `SHA3_224E`
|
|
-- SHA-3 hashes, for bleeding edge fun.
|
|
* `SKEIN512`, `SKEIN512E`, `SKEIN256`, `SKEIN256E`
|
|
-- [Skein hash](http://en.wikipedia.org/wiki/Skein_hash),
|
|
a well-regarded SHA3 hash competition finalist.
|
|
* `BLAKE2B160`, `BLAKE2B224`, `BLAKE2B256`, `BLAKE2B384`, `BLAKE2B512`
|
|
`BLAKE2B160E`, `BLAKE2B224E`, `BLAKE2B256E`, `BLAKE2B384E`, `BLAKE2B512E`
|
|
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for 64 bit
|
|
platforms.
|
|
* `BLAKE2S160`, `BLAKE2S224`, `BLAKE2S256`
|
|
`BLAKE2S160E`, `BLAKE2S224E`, `BLAKE2S256E`
|
|
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for 32 bit
|
|
platforms.
|
|
* `BLAKE2BP512`, `BLAKE2BP512E`
|
|
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for
|
|
4-way CPUs.
|
|
* `BLAKE2SP224`, `BLAKE2SP256`
|
|
`BLAKE2SP224E`, `BLAKE2SP256E`
|
|
-- Fast [Blake2 hash](https://blake2.net/) variants optimised for
|
|
8-way CPUs.
|
|
`VURL` -- This is like an `URL` (see below) but the content can
|
|
be verified with a cryptographically secure checksum that is
|
|
recorded in the git-annex branch. It's generated when using
|
|
eg `git-annex addurl --fast --verifiable`.
|
|
|
|
## non-cryptograpgically secure backends
|
|
|
|
The backends below do not guarantee cryptographically that the
|
|
content of an annexed file remains unchanged.
|
|
|
|
* `SHA1`, `SHA1E`, `MD5`, `MD5E` -- Smaller hashes than `SHA256`
|
|
for those who want a checksum but are not concerned about security.
|
|
* `WORM` ("Write Once, Read Many") -- This assumes that any file with
|
|
the same filename, size, and modification time has the same content.
|
|
This is the least expensive backend, recommended for really large
|
|
files or slow systems.
|
|
* `URL` -- This is a key that is generated from the url to a file.
|
|
It's generated when using eg, `git annex addurl --fast`, when the file
|
|
content is not available for hashing.
|
|
The key may not contain the full URL; for long URLs, part of the URL may be
|
|
represented by a checksum.
|
|
The URL key may contain `&` characters; be sure to quote the key if
|
|
passing it to a shell script. These types of keys are distinct from URLs/URIs
|
|
that may be attached to a key (using any backend) indicating the key's location
|
|
on the web or in one of [[special_remotes]].
|
|
* `GIT` -- This is used internally by git-annex when exporting trees
|
|
containing files stored in git, rather than git-annex. It represents a
|
|
git sha. This is never used for git-annex links, but information about
|
|
keys of this type is stored in the git-annex branch.
|
|
|
|
## external backends
|
|
|
|
While most backends are built into git-annex, it also supports external
|
|
backends. These are programs with names like `git-annex-backend-XFOO`,
|
|
which can be provided by others. See [[design/external_backend_protocol]]
|
|
for details about how to write them.
|
|
|
|
Here's a list of external backends. Edit this page to add yours to the list.
|
|
|
|
* [[design/external_backend_protocol/git-annex-backend-XFOO]]
|
|
is a demo program implementing the protocol with a shell script.
|
|
|
|
Like with git-annex's builtin backends, you can add "E" to the end of the
|
|
name of an external backend, to get a version that includes the file
|
|
extension in the key.
|
|
|
|
## notes
|
|
|
|
If you want to be able to prove that you're working with the same file
|
|
contents that were checked into a repository earlier, you should avoid
|
|
using non-cryptographically-secure backends, and will need to use
|
|
signed git commits. See [[tips/using_signed_git_commits]] for details.
|
|
|
|
Retrieval of WORM and URL from many [[special_remotes]] is prohibited
|
|
for [[security_reasons|security/CVE-2018-10857_and_CVE-2018-10859]].
|
|
|
|
Note that the various 512 and 384 length hashes result in long paths,
|
|
which are known to not work on Windows. If interoperability on Windows is a
|
|
concern, avoid those.
|