git-annex/doc/tips/largefiles.mdwn
Joey Hess 10138056dc
v6: avoid accidental conversion when annex.largefiles is not configured
v6: When annex.largefiles is not configured for a file, running git add or
git commit, or otherwise using git to stage a file will add it to the annex
if the file was in the annex before, and to git otherwise. This is to avoid
accidental conversion.

Note that git-annex add's behavior has not changed, for reasons explained
in the added comment.

Performance: No added overhead when annex.largefiles is configured.
When not configured, there is an added call to catObjectMetaData,
which involves a round trip through git cat-file --batch.
However, the earlier catKeyFile primes the cache for it.

This commit was supported by the NSF-funded DataLad project.
2018-08-27 14:51:10 -04:00

141 lines
5 KiB
Markdown

[[!meta title="annex.largefiles: configuring mixed content repositories"]]
Normally commands like `git annex add` always add files to the annex.
And when using the v6 repository mode, even `git add` and `git commit -a`
will add files to the annex.
Let's suppose you're developing a video game, written in C. You have
source code, and some large game assets. You want to ensure the source
code is stored in git -- that's what git's for! And you want to store
the game assets in the git annex -- to avod bloating your git repos with
possibly enormous files, but still version control them.
The annex.largefiles configuration is useful for such mixed content
repositories. It's checked by `git annex add`, by `git add` and `git commit -a`
(in v6 repositories), by `git annex import` and the assistant. It's
also used by `git annex addurl` and `git annex importfeed` when downloading
files. When a file does not match annex.largefiles, these commands will
add its content to git instead of to the annex.
This saves you the bother of keeping things straight when adding files.
## examples
For example, let's make only files larger than 100 kb be added to the annex,
and never `*.c` and `*.h` source code files.
Write this to the `.gitattributes` file:
* annex.largefiles=(largerthan=100kb)
*.c annex.largefiles=nothing
*.h annex.largefiles=nothing
Or, set the git configuration instead:
git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
Both of these settings do the same thing. Setting it in the `.gitattributes`
file makes any checkout of the repository share that configuration, so is often
a good choice. Setting the annex.largefiles git configuration lets different
checkouts behave differently. The git configuration overrides the
`.gitattributes` configuration.
## syntax
The value of annex.largefiles is similar to a
[[preferred content expression|git-annex-preferred-content]].
The following terms can be used in annex.largefiles:
* `include=glob` / `exclude=glob`
Specify files to include or exclude.
The glob can contain `*` and `?` to match arbitrary characters.
* `smallerthan=size` / `largerthan=size`
Matches only files smaller than, or larger than the specified size.
The size can be specified with any commonly used units, for example,
"0.5 gb" or "100 KiloBytes"
* `mimetype=glob`
Looks up the MIME type of a file, and checks if the glob matches it.
For example, "mimetype=text/*" will match many varieties of text files,
including "text/plain", but also "text/x-shellscript", "text/x-makefile",
etc.
The MIME types are the same that are displayed by running `file --mime-type`
This is only available to use when git-annex was built with the
MagicMime build flag.
* `anything`
Matches any file.
* `nothing`
Matches no files. (Same as "not anything")
* `not expression`
Inverts what the expression matches.
* `and` / `or` / `( expression )`
These can be used to build up more complicated expressions.
The way the `.gitattributes` example above works is, `*.c` and `*.h` files
have the annex.largefiles attribute set to "nothing",
and so those files are never treated as large files. All other files use
the other value, which checks the file size.
Note that, since git attribute values cannot contain whitespace,
it's useful to instead parenthesize the terms of the annex.largefiles
attribute. This trick allows for more complicated expressions.
For example, this is the same as the git config shown earlier, shoehorned
into a git attribute:
* annex.largefiles=(largerthan=100kb)and(not((include=*.c)or(include=*.h)))
## temporarily override
If you've set up an annex.largefiles configuration but want to force a file to
be stored in the annex, you can temporarily override the configuration like
this:
git annex add -c annex.largefiles=anything smallfile
## converting git to annexed
When you have a file that is currently stored in git, and you want to
convert that to be stored in the annex, here's how to accomplish that:
git rm --cached file
git annex add -c annex.largefiles=anything file
git commit file
This first removes the file from git's index cache, and then adds it back
using git-annex. You can modify the file before the `git-annex add` step,
perhaps replacing it with new larger content that necessitates git-annex.
## converting annexed to git
When you have a file that is currently stored in the annex, and you want to
convert that to be stored in git, here's how to accomplish that:
git annex unlock file
git -c annex.largefiles=nothing add file
git commit -n file
You can modify the file after unlocking it and before adding it to
git. And this is probably a good idea if it was really a big file,
so that you can replace its content with something smaller.
Notice the `-n` switch when committing the file. This bypasses the
[[git-annex-precommit]] hook. In this situation, that hook sees an unlocked
file and wants to add it back to the annex, so you have to bypass the hook
to get the file committed to git.