2016-02-02 20:50:58 +00:00
|
|
|
[[!meta title="annex.largefiles: configuring mixed content repositories"]]
|
|
|
|
|
2019-10-24 17:50:44 +00:00
|
|
|
Normally commands like `git annex add` always add files to the annex,
|
|
|
|
while `git add` adds files to git.
|
2016-02-02 20:50:58 +00:00
|
|
|
|
|
|
|
Let's suppose you're developing a video game, written in C. You have
|
|
|
|
source code, and some large game assets. You want to ensure the source
|
|
|
|
code is stored in git -- that's what git's for! And you want to store
|
|
|
|
the game assets in the git annex -- to avod bloating your git repos with
|
|
|
|
possibly enormous files, but still version control them.
|
|
|
|
|
2019-10-24 17:50:44 +00:00
|
|
|
You could take care to use `git annex add` after changes to the assets,
|
|
|
|
but it would be easy to slip up and `git commit -a` (which runs `git add`),
|
|
|
|
checking your large assets into git. Configuring annex.largefiles
|
|
|
|
saves you the bother of keeping things straight when adding files.
|
|
|
|
Once you've told git-annex what files are large, both `git annex add`
|
|
|
|
and `git add`/`git commit -a` will add the large files to the annex and the
|
|
|
|
small files to git.
|
2016-02-02 20:50:58 +00:00
|
|
|
|
2019-10-24 17:50:44 +00:00
|
|
|
Other commands that use the annex.largefiles configuration include
|
2020-01-13 00:25:27 +00:00
|
|
|
`git annex import`, `git annex addurl`, `git annex importfeed`, and
|
2019-10-24 17:50:44 +00:00
|
|
|
the assistant.
|
2016-02-02 20:53:29 +00:00
|
|
|
|
2016-02-02 20:50:58 +00:00
|
|
|
## examples
|
|
|
|
|
|
|
|
For example, let's make only files larger than 100 kb be added to the annex,
|
2016-02-02 20:53:29 +00:00
|
|
|
and never `*.c` and `*.h` source code files.
|
2016-02-02 20:50:58 +00:00
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
|
|
|
|
|
|
|
|
That is a local configuration, so will only apply to your clone of the
|
|
|
|
repository. To set a default that will apply to all clones, unless
|
|
|
|
overridden, do this instead:
|
|
|
|
|
|
|
|
git annex config --set annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
|
|
|
|
|
|
|
|
There's one other way to configure the same thing, you can put this in
|
|
|
|
the `.gitattributes` file:
|
2016-02-02 20:50:58 +00:00
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
* annex.largefiles=largerthan=100kb
|
2016-02-02 20:50:58 +00:00
|
|
|
*.c annex.largefiles=nothing
|
|
|
|
*.h annex.largefiles=nothing
|
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
The syntax in .gitattributes is a bit different, because the .gitattributes
|
|
|
|
matches files itself, and the values of attributes cannot contain spaces.
|
|
|
|
So using .gitattributes for this is not recommended (but it does work for
|
|
|
|
older versions of git-annex, where the `git annex config` setting does
|
|
|
|
not). Any .gitattributes setting overrides the `git annex config` setting,
|
|
|
|
but will be overridden by the `git config` setting.
|
2019-10-24 17:50:44 +00:00
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
Another example. If you wanted `git add` to put all files the annex
|
|
|
|
in your local repository:
|
2019-10-24 17:50:44 +00:00
|
|
|
|
|
|
|
git config annex.largefiles anything
|
2016-02-02 20:50:58 +00:00
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
Or in all clones:
|
|
|
|
|
|
|
|
git annex config --set annex.largefiles anything
|
|
|
|
|
2016-02-02 20:50:58 +00:00
|
|
|
## syntax
|
|
|
|
|
2019-12-20 19:01:34 +00:00
|
|
|
See [[git-annex-matching-expression]] for details about the syntax.
|
2016-02-03 18:56:34 +00:00
|
|
|
|
2019-12-20 19:01:34 +00:00
|
|
|
## gitattributes format
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
|
|
|
|
Here's that example `.gitattributes` again:
|
2016-02-02 20:50:58 +00:00
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
* annex.largefiles=largerthan=100kb
|
|
|
|
*.c annex.largefiles=nothing
|
|
|
|
*.h annex.largefiles=nothing
|
|
|
|
|
|
|
|
The way that works is, `*.c` and `*.h` files have the annex.largefiles
|
|
|
|
attribute set to "nothing", and so those files are never treated as large
|
|
|
|
files. All other files use the other value, which checks the file size.
|
|
|
|
|
|
|
|
Since git attribute values cannot contain whitespace, when you need
|
|
|
|
a more complicated annex.largefiles expression, you can instead
|
|
|
|
parenthesize the terms of the annex.largefiles attribute.
|
2016-02-02 20:50:58 +00:00
|
|
|
For example, this is the same as the git config shown earlier, shoehorned
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
into a single git attribute:
|
2016-02-02 20:50:58 +00:00
|
|
|
|
|
|
|
* annex.largefiles=(largerthan=100kb)and(not((include=*.c)or(include=*.h)))
|
|
|
|
|
git-annex config annex.largefiles
annex.largefiles can be configured by git-annex config, to more easily set
a default that will also be used by clones, without needing to shoehorn the
expression into the gitattributes file. The git config and gitattributes
override that.
Whenever something is added to git-annex config, we have to consider what
happens if a user puts a purposfully bad value in there. Or, if a new
git-annex adds some new value that an old git-annex can't parse.
In this case, a global annex.largefiles that can't be parsed currently
makes an error be thrown. That might not be ideal, but the gitattribute
behaves the same, and is almost equally repo-global.
Performance notes:
git-annex add and addurl construct a matcher once
and uses it for every file, so the added time penalty for reading the global
config log is minor. If the gitattributes annex.largefiles were deprecated,
git-annex add would get around 2% faster (excluding hashing), because
looking that up for each file is not fast. So this new way of setting
it is progress toward speeding up add.
git-annex smudge does need to load the log every time. As well as checking
the git attribute. Not ideal. Setting annex.gitaddtoannex=false avoids
both overheads.
2019-12-20 16:12:31 +00:00
|
|
|
It's generally a better idea to use `git annex config` instead.
|
|
|
|
|
2016-02-02 20:50:58 +00:00
|
|
|
## temporarily override
|
|
|
|
|
|
|
|
If you've set up an annex.largefiles configuration but want to force a file to
|
|
|
|
be stored in the annex, you can temporarily override the configuration like
|
|
|
|
this:
|
|
|
|
|
2020-01-01 18:03:06 +00:00
|
|
|
git annex add --force-large smallfile
|
2018-08-09 19:05:19 +00:00
|
|
|
|
|
|
|
## converting git to annexed
|
|
|
|
|
|
|
|
When you have a file that is currently stored in git, and you want to
|
|
|
|
convert that to be stored in the annex, here's how to accomplish that:
|
|
|
|
|
|
|
|
git rm --cached file
|
2020-01-01 18:03:06 +00:00
|
|
|
git annex add --force-large file
|
2018-08-09 19:05:19 +00:00
|
|
|
git commit file
|
|
|
|
|
|
|
|
This first removes the file from git's index cache, and then adds it back
|
|
|
|
using git-annex. You can modify the file before the `git-annex add` step,
|
|
|
|
perhaps replacing it with new larger content that necessitates git-annex.
|
|
|
|
|
|
|
|
## converting annexed to git
|
|
|
|
|
|
|
|
When you have a file that is currently stored in the annex, and you want to
|
|
|
|
convert that to be stored in git, here's how to accomplish that:
|
|
|
|
|
|
|
|
git annex unlock file
|
2019-10-08 18:16:39 +00:00
|
|
|
git rm --cached file
|
2020-01-01 18:03:06 +00:00
|
|
|
git annex add --force-small file
|
2019-09-30 17:34:26 +00:00
|
|
|
git commit file
|
2018-08-09 19:05:19 +00:00
|
|
|
|
|
|
|
You can modify the file after unlocking it and before adding it to
|
|
|
|
git. And this is probably a good idea if it was really a big file,
|
|
|
|
so that you can replace its content with something smaller.
|