documentation for making git add only annex when configured by annex.largefiles

Code change should be trvial, but not yet implemented. This
significantly complicated the task of documenting how git-annex works.

I'm not sure how useful the annex.gitaddtoannex confguration is after
this change; seems that if a user has an annex.largefiles they will want
it applied consistently. But the last thing I want to hear is more
complaining from users about git add doing something they don't want it
to.

There's a pretty high risk users who got used to the git add behavior
and don't have annex.largefiles configured will miss the NEWS and
complain bitterly about their suddenly bloated repositories. Oh well.

Removed outdated comments about the old behavior to avoid confusion.
I don't know if I've found all the places that griping spread to.
This commit is contained in:
Joey Hess 2019-10-24 13:50:44 -04:00
parent 64d4a35523
commit 31a5b58b2c
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
14 changed files with 111 additions and 138 deletions

View file

@ -1,5 +1,14 @@
git-annex (7.20191018) UNRELEASED; urgency=medium
* Changed git add/git commit -a default behavior back to what it was
before v7; they add file contents to git, not to the annex.
(However, if a file was annexed before, they will still add it to
the annex, to avoid footgun.)
Configuring annex.largefiles overrides this; once git-annex has
been told which files are large git add/git commit -a will honor that.
* Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from adding files to the annex even when annex.largefiles
is configured. (Unless the file was annexed before.)
* init: Fix a failure when used in a submodule on a crippled filesystem.
* enable-tor: Deal with pkexec changing to root's home directory
when running a command.
@ -8,9 +17,6 @@ git-annex (7.20191018) UNRELEASED; urgency=medium
* Made git add smarter about renamed annexed files. It can tell when an
annexed file was renamed, and will add it to the annex, and not to git,
unless annex.largefiles tells it to do otherwise.
* Added annex.gitaddtoannex configuration. Setting it to false prevents
git add from usually adding files to the annex.
(Unless the file was annexed before, or a renamed annexed file is detected.)
-- Joey Hess <id@joeyh.name> Mon, 21 Oct 2019 11:01:06 -0400

9
NEWS
View file

@ -1,3 +1,12 @@
git-annex (7.20191024) UNRELEASED; urgency=medium
When annex.largefiles is not configured, `git add` and `git commit -a`
add files to git, not to the annex. If you have gotten used to `git add`
adding all files to the annex, you can get that behavior back by running:
git config annex.largefiles anything
-- Joey Hess <id@joeyh.name> Thu, 24 Oct 2019 13:46:52 -0400
git-annex (7.20190912) upstream; urgency=medium
This version of git-annex uses repository version 7 for all repositories.

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="check the 'git add' changes before upgrading"
date="2019-10-22T00:27:50Z"
content="""
I haven't had a data loss, but I wish I had waited to upgrade. There are [[some issues|forum/lets_discuss_git_add_behavior]] with v7 semantics. If you prefer v5 semantics, you can keep using an older git-annex version with your repos. However, once you upgrade the repos (which recent git-annex versions do automatically without asking), there's no going back.
"""]]

View file

@ -1,11 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-10-22T03:25:10Z"
content="""
I understand that you may have a strong dislike of that, Ilya, but I
think it's unwarranted and unhelpful to drag discussion of it into
a question like this.
(I'd say more, but this is not the place.)
"""]]

View file

@ -17,6 +17,13 @@ of being symlinks, and lets `git add` store files in the annex.
When adding a file with `git add`, the annex.largefiles config is
consulted to decide if a given file should be added to git as-is,
or if its content are large enough to need to use git-annex.
The annex.gitaddtoannex setting overrides that; setting it to false
prevents `git add` from adding files to the annex.
However, if git-annex can tell that a file was annexed before,
it will still be added to the annex even when those configs would normally
prevent it. Two examples of this are adding a modified version of an
annexed file, and moving an annexed file to a new filename and adding that.
The git configuration to use this command as a filter driver is as follows.
This is normally set up for you by git-annex init, so you should

View file

@ -891,25 +891,29 @@ Like other git commands, git-annex is configured via `.git/config`.
* `annex.largefiles`
Used to configure which files are large enough to be added to the annex.
Default: All files.
It is an expression that matches the large files, eg
"*.mp3 or largerthan(500kb)"
Overrides any annex.largefiles attributes in `.gitattributes` files.
See <https://git-annex.branchable.com/tips/largefiles> for details.
This configures the behavior of both git-annex and git when adding
files to the repository. By default, `git-annex add` adds all files
to the annex, and `git add` adds files to git (unless they were added
to the annex previously). When annex.largefiles is configured, both
`git annex add` and `git add` will add matching large files to the
annex, and the other files to git.
Other git-annex commands also honor annex.largefiles, including
`git annex import`, `git annex addurl`, `git annex importfeed`
and the assistant.
See <https://git-annex.branchable.com/tips/largefiles> for syntax
documentation and more.
* `annex.gitaddtoannex`
This controls the behavior of `git add`. If you want `git add` to
add files to the annex (either all files, or the files matched
by your annex.largefiles configuration), set it to true.
To make `git add` add files to git but not to the annex, set it to false.
Note that `git add` will still add files to the annex in a couple of
situations. When an annexed file has been modified, it makes sense to add
the new version to the annex too. When an annexed file has been renamed
to a new name, it should remain annexed.
The default is currently true.
Setting this to false will prevent `git add` from honoring the
annex.largefiles configuration.
* `annex.addsmallfiles`
@ -1678,9 +1682,10 @@ but the SHA256E backend for ogg files:
* annex.backend=WORM
*.ogg annex.backend=SHA256E
There is a annex.largefiles attribute; which is used to configure which
There is a annex.largefiles attribute, which is used to configure which
files are large enough to be added to the annex.
See <https://git-annex.branchable.com/tips/largefiles> for details.
See the documentation above of the annex.largefiles git config
and <https://git-annex.branchable.com/tips/largefiles> for details.
The numcopies setting can also be configured on a per-file-type basis via
the `annex.numcopies` attribute in `.gitattributes` files. This overrides

View file

@ -1,8 +1,7 @@
[[!meta title="annex.largefiles: configuring mixed content repositories"]]
Normally commands like `git annex add` always add files to the annex.
And when using the v7 repository mode, even `git add` and `git commit -a`
will add files to the annex.
Normally commands like `git annex add` always add files to the annex,
while `git add` adds files to git.
Let's suppose you're developing a video game, written in C. You have
source code, and some large game assets. You want to ensure the source
@ -10,14 +9,17 @@ code is stored in git -- that's what git's for! And you want to store
the game assets in the git annex -- to avod bloating your git repos with
possibly enormous files, but still version control them.
The annex.largefiles configuration is useful for such mixed content
repositories. It's checked by `git annex add`, by `git add` and `git commit -a`
(in v7 repositories), by `git annex import` and the assistant. It's
also used by `git annex addurl` and `git annex importfeed` when downloading
files. When a file does not match annex.largefiles, these commands will
add its content to git instead of to the annex.
You could take care to use `git annex add` after changes to the assets,
but it would be easy to slip up and `git commit -a` (which runs `git add`),
checking your large assets into git. Configuring annex.largefiles
saves you the bother of keeping things straight when adding files.
Once you've told git-annex what files are large, both `git annex add`
and `git add`/`git commit -a` will add the large files to the annex and the
small files to git.
This saves you the bother of keeping things straight when adding files.
Other commands that use the annex.largefiles configuration include
`git annex import`, git annex addurl`, `git annex importfeed`, and
the assistant.
## examples
@ -34,11 +36,17 @@ Or, set the git configuration instead:
git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)'
Both of these settings do the same thing. Setting it in the `.gitattributes`
file makes any checkout of the repository share that configuration, so is often
a good choice. Setting the annex.largefiles git configuration lets different
checkouts behave differently. The git configuration overrides the
`.gitattributes` configuration.
Both of these settings do the same thing. Setting it in the
`.gitattributes` file makes any checkout of the repository share that
configuration, so is often a good choice. Setting the annex.largefiles git
configuration lets different checkouts behave differently. The git
configuration overrides the `.gitattributes` configuration.
Or, perhaps you just want all files to be added to the annex, no matter
what. Just write "* annex.largefiles=anything" to the `.gitattributes`
file, or run:
git config annex.largefiles anything
## syntax

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 12"""
date="2019-09-16T18:43:02Z"
content="""
[[forum/lets_discuss_git_add_behavior]]
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="hoxu"
avatar="http://cdn.libravatar.org/avatar/95e33a0073f6c06477b3a202f0301dde"
subject="v6 & manual annexation"
date="2017-06-29T07:25:31Z"
content="""
With v6, is there any way to retain old usage of `git add` and `git annex add` to manually choose which files are kept under plain git and which annexed?
I'm aware of the `-c annex.largefiles=foo` parameter, but that's pretty cumbersome.
"""]]

View file

@ -14,41 +14,45 @@ They are stored in the git repository differently, and they appear as
regular files in the working tree, instead of the symbolic links used for
locked files.
## adding unlocked files
## using unlocked files
Instead of using `git annex add`, use `git add`, and the file will be
stored in git-annex, but left unlocked.
You can unlock any annexed file:
[[!template id=note text="""
Want `git add` to add some file contents to the annex, but store the contents of
smaller files in git itself? Configure annex.largefiles to match the former.
See [[largefiles]].
"""]]
# git annex unlock my_cool_big_file
# cp ~/my_cool_big_file .
# git add my_cool_big_file
# git commit -m "added my_cool_big_file to the annex"
[master (root-commit) 92f2725] added my_cool_big_file to the annex
1 file changed, 1 insertion(+)
create mode 100644 my_cool_big_file
# git annex find
my_cool_big_file
That changes what's stored in git between a git-annex symlink
(locked) and a git-annex pointer file (unlocked). You can commit
the change, if you want that file to be unlocked in other clones of the
repository. To lock the file again, use `git annex lock`.
You can make whatever modifications you want to unlocked files, and commit
your changes.
The nice thing about an unlocked file is that you can modify it
in place -- it's a regular file. And you can commit your changes.
# echo more stuff >> my_cool_big_file
# git mv my_cool_big_file my_cool_bigger_file
# git commit -a -m "some changes"
[master 196c0e2] some changes
2 files changed, 1 insertion(+), 1 deletion(-)
delete mode 100644 my_cool_big_file
create mode 100644 my_cool_bigger_file
1 files changed, 1 insertion(+), 1 deletion(-)
Under the hood, this uses git's [[todo/smudge]] filter interface, and
git-annex converts between the content of the big file and a pointer file,
which is what gets committed to git. All the regular git-annex commands
(get, drop, etc) can be used on unlocked files too.
Notice that `git commit -a` added the new content of the file to the annex,
and only committed a change to the pointer. That happened because git-annex
knows this was an annexed file before. Git leaves the file unlocked, so
you can continue to make modifications to it.
By default, using git to add a file that has not been annexed before will
still add its contents to git, not to the annex. If you tell git-annex what
files are large, it will arrange for the large files to be added to the
annex, and the small ones to be added to git. This is done by configuring
annex.largefiles. See [[largefiles]] for full documentation of that.
All the regular git-annex commands (find, get, drop, etc) can be used on
unlocked files as well as locked files. When you drop the content of
an unlocked file, it will be replaced by a pointer file, which
looks like "/annex/objects/...". So if you open a file and see
that, you'll need to use `git annex get`.
Under the hood, unlocked files use git's [[todo/smudge]] filter interface,
and git-annex converts between the content of the big file and a pointer
file, which is what gets committed to git.
[[!template id=note text="""
By default, git-annex commands will add files in locked mode,
@ -57,14 +61,7 @@ mode is used. To make them always use unlocked mode, run:
`git config annex.addunlocked true`
"""]]
## mixing locked and unlocked files
A repository can contain both locked and unlocked files. You can switch
a file back and forth using the `git annex lock` and `git annex unlock`
commands. This changes what's stored in git between a git-annex symlink
(locked) and a git-annex pointer file (unlocked). To add a file to
the repository in locked mode, use `git annex add`; to add a file in
unlocked mode, use `git add`.
## adjusted branches
If you want to mostly keep files locked, but be able to locally switch
to having them all unlocked, you can do so using `git annex adjust
@ -73,6 +70,15 @@ useful when using filesystems like FAT, and OS's like Windows that don't
support symlinks. Indeed, `git-annex init` detects such filesystems and
automatically sets up a repository to use all unlocked files.
## finding unlocked files
While it's easy to see when a file is a git-annex symlink, unlocked files
look the same as files stored in git. To see what files are unlocked or
locked, many git-annex commands support `--unlocked` and `--locked`
options.
git annex find --unlocked
## imperfections
Unlocked files mostly work very well, but there are a

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="ginquistador@86f226616ead98d2733e249429918f241f928064"
nickname="ginquistador"
avatar="http://cdn.libravatar.org/avatar/f0ef7d68c0ff5d4948a9b0d282987195"
subject="Disappointed with `git add`"
date="2019-09-03T07:30:28Z"
content="""
I first have to say, I have been following and using git annex for ages (5+ years at least), and is my trusted source for all my data. However, for the first time in all these years, I'm seeing a decision that I do not agree with or understand.
Specifically, using `git add .` to add a file to git annex as the default pattern just seems a fundamentally wrong design to me (at least for my usage pattern). I want to be able to use git normally, and have git-annex only get involved when I explicitly request it to, and not for all files. AFAIK, git-lfs does do it right. I understand [annex.largefiles: configuring mixed content repositories](http://git-annex.branchable.com/tips/largefiles/) can be configured to get the behavior I want. However, the default behavior should add it to vanilla git, and any other desired behavior can be obtained by the user via annex attributes, or extra command line flags to `git annex add`
Knowing Joey, I assume there's a strong rationale as always, and would love to hear it, but I would still like to STRONGLY REQUEST changing the default behavior.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 16"""
date="2019-09-16T17:36:06Z"
content="""
@ginquistador it may or may not have been the best decision, but this tip
is not a good place to discuss it. A bug would be a good place.
"""]]

View file

@ -1,7 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 17"""
date="2019-09-16T18:44:33Z"
content="""
[[forum/lets_discuss_git_add_behavior]]
"""]]

View file

@ -73,11 +73,9 @@ all locked files in the local repository.
The behavior of some commands changes in an upgraded repository:
* `git add` will add files to the annex, rather than adding them directly
to the git repository. To cause some files to be added directly
to git, you can configure `annex.largefiles`. For example:
`git config annex.largefiles "largerthan=100kb and not (include=*.c or include=*.h)"`
* `When `annex.largefiles` is configured, git add` will add matching
files to the annex, rather than adding them directly to the git
repository.
* `git annex unlock` and `git annex lock` change how the pointer to
the annexed content is stored in git. If you commit the change,