remove this thread, which has outlived its usefulness

Based on the last three comments on this thread it was going to keep
collecting complaints from people who glanced at the thread, did not
notice it was for an old, solved issue, and decided to followup.

Also, the tone of this thread, while very constructive in some places,
is very very outraged in others. This apparently leads people to feel
that randomly saying they don't trust me is a reasonable thing to post
at the end of a long thread they have not bothered to read all of, by
their own admission.

So this thread seems to have become a source of confusion for users, and
a source of pain and disincentive to worm on git-annex for me. Yes, I'm
taking the rest of today to go do something actually fun obviously.
This commit is contained in:
Joey Hess 2020-05-21 15:03:29 -04:00
parent 67e9efa03a
commit 6702bc961b
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
39 changed files with 0 additions and 768 deletions

View file

@ -1,6 +0,0 @@
With v7, the default behavior of `git add` in a git-annex repo changed
to adding it as an annexed, unlocked file.
It may be a little late to talk about this since v7 has become the default
already, but I was asked to explain my thoughts on it, and I'm also
interested in hearing your thoughts. --[[Joey]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="separate annex.git-add.largefiles and annex.git-annex-add.largefiles settings"
date="2019-10-10T18:33:50Z"
content="""
[[separate `annex.git-add.largefiles` and `annex.git-annex-add.largefiles` settings|todo/separate_annex.largefiles.git-add_and_annex.largefiles.git-annex-add_settings]] would let you configure `git add` to only add to git while still letting `git annex add` decide what gets annexed.
"""]]

View file

@ -1,141 +0,0 @@
[[!comment format=mdwn
username="amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6"
nickname="amindfv"
avatar="http://cdn.libravatar.org/avatar/9cdda587f634ea9a85b34b25be421676"
subject="comment 11"
date="2019-10-12T21:12:51Z"
content="""
First off, I love git annex and I truly appreciate all the hard work that's gone into it, so I hope you'll take my frustration as constructive when I say:
Making \"git add\" mean \"git annex add\" is a **terrible** default and it should be reverted **ASAP**.
## v7 is an entirely different tool than v5
mkdir foo && cd foo
git init && git annex init
echo 'one' > a.txt
git add a.txt
git commit -m '+'
echo 'two' >> a.txt
git diff
I don't get a diff!! What??! Except for after \"git annex init\", git-annex has kept completely quiet (not warning me about any of this), and yet it's hijacked the whole repo.
\"git show\" is also broken, \"git add -p\", \"git log -p\", etc etc.
As others have mentioned, \"git clone\" may cause people to lose their data as well.
In other words, \"git annex\" is no longer a couple of additional commands to use within git - it's something closer to a replacement of git. It feels like a takeover of my git repos.
## It forces a workflow on users
One of the beautiful things about git-annex is it adds a few simple concepts to git, and allows us to use those new primitives in any way we like. This has allowed users to invent lots of different workflows that meet their needs.
I've seen lots of different types of repo configurations and workflows, but for discussion here I think we only need to talk about two:
1. \"Big bunch-o-binaries\" (BBOB): user wants to keep their photo collection/scientific data/etc. in a big repo and they've got some way (like the assistant) to sync it. This could be considered to be a \"dropbox replacement\". In a BBOB nobody ever wants to look at a diff between versions of a file, do line-by-line merges, or use most of the other features of git.
2. All uses that aren't that one! But let's be specific and describe a couple:
* I've got a repo (dozens, actually) which is just code, but somewhere along the line I had a large data file I wanted to add and didn't want to slow git down. I \"git annex init && git annex add\"ed and was done with it. Back to writing code.
* I've got a repo which is a true mixture of large binaries and small text files. E.g. for a video editing project I've got raw video files as well as various configuration scripts, notes, the .kdenlive file, etc.
This change puts a giant wedge between use case #1 and #2.
As an example, the org-mode people suggest using git-annex: <https://orgmode.org/worg/org-tutorials/index.html> . I can't imagine they'd be happy to accidentally lose the ability to get diffs between versions of their .org files.
## We don't know how big of a breaking change this is...
...and it may be very large.
How many people use \"git add\" to mean \"git add\" in at least one of their annex repos? We don't know! Compatibility with a big breaking change like this wasn't asked in the last user survey (which I happily took part in): <https://git-annex-survey.branchable.com/polls/2018/>
But we can look at what _was_ asked to try and estimate the extent of the breakage:
As a proxy, we could examine how many people use the assistant (more likely to simply have a big-bunch-o-binaries), vs. number of people who use it on the command line (more likely to be carefully managing their repos, including which files are in \"vanilla\" git vs annex). The numbers in the survey indicate there are 14 people using it on the command line for every 1 person using the assistant (85% to 6%).
We can also look at how many people were using any v7 features: in the most recent survey 75% of people say they're not using any v7 features, and another 7% say they don't know, which I read as not following this discussion very closely. This suggests to me most people (at least 82%) were happy using git annex basically as it was.
That idea (\"we're basically happy how it is already\") is borne out by other questions: 83% of users rate themselves between \"happy\" and \"one of my favorite applications of all time\" (FWIW, I fall into the \"one of my favorie applications of all time\" camp!)
In addition, the \"blocking problems,\" \"focus,\" and \"roadmap\" sections of the survey don't provide compelling evidence that changing fundamentally how git-annex interacts with git is something anyone's clamoring for.
85% of people mostly use `git annex` from the command line. How many of those people have used \"git add\" in an annex repo at least once, and (now incorrectly) believe they know what it's going to do to their repo?
## More broken workflows
Even repos which mainly *are* BBOB (big-bunch-o-binaries) may have a README file or other files like them. I note that most (all?) repos here have text files that are in \"plain git\": <https://git-annex.branchable.com/publicrepos/>
Now when I \"apt upgrade\" and add a new source file to one of my code repos, it's going to be `git-annex-add`ed? Does that mean I can't (easily, sensibly) push it to a code hosting platform (GitLab, Github)?
Forcing people to this behavior by default reifies one workflow (BBOB) over all others, and basically replaces one tool (basically a few added primitives for git) with another that's also called git annex (basically a replacement for content tracking in git - and isn't that basically a replacement for git?).
I realize every user's going to have slightly different preferences, but I truly think this is the rare case where simply saying \"if you don't like it, set these flags in your repos' configs\" is not nearly good enough.
I realize it would be a pain to roll this change back, but the benefits still far outweigh the costs. It's going to be a much bigger pain point for all users who are suddenly confused and have put their history in a broken state, for all the tutorials that are now giving users inaccurate information, etc.
## Worse experience for new/inexperienced users
Anecdotally, I've turned 5-10 people on to the beauty of git annex, and in every case the reason I told them about it was they mentioned to me they needed some way to store a large data file in their existing code repository.
Now when I tell people about git annex, I have to also tell them to be super careful to set up a set of configs in orer for it not to fundamentally redefine the meaning of their repo?
The common new-user experience for me has been:
> Friend (from across the room): \"Dang, this file's too big for git\"
> Me: \"Have you tried git annex?\"
> [...talking about the benefits, installation, etc...]
> Friend: \"Ok now how do I use it?\"
> Me (still across the room): \"Just 'git annex init' in the repo and then 'git annex add' the file\"
It seems almost comical that I'd memorize a line so I could instead say:
> Me: \"Just 'git annex init' in the repo and then 'git annex add' the file. But first be SUPER careful to type 'git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)' and don't forget to keep that list of file extensions up to date\"
\"git annex init\" should be set-it-and-forget-it. I shouldn't have to worry about what parts of my git repo it's messing with because I'm not being vigilant enough.
## Should have been discussed and then announced far more widely
I go to the git annex homepage every couple of weeks, which I imagine is on the high end for a user of a command line tool. Even I was caught completely by surprise with this change, and only saw it when I \"apt-upgrade\"d.
## (Subjective:) Worse even if it didn't break (most?) users' expectations
IMO, even if this were a new tool with no existing users or workflows or repos, this would not be the best default and instead should be behind a flag. I know I'd have been less enthusiastic to start using it if it were nudging me to basically use it as a replacement for Dropbox, instead of an unobtrusive set of additional tools for git.
I'd also be less enthusiastic to know that if I weren't vigilant I'd get totally wrong behavior (e.g. I say 'git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)' but then I \"git add\" a .hsc file and it \"git-annex-add\"s it \"behind my back\")
## In summary:
Issues as I see it are (and there may be more):
* It breaks users' workflows. Potentially a huge number of them. This alone should be a big reason to be careful when making a change like this.
I - one single user - have already spent hours assessing how this will affect me and my repos. It requires me to be very careful and I'm far from sure I won't slip up somewhere. Hopefully if/when I do, I'll notice the mistake.
I had other things planned for my Saturday, but some of them involved using \"git annex\" and so now I have to halt everything to make sure I'm not screwing anything up now that I \"apt upgrade\"d and got a new version of git-annex this morning.
* Uncomfortable new-user experience
* (Subjective:) not a good default even if didn't break expectations
* Should have been announced and discussed MUCH more widely and extensively
## A response to a few issues already raised:
> \"Suppose you have an unlocked file in your repo, and you rename it (not using git move), and then git add it. Oops, now you've added to git a large file that you wanted to be annexed. \"
Why not simply provide a configurable warning about attempting to \"git add\" a file above a certain size? A \"did you mean 'git annex add'?\" type warning would be helpful for everyone. It'd catch all mistakes, not just ones caused by mv.
> \"But mv foo bar; git add bar is normally identical to git mv foo bar. Why should using git-annex break that identity?\"
This to me feels like killing a mosquito with a sledgehammer. This change breaks myriad other identities, including a very simple one:
- \"git add\"ing a new file should behave the same whether or not we've at some point typed \"git annex init\"
The \"mv ; add\" identity has never been an issue for me in years of using git annex. By contrast, this change has already eaten up half of my Saturday. (Admittedly some of it writing this up. And again I should mention: I still hugely appreciate all the work on git-annex!)
## What to do about it?
I'm 100% behind Dwk's 4-point plan.
My one clarification is potentially to take \"only if largefiles is set\" to mean \"only if largefiles is set to a value other than 'nothing'.\" Not sure about this one.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="experience with v7 semantics"
date="2019-10-12T21:39:03Z"
content="""
To add to the anecdotal reports of user experience, I find myself periodically running `git annex lock` just to make sure nothing got inadvertently unlocked, or added as unlocked. The main benefit of using `git-annex` to version data analysis results, besides avoiding git's choking on large files or breaking github size limits, is the certainty that the result files are exactly as originally output, and haven't been accidentally changed (e.g. by re-running the analysis with different parameters but same output file name). With locked files, I have that guarantee -- once added and committed, the files won't change unless explicitly unlocked. With unlocked files, there's a chance of changing the file and then not noticing the change and committing it. The git log will of course reflect the change, but I might miss it, unless I inspect the log. So it's important to have a foolproof way to prevent files from being added as unlocked, and that's hard to do with the current `git-annex` version. I can set `annex.largefiles=nothing` at the repo root, but then `git annex add` won't annex anything, either. One solution is [[todo/separate_annex.largefiles.git-add_and_annex.largefiles.git-annex-add_settings]]; there may be others.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="Lukey"
avatar="http://cdn.libravatar.org/avatar/c7c08e2efd29c692cc017c4a4ca3406b"
subject="v5 mode"
date="2019-10-12T21:48:56Z"
content="""
Maybe something like \"git annex v5mode true\" is needed so we get v5 semantics in v7 mode.
"""]]

View file

@ -1,65 +0,0 @@
[[!comment format=mdwn
username="amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6"
nickname="amindfv"
avatar="http://cdn.libravatar.org/avatar/9cdda587f634ea9a85b34b25be421676"
subject="comment 14"
date="2019-10-14T00:20:25Z"
content="""
I'm again trying to do my best at expressing what I perceive as the seriousness of this change without being too critical of the dev who made it all possible. Sincere apologies in advance if I haven't hit the right balance. Git Annex really is one of my favorite tools! :)
I missed the fact that (if I understand correctly) regardless of what flags you set, \"git add\" really is just \"git annex add\" with the newest version. Setting e.g. 'largefile = nothing' just means you can't add anything to git annex at all!
(Side note: if I after hours of research missed that, mow much confusion can we expect for the average user without that much time to spend?)
After realizing this, I created a couple of aliases to - I thought - get back control by being super explicit. Aliases for
git annex add -c annex.largefiles=anything
and
git annex add -c annex.largefiles=nothing
But then, without thinking, I ran a couple of scripts, one of which calls \"git annex add\" and the other of which calls \"git add\" an an \"annex-init\"ed repo. Now I've got to fix another mess.
I can't assume there's any part of my digital life involving git that this doesn't impact!
So my only option, it seems, to wrest back control of my repos is to define extensive rules:
largerthan=100kb and not (include=*.c or include=*.h or include=*.hs or include=*.lhs or include=*.hsc or include=*.pl or include=*.py or include= ...etc etc etc)
What happens when I forget a file extension? If I remember to include '*.hs' but forget '*.lhs'? And then I push code to a team repo?
And even still: I've gone through my git annex repos and found that I have greater than 100 files each of the following categories:
* Files managed by \"vanilla\" git which are greater than 100kb in size, which have no file extension
* Files managed by git annex which are smaller than 100kb, which have no file extension
(I stopped searching when I passed the 100 threshold on all these, but I could get more complete data if it's useful)
Closer to what I want is the `\"mimeencoding=binary\"` criterion, but:
* It's just a more accurate rule, not a completely accurate rule, and I don't want to fight my tools (or not notice till it's too late) when it incorrectly guesses. I don't want it guessing at all!
* As the manual notes, \"This is only available to use when git-annex was built with the MagicMime build flag.\"
I've decided the only safe thing to do is to downgrade to an earlier git-annex version until something's sorted out.
It's not an exaggeration to say that this change redefines what git annex is. Previously if someone asked \"what's git annex?\" my answer has been:
\"Oh yeah, it's super cool! It allows you to add large files to git without keeping their actual contents in git, which means [... etc]\"
Now I'd have to phrase it differently:
\"It redefines 'git add' to try to correctly guess which files should be tracked in git vs. a separate store, which has benefit A, B, and C. It's cool but it makes me nervous.\"
I worry \"git add\" will suffer from the problem of the old Microsoft Word, where it's decided what you're writing is a resume and you have to fight the f***ing thing to convince it it's not a resume and to stop \"fixing\" your work.
@Lukey writes:
> Maybe something like \"git annex v5mode true\" is needed so we get v5 semantics in v7 mode.
I _really_ don't think this is good enough. \"git add\" has a meaning, and \"everyone knows\" what it means. This change unnecessarily breaks all users' understanding of what \"git add\" does. It may cause them to lose data, etc.
I'd be fine with a flag to do the inverse (explicitly set \"git add\" to mean \"git annex add\").
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="getting v5 semantics"
date="2019-10-14T03:42:05Z"
content="""
The best way to restore v5 semantics right now seems to be: (1) set `annex.largefiles=nothing` to prevent `git add` from annexing; (2) make an alias to use instead of `git annex add`, which temporarily sets `annex.largefiles` to whatever it normally would be and then calls `git-annex-add`. But old scripts that call `git-annex-add` would need to be changed to call the alias.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="strmd"
avatar="http://cdn.libravatar.org/avatar/035707b9756129bbdea6b36a7f7b38d3"
subject="A new user chiming in"
date="2019-10-14T18:23:59Z"
content="""
I, too, am bewildered by how this major, breaking change was implemented and communicated so casually. Having basically been drawn in by the default v5 workflow and just barely finished migrating all of my archives/data sets and many of my current projects, I was initially convinced I must have not understood something right. But here we are, and at least I'm glad I'm not alone.
To a certain extent, I get that git-annex users are expected to be pretty savvy, but the auto-upgrade caught me completely by suprise. I'm on a Mac and ran `brew upgrade ` as part of everyday maintenance, made some changes in my repo, synced it to a remote and to my complete surprise was the repo now behaving the exact opposite way as the day before, i.e. `git add` annexed the file.
I am super happy about git-annex and your work on it Joey, but this has been a scary experience for me. And I'd very much like the default v5 workflow back.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="spwhitton"
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
subject="comment 17"
date="2019-10-15T18:03:34Z"
content="""
I find amindfv's argument convincing: it is not good for new users of git-annex to feel that their git repo has been hijacked. And, it is not good to break people's scripts.
On the other hand, Joey's arguments in favour of the change make sense.
So, having an easy way to turn the new git add behaviour on, but leaving it off by default, seems best.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="can git-annex-pre-commit annex files?"
date="2019-10-15T19:15:41Z"
content="""
Re: [[concerns|forum/lets_discuss_git_add_behavior/#comment-cb55e3813bed92ceb6d84092841903e3]] about `git add` not annexing files:
>Oops, now you've added to git a large file that you wanted to be annexed
>you would surely hope that the annexed ones stay annexed and don't get committed directly to git
Could [[`git-annex-pre-commit`|git-annex-pre-commit]] annex the file if it matches `annex.largefiles`? (Since it already changes the index, to fix up the symlinks into annex). Could it be configured to annex the file as locked?
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="different handling of dotfiles"
date="2019-10-21T02:51:16Z"
content="""
Related: [[bugs/dotfiles_handled_differently]]
For most files, whether they get annexed is controlled by `annex.largefiles`. But dotfiles are configured to never be annexed regardless of `annex.largefiles`. This special-casing (in `.git/info/attributes`) is unexpected and confusing. It is probably a consequence of making `git add` annex files by default, but it's better to change that default than to have the special case. Also, `git-annex-add` seems to ignore the dotfiles, as in the bug report above.
I thought of reverting to an earlier version of git-annex until these and other issues can get worked out, but realized I can't, since the repos got irreversibly auto-upgraded to v7...
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2019-09-16T18:12:07Z"
content="""
My main uncertaintly about this personally has been: Why should `git add`
add the file unlocked, and `git annex add` add the file locked? If `git add`
adds the file to the annex, it must add it unlocked (due to git's
interface). But `git-annex add` could behave the same as `git add` does.
(And it can if you set annex.addunlocked.)
The only half way good reason seems to be, sometimes we want the file added
unlocked and sometimes locked, and this provides commands available by default
with no configuration, and no added switches, for both use cases.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 20"""
date="2019-10-22T17:14:24Z"
content="""
Can we please not use language like "hijacked" and "man in the middle
attack" about this.
At least, not if you want me to engage constructively with this thread.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="leej"
avatar="http://cdn.libravatar.org/avatar/eb1c6bd57680f694fb4658388e6de4ed"
subject="+1 to the comments above"
date="2019-10-22T07:21:31Z"
content="""
Git Annex has been and is a terrific tool. And, I would, very respectfully, like to +1 the comments above. I agree with spwhitton that v7 `git add` behavior would benefit from being off by default, with a way to turn it on as required. Both from a migration safety/friction-reduction standpoint, but also from the least invasive integration with Git principle that amindfv and others have so well articulated.
"""]]

View file

@ -1,13 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""re: can git-annex-pre-commit annex files?"""
date="2019-10-22T17:45:43Z"
content="""
Ilya, by the time the pre-commit hook runs, `git add` would have already
written the large file into the object file, so stuff like `git gc` would
pay the price of it even if it were kept out of a commit.
In other words, that has the same problems that v5 unlocked files had when
git add or git commit was run on them. I've seen plenty of users bitten by
that with v5. Fixing that problem was a (minor) motivation for v7.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="temporary manual downgrade of repos to v5 is possible"
date="2019-10-22T16:40:32Z"
content="""
Related: until the v7 issues are resolved, one option is to manually [downgrade](https://git-annex.branchable.com/forum/Exactly_what_does_a_v5_to_v7_upgrade_entail__63__/#comment-e1a252807a2200a7c737696abd08d38b) the repos to v5 (turns out it only involves changing a few git config and gitattributes settings) and use an [[pre-7.20190912|news/version_7.20190819]] git-annex version.
Just to reiterate, I much appreciate @joeyh's work on v7 and [[think|forum/lets_discuss_git_add_behavior/#comment-ffd69d3a710beeac091ef4173b122cdc]] it's an important advance. The v5 downgrade is just a temp option to keep old workflows working, until they can be adapted to v7, which won't be hard to do once the issues discussed above get addressed.
"""]]

View file

@ -1,16 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""re: comment 4"""
date="2019-10-22T17:49:43Z"
content="""
> maybe, git-annex could keep track of local unlocked files by inode, not just by path name?
That's an interesting idea. If it could be made to work well, I think it
would address my concerns from comment 2 while freeing `git add` to
otherwise behave however it might be desired to behave by the user.
I've expanded on the idea in
[todo/inode_based_clean_filter_for_less_surprising_git_add]]
Thanks!
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 23"""
date="2019-10-22T18:16:15Z"
content="""
Several commenters seem to be under the misapprehension that `git add` of a
modified file that is stored in git before will annex the new version. It
does not. That case is already handled, by git-annex noticing if the old
file was annexed, and if not, letting git add it to git as usual
(unless annex.largefiles is configured, in which case it uses that
configuration).
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="can git add only annex already-annexed files?"
date="2019-10-22T20:01:51Z"
content="""
Both [[problem cases|forum/lets_discuss_git_add_behavior/#comment-cb55e3813bed92ceb6d84092841903e3]] involve `git add`'ing *already annexed* files. So if the new `git add` behavior could be limited to already-annexed files, these problem cases would be addressed, without creating the problems discussed above. Since already \"git-annex [[abuses|todo/git_smudge_clean_interface_suboptiomal]] the fact that git provides the clean filter with the work tree filename, and reads and cleans the file itself\", the work tree filename is known. Question is how to know, when `git add` calls [[git-annex-clean]], which files are already-annexed.
\"Suppose you have a mixture of unlocked files and files that are added directly to git, and you've modified several of them. Now, if you run git commit -a, you would surely hope that the annexed ones stay annexed and don't get committed directly to git. Well, git add . ; git commit is normally equivilant, so it should behave the same. It follows that git add does need to add some files to the annex.\" -- for the unlocked files, the version in the index would be the pointer file, so git-annex would know what they are.
\"Suppose you have an unlocked file in your repo, and you rename it (not using git move), and then git add it.\" -- catching this requires keeping track of inodes of unlocked files. But since already \"git-annex [[installs|todo/git_smudge_clean_interface_suboptiomal]] post-checkout, post-merge, and pre-commit hooks, which update the working tree files to make content from git-annex available\", the hooks could do this, maybe with a Bloom filter? You'd only consult the Bloom filter if the git index entry isn't there _and_ file matches `annex.largefiles`. Or maybe the inode info in the git index could be used.
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6"
nickname="amindfv"
avatar="http://cdn.libravatar.org/avatar/9cdda587f634ea9a85b34b25be421676"
subject="comment 27"
date="2019-10-23T04:24:01Z"
content="""
Hi Joey --
As one of the people having used the word \"hijacked\" previously, my apologies.
I was trying to express frustration, surprise, and a feeling of loss of control. But I do see how my phrasing could sound too adversarial and potentially like it's ascribing something like malice where there's clearly none.
For what it's worth, that's not at all the meaning I intended to convey (as I said, I very much appreciate your ongoing work!), but message received: I'm happy to change how I phrase things like that if it'll make for more constructive conversation.
Thanks!
"""]]

View file

@ -1,18 +0,0 @@
[[!comment format=mdwn
username="amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6"
nickname="amindfv"
avatar="http://cdn.libravatar.org/avatar/9cdda587f634ea9a85b34b25be421676"
subject="comment 28"
date="2019-10-23T04:42:36Z"
content="""
> Several commenters seem to be under the misapprehension that git add of a modified file that is stored in git before will annex the new version. It does not.
For what it's worth, this wasn't a misapprehension that I had in my posts above. It's possible I conflated or wasn't clear enough about other concerns in a way that created that impression (or maybe you simply weren't talking about my posts!), but my concerns are about:
- Loss of control of what's added to git vs. git-annex
- Violating the Principle of Least Astonishment
- Too much \"magic\" requiring me to work against, instead of with, my tools
- Breaking my and others' existing git-annex workflows
All of which are relevant to adding new files to the repo.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="spwhitton"
avatar="http://cdn.libravatar.org/avatar/9c3f08f80e67733fd506c353239569eb"
subject="comment 29"
date="2019-10-23T15:25:30Z"
content="""
Thank you for writing up the inode based clean filter idea, Joey. And apologies from me too for 'hijacked'.
"""]]

View file

@ -1,45 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 2"""
date="2019-09-16T18:20:00Z"
content="""
But I think more people will care about the question of why
`git add` adds files to the annex, and not to git.
I had several reasons to make that the default behavior.
* Suppose you have an unlocked file in your repo, and you rename
it (not using `git move`), and then `git add` it. Oops, now you've
added to git a large file that you wanted to be annexed. But
`mv foo bar; git add bar` is normally identical to `git mv foo bar`.
Why should using git-annex break that identity?
* Suppose you have a mixture of unlocked files and files that are added
directly to git, and you've modified several of them.
Now, if you run `git commit -a`, you would surely hope that the annexed
ones stay annexed and don't get committed directly to git.
Well, `git add . ; git commit` is normally equivilant, so it should
behave the same. It follows that `git add` does need to add some files to
the annex.
* In general, keeping track of which files are supposed to be in the annex
and which in git is very failure prone. The best way seems to be for
git-annex to somehow be able to distinguish between them. And that's what
annex.largefiles lets it do. And it needs to default to adding files
to the annex, otherwise the above two cases can cause problems.
* If `git add` does something the user doesn't want, it's
far preferable for the mistake to be adding the file content to the annex,
vs adding the file content to git. Recovery from the former is a simple
process (see [[tips/largefiles]] which has conversion recipes), while
recovery from the latter can be arbitrarily compilicated,
including needing to fix problems in clones on other people's computers.
I think that's most of my thinking, about why `git add` behaves
the way it does by default, although I may be forgetting something.
Counterpoint: git-lfs instead recommends the user always start a repo with `git
lfs track` the extensions that want `git add` to store in git-lfs.
The git-lfs default is thus backwards to the annex.largefiles default,
and they do seem to still do ok.
"""]]

View file

@ -1,39 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""re: A new user chiming in"""
date="2019-10-23T15:58:55Z"
content="""
@strmd I don't see how this change could have been much better
communicated.
git-annex ships with a NEWS file which is where this kind of
thing is documented. You should read it when upgrading.
The same documentation is included in the release announcement on this
website.
(Package managers should really provide a way to see news items when
upgrading software, but as far as I know only Debian's apt is able to do
that.)
This change was under development since 2015, and was discussed fairly
extensively in the [[devblog]] over the years. I don't expect many people
to follow that, but people who want to proactively influence git-annex
development can and do follow that.
The 2018 git-annex user survey intentionally opened off with a mention
of v7 repositories and linked to documentation that included the `git add`
behavior. I was trying to get the git-annex community to try it out
and provide feedback. Some did. Nobody complained about git add.
Throughout the past year or so, I saw evidence of increasing numbers of
users using v7 features. None of them ever complained about git add
behavior change.
Before pulling the trigger on v7, I solicited user feedback in the devblog
and elsewhere, and I did get some useful feedback that led to the
annex.autoupgraderepository config.
Anyway, if the git-annex community evolves in a direction that I can't
make large changes, including breaking changes when necessary,
git-annex will need a new maintainer.
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 31"""
date="2019-10-23T19:20:25Z"
content="""
This will get the behavior you seek, once you have upgraded to current
master:
git config annex.gitaddtoannex false
It could be made to default to false, TBD. If someone wants to make a patch
to all the documentation that currently talks about using git add and git
commit -a to annex files, to document the mooted new behavior, that would
be helpful.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 32"""
date="2019-10-24T18:17:05Z"
content="""
Futher, current master now only makes git add add to annex when
annex.largefiles has been configured.
"""]]

View file

@ -1,19 +0,0 @@
[[!comment format=mdwn
username="strmd"
avatar="http://cdn.libravatar.org/avatar/035707b9756129bbdea6b36a7f7b38d3"
subject="comment 33"
date="2019-10-29T20:01:01Z"
content="""
@joey,
Thanks for your reply. I certainly appreciate you giving your perspective and rationale, and you bring up several good points. I never intended for my comment to be hostile or harsh, and I'm sorry if it struck you that way.
I do believe you acted in the best of intentions, and going back through the devblog and documentation one can follow along and see how this change came about. So the bit about it being casually communicated, as I wrote, is in one sense absolutely false.
And yet, to me, a somewhat recent user (who still find aspects of git-annex and its inner workings hard to grasp at times), it appeared to happen without warning and came pretty much out of nowhere. As you note, there are several reasons for that, and I'm obviously responsible for making sure I understand the tools I use.
As you have touched upon on your personal blog, there are probably some lessons to learn here. I hope we can turn it into something constructive. And mostly, I hope we as a community can avoid putting you in this regrettable position again.
Finally, the solution you have come up with since my post, seems very, very good and should alleviate most, if not all, concerns.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="Dwk"
avatar="http://cdn.libravatar.org/avatar/65fade4f1582ef3f00e9ad6ae27dae56"
subject="comment 34"
date="2019-10-29T21:04:42Z"
content="""
@joey
First, thanks a lot for this solution, which seems to strike a good balance between various use-cases.
Second, I apologize for concluding my comment (which was intended as constructive…) on a too-strongly-worded disclaimer about how I'd felt when upgrading. I thought it would give you grounds for discarding my opinion (if you felt it wasn't well-founded) as that of a someone unthinkingly annoyed at seeing one's workflow changed. I certainly didn't intend to push you around in any way.
"""]]

View file

@ -1,9 +0,0 @@
[[!comment format=mdwn
username="http://templeofcrom.duckdns.org/"
nickname="Karl"
avatar="http://cdn.libravatar.org/avatar/336975995d2c8652aa98284987d5987e90e1b4d137da415af18a8e04c29edbc3"
subject="A warning in the docs about earlier v7 revisions would be nice"
date="2019-11-08T04:59:43Z"
content="""
I started using git-annex for the first time today and ran head-first into this bug, so I'm glad to see a course correction here. I just wish the update had been done a month earlier, as the version packaged with Fedora 30 has the 'git add' override behavior which cost me a few hours in figuring out how to get files out of the annex and into a git object. The git-annex-add wiki page really could use a couple warnings to say that 'git add' may be overridden and that in earlier v7 revisions you are basically required to use annex.largefiles with a strict filter in order to make normal use of 'git add'.
"""]]

View file

@ -1,14 +0,0 @@
[[!comment format=mdwn
username="javkalas@30f67400375ff022347b37236b63786ecf79cd82"
nickname="javkalas"
avatar="http://cdn.libravatar.org/avatar/2343ce9a9d022cb24c5896fb00f1e12f"
subject="comment 36"
date="2020-04-03T19:09:52Z"
content="""
I think this was the kind of change that bites hardest the people who don't follow development. I have a repo that for 6 years has had exactly one file tracked with git-annex. I don't want to track anything else. Then today, a file was auto-added to git-annex (tiny js file, no idea why) and when I pulled on server my web page went down :) Anyway, point being, if you don't want many files tracked by git annex, you probably don't read much about git annex either, and won't notice such a change until it bites you.
I did
$ git config annex.largefiles '(largerthan=9999999999kb)'
"""]]

View file

@ -1,33 +0,0 @@
[[!comment format=mdwn
username="codelix"
avatar="http://cdn.libravatar.org/avatar/667ff4d0387694f28236639bab0faf2c"
subject="SO frustrating"
date="2020-05-21T17:55:09Z"
content="""
I am a big big fan of Joey and a big big fan of git annex, been using this for 7+ years. I absolutely love the reasoning that Joey does and how we identifies the best way to solve any problem.
But this is the first change that does what I consider to be a major mistake. It's essentially had me rethinking whether I can trust git annex anymore, and am tempted to continue using older versions which come with their own problems. It essentially all comes down to \"sane defaults\". Joey's reasonings are absolutely bang on, but optimizing for a very specific use case and silently doing things behind the scenes does not make sense.
For instance, git lfs does not add all files to lfs silently, but this essentially makes git annex do that in a sense.
My understanding is that this has been changed to some extent in recent versions, but I'm not 100% sure what the state is. In any case, I propose something like this
* Sane default -> git add ALWAYS adds files to git, git annex add ALWAYS add files to git annex, no exceptions
* Permit configuring which files get added to annex by git add, or to git by git annex add.
Honestly, it's becoming super confusing how all the different options like largefiles interact with each other. For this purpose, I suggest having a new namespace/version of configs that cleans this up, to maybe something like.
~~~
cat .gitaddtoannex
*.ogg
*.mp3
> 100MB
cat .gitannexaddtogit
.*
< 3MB
~~~
I feel this will simplify the part and making it super clear what will happen. Having the same behavior for `git mv` and `mv` can be handled as suggested already. Please let me know your thoughts on this, Joey.
Greatly appreciate all the work you are doing and hope we can continue to keep git annex the rock solid option it is. I think simplifying some of these configs will also help make it more accessible to less techie folks as well.
"""]]

View file

@ -1,20 +0,0 @@
[[!comment format=mdwn
username="codelix"
avatar="http://cdn.libravatar.org/avatar/667ff4d0387694f28236639bab0faf2c"
subject="comment 38"
date="2020-05-21T18:00:57Z"
content="""
Actually, even better it could be something like this
~~~
cat .gitannexinclude
+ *.ogg
+ *.mp3
+ > 100MB
- .*
- < 3MB
~~~
Any statement lower down overrides a statement higher up. Any file that does not match any of these patterns is automatically added to git. This will let us deprecate options like `largefiles` which are a source of lot of confusion for at least me.
"""]]

View file

@ -1,8 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 3"
date="2019-09-19T20:39:21Z"
content="""
\"it's far preferable for the mistake to be adding the file content to the annex, vs adding the file content to git\" -- not always. If you add content to the annex, you have to remember to separately sync it; just pushing the branch won't save it. I've lost contents this way, when working with a temporary repo clone.
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 4"
date="2019-09-20T15:43:39Z"
content="""
Another reason it's not always \"preferable for the mistake to be adding the file content to the annex, vs adding the file content to git\", is that adding it as *unlocked* annexed files can cause [[slowness|bugs/git_status_extremely_slow_with_v7]], can cause files to be unexpectedly missing from new clones until they're explicitly [[gotten|git-annex-get]], etc. The tradeoffs depend on the use case.
\"Suppose you have an unlocked file in your repo, and you rename it\" -- maybe, git-annex could keep track of local unlocked files by inode, not just by path name?
\"The best way seems to be for git-annex to somehow be able to distinguish between them. And that's what annex.largefiles lets it do\" -- can there be an `annex.largefiles_git_add` variable that, if defined, is used by `git add` but not by [[`git annex add`|git-annex-add]]? Then one can set `annex.largefiles_git_add=nothing` to get the former default behavior, while still controlling `git-annex-add` behavior as before.
"""]]

View file

@ -1,21 +0,0 @@
[[!comment format=mdwn
username="Dwk"
avatar="http://cdn.libravatar.org/avatar/65fade4f1582ef3f00e9ad6ae27dae56"
subject="Perhaps a good behaviour but only if largefiles is set"
date="2019-10-05T02:34:42Z"
content="""
This is indeed a sane default for people who want to annex every file. It is also a nice behaviour as soon as largefiles is set (it simplifies one's workflows and avoids errors).
However, it makes little sense as a default for people who use git-annex to manage some large files inside a normal git repo. They are basically forced to configure largefiles, since out-of-the-box git-annex now essentially breaks git: as Ilya points out, it breaks a very standard git workflow you add a file, you push, you pull in another clone, and you then expect to have the contents of the file. (Worse, it does it in a silent way: since git-add adds the file unlocked, there is no straightforward way of noticing that the file has, in fact, been annexed and therefore that git won't be able to sync it.)
At bottom, the problem is to accommodate two very different groups of users.
It would require more thought, but I would favor a solution like the following:
1. modify git-add's behaviour *only if* largefiles is set;
2. explain carefully in the doc that largefiles will alter git-add's behaviour (I believe git-annex should modify the underlying git behaviour as little as possible and not without due warning);
3. warn in the doc that without a largefile setting, some unfortunate errors (those you mention in your comments) become likely, so as to make the advantages of a largefile settings clear;
4. perhaps, add a question when doing git-annex-init: are you planning to use git-annex to manage all your files? If yes, set `largefiles=anything` and warn that git-add will now add things to the annex; if no, do not set largefiles and thus keep the current default until the user decides otherwise.
Disclaimer: my judgment may be clouded by the fact that I was unpleasantly taken by surprise by the change (and lost a few hours of work to this, until I got access to the internet and figured out the issue): upon upgrading, I felt like git-annex had done some kind of man-in-the-middle attack on my normal git…
"""]]

View file

@ -1,36 +0,0 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 6"
date="2019-10-07T08:30:52Z"
content="""
If you want to add the file to git, use `git add`.
If you want to add the file to git-annex, use `git annex add`.
Simples!
There isn't any other behaviour which is a more obvious default.
> Suppose you have an unlocked file in your repo, and you rename it (not using git move), and then git add it. Oops, now you've added to git a large file that you wanted to be annexed
If you wanted it to be annexed, you should have `git annex add`'d it! git-annex doesn't (and can't) know that the user wanted something different from the totally valid command they issued.
> you would surely hope that the annexed ones stay annexed and don't get committed directly to git
If the modified file changes its match state from largefiles (e.g. crossing a filesize threshold), it would still change state between annexed/non-annexed, wouldn't it?
> keeping track of which files are supposed to be in the annex and which in git is very failure prone
> And it needs to default to adding files to the annex, otherwise the above two cases can cause problems.
Not only is it failure prone, the only thing that knows which is wanted is.. the user. The decision to usurp git and the user creates the first 2 problem cases. If you go with the expectation that the user will issue the correct commands for what they want to happen (fair, considering only the user knows), the first two cases are obviously not problems.
> If git add does something the user doesn't want
Why would it? It just adds files to git, right?
> Recovery [..] from [adding file to git] can be arbitrarily compilicated, including needing to fix problems in clones on other people's computers.
And this can still totally happen if largefiles is not set correctly for what the user wants.
Sure, you can set up git-annex to do magic to make your workflow easier or more seamless. Key words there being \"*set up*\". It shouldn't be doing such magic by default.
"""]]

View file

@ -1,10 +0,0 @@
[[!comment format=mdwn
username="Ilya_Shlyakhter"
avatar="http://cdn.libravatar.org/avatar/1647044369aa7747829c38b9dcc84df0"
subject="comment 7"
date="2019-10-07T18:20:21Z"
content="""
Related: [[todo/addunlocked_config_setting]].
Re: this thread, I also think preserving `git add` default behavior (adding to git) is better. I'm not sure it should *always* add to git. The whole point of v7 was, as I understand it, to make it possible to use normal git workflow (`git add`; `git commit`; [make changes]; `git add`; `git commit`) with large files without thinking about it. Existing scripts that just call `git add` and are unaware of `git-annex-add` would still work. So it makes sense to let `git add` add to annex *when explicitly configured*. In my use case, I'd like to configure it so that any files it adds to the annex are added as locked by default.
"""]]

View file

@ -1,15 +0,0 @@
[[!comment format=mdwn
username="joey"
subject="""comment 8"""
date="2019-10-08T18:03:05Z"
content="""
Candyangel, you missed this part of my comment:
> But `mv foo bar; git add bar` is normally identical to `git mv foo bar`.
> Why should using git-annex break that identity?
With locked files, that identity still holds; you can mv a symlink and git
add it, and you again have an annexed file. So every git and git-annex
repository has always behaved that way. There are innumerable workflows and
documentation that depend on that, in big or small ways.
"""]]

View file

@ -1,12 +0,0 @@
[[!comment format=mdwn
username="CandyAngel"
avatar="http://cdn.libravatar.org/avatar/15c0aade8bec5bf004f939dd73cf9ed8"
subject="comment 9"
date="2019-10-09T07:53:43Z"
content="""
I'm not sure which part of my post you are responding to..
Using `git add` with symlinks makes sense because you are adding the symlink, not the file, to git.
But we are talking about actual files (because unlocked or non-annexed), right? Where `git add` would add it to git instead of the annex.. which makes sense because the command for adding a *file* to git-annex is `git annex add`, not `git add`.
"""]]