Added a comment

This commit is contained in:
amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6 2019-10-12 21:12:52 +00:00 committed by admin
parent ddcc57bd9a
commit c85165ece5

View file

@ -0,0 +1,141 @@
[[!comment format=mdwn
username="amindfv@6f75355c5dad3450ed73d1f01715be90dfdd6cd6"
nickname="amindfv"
avatar="http://cdn.libravatar.org/avatar/9cdda587f634ea9a85b34b25be421676"
subject="comment 11"
date="2019-10-12T21:12:51Z"
content="""
First off, I love git annex and I truly appreciate all the hard work that's gone into it, so I hope you'll take my frustration as constructive when I say:
Making \"git add\" mean \"git annex add\" is a **terrible** default and it should be reverted **ASAP**.
## v7 is an entirely different tool than v5
mkdir foo && cd foo
git init && git annex init
echo 'one' > a.txt
git add a.txt
git commit -m '+'
echo 'two' >> a.txt
git diff
I don't get a diff!! What??! Except for after \"git annex init\", git-annex has kept completely quiet (not warning me about any of this), and yet it's hijacked the whole repo.
\"git show\" is also broken, \"git add -p\", \"git log -p\", etc etc.
As others have mentioned, \"git clone\" may cause people to lose their data as well.
In other words, \"git annex\" is no longer a couple of additional commands to use within git - it's something closer to a replacement of git. It feels like a takeover of my git repos.
## It forces a workflow on users
One of the beautiful things about git-annex is it adds a few simple concepts to git, and allows us to use those new primitives in any way we like. This has allowed users to invent lots of different workflows that meet their needs.
I've seen lots of different types of repo configurations and workflows, but for discussion here I think we only need to talk about two:
1. \"Big bunch-o-binaries\" (BBOB): user wants to keep their photo collection/scientific data/etc. in a big repo and they've got some way (like the assistant) to sync it. This could be considered to be a \"dropbox replacement\". In a BBOB nobody ever wants to look at a diff between versions of a file, do line-by-line merges, or use most of the other features of git.
2. All uses that aren't that one! But let's be specific and describe a couple:
* I've got a repo (dozens, actually) which is just code, but somewhere along the line I had a large data file I wanted to add and didn't want to slow git down. I \"git annex init && git annex add\"ed and was done with it. Back to writing code.
* I've got a repo which is a true mixture of large binaries and small text files. E.g. for a video editing project I've got raw video files as well as various configuration scripts, notes, the .kdenlive file, etc.
This change puts a giant wedge between use case #1 and #2.
As an example, the org-mode people suggest using git-annex: <https://orgmode.org/worg/org-tutorials/index.html> . I can't imagine they'd be happy to accidentally lose the ability to get diffs between versions of their .org files.
## We don't know how big of a breaking change this is...
...and it may be very large.
How many people use \"git add\" to mean \"git add\" in at least one of their annex repos? We don't know! Compatibility with a big breaking change like this wasn't asked in the last user survey (which I happily took part in): <https://git-annex-survey.branchable.com/polls/2018/>
But we can look at what _was_ asked to try and estimate the extent of the breakage:
As a proxy, we could examine how many people use the assistant (more likely to simply have a big-bunch-o-binaries), vs. number of people who use it on the command line (more likely to be carefully managing their repos, including which files are in \"vanilla\" git vs annex). The numbers in the survey indicate there are 14 people using it on the command line for every 1 person using the assistant (85% to 6%).
We can also look at how many people were using any v7 features: in the most recent survey 75% of people say they're not using any v7 features, and another 7% say they don't know, which I read as not following this discussion very closely. This suggests to me most people (at least 82%) were happy using git annex basically as it was.
That idea (\"we're basically happy how it is already\") is borne out by other questions: 83% of users rate themselves between \"happy\" and \"one of my favorite applications of all time\" (FWIW, I fall into the \"one of my favorie applications of all time\" camp!)
In addition, the \"blocking problems,\" \"focus,\" and \"roadmap\" sections of the survey don't provide compelling evidence that changing fundamentally how git-annex interacts with git is something anyone's clamoring for.
85% of people mostly use `git annex` from the command line. How many of those people have used \"git add\" in an annex repo at least once, and (now incorrectly) believe they know what it's going to do to their repo?
## More broken workflows
Even repos which mainly *are* BBOB (big-bunch-o-binaries) may have a README file or other files like them. I note that most (all?) repos here have text files that are in \"plain git\": <https://git-annex.branchable.com/publicrepos/>
Now when I \"apt upgrade\" and add a new source file to one of my code repos, it's going to be `git-annex-add`ed? Does that mean I can't (easily, sensibly) push it to a code hosting platform (GitLab, Github)?
Forcing people to this behavior by default reifies one workflow (BBOB) over all others, and basically replaces one tool (basically a few added primitives for git) with another that's also called git annex (basically a replacement for content tracking in git - and isn't that basically a replacement for git?).
I realize every user's going to have slightly different preferences, but I truly think this is the rare case where simply saying \"if you don't like it, set these flags in your repos' configs\" is not nearly good enough.
I realize it would be a pain to roll this change back, but the benefits still far outweigh the costs. It's going to be a much bigger pain point for all users who are suddenly confused and have put their history in a broken state, for all the tutorials that are now giving users inaccurate information, etc.
## Worse experience for new/inexperienced users
Anecdotally, I've turned 5-10 people on to the beauty of git annex, and in every case the reason I told them about it was they mentioned to me they needed some way to store a large data file in their existing code repository.
Now when I tell people about git annex, I have to also tell them to be super careful to set up a set of configs in orer for it not to fundamentally redefine the meaning of their repo?
The common new-user experience for me has been:
> Friend (from across the room): \"Dang, this file's too big for git\"
> Me: \"Have you tried git annex?\"
> [...talking about the benefits, installation, etc...]
> Friend: \"Ok now how do I use it?\"
> Me (still across the room): \"Just 'git annex init' in the repo and then 'git annex add' the file\"
It seems almost comical that I'd memorize a line so I could instead say:
> Me: \"Just 'git annex init' in the repo and then 'git annex add' the file. But first be SUPER careful to type 'git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)' and don't forget to keep that list of file extensions up to date\"
\"git annex init\" should be set-it-and-forget-it. I shouldn't have to worry about what parts of my git repo it's messing with because I'm not being vigilant enough.
## Should have been discussed and then announced far more widely
I go to the git annex homepage every couple of weeks, which I imagine is on the high end for a user of a command line tool. Even I was caught completely by surprise with this change, and only saw it when I \"apt-upgrade\"d.
## (Subjective:) Worse even if it didn't break (most?) users' expectations
IMO, even if this were a new tool with no existing users or workflows or repos, this would not be the best default and instead should be behind a flag. I know I'd have been less enthusiastic to start using it if it were nudging me to basically use it as a replacement for Dropbox, instead of an unobtrusive set of additional tools for git.
I'd also be less enthusiastic to know that if I weren't vigilant I'd get totally wrong behavior (e.g. I say 'git config annex.largefiles 'largerthan=100kb and not (include=*.c or include=*.h)' but then I \"git add\" a .hsc file and it \"git-annex-add\"s it \"behind my back\")
## In summary:
Issues as I see it are (and there may be more):
* It breaks users' workflows. Potentially a huge number of them. This alone should be a big reason to be careful when making a change like this.
I - one single user - have already spent hours assessing how this will affect me and my repos. It requires me to be very careful and I'm far from sure I won't slip up somewhere. Hopefully if/when I do, I'll notice the mistake.
I had other things planned for my Saturday, but some of them involved using \"git annex\" and so now I have to halt everything to make sure I'm not screwing anything up now that I \"apt upgrade\"d and got a new version of git-annex this morning.
* Uncomfortable new-user experience
* (Subjective:) not a good default even if didn't break expectations
* Should have been announced and discussed MUCH more widely and extensively
## A response to a few issues already raised:
> \"Suppose you have an unlocked file in your repo, and you rename it (not using git move), and then git add it. Oops, now you've added to git a large file that you wanted to be annexed. \"
Why not simply provide a configurable warning about attempting to \"git add\" a file above a certain size? A \"did you mean 'git annex add'?\" type warning would be helpful for everyone. It'd catch all mistakes, not just ones caused by mv.
> \"But mv foo bar; git add bar is normally identical to git mv foo bar. Why should using git-annex break that identity?\"
This to me feels like killing a mosquito with a sledgehammer. This change breaks myriad other identities, including a very simple one:
- \"git add\"ing a new file should behave the same whether or not we've at some point typed \"git annex init\"
The \"mv ; add\" identity has never been an issue for me in years of using git annex. By contrast, this change has already eaten up half of my Saturday. (Admittedly some of it writing this up. And again I should mention: I still hugely appreciate all the work on git-annex!)
## What to do about it?
I'm 100% behind Dwk's 4-point plan.
My one clarification is potentially to take \"only if largefiles is set\" to mean \"only if largefiles is set to a value other than 'nothing'.\" Not sure about this one.
"""]]