Merge branch 'v7'

This commit is contained in:
Joey Hess 2018-10-26 13:52:09 -04:00
commit 6fd37fb016
No known key found for this signature in database
GPG key ID: DB12DB0FF05F8F38
37 changed files with 528 additions and 280 deletions

View file

@ -77,3 +77,5 @@ file_%subdir%
### Have you had any luck using git-annex before? (Sometimes we get tired of reading bug reports all day and a lil' positive end note does wonders)
Yep! I already use it to move files between my laptop's HDD and SSD, and to copy files between my many SD cards. I was trying this to see if I could not have to scroll as far on my 3D printer's menu.
> [[done]] see comments --[[Joey]]

View file

@ -0,0 +1,17 @@
[[!comment format=mdwn
username="joey"
subject="""comment 4"""
date="2018-10-26T16:53:47Z"
content="""
`git annex adjust --hide-missing` is now available to do what you want
re hiding missing files.
`git annex view` doesn't currently unlock files in a v6 repo, so it's not
usable on a crippled filesystem. That's why the cat in the transcript above
shows the symlink content which git writes to a regular file when in a
crippled filesystem.
I would like to eventually unify adjust with view, so `git annex adjust
--unlock` can be used with a view, which would support that.
See [[todo/unify_adjust_with_view]].
"""]]

View file

@ -0,0 +1,11 @@
[[!comment format=mdwn
username="joey"
subject="""comment 1"""
date="2018-10-26T17:04:09Z"
content="""
Have you ever seen this again or have any more information about how to
reproduce it?
This seems similar to the problem fixed by [[!commit a13c0ce66c6dd5d8cf5b09ee2fc5a58f43db4b14]]
but the version you were using already had that commit in it.
"""]]

View file

@ -377,3 +377,5 @@ total 12
lil' positive end note mode on:
git-annex is the only thing to which I trust my archive of most valuable documents and memories!
> [[done]]; see comments --[[Joey]]

View file

@ -5,6 +5,7 @@
subject="comment 4"
date="2018-10-18T23:34:26Z"
content="""
I am stupid talking about executable files hardlinking. I think I just chmod-ed already hardlinking files, that's how I got it. No surprise.
I am ok with this quirk (executable files are not thinned), but just curious: what exactly influenced such design decision?

View file

@ -0,0 +1,24 @@
[[!comment format=mdwn
username="joey"
subject="""comment 5"""
date="2018-10-26T17:17:35Z"
content="""
[[!commit b7c8bf5274a64389ac87d6ce0388b8708c261971]] is where that was
implemented. Interestingly, its commit message does say that the annex
object file is made executable when using annex.thin.
And indeed, git add of an executable file with annex.thin set does
make the object executable and hard link to it.
But that commit contains this line that avoids hard linking:
| maybe False isExecutable destmode = copy =<< getstat
Which is what I based my earlier comment on. But without that line,
AFAIK it will behave the way you want, with the annex object and
executable worktree file being hard linked. The code also removes the
execute bit if the annex object file later ends up getting hard linked
instead to a non-executable file.
So, based on this analysis, I'm going to remove that line. And improve the
annex.thin docs slightly, and I think that's sufficient to close this bug.
"""]]

View file

@ -11,9 +11,9 @@ understand how to update its working tree.
## deprecated
Direct mode is deprecated! Intead, git-annex v6 repositories can simply
Direct mode is deprecated! Intead, git-annex v7 repositories can simply
have files that are unlocked and thus can be directly accessed and
modified. See [[upgrades]] for details about the transition to v6
modified. See [[upgrades]] for details about the transition to v7
repositories.
## enabling (and disabling) direct mode

View file

@ -6,11 +6,13 @@ git-annex smudge - git filter driver for git-annex
git annex smudge [--clean] file
git annex smudge --update
# DESCRIPTION
This command lets git-annex be used as a git filter driver which lets
annexed files in the git repository to be unlocked at all times, instead
of being symlinks.
annexed files in the git repository to be unlocked, instead
of being symlinks, and lets `git add` store files in the annex.
When adding a file with `git add`, the annex.largefiles config is
consulted to decide if a given file should be added to git as-is,
@ -32,6 +34,16 @@ contents:
* filter=annex
.* !filter
The smudge filter does not provide git with the content of annexed files,
because that would be slow and triggers memory leaks in git. Instead,
it records which worktree files need to be updated, and
`git annex smudge --update` later updates the work tree to contain
the content. That is run by several git hooks, including post-checkout
and post-merge. However, a few git commands, notably `git stash` and
`git cherry-pick`, do not run any hooks, so after using those commands
you can manually run `git annex smudge --update` to update the working
tree.
# SEE ALSO
[[git-annex]](1)

View file

@ -1024,18 +1024,16 @@ Here are all the supported configuration settings.
* `annex.thin`
Set this to `true` to make unlocked files be a hard link to their content
in the annex, rather than a second copy. (Only when supported by the file
system, and only in repository version 6.) This can save considerable
in the annex, rather than a second copy. This can save considerable
disk space, but when a modification is made to a file, you will lose the
local (and possibly only) copy of the old version. So, enable with care.
After setting (or unsetting) this, you should run `git annex fix` to
fix up the annexed files in the work tree to be hard links (or copies).
Note that `annex.thin` is not honored when git updates an annexed file
in the working tree. So when `git checkout` or `git merge` updates the
working tree, a second copy of annexed files will result. You can run
`git-annex fix` to fix up the hard links after running such git commands.
Note that this has no effect when the filesystem does not support hard links.
And when multiple files in the work tree have the same content, only
one of them gets hard linked to the annex.
* `annex.delayadd`

View file

@ -8,10 +8,10 @@ but it needs some different workflows of using git-annex.
## getting started
To get started, your repository needs to be upgraded to v6, since the
To get started, your repository needs to be upgraded to v7, since the
feature does not work in v5 repositories.
git annex upgrade --version=6
git annex upgrade --version=7
The [[git-annex adjust|git-annex-adjust]] command sets up an adjusted form
of a git branch, in this case we'll ask it to hide missing files.
@ -124,7 +124,7 @@ I set up the repository like this:
git clone server:/path/to/podcasts
cd podcasts
git annex upgrade --version=6
git annex upgrade --version=7
git annex adjust --hide-missing
git annex group here client
git annex wanted here standard

View file

@ -15,7 +15,7 @@ by running `git annex unlock`.
# git annex unlock some_file
# echo "new content" > some_file
Back before git-annex version 6, and its v6 repository mode, unlocking a file
Back before git-annex version 7, and its v7 repository mode, unlocking a file
like this was a transient thing. You'd modify it and then `git annex add` the
modified version to the annex, and finally `git commit`. The new version of
the file was then back to being locked.
@ -29,31 +29,28 @@ to edit files repeatedly, without manually having to unlock them every time.
The [[direct_mode]] made all files be unlocked all the time, but it
had many problems of its own.
## enter v6 mode
## enter v7 mode
/!\ This is a new feature; see its [[todo_list|todo/smudge]]
for known issues.
This led to the v6 repository mode, which makes unlocked files remain
This led to the v7 repository mode, which makes unlocked files remain
unlocked after they're committed, so you can keep changing them and
committing the changes whenever you'd like. It also lets you use more
normal git commands (or even interfaces on top of git) for handling
annexed files.
To get a repository into v6 mode, you can [[upgrade|upgrades]] it.
To get a repository into v7 mode, you can [[upgrade|upgrades]] it.
This will eventually happen automatically, but for now it's a manual process
(be sure to read [[upgrades]] before doing this):
# git annex upgrade
Or, you can init a new repository in v6 mode.
Or, you can init a new repository in v7 mode.
# git init
# git annex init --version=6
# git annex init --version=7
## using it
Using a v6 repository is easy! Simply use regular git commands to add
Using a v7 repository is easy! Simply use regular git commands to add
and commit files. In a git-annex repository, git will use git-annex
to store the file contents, and the files will be left unlocked.
@ -97,7 +94,7 @@ mode is used. To make them always use unlocked mode, run:
## mixing locked and unlocked files
A v6 repository can contain both locked and unlocked files. You can switch
A v7 repository can contain both locked and unlocked files. You can switch
a file back and forth using the `git annex lock` and `git annex unlock`
commands. This changes what's stored in git between a git-annex symlink
(locked) and a git-annex pointer file (unlocked). To add a file to
@ -108,28 +105,34 @@ If you want to mostly keep files locked, but be able to locally switch
to having them all unlocked, you can do so using `git annex adjust
--unlock`. See [[git-annex-adjust]] for details. This is particularly
useful when using filesystems like FAT, and OS's like Windows that don't
support symlinks.
support symlinks. Indeed, `git-annex init` detects such filesystems and
automatically sets up a repository to use all unlocked files.
## index gotchas
## imperfections
When git-annex gets or drops the content of an unlocked file, it updates
the file in git's worktree accordingly. That makes `git status` show
the file as modified, even though there are no changes to commit.
So git-annex then updates the index file to reflect the change to the
worktree, and prevent the file from appearing to be modified.
Unlocked files in v7 repositories mostly work very well, but there are a
few imperfections which you should be aware of when using them.
This means that when git-annex is running a command that gets or drops the
content of an unlocked file, the index will sometimes be locked. This might
prevent you from `git commit` at the same time. Or, if you have a git
commit in progress, or are running multiple git-annex processes, git-annex
may complain that the index is locked.
1. `git stash`, `git cherry-pick` and `git reset --hard` don't update
the working tree with the content of unlocked files. The files
will contain pointers, the same as if the content was not in the
repository. So after running these commands, you will need to manually
run `git annex smudge --update`.
Also, interrupting git-annex (eg with ctrl-c) before it can update the
index will leave `git status` showing modifications.
2. When git-annex is running a command that gets or drops the content
of an unlocked file, git's index will briefly be locked, which might
prevent you from running a `git commit` at the same time.
To manually update the index when git-annex was not able to, you can run:
3. Conversely, if you have a git commit in progress, running git-annex may
complain that the index is locked, though this will not prevent it from
working.
git update-index -q --refresh $file
4. When an operation such as a checkout or merge needs to update a large
number of unlocked files, it can become slow. So can be `git add` of
a large number of files (`git annex add` is faster).
(The technical reasons behind these imperfections are explained in
detail in [[todo/git_smudge_clean_interface_suboptiomal]].)
## using less disk space
@ -168,15 +171,6 @@ match the new setting:
git annex fix
Unfortunately, git's smudge interface does not let git-annex honor
the annex.thin configuration when git is checking out a file.
So, using `git checkout` to check out a different branch, or even
`git merge` can result in some non-thin files making their way into the
working tree, and using more disk space. A warning will be printed out in
this situation. You can always run `git annex fix` to re-thin such files.
## annex.thin tradeoffs
[[!template id=note text="""
When a [[direct_mode]] repository is upgraded, annex.thin is automatically
set, because direct mode made the same single-copy tradeoff.

View file

@ -0,0 +1,21 @@
[[!comment format=mdwn
username="joey"
subject="""comment 3"""
date="2018-10-26T16:21:28Z"
content="""
While `git add` would be a lot slower when using this interface to add
large files, it would make `git checkout` and other commands that update
the work tree a lot faster.
Since the smudge filter is not providing git with the file content any more,
using filterdriver would avoid git running many git-annex smudge processes,
greatly speeding up large checkouts.
Unfortunately, `git annex smudge --update` ends up running the smudge filter
on all files that the clean filter earlier acted on, so even if filterdriver were
used to speed up the clean filter, there would still be one process spawned per
file for the smudge filter.
So some interface improvement is needed before git-annex can usefully use
this.
"""]]

View file

@ -1,82 +1,13 @@
git-annex should use smudge/clean filters. v6 mode
git-annex should use smudge/clean filters. v7 mode
### problems keeping v6 experimental
## warts
* Checking out a different branch causes git to smudge all changed files,
and write their content. This does not honor annex.thin. A warning
message is printed in this case.
This is particularly wasteful when checking out an adjusted unlocked
branch, which causes 2x the space to be used.
"git annex proxy" could be used to handle this.
Make it run the git command with smudge filter set to not output content
but only pointers, and then at the end populate the pointer files, hard
when appropriate. (As an optimization, the smudge filter could also be
made to use the long-running filter interface when run this way.)
git-annex adjust and git-annex sync could both use that internally
when checking out the adjusted branch, and merging a branch into HEAD.
Or: Make the smudge filter never provide the actual file content, but the
pointer. Install post-checkout and post-merge hooks that populate
the worktree files that were checked out. Of course, they will also
need to update the index.
Problem: post-merge hook is not run when there's a merge conflict.
Git does not actually run the smudge filter in this case;
the conflicting file becomes a text file containing a merge conflict
between the two annex pointers. When the user resolves the conflict
and git add's the result, git runs the smudge filter.
So, if the smudge filter then provides the pointer, the file would not be
populated. The post-commit hook would then need to populate the file,
once the merge got committed.
Problem: No hook seems to be run for git stash / git stash apply
or for git reset --hard or git cherry-pick.
Fatal or can we live with needing to run a
git-annex command to populate the files after those commands?
> implemented on the `delaysmudge` branch now
(My enhanced smudge/clean patch set also fixed this problem, in a much
nicer way...)
* Optionally: Use the filterdriver interface during checkout. Unfortunately that
interface is slower for cleaning during git add (see
[[todo/Long_Running_Filter_Process]]), but since the smudge filter is not
providing git with the file content any more, using filterdriver would
avoid git running many git-annex smudge processes, greatly speeding up large
checkouts. git add could be left slow, with git-annex add being the fast path,
until the filterdriver interface is improved. Or, make "git annex proxy"
use the filterdriver interface for checkout.
* When git runs the smudge filter, it buffers all its output in ram before
writing it to a file. So, checking out a branch with a large v6 unlocked files
can cause git to use a lot of memory.
This needs to be fixed in git, but my proposed interface in
<http://thread.gmane.org/gmane.comp.version-control.git/294425> would
avoid the problem for git checkout, since it would use the new interface
and not the smudge filter.
Last verified with git 2.18 in 2018.
Note that the long-running filter process interface has the same problem.
The annex.thin idea above could work around this problem.
> implemented on the `delaysmudge` branch now
## other warts
* There are several v6 bugs that are edge cases and
* There are several bugs that are edge cases and
need more info or analysis. None of these seem like blockers
to keep v6 experimental or to replacing direct mode with v6.
to keep v7 experimental or to replacing direct mode with v7.
- <http://git-annex.branchable.com/bugs/assistant_crashes_in_TransferScanner/>
- <http://git-annex.branchable.com/bugs/v6_appears_to_not_thin/>
- <http://git-annex.branchable.com/bugs/Metadata_views_in_v6_repo_upgraded_from_direct_mode_act_strangely/>
- <http://git-annex.branchable.com/bugs/git-annex-sync_sometimes_fails_in_submodule_in_V6_adjusted_branch/>
### long term todos
@ -86,14 +17,14 @@ git-annex should use smudge/clean filters. v6 mode
multiple files, and so should be faster.
See [[todo/Long_Running_Filter_Process]] .. it's not currently actually a
win but might be a good way to improve git to work better with v6.
win but might be a good way to improve git to work better with v7.
* Eventually (but not yet), make v6 the default for new repositories.
* Eventually (but not yet), make v7 the default for new repositories.
Note that the assistant forces repos into direct mode; that will need to
be changed then, and it should enable annex.thin instead.
* Later still, remove support for direct mode, and enable automatic
v5 to v6 upgrades.
v5 to v7 upgrades.
### historical notes
@ -395,7 +326,7 @@ just look at the repo content in the first place..
#### Upgrading
annex.version changes to 6
annex.version changes to 7
git config for filter.annex.smudge and filter.annex.clean is set up.

View file

@ -0,0 +1,7 @@
`git annex adjust` and `git annex view` (et all) both derive a branch from
the main branch and enter it. They have different capabilies. It would be
useful to be able to compose them. For example, to enter a view based on
metadata that also has all files unlocked.
There's also probably a fair amount of overlap in their implementations.
--[[Joey]]

View file

@ -46,11 +46,18 @@ the upgrade would need to be run in a copy of the repository.
The upgrade events, so far:
## v5 -> v6 (git-annex version 6.x)
## v6 -> v7 (git-annex version 7.x)
The upgrade from v5 to v6 is handled manually for now.
The upgrade from v5 to v7 is handled manually for now.
Run `git-annex upgrade` to perform the upgrade.
v6 repositories are automatically upgraded to v7.
The only difference between v6 and v7 is that some additional git hooks
were added in v7.
## v5 -> v6 (git-annex version 6.x)
A v6 git-annex repository can have some files locked while other files are
unlocked, and all git and git-annex commands can be used on both locked and
unlocked files. (Although for locked files to be accessible, the filesystem