Merge branch 'smudge'

This commit is contained in:
Joey Hess 2015-12-24 19:23:18 -04:00
commit 72e717e14c
Failed to extract signature
76 changed files with 2392 additions and 894 deletions

View file

@ -9,6 +9,13 @@ understand how to update its working tree.
[[!toc]]
## deprecated
Direct mode is deprecated! Intead, git-annex v6 repositories can simply
have files that are unlocked and thus can be directly accessed and
modified. See [[upgrades]] for details about the transition to v6
repositories.
## enabling (and disabling) direct mode
Normally, git-annex repositories start off in indirect mode. With some

View file

@ -11,12 +11,18 @@ git annex add `[path ...]`
Adds files in the path to the annex. If no path is specified, adds
files from the current directory and below.
Normally, files that are already checked into git, or that git has been
configured to ignore will be silently skipped.
Files that are already checked into git and are unmodified, or that
git has been configured to ignore will be silently skipped.
If annex.largefiles is configured, and does not match a file that is being
added, `git annex add` will behave the same as `git add` and add the
non-large file directly to the git repository, instead of to the annex.
If annex.largefiles is configured, and does not match a file, `git annex
add` will behave the same as `git add` and add the non-large file directly
to the git repository, instead of to the annex.
Large files are added to the annex in locked form, which prevents further
modification of their content unless unlocked by [[git-annex-unlock]](1).
(This is not the case however when a repository is in direct mode.)
To add a file to the annex in unlocked form, `git add` can be used instead
(that only works when the repository has annex.version 6 or higher).
This command can also be used to add symbolic links, both symlinks to
annexed content, and other symlinks.

View file

@ -17,12 +17,18 @@ Note that git commands that operate on the work tree will refuse to
run in direct mode repositories. Use `git annex proxy` to safely run such
commands.
Note that the direct mode/indirect mode distinction is removed in v6
git-annex repositories. In such a repository, you can
use [[git-annex-unlock]](1) to make a file's content be directly present.
# SEE ALSO
[[git-annex]](1)
[[git-annex-indirect]](1)
[[git-annex-unlock]](1)
# AUTHOR
Joey Hess <id@joeyh.name>

View file

@ -11,9 +11,8 @@ git annex indirect
Switches a repository back from direct mode to the default, indirect
mode.
Some systems cannot support git-annex in indirect mode, because they
do not support symbolic links. Repositories on such systems instead
default to using direct mode.
Note that the direct mode/indirect mode distinction is removed in v6
git-annex repositories.
# SEE ALSO

View file

@ -24,6 +24,13 @@ mark it as dead (see [[git-annex-dead]](1)).
This command is entirely safe, although usually pointless, to run inside an
already initialized git-annex repository.
# OPTIONS
* `--version=N`
Force the repository to be initialized using a different annex.version
than the current default.
# SEE ALSO
[[git-annex]](1)

View file

@ -9,7 +9,7 @@ git annex lock `[path ...]`
# DESCRIPTION
Use this to undo an unlock command if you don't want to modify
the files, or have made modifications you want to discard.
the files any longer, or have made modifications you want to discard.
# OPTIONS

View file

@ -12,10 +12,14 @@ This is meant to be called from git's pre-commit hook. `git annex init`
automatically creates a pre-commit hook using this.
Fixes up symlinks that are staged as part of a commit, to ensure they
point to annexed content. Also handles injecting changes to unlocked
files into the annex. When in a view, updates metadata to reflect changes
point to annexed content.
When in a view, updates metadata to reflect changes
made to files in the view.
When in a repository that has not been upgraded to annex.version 6,
also handles injecting changes to unlocked files into the annex.
# SEE ALSO
[[git-annex]](1)

43
doc/git-annex-smudge.mdwn Normal file
View file

@ -0,0 +1,43 @@
# NAME
git-annex smudge - git filter driver for git-annex
# SYNOPSIS
git annex smudge [--clean] file
# DESCRIPTION
This command lets git-annex be used as a git filter driver which lets
annexed files in the git repository to be unlocked at all times, instead
of being symlinks.
When adding a file with `git add`, the annex.largefiles config is
consulted to decide if a given file should be added to git as-is,
or if its content are large enough to need to use git-annex.
The git configuration to use this command as a filter driver is as follows.
This is normally set up for you by git-annex init, so you should
not need to configure it manually.
[filter "annex"]
smudge = git-annex smudge %f
clean = git-annex smudge --clean %f
To make git use that filter driver, it needs to be configured in
the .gitattributes file or in `.git/config/attributes`. The latter
is normally configured when a repository is initialized, with the following
contents:
* filter=annex
.* !filter
# SEE ALSO
[[git-annex]](1)
# AUTHOR
Joey Hess <id@joeyh.name>
Warning: Automatically converted into a man page by mdwn2man. Edit with care.

View file

@ -11,8 +11,16 @@ git annex unlock `[path ...]`
Normally, the content of annexed files is protected from being changed.
Unlocking an annexed file allows it to be modified. This replaces the
symlink for each specified file with a copy of the file's content.
You can then modify it and `git annex add` (or `git commit`) to inject
it back into the annex.
You can then modify it and `git annex add` (or `git commit`) to save your
changes.
In repositories with annex.version 5 or earlier, unlocking a file is local
to the repository, and is temporary. With version 6, unlocking a file
changes how it is stored in the git repository (from a symlink to a pointer
file), so you can commit it like any other change. Also in version 6, you
can use `git add` to add a fie to the annex in unlocked form. This allows
workflows where a file starts out unlocked, is modified as necessary, and
is locked once it reaches its final version.
# OPTIONS

View file

@ -626,6 +626,14 @@ subdirectories).
See [[git-annex-diffdriver]](1) for details.
* `smudge`
This command lets git-annex be used as a git filter driver, allowing
annexed files in the git repository to be unlocked at all times, instead
of being symlinks.
See [[git-annex-smudge]](1) for details.
* `remotedaemon`
Detects when network remotes have received git pushes and fetches from them.

View file

@ -158,7 +158,8 @@ Using git-annex on a crippled filesystem that does not support symlinks.
Data:
* An annex pointer file has as its first line the git-annex key
that it's standing in for. Subsequent lines of the file might
that it's standing in for (prefixed with "annex/objects/", similar to
an annex symlink target). Subsequent lines of the file might
be a message saying that the file's content is not currently available.
An annex pointer file is checked into the git repository the same way
that an annex symlink is checked in.
@ -177,8 +178,8 @@ Configuration:
the annex. Other files are passed through the smudge/clean as-is and
have their contents stored in git.
* annex.direct is repurposed to configure how the assistant adds files.
When set to true, they're added unlocked.
* annex.direct is repurposed to configure how git-annex adds files.
When set to false, it adds symlinks and when true it adds pointer files.
git-annex clean:
@ -232,15 +233,11 @@ git annex lock/unlock:
transition repositories to using pointers, and a cleaner unlock/lock
for repos using symlinks.
unlock will stage a pointer file, and will copy the content of the object
out of .git/annex/objects to the work tree file. (Might want a --hardlink
switch.)
unlock will stage a pointer file, and will link the content of the object
from .git/annex/objects to the work tree file.
lock will replace the current work tree file with the symlink, and stage it.
Note that multiple work tree files could point to the same object.
So, if the link count is > 1, replace the annex object with a copy of
itself to break such a hard link. Always finish by locking down the
permissions of the annex object.
lock will replace the current work tree file with the symlink, and stage it,
and lock down the permissions of the annex object.
#### file map
@ -248,7 +245,8 @@ The file map needs to map from `Key -> [File]`. `File -> Key`
seems useful to have, but in practice is not worthwhile.
Drop and get operations need to know what files in the work tree use a
given key in order to update the work tree.
given key in order to update the work tree. And, we don't want to
overwrite a work tree file if it's been modified when dropping or getting.
git-annex commands that look at annex symlinks to get keys to act on will
need fall back to either consulting the file map, or looking at the staged
@ -275,13 +273,14 @@ In particular:
* Is the smudge filter called at any other time? Seems unlikely but then
there could be situations with a detached work tree or such.
* Does git call any useful hooks when removing a file from the work tree,
or converting it to not be annexed?
or converting it to not be annexed, or for `git mv` of an annexed file?
No!
From this analysis, any file map generated by the smudge/clean filters
is necessary potentially innaccurate. It may list deleted files.
It may or may not reflect current unstaged changes from the work tree.
Follows that any use of the file map needs to verify the info from it,
and throw out bad cached info (updating the map to match reality).
@ -306,17 +305,71 @@ just look at the repo content in the first place..
annex.version changes to 6
Upgrade should be handled automatically.
git config for filter.annex.smudge and filter.annex.clean is set up.
On upgrade, update .gitattributes with a stock configuration, unless
it already mentions "filter=annex".
.gitattributes is updated with a stock configuration,
unless it already mentions "filter=annex".
Upgrading a direct mode repo needs to switch it out of bare mode, and
needs to run `git annex unlock` on all files (or reach the same result).
So will need to stage changes to all annexed files.
When a repo has some clones indirect and some direct, the upgraded repo
will have all files unlocked, necessarily in all clones.
will have all files unlocked, necessarily in all clones. This happens
automatically, because when the direct repos are upgraded that causes the
files to be unlocked, while the indirect upgrades don't touch the files.
#### implementation todo list
* Still a few test suite failues for v6 with locked files.
* Test suite should make pass for v6 with unlocked files.
* Reconcile staged changes into the associated files database, whenever
the database is queried. This is needed to handle eg:
git add largefile
git mv largefile othername
git annex move othername --to foo
# fails to drop content from associated file othername,
# because it doesn't know it has that name
# git commit clears up this mess
* Interaction with shared clones. Should avoid hard linking from/to a
object in a shared clone if either repository has the object unlocked.
(And should avoid unlocking an object if it's hard linked to a shared clone,
but that's already accomplished because it avoids unlocking an object if
it's hard linked at all)
* Make automatic merge conflict resolution work for pointer files.
- Should probably automatically handle merge conflicts between annex
symlinks and pointer files too. Maybe by always resulting in a pointer
file, since the symlinks don't work everwhere.
* Crippled filesystem should cause all files to be transparently unlocked.
Note that this presents problems when dealing with merge conflicts and
when pushing changes committed in such a repo. Ideally, should avoid
committing implicit unlocks, or should prevent such commits leaking out
in pushes.
* Dropping a smudged file causes git status (and git annex status)
to show it as modified, because the timestamp has changed.
Getting a smudged file can also cause this.
Upgrading a direct mode repo also leaves files in this state.
User can use `git add` to clear it up, but better to avoid this,
by updating stat info in the index.
(May need to use libgit2 to do this, cannot find
any plumbing except git-update-index, which is very inneficient for
smudged files.)
* Audit code for all uses of isDirect. These places almost always need
adjusting to support v6, if they haven't already.
* Optimisation: See if the database schema can be improved to speed things
up. Are there enough indexes? getAssociatedKey in particular does a
reverse lookup and might benefit from an index.
* Optimisation: Reads from the Keys database avoid doing anything if the
database doesn't exist. This makes v5 repos, or v6 with all locked files
faster. However, if a v6 repo unlocks and then re-locks a file, its
database will exist, and so this optimisation will no longer apply.
Could try to detect when the database is empty, and remove it or avoid reads.
* Eventually (but not yet), make v6 the default for new repositories.
Note that the assistant forces repos into direct mode; that will need to
be changed then.
* Later still, remove support for direct mode, and enable automatic
v5 to v6 upgrades.
----

View file

@ -43,6 +43,46 @@ conflicts first before upgrading git-annex.
The upgrade events, so far:
## v5 -> v6 (git-annex version 6.x)
The upgrade from v5 to v6 is handled manually. Run `git-annex upgrade`
perform the upgrade.
Warning: All places that a direct mode repository is cloned to should be
running git-annex version 6.x before you upgrade the repository.
This is necessary because the contents of the repository are changed
in the upgrade, and the old version of git-annex won't be able to
access files after the repo is upgraded.
This upgrade does away with the direct mode/indirect mode distinction.
A v6 git-annex repository can have some files locked and other files
unlocked, and all git and git-annex commands can be used on both locked and
unlocked files. (Although for locked files to work, the filesystem
must support symbolic links..)
The behavior of some commands changes in an upgraded repository:
* `git add` will add files to the annex, in unlocked mode, rather than
adding them directly to the git repository. To cause some files to be
added directly to git, you can configure `annex.largefiles`. For
example:
git config annex.largefiles "largerthan=100kb and not (include=*.c or include=*.h)"
* `git annex unlock` and `git annex lock` change how the pointer to
the annexed content is stored in git.
If a repository is only used in indirect mode, you can use git-annex
v5 and v6 in different clones of the same indirect mode repository without
problems.
On upgrade, all files in a direct mode repository will be converted to
unlocked files. The upgrade will stage changes to all annexed files in
the git repository, which you can then commit.
If a repository has some clones using direct mode and some using indirect
mode, all the files will end up unlocked in all clones after the upgrade.
## v4 -> v5 (git-annex version 5.x)
The upgrade from v4 to v5 is handled