annex.thin
Decided it's too scary to make v6 unlocked files have 1 copy by default, but that should be available to those who need it. This is consistent with git-annex not dropping unused content without --force, etc. * Added annex.thin setting, which makes unlocked files in v6 repositories be hard linked to their content, instead of a copy. This saves disk space but means any modification of an unlocked file will lose the local (and possibly only) copy of the old version. * Enable annex.thin by default on upgrade from direct mode to v6, since direct mode made the same tradeoff. * fix: Adjusts unlocked files as configured by annex.thin.
This commit is contained in:
parent
bb6719678e
commit
121f5d5b0c
17 changed files with 259 additions and 146 deletions
|
@ -1,6 +1,6 @@
|
|||
# NAME
|
||||
|
||||
git-annex fix - fix up symlinks to point to annexed content
|
||||
git-annex fix - fix up links to annexed content
|
||||
|
||||
# SYNOPSIS
|
||||
|
||||
|
@ -11,8 +11,11 @@ git annex fix `[path ...]`
|
|||
Fixes up symlinks that have become broken to again point to annexed
|
||||
content.
|
||||
|
||||
This is useful to run if you have been moving the symlinks around,
|
||||
but is done automatically when committing a change with git too.
|
||||
This is useful to run manually when you have been moving the symlinks
|
||||
around, but is done automatically when committing a change with git too.
|
||||
|
||||
Also, adjusts unlocked files to be copies or hard links as
|
||||
configured by annex.thin.
|
||||
|
||||
# OPTIONS
|
||||
|
||||
|
|
|
@ -10,7 +10,7 @@ git annex unlock `[path ...]`
|
|||
|
||||
Normally, the content of annexed files is protected from being changed.
|
||||
Unlocking an annexed file allows it to be modified. This replaces the
|
||||
symlink for each specified file with a copy of the file's content.
|
||||
symlink for each specified file with the file's content.
|
||||
You can then modify it and `git annex add` (or `git commit`) to save your
|
||||
changes.
|
||||
|
||||
|
@ -22,6 +22,14 @@ can use `git add` to add a fie to the annex in unlocked form. This allows
|
|||
workflows where a file starts out unlocked, is modified as necessary, and
|
||||
is locked once it reaches its final version.
|
||||
|
||||
Normally, unlocking a file requires a copy to be made of its content,
|
||||
so that its original content is preserved, while the copy can be modified.
|
||||
To use less space, annex.thin can be set to true; this makes a hard link
|
||||
to the content be made instead of a copy. (Only when supported by the file
|
||||
system, and only in repository version 6.) While this can save considerable
|
||||
disk space, any modification made to a file will cause the old version of the
|
||||
file to be lost from the local repository. So, enable annex.thin with care.
|
||||
|
||||
# OPTIONS
|
||||
|
||||
* file matching options
|
||||
|
|
|
@ -904,6 +904,14 @@ Here are all the supported configuration settings.
|
|||
will automatically set annex.hardlink and mark the repository as
|
||||
untrusted.
|
||||
|
||||
* `annex.thin`
|
||||
|
||||
Set this to `true` to make unlocked files be a hard link to their content
|
||||
in the annex, rather than a second copy. (Only when supported by the file
|
||||
system, and only in repository version 6.) This can save considerable
|
||||
disk space, but modification made to a file will lose the local (and
|
||||
possibly only) copy of the old version. So, enable with care.
|
||||
|
||||
* `annex.delayadd`
|
||||
|
||||
Makes the watch and assistant commands delay for the specified number of
|
||||
|
|
|
@ -49,9 +49,11 @@ Or, you can init a new repository in v6 mode.
|
|||
# git init
|
||||
# git annex init --version=6
|
||||
|
||||
## using it
|
||||
|
||||
Using a v6 repository is easy! Just use regular git commands to add
|
||||
and commit files. Under the hood, git will use git-annex to store the file
|
||||
contents.
|
||||
and commit files. git will use git-annex to store the file contents,
|
||||
and the files will be left unlocked.
|
||||
|
||||
[[!template id=note text="""
|
||||
Want `git add` to add some file contents to the annex, but store the contents of
|
||||
|
@ -70,8 +72,8 @@ smaller files in git itself? Configure annex.largefiles to match the former.
|
|||
# git annex find
|
||||
my_cool_big_file
|
||||
|
||||
You can make whatever changes you like to committed files, and commit your
|
||||
changes.
|
||||
You can make whatever modifications you want to unlocked files, and commit
|
||||
your changes.
|
||||
|
||||
# echo more stuff >> my_cool_big_file
|
||||
# git mv my_cool_big_file my_cool_bigger_file
|
||||
|
@ -81,47 +83,62 @@ changes.
|
|||
delete mode 100644 my_cool_big_file
|
||||
create mode 100644 my_cool_bigger_file
|
||||
|
||||
Under the hood, this uses git's [[todo/smudge]] filter interface,
|
||||
and git-annex converts between the content of the big file and a pointer file,
|
||||
Under the hood, this uses git's [[todo/smudge]] filter interface, and
|
||||
git-annex converts between the content of the big file and a pointer file,
|
||||
which is what gets committed to git.
|
||||
|
||||
A v6 repository can have both locked and unlocked files. You can switch
|
||||
A v6 repository can contain both locked and unlocked files. You can switch
|
||||
a file back and forth using the `git annex lock` and `git annex unlock`
|
||||
commands. This changes what's stored in git between a git-annex symlink
|
||||
(locked) and a git-annex pointer file (unlocked).
|
||||
(locked) and a git-annex pointer file (unlocked). To add a file to
|
||||
the repository in locked mode, use `git annex add`; to add a file in
|
||||
unlocked mode, use `git add`.
|
||||
|
||||
## danger will robinson
|
||||
## using less disk space
|
||||
|
||||
Unlocked files are handy, but they have one significant disadvantage
|
||||
compared with locked files: They use more disk space.
|
||||
While only one copy of a locked file has to be stored, normally,
|
||||
two copies of an unlocked file are stored on disk. One copy is in
|
||||
the git work tree, where you can use and modify it,
|
||||
and the other is stashed away in `.git/annex/objects` (see [[internals]]).
|
||||
|
||||
The reason for that second copy is to preserve the old version of the file,
|
||||
if you modify the unlocked file in the work tree. Being able to access
|
||||
old versions of files is an important part of git after all.
|
||||
|
||||
That's a good safe default. But there are ways to use git-annex that
|
||||
make the second copy not be worth keeping:
|
||||
|
||||
[[!template id=note text="""
|
||||
Double the disk space is used on systems like Windows that don't support
|
||||
hard links.
|
||||
When a [[direct_mode]] repository is upgraded, annex.thin is automatically
|
||||
set, because direct mode made the same single-copy tradeoff.
|
||||
"""]]
|
||||
|
||||
In contrast with locked files, which are quite safe, using unlocked files is a
|
||||
little bit dangerous. git-annex tries to avoid storing a duplicate copy of an
|
||||
unlocked file in your local repository, in order to not use double the disk
|
||||
space. But this means that an unlocked file can be the only copy of that
|
||||
version of the file's content. Modify it, and oops, you lost the old version!
|
||||
* When you're using git-annex to sync the current version of files acrosss
|
||||
devices, and don't care much about previous versions.
|
||||
* When you have set up a backup repository, and use git-annex to copy
|
||||
your files to the backup.
|
||||
|
||||
In fact, that happened in the examples above, and you probably didn't notice
|
||||
until now.
|
||||
In situations like these, you may want to avoid the overhead of the second
|
||||
local copy of unlocked files. There's config setting for that.
|
||||
|
||||
# git checkout HEAD^
|
||||
HEAD is now at 92f2725 added my_cool_big_file to the annex
|
||||
# cat my_cool_big_file
|
||||
/annex/objects/SHA256E-s30--e7aaf46f227886c10c98f8f76cae681afd0521438c78f958fc27114674b391a4
|
||||
git config annex.thin true
|
||||
|
||||
Woah, what's all that?! Well, it's the pointer file that gets checked into
|
||||
git. You'd see the same thing if you had used `git annex drop` to drop
|
||||
the content of the file from your repository.
|
||||
After changing annex.thin, you'll want to fix up the work tree to
|
||||
match the new setting:
|
||||
|
||||
In the example above, the content wasn't explicitly dropped, but it was
|
||||
modified while it was unlocked... and so the old version of the content
|
||||
was lost.
|
||||
git annex fix
|
||||
|
||||
If this is worrying -- and it should be -- you'll want to keep files locked
|
||||
most of the time, or set up a remote and have git-annex copy the content of
|
||||
files to the remote as a backup.
|
||||
Note that setting annex.thin only has any effect on systems that support
|
||||
hard links. Ie, not Windows, and not FAT filesystems.
|
||||
|
||||
By the way, don't worry about deleting an unlocked file. That *won't* lose
|
||||
its content.
|
||||
## tradeoffs
|
||||
|
||||
Setting annex.thin can save a lot of disk space, but it's a tradeoff
|
||||
between disk usage and safety.
|
||||
|
||||
Keeping files locked is safer and also avoids using unnecessary
|
||||
disk space, but trades off easy modification of files.
|
||||
|
||||
Pick the tradeoff that's right for you.
|
||||
|
|
|
@ -13,10 +13,11 @@ git-annex should use smudge/clean filters.
|
|||
# because it doesn't know it has that name
|
||||
# git commit clears up this mess
|
||||
* Interaction with shared clones. Should avoid hard linking from/to a
|
||||
object in a shared clone if either repository has the object unlocked.
|
||||
(And should avoid unlocking an object if it's hard linked to a shared clone,
|
||||
but that's already accomplished because it avoids unlocking an object if
|
||||
it's hard linked at all)
|
||||
object in a shared clone if either repository has the object unlocked
|
||||
with a hard link in place.
|
||||
(And should avoid unlocking an object with a hard link if it's hard
|
||||
linked to a shared clone, but that's already accomplished because it
|
||||
avoids unlocking an object if it's hard linked at all)
|
||||
* Make automatic merge conflict resolution work for pointer files.
|
||||
- Should probably automatically handle merge conflicts between annex
|
||||
symlinks and pointer files too. Maybe by always resulting in a pointer
|
||||
|
@ -46,7 +47,7 @@ git-annex should use smudge/clean filters.
|
|||
|
||||
* Eventually (but not yet), make v6 the default for new repositories.
|
||||
Note that the assistant forces repos into direct mode; that will need to
|
||||
be changed then.
|
||||
be changed then, and it should enable annex.thin.
|
||||
* Later still, remove support for direct mode, and enable automatic
|
||||
v5 to v6 upgrades.
|
||||
|
||||
|
@ -158,7 +159,7 @@ cannot directly write to the file or git gets unhappy.
|
|||
.. Are very important, otherwise a repo can't scale past the size of the
|
||||
smallest client's disk!
|
||||
|
||||
It would be nice if the smudge filter could hard link or symlink a work
|
||||
It would be nice if the smudge filter could hard link a work
|
||||
tree file to the annex object.
|
||||
|
||||
But currently, the smudge filter can't modify the work tree file on its own
|
||||
|
@ -184,7 +185,9 @@ smudged file in the work tree when renaming it. It instead deletes the old
|
|||
file and asks the smudge filter to smudge the new filename.
|
||||
|
||||
So, copies need to be maintained in .git/annex/objects, though it's ok
|
||||
to use hard links to the work tree files.
|
||||
to use hard links to the work tree files. (Although somewhat unsafe
|
||||
since modification of the file will lose the old version. annex.thin
|
||||
setting can enable this.)
|
||||
|
||||
Even if hard links are used, smudge needs to output the content of an
|
||||
annexed file, which will result in duplication when merging in renames of
|
||||
|
@ -241,21 +244,16 @@ git-annex clean:
|
|||
|
||||
Generate annex key from filename and content from stdin.
|
||||
|
||||
Hard link .git/annex/objects to the file, if it doesn't already exist.
|
||||
(On platforms not supporting hardlinks, copy the file to
|
||||
.git/annex/objects.)
|
||||
Hard link (annex.thin) or copy .git/annex/objects to the file,
|
||||
if it doesn't already exist.
|
||||
|
||||
This is done to prevent losing the only copy of a file when eg
|
||||
doing a git checkout of a different branch, or merging a commit that
|
||||
renames or deletes a file. But, no attempt is made to
|
||||
renames or deletes a file. But, with annex.thin no attempt is made to
|
||||
protect the object from being modified. If a user wants to
|
||||
protect object contents from modification, they should use
|
||||
`git annex add`, not `git add`, or they can `git annex lock` after adding,.
|
||||
|
||||
There could be a configuration knob to cause a copy to be made to
|
||||
.git/annex/objects -- useful for those crippled filesystems. It might
|
||||
also drop that copy once the object gets uploaded to another repo ...
|
||||
But that gets complicated quickly.
|
||||
`git annex add`, not `git add`, or they can `git annex lock` after adding,
|
||||
or not enable annex.thin.
|
||||
|
||||
Update file map.
|
||||
|
||||
|
|
|
@ -72,10 +72,6 @@ The behavior of some commands changes in an upgraded repository:
|
|||
* `git annex unlock` and `git annex lock` change how the pointer to
|
||||
the annexed content is stored in git.
|
||||
|
||||
If a repository is only used in indirect mode, you can use git-annex
|
||||
v5 and v6 in different clones of the same indirect mode repository without
|
||||
problems.
|
||||
|
||||
On upgrade, all files in a direct mode repository will be converted to
|
||||
unlocked files. The upgrade will stage changes to all annexed files in
|
||||
the git repository, which you can then commit.
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue