notes on merge

This commit is contained in:
Joey Hess 2015-11-23 18:10:50 -04:00
parent fe55caa2ae
commit cf0130894e
Failed to extract signature

View file

@ -101,53 +101,45 @@ The smudge script can also be provided a filename with %f, but it
cannot directly write to the file or git gets unhappy. cannot directly write to the file or git gets unhappy.
> Still the case in 2015. Means an unnecesary read and pipe of the file > Still the case in 2015. Means an unnecesary read and pipe of the file
P> even if the content is already locally available on disk. --[[Joey]] > even if the content is already locally available on disk. --[[Joey]]
### partial checkouts ### partial checkouts
It's important that git-annex supports partial checkouts of the content of .. Are very important, otherwise a repo can't scale past the size of the
a repository. This allows repositories to be checked out when there's not smallest client's disk!
available disk space for all files in the repository.
The way git-lfs uses smudge/clean filters, which is similar to that It would be nice if the smudge filter could hard link or symlink a work
described above, does not support partial checkouts; it always tries to tree file to the annex object.
download the contents of all files. Indeed, git-lfs seems to keep 2 copies
of newly added files; one in the work tree and one in .git/lfs/objects/,
at least before it sends the latter to the server. This lack of control
over which data is checked out and duplication of the data limits the
usefulness of git-lfs on truely large amounts of data.
To support partial checkouts, `git annex get` and `git annex drop` need to But currently, the smudge filter can't modify the work tree file on its own
be able to be used. -- git always modifies the file after getting the output of the smudge
filter, and will stumble over any modifications that the smudge filter
makes. And, it's important that the smudge filter never fail as that will
leave the repo in a bad state.
To avoid data duplication when adding a new object, the clean filter could Seems the best that can be done is for the smudge filter to copy from the
hard link from the work tree file to the annex object. Although the annex object when the object is present. When it's not present, the smudge
user could change the work tree file w/o breaking the hard link and this filter should provide a pointer to its content.
would corrupt the annexed object. Could remove write permissions to avoid
that (mostly), but that would lose some of the benefits of smudge/clean as
the user wouldn't be able to modify annexed files.
> This may be one of those things where different tradeoffs meet different
> user's needs and so a repo could be switched between the two modes as
> needed.)
The smudge filter can't modify the work tree file on its own -- git always The clean filter should detect when it's operating on that pointer file.
modifies the file after getting the output of the smudge filter, and will
stumble over any modifications that the smudge filter makes. And, it's
important that the smudge filter never fail as that will leave the repo in
a bad state.
So, to support partial checkouts and avoid data dupliciation, the smudge
filter should provide some dummy content, probably including the key of the
file. (The clean filter should detect when it's operating on that dummy
content, and provide the same key as it would if the file content was
present.)
To get the real content, use `git annex get`. (A `post-checkout` hook could
run that on all files if the user wants that behavior, or a config setting
could make the smudge filter automatically get file's contents.)
I've a demo implementation of this technique in the scripts below. I've a demo implementation of this technique in the scripts below.
### deduplication
.. Is nice; needing 2 copies of every annexed file is annoying.
Unfortunately, when using smudge/clean, `git merge` does not preserve a
smudged file in the work tree when renaming it. It instead deletes the old
file and asks the smudge filter to smudge the new filename.
So, copies need to be maintained in .git/annex/objects, though it's ok
to use hard links to the work tree files.
Even if hard links are used, smudge needs to output the content of an
annexed file, which will result in duplication when merging in renames of
files.
### design ### design
Goal: Get rid of current direct mode, using smudge/clean filters instead to Goal: Get rid of current direct mode, using smudge/clean filters instead to
@ -203,7 +195,8 @@ git-annex clean:
.git/annex/objects.) .git/annex/objects.)
This is done to prevent losing the only copy of a file when eg This is done to prevent losing the only copy of a file when eg
doing a git checkout of a different branch. But, no attempt is made to doing a git checkout of a different branch, or merging a commit that
renames or deletes a file. But, no attempt is made to
protect the object from being modified. If a user wants to protect the object from being modified. If a user wants to
protect object contents from modification, they should use protect object contents from modification, they should use
`git annex add`, not `git add`, or they can `git annex lock` after adding,. `git annex add`, not `git add`, or they can `git annex lock` after adding,.
@ -224,7 +217,8 @@ git-annex smudge:
Updates file2key map. Updates file2key map.
Outputs the same pointer file content to stdout. When an object is present in the annex, outputs its content to stdout.
Otherwise, outputs the file pointer content.
git annex direct/indirect: git annex direct/indirect: