every idea that came to me in my sleep. there were rather a lot of them
This commit is contained in:
parent
5e8dee6cb0
commit
aa06e913e5
1 changed files with 101 additions and 22 deletions
|
@ -4,11 +4,12 @@
|
||||||
|
|
||||||
Attach an arbitrary set of metadata to a key.
|
Attach an arbitrary set of metadata to a key.
|
||||||
|
|
||||||
Metadata can be tags, but it can also be fields with values (ie, date=xxx,
|
|
||||||
conference=yyy).
|
|
||||||
|
|
||||||
Store in git-annex branch, next to location log files.
|
Store in git-annex branch, next to location log files.
|
||||||
|
|
||||||
|
Metadata can be tags, but it can also be fields with values (ie, date=xxx,
|
||||||
|
conference=yyy). Fields can have multiple values, for example
|
||||||
|
multiple authors.
|
||||||
|
|
||||||
Storage needs to support union merging, including removing tags, and
|
Storage needs to support union merging, including removing tags, and
|
||||||
changing values.
|
changing values.
|
||||||
|
|
||||||
|
@ -20,6 +21,7 @@ when adding it.
|
||||||
Could also automatically attach permissions.
|
Could also automatically attach permissions.
|
||||||
|
|
||||||
A git hook could be run by git annex add to gather more metadata.
|
A git hook could be run by git annex add to gather more metadata.
|
||||||
|
For example, by examining MP3 metadata.
|
||||||
|
|
||||||
Also auto adds metadata when adding files to filter branches. See below.
|
Also auto adds metadata when adding files to filter branches. See below.
|
||||||
|
|
||||||
|
@ -28,40 +30,62 @@ Also auto adds metadata when adding files to filter branches. See below.
|
||||||
From the ctime, some additional
|
From the ctime, some additional
|
||||||
metadata is derived, at least year=yyyy and probably also month, etc.
|
metadata is derived, at least year=yyyy and probably also month, etc.
|
||||||
|
|
||||||
Should be a general mechanism for this.
|
This is probably not stored anywhere. It's computed on demand by a pure
|
||||||
|
function from the other metadata.
|
||||||
|
|
||||||
|
From the set of tags a file has, a "tag" field is derived, which has the
|
||||||
|
value of each tag. See example below.
|
||||||
|
|
||||||
|
Should be a general mechanism for this. (It probably generalizes to
|
||||||
|
sql queries if we want to go that far.)
|
||||||
|
|
||||||
# filtered branches
|
# filtered branches
|
||||||
|
|
||||||
`git annex filter year=2014 talk` should create a new branch
|
`git annex filter year=2014 talk` should create a new branch
|
||||||
filtered/talk/year=2014 containing only files tagged with that, and
|
filtered/year=2014/talk containing only files tagged with that, and
|
||||||
have git check it out. In this example, all files appear in top level
|
have git check it out. In this example, all files appear in top level
|
||||||
directory of repo; no subdirs.
|
directory of repo; no subdirs.
|
||||||
|
|
||||||
`git annex fadd haskell` switches to branch
|
`git annex fadd haskell` switches to branch
|
||||||
filtered/haskell/talk/year=2014 with only the haskell talks.
|
filtered/year=2014/talk/haskell with only the haskell talks.
|
||||||
|
|
||||||
`git annex fadd year=2013 year=2012` switches to branch
|
`git annex fadd year=2013 year=2012` switches to branch
|
||||||
filtered/haskell/talk/year=2012,2013,2014. This has subdirectories 2012,
|
filtered/year=2012,2013,2014/talk/haskell. This has subdirectories 2012,
|
||||||
2013 and 2014 with the matching talks.
|
2013 and 2014 with the matching talks.
|
||||||
|
|
||||||
`git annex frm haskell` switches to
|
Patterns can be used in both the values of fields, and in matching tags.
|
||||||
filtered/talk/year=2012,2013,2014, which has all available talks in it.
|
So, `year=20*` could be used to match years, and `foo/*` matches any
|
||||||
|
tag in the foo namespace. Or even `*` to match *all* tags.
|
||||||
|
|
||||||
`git annex filteradd conference=fosdem conference=icfp` switches to branch
|
`git annex frm haskell` switches to
|
||||||
filtered/conference=fosdem,icfp/talk/year=2012,2013,2014. Now we need
|
filtered/year=2012,2013,2014/talk, which has all available talks in it.
|
||||||
to either nest the subdirectories, or make fosdem-2014, icfp-2013, etc.
|
|
||||||
May need an option to choose this. Note that user may prefer to have year
|
`git annex fadd conference=fosdem conference=icfp` switches to branch
|
||||||
first or conference first, so may need an option for that as well.
|
filtered/year=2012,2013,2014/talk/conference=fosdem,icfp. Now there
|
||||||
|
are nested subdirectories. They follow the format of the branch,
|
||||||
|
so 2013/icfp, 2014/fosdem, etc.
|
||||||
|
|
||||||
|
`git annex filter tag=haskell,debian` uses the "tag" field that is
|
||||||
|
automatically derived from the set of tags. So this yields a branch
|
||||||
|
with hakell and debian subdirectories, containing the files tagged with
|
||||||
|
either.
|
||||||
|
|
||||||
|
To see all tags, `git annex filter tag=*` !
|
||||||
|
|
||||||
|
Files not matching the filter can be included, by using
|
||||||
|
`git annex filter --unmatched=other`. That puts all such files into
|
||||||
|
the subdirectory other.
|
||||||
|
|
||||||
|
Sometimes you want to see files that do not match a tag, while still
|
||||||
|
getting subdirectories for
|
||||||
|
|
||||||
Note that old filter branches can be deleted when switching to a new one.
|
Note that old filter branches can be deleted when switching to a new one.
|
||||||
There is no need to retain them. Unless the user has committed non
|
There is no need to retain them. Unless the user has committed non-annexed
|
||||||
git-annexed files to them, In which case, urk.
|
files to them, In which case, urk. The only reason to use specially named
|
||||||
|
filtered branches is because it makes self-documenting how the repository
|
||||||
|
is currently filtered.
|
||||||
|
|
||||||
These command should probably refuse to do anything if run from within a
|
## operations while on filtered branch
|
||||||
subdir of the work tree that would get deleted by checking out the new
|
|
||||||
filtered branch.
|
|
||||||
|
|
||||||
# operations while on filter branch
|
|
||||||
|
|
||||||
* If files are removed and git commit called, git-annex should remove the
|
* If files are removed and git commit called, git-annex should remove the
|
||||||
relevant metadata from the files. **possibly** It's not clear that
|
relevant metadata from the files. **possibly** It's not clear that
|
||||||
|
@ -69,6 +93,8 @@ filtered branch.
|
||||||
branch (especially if it's derived metadata like the year).
|
branch (especially if it's derived metadata like the year).
|
||||||
Also, this is not usable in direct mode because deleting the
|
Also, this is not usable in direct mode because deleting the
|
||||||
file.. actually deletes it.
|
file.. actually deletes it.
|
||||||
|
* If a file is moved into a new subdirectory while in a filter branch,
|
||||||
|
a tag is added with the subdir name. This allows on the fly tagging.
|
||||||
* `git annex sync` should avoid pushing out the filter branch, but
|
* `git annex sync` should avoid pushing out the filter branch, but
|
||||||
it should check if there are changes to the metadata pulled in, and update
|
it should check if there are changes to the metadata pulled in, and update
|
||||||
the branch to reflect them.
|
the branch to reflect them.
|
||||||
|
@ -85,6 +111,11 @@ same tree of files filter would. The user can then commit that if desired.
|
||||||
Or, they could run additional commands like `git annex fadd` to refine the
|
Or, they could run additional commands like `git annex fadd` to refine the
|
||||||
tree of files in the subdir.
|
tree of files in the subdir.
|
||||||
|
|
||||||
|
Metadata can be used for configuring numcopies. One way would be a
|
||||||
|
numcopies=n value attached to a file. But perhaps better would be to make
|
||||||
|
the numcopies.log allow configuring numcopies based on which files have
|
||||||
|
other metadata.
|
||||||
|
|
||||||
Other programs could query git-annex for the metadata of files in the work
|
Other programs could query git-annex for the metadata of files in the work
|
||||||
tree, and do whatever it wants with it.
|
tree, and do whatever it wants with it.
|
||||||
|
|
||||||
|
@ -97,11 +128,59 @@ want to see.
|
||||||
* Could use filename metadata for the key, recorded by git-annex add (which
|
* Could use filename metadata for the key, recorded by git-annex add (which
|
||||||
may not correspond to filenames being used in regular git branches like
|
may not correspond to filenames being used in regular git branches like
|
||||||
master for the key).
|
master for the key).
|
||||||
* Couod use the .map files to get a filename, but this is somewhat
|
* Could use the .map files to get a filename, but this is somewhat
|
||||||
arbitrary (.map can contain multiple filenames), and is only
|
arbitrary (.map can contain multiple filenames), and is only
|
||||||
currently supported in direct mode.
|
currently supported in direct mode.
|
||||||
|
|
||||||
|
Note that any of these filenames can in theory conflict. May need to use
|
||||||
|
`.variant-*` like sync does on conflict to allow 2 files with same name in
|
||||||
|
same filtered branch.
|
||||||
|
|
||||||
# efficient metadata lookup
|
# efficient metadata lookup
|
||||||
|
|
||||||
Looking up metadata for filtering so far requires traversing all keys in
|
Looking up metadata for filtering so far requires traversing all keys in
|
||||||
the git-annex branch. This is slow. A fast cache is needed.
|
the git-annex branch. This is slow. A fast cache is needed.
|
||||||
|
|
||||||
|
# direct mode issues
|
||||||
|
|
||||||
|
Checking out a filter branch can result in any number of copies of a file
|
||||||
|
appearing in different directories. No problem in indirect mode, but
|
||||||
|
in direct mode these are real, expensive copies.
|
||||||
|
|
||||||
|
But, it's worth supporting direct mode!
|
||||||
|
|
||||||
|
So, possible approaches:
|
||||||
|
|
||||||
|
* Before checking out a filter branch, calculate how much space will
|
||||||
|
be used by duplicates and refuse if not enough is free.
|
||||||
|
* Only check out one file, and omit the copies. Keep track of which
|
||||||
|
files were omitted, and make sure that when committing on the branch,
|
||||||
|
that metadata is not removed. Has the downside that files can seem
|
||||||
|
to randomly move around in the tree as their metadata changes.
|
||||||
|
* Disallow filter branch checkouts that have duplicate files.
|
||||||
|
Note that duplicate files can only occur when filtering on the content
|
||||||
|
of values, not tags. And values can be used in some simple cases w/o
|
||||||
|
duplicate files. This would cripple it some, but perhaps not too badly?
|
||||||
|
|
||||||
|
# gotchas
|
||||||
|
|
||||||
|
* Checking out a filter branch can remove the current subdir. May be worth
|
||||||
|
detecting when this happens and leaving behind an empty directory so the
|
||||||
|
user can navigate back up.
|
||||||
|
|
||||||
|
* Git has a complex set of rules for what is legal in a ref name.
|
||||||
|
Filter branch names will need to filter out any illegal stuff.
|
||||||
|
|
||||||
|
* Filesystems that are not case sensative (including case preserving OSX)
|
||||||
|
will cause problems if filter branches try to use different cases for
|
||||||
|
2 directories representing the value of some metadata. But, users
|
||||||
|
probably want at least case-preserving metadata values.
|
||||||
|
|
||||||
|
Solution might be to compare metadata case-insensitively, and
|
||||||
|
pick one representation consistently, so if, for example an author
|
||||||
|
field uses mixed case, it will be used in the filter branch.
|
||||||
|
|
||||||
|
Alternatively, it could escape `A` to `_A` when such a filesystem
|
||||||
|
is detected and avoid collisions that way (double `_` to escape it).
|
||||||
|
This latter option is ugly, but so are non-posix filesystems.. and it
|
||||||
|
also solves any similar issues with case-colliding filenames.
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue