interesting new design just gelled.. almost
This commit is contained in:
parent
40cec65ace
commit
5e8dee6cb0
1 changed files with 107 additions and 0 deletions
107
doc/design/metadata.mdwn
Normal file
107
doc/design/metadata.mdwn
Normal file
|
@ -0,0 +1,107 @@
|
|||
[[!toc]]
|
||||
|
||||
# metadata
|
||||
|
||||
Attach an arbitrary set of metadata to a key.
|
||||
|
||||
Metadata can be tags, but it can also be fields with values (ie, date=xxx,
|
||||
conference=yyy).
|
||||
|
||||
Store in git-annex branch, next to location log files.
|
||||
|
||||
Storage needs to support union merging, including removing tags, and
|
||||
changing values.
|
||||
|
||||
## automatically added metadata
|
||||
|
||||
git annex add should automatically attach the current mtime of a file
|
||||
when adding it.
|
||||
|
||||
Could also automatically attach permissions.
|
||||
|
||||
A git hook could be run by git annex add to gather more metadata.
|
||||
|
||||
Also auto adds metadata when adding files to filter branches. See below.
|
||||
|
||||
## derived metadata
|
||||
|
||||
From the ctime, some additional
|
||||
metadata is derived, at least year=yyyy and probably also month, etc.
|
||||
|
||||
Should be a general mechanism for this.
|
||||
|
||||
# filtered branches
|
||||
|
||||
`git annex filter year=2014 talk` should create a new branch
|
||||
filtered/talk/year=2014 containing only files tagged with that, and
|
||||
have git check it out. In this example, all files appear in top level
|
||||
directory of repo; no subdirs.
|
||||
|
||||
`git annex fadd haskell` switches to branch
|
||||
filtered/haskell/talk/year=2014 with only the haskell talks.
|
||||
|
||||
`git annex fadd year=2013 year=2012` switches to branch
|
||||
filtered/haskell/talk/year=2012,2013,2014. This has subdirectories 2012,
|
||||
2013 and 2014 with the matching talks.
|
||||
|
||||
`git annex frm haskell` switches to
|
||||
filtered/talk/year=2012,2013,2014, which has all available talks in it.
|
||||
|
||||
`git annex filteradd conference=fosdem conference=icfp` switches to branch
|
||||
filtered/conference=fosdem,icfp/talk/year=2012,2013,2014. Now we need
|
||||
to either nest the subdirectories, or make fosdem-2014, icfp-2013, etc.
|
||||
May need an option to choose this. Note that user may prefer to have year
|
||||
first or conference first, so may need an option for that as well.
|
||||
|
||||
Note that old filter branches can be deleted when switching to a new one.
|
||||
There is no need to retain them. Unless the user has committed non
|
||||
git-annexed files to them, In which case, urk.
|
||||
|
||||
These command should probably refuse to do anything if run from within a
|
||||
subdir of the work tree that would get deleted by checking out the new
|
||||
filtered branch.
|
||||
|
||||
# operations while on filter branch
|
||||
|
||||
* If files are removed and git commit called, git-annex should remove the
|
||||
relevant metadata from the files. **possibly** It's not clear that
|
||||
removing a file should nuke all the metadata used to filter it into the
|
||||
branch (especially if it's derived metadata like the year).
|
||||
Also, this is not usable in direct mode because deleting the
|
||||
file.. actually deletes it.
|
||||
* `git annex sync` should avoid pushing out the filter branch, but
|
||||
it should check if there are changes to the metadata pulled in, and update
|
||||
the branch to reflect them.
|
||||
* If `git annex add` adds a file, it gets all the metadata of the filter
|
||||
branch it's added to. If it's in a relevent directory (like fosdem-2014),
|
||||
it gets that metadata automatically recorded as well.
|
||||
|
||||
# other uses for metadata
|
||||
|
||||
Uses are not limited to filter branches.
|
||||
|
||||
`git annex checkoutmeta year=2014 talk` in a subdir of master could create the
|
||||
same tree of files filter would. The user can then commit that if desired.
|
||||
Or, they could run additional commands like `git annex fadd` to refine the
|
||||
tree of files in the subdir.
|
||||
|
||||
Other programs could query git-annex for the metadata of files in the work
|
||||
tree, and do whatever it wants with it.
|
||||
|
||||
# filenames
|
||||
|
||||
The hard part of this is actually getting a useful filename to put in the
|
||||
filter branch, since git-annex only has a key which the user will not
|
||||
want to see.
|
||||
|
||||
* Could use filename metadata for the key, recorded by git-annex add (which
|
||||
may not correspond to filenames being used in regular git branches like
|
||||
master for the key).
|
||||
* Couod use the .map files to get a filename, but this is somewhat
|
||||
arbitrary (.map can contain multiple filenames), and is only
|
||||
currently supported in direct mode.
|
||||
|
||||
# efficient metadata lookup
|
||||
|
||||
Looking up metadata for filtering so far requires traversing all keys in
|
||||
the git-annex branch. This is slow. A fast cache is needed.
|
Loading…
Reference in a new issue