views: add automatically constructed file location metadata

When constructing views, metadata is available about the location of the
file in the view's reference branch. Allows incorporating parts of the
directory hierarchy in a view.

For example `git annex view tag=* podcasts/=*` makes a view in the form
tag/showname.

Performance impact: I benchmarked git annex view tag=* in the conference
proceedings repo to take 6.459s before this change, and 6.544s after.

FWIW, I considered making the syntax for this be podcasts/*, which might
be easier for the user to learn. However, I think it's not as good:

* The user has to then juggle two different syntaxes, and podcasts/* will
  be expanded by the shell so they also need to quote it, while podcasts/=*
  is unlikely to be expanded by the shell.
* It would allow for things like podcasts/*/* and *.mp3 which do not
  map well into views.

This commit was sponsored by Aurélien Pinceaux.
This commit is contained in:
Joey Hess 2014-02-22 16:09:00 -04:00
parent 73a5245502
commit 079b35a1a8
8 changed files with 89 additions and 25 deletions

View file

@ -34,6 +34,7 @@ import Config
import CmdLine.Action
import qualified Data.Set as S
import qualified Data.Map as M
import "mtl" Control.Monad.Writer
{- Each visible ViewFilter in a view results in another level of
@ -233,11 +234,32 @@ prop_view_roundtrips f metadata visible = null f || viewTooLarge view ||
visiblefields = sort (map viewField $ filter viewVisible (viewComponents view))
hasfields fv = sort (map fst (fromMetaData (fromView view fv))) == visiblefields
{- A directory foo/bar/baz/ is turned into metadata fields
- /=foo, foo/=bar, foo/bar/=baz.
-
- Note that this may generate MetaFields that legalField rejects.
- This is necessary to have a 1:1 mapping between directory names and
- fields. So this MetaData cannot safely be serialized. -}
getDirMetaData :: FilePath -> MetaData
getDirMetaData d = MetaData $ M.fromList $ zip fields values
where
dirs = splitDirectories d
fields = map (MetaField . addTrailingPathSeparator . joinPath)
(inits dirs)
values = map (S.singleton . toMetaValue . fromMaybe "" . headMaybe)
(tails dirs)
getWorkTreeMetaData :: FilePath -> MetaData
getWorkTreeMetaData = getDirMetaData . dropFileName
getViewedFileMetaData :: FilePath -> MetaData
getViewedFileMetaData = getDirMetaData . dirFromViewedFile . takeFileName
{- Applies a view to the currently checked out branch, generating a new
- branch for the view.
-}
applyView :: View -> Annex Git.Branch
applyView view = applyView' viewedFileFromReference view
applyView view = applyView' viewedFileFromReference getWorkTreeMetaData view
{- Generates a new branch for a View, which must be a more narrow
- version of the View originally used to generate the currently
@ -245,7 +267,7 @@ applyView view = applyView' viewedFileFromReference view
- in view, not any others.
-}
narrowView :: View -> Annex Git.Branch
narrowView = applyView' viewedFileReuse
narrowView = applyView' viewedFileReuse getViewedFileMetaData
{- Go through each file in the currently checked out branch.
- If the file is not annexed, skip it, unless it's a dotfile in the top.
@ -255,8 +277,8 @@ narrowView = applyView' viewedFileReuse
- Currently only works in indirect mode. Must be run from top of
- repository.
-}
applyView' :: MkViewedFile -> View -> Annex Git.Branch
applyView' mkviewedfile view = do
applyView' :: MkViewedFile -> (FilePath -> MetaData) -> View -> Annex Git.Branch
applyView' mkviewedfile getfilemetadata view = do
top <- fromRepo Git.repoPath
(l, clean) <- inRepo $ Git.LsFiles.inRepo [top]
liftIO . nukeFile =<< fromRepo gitAnnexViewIndex
@ -273,7 +295,9 @@ applyView' mkviewedfile view = do
genviewedfiles = viewedFiles view mkviewedfile -- enables memoization
go uh hasher f (Just (k, _)) = do
metadata <- getCurrentMetaData k
forM_ (genviewedfiles f metadata) $ \fv -> do
let dirmetadata = getfilemetadata f
let metadata' = unionMetaData dirmetadata metadata
forM_ (genviewedfiles f metadata') $ \fv -> do
stagesymlink uh hasher fv =<< inRepo (gitAnnexLink fv k)
go uh hasher f Nothing
| "." `isPrefixOf` f = do

View file

@ -48,6 +48,8 @@ viewedFileFromReference f = concat
escape :: String -> String
escape = replace "%" "\\%" . replace "\\" "\\\\"
{- For use when operating already within a view, so whatever filepath
- is present in the work tree is already a ViewedFile. -}
viewedFileReuse :: MkViewedFile
viewedFileReuse = takeFileName

View file

@ -10,11 +10,11 @@ module Command.VAdd where
import Common.Annex
import Command
import Annex.View
import Command.View (paramView, parseViewParam, checkoutViewBranch)
import Command.View (parseViewParam, checkoutViewBranch)
def :: [Command]
def = [notBareRepo $ notDirect $
command "vadd" paramView seek SectionMetaData "add subdirs to current view"]
def = [notBareRepo $ notDirect $ command "vadd" (paramRepeating "FIELD=GLOB")
seek SectionMetaData "add subdirs to current view"]
seek :: CommandSeek
seek = withWords start

View file

@ -43,7 +43,7 @@ perform view = do
next $ checkoutViewBranch view applyView
paramView :: String
paramView = paramPair (paramRepeating "FIELD=VALUE") (paramRepeating "TAG")
paramView = paramPair (paramRepeating "TAG") (paramRepeating "FIELD=VALUE")
parseViewParam :: String -> (MetaField, String)
parseViewParam s = case separate (== '=') s of

5
debian/changelog vendored
View file

@ -4,6 +4,11 @@ git-annex (5.20140222) UNRELEASED; urgency=medium
including rsync.net.
* --metadata field=value can now use globs to match, and matches
case insensatively, the same as git annex view field=value does.
* When constructing views, metadata is available about the location of the
file in the view's reference branch. Allows incorporating parts of the
directory hierarchy in a view.
For example `git annex view tag=* podcasts/=*` makes a view in the form
tag/showname.
-- Joey Hess <joeyh@debian.org> Fri, 21 Feb 2014 13:03:04 -0400

View file

@ -70,7 +70,7 @@ metadata is derived, at least year=yyyy and probably also month, etc.
### directory hierarchy metadata
TODO From the original filename used in the master branch, when
From the original filename used in the master branch, when
constructing a view, generate fields. For example foo/bar/baz.mp3
would get /=foo, foo/=bar, foo/bar/=baz, and .=mp3.
@ -82,11 +82,12 @@ This allows using whatever directory hierarchy exists to inform the view,
without locking the view into using it.
Complication: When refining a view, it only looks at the filenames in
the view, so it would need to map from
the view, so it has to map from
those filenames to derive the same metadata, unless there is persistent
storage. Luckily, the filenames used in the views currently include the
subdirs (although not quite in a parseable format, would need some small
changes).
subdirs.
**done**!
# other uses for metadata

View file

@ -715,20 +715,29 @@ subdirectories).
git annex metadata annexscreencast.ogv -t video -t screencast -s author+=Alice
* `view [field=value ...] [tag ...]`
* `view [tag ...] [field=value ...] [location/=value]`
Uses metadata to build a view branch of the files in the current branch,
and checks out the view branch. Only files in the current branch whose
metadata matches all the specified field values and tags will be
shown in the view.
Once within a view, you can make additional directories, and
copy or move files into them. When you commit, the metadata will
be updated to correspond to your changes.
Multiple values for a metadata field can be specified, either by using
a glob (`field="*"`) or by listing each wanted value. The resulting view
will put files in subdirectories according to the value of their fields.
Once within a view, you can make additional directories, and
copy or move files into them. When you commit, the metadata will
be updated to correspond to your changes.
There are fields corresponding to the path to the file. So a file
"foo/bar/baz/file" has fields "/=foo", "foo/=bar", and "foo/bar/=baz".
These location fields can be used the same as other metadata to construct
the view.
For example, `/=podcasts` will only include files from the podcasts
directory in the view, while `podcasts/=*` will preserve the
subdirectories of the podcasts directory in the view.
* `vpop [N]`
@ -737,12 +746,12 @@ subdirectories).
The optional number tells how many views to pop.
* `vfilter [field=value ...] [tag ...]`
* `vfilter [tag ...] [field=value ...] [location/=value]`
Filters the current view to only the files that have the
specified values and tags.
specified field values, tags, and locations.
* `vadd [field=glob ...]`
* `vadd [field=glob ...] [location/=glob]`
Changes the current view, adding an additional level of directories
to categorize the files.

View file

@ -24,8 +24,8 @@ metadata:
# git annex metadata --tag done videos/old
# git annex metadata --tag new videos/lotsofcats.ogv
# git annex metadata --tag sound podcasts
# git annex metadata --tag done podcasts/old
# git annex metadata --tag new podcasts/recent
# git annex metadata --tag done podcasts/*/old
# git annex metadata --tag new podcasts/*/recent
So, you had a bunch of different kinds of files sorted into a directory
structure. But that didn't really reflect how you approach the files.
@ -81,9 +81,11 @@ all the way out of all views, you'll be back on the regular git branch you
originally started from. You can also use `git checkout` to switch between
views and other branches.
Beyond simple tags, you can add whatever kinds of metadata you like, and
use that metadata in more elaborate views. For example, let's add a year
field.
## fields
Beyond simple tags and directories, you can add whatever kinds of metadata
you like, and use that metadata in more elaborate views. For example, let's
add a year field.
# git checkout master
# git annex metadata --set year=2014 work/2014
@ -118,4 +120,25 @@ Oh, did you want it the other way around? Easy!
|-- 2014
`-- 2013
## location fields
Let's switch to a view containing only new podcasts. And since the
podcasts are organized into one subdirectory per show, let's
include those subdirectories in the view.
# git checkout master
# git annex view tag=new podcasts/=*
# tree -d
This_Developers_Life
Escape_Pod
GitMinutes
The_Haskell_Cast
StarShipSofa
That's an example of using part of the directory layout of the original
branch to inform the view. Every file gets fields automatically set up
corresponding to the directory it's in. So a file"foo/bar/baz/file" has
fields "/=foo", "foo/=bar", and "foo/bar/=baz". These location fields
can be used the same as other metadata to construct the view.
This has probably only scratched the surface of what you can do with views.