Commit graph

236 commits

Author SHA1 Message Date
Joey Hess
014794f4ed improve a bit 2014-07-24 17:18:14 -04:00
Joey Hess
e2c44bf656 implement chunk logs
Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.

This commit was sponsored by Frank Thomas.
2014-07-24 16:23:36 -04:00
Joey Hess
d0c1a22e7c import metadata from feeds
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.

Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.

Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.

(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)

The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.

This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
2014-07-03 14:15:00 -04:00
Joey Hess
c00b459f3a unused: Avoid checking view branches for unused files.
This avoids a potential slowdown when using lots of views.

I think that it makes sense for unused to ignore (local) view branches,
since these are by definition supposed to be views of an existing branch,
so looking at the branch should be sufficient (and if the view is out of
date and has files that have since been deleted from the branch, the user's
intent is not to preserve those from unused reaping).
2014-06-04 14:03:41 -04:00
Joey Hess
9eaabf0382 webapp: avoid overwriting remote configs when enabling it
Avoid stomping on existing group and preferred content settings
when enabling or combining with an already existing remote.

Two level fix. First, use defaultStandardGroup rather than
setStandardGroup, so if there is an existing configuration in the git-annex
branch, it's not overwritten.

To handle pre-existing ssh remotes (including gcrypt), a second level is
needed, because before syncing with the remote, it's configuration won't be
available locally. (And syncing could take a long time.) So, in this case,
keep track of whether the remote is being created or enabled, and only set
configs when creating it.

This commit was sponsored by Anders Lannerback.
2014-05-30 14:03:04 -04:00
Joey Hess
065248f3d2 Added required content configuration.
This includes checking when dropping files that any required content
configuration is satisfied. However, it does not yet include an active
check on the required content; the location log is trusted when checking
the required content expression.
2014-03-29 16:03:33 -04:00
Joey Hess
fe19e15040 reorg matcher types; no non-type code changes 2014-03-29 14:43:34 -04:00
Joey Hess
e426fac273 add desktop notifications
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.

This commit was sponsored by THM Schoemaker.
2014-03-22 14:12:19 -04:00
Joey Hess
ed30b81e2c Improve behavior when unable to parse a preferred content expression (thanks, ion).
Fall back to "present" as the preferred conent expression, which will
not result in any content movement.
2014-03-20 00:10:12 -04:00
Joey Hess
f64c2d6138 toplevel lastchanged field 2014-03-19 19:10:55 -04:00
Joey Hess
6848f09a12
better timestamp format 2014-03-18 19:01:50 -04:00
Joey Hess
caa97d1271 Each for each metadata field, there's now an automatically maintained "$field-lastchanged" that gives the timestamp of the last change to that field.
Note that this is a nearly entirely free feature. The data was already
stored in the metadata log in an easily accessible way, and already was
parsed to a time when parsing the log. The generation of the metadata
fields may even be done lazily, although probably not entirely (the map
has to be evaulated to when queried).
2014-03-18 18:55:43 -04:00
Joey Hess
6a4dd42328 finish wiring up groupwanted 2014-03-15 17:08:55 -04:00
Joey Hess
417aea25be vicfg: Allows editing preferred content expressions for groups.
This is stored in the git-annex branch, but not yet actually hooked up and
used.
2014-03-15 16:17:01 -04:00
Joey Hess
431d805a96 factored out a generic MapLog from uuid-based logs
UUIDBased is just a MapLog with a UUID for the field.
2014-03-15 13:45:25 -04:00
Joey Hess
b7eb1d834a Avoid encoding errors when using the unused log file. 2014-03-15 11:57:27 -04:00
Joey Hess
3551d40b05 "standard" can now be used as a first-class keyword in preferred content expressions.
For example "standard or (include=otherdir/*)" or even "not standard"

Note that the implementation avoids any potential for loops (if a
standard preferred content expression itself mentioned standard).

This commit was sponsored by Jochen Bartl.
2014-03-14 15:04:33 -04:00
Joey Hess
67f09bca6d fully fix fsck memory use by iterative fscking
Not very well tested, but I'm sure it doesn't eg, loop forever.
2014-03-12 15:18:43 -04:00
Joey Hess
9f27339e80 remove uninofrmative warning
dateUnusedLog is only used to show a timestamp in the webapp, so
not worth a warning
2014-03-12 12:42:51 -04:00
Joey Hess
c2e8c21ca6 view, vfilter: Add support for filtering tags and values out of a view, using !tag and field!=value.
Note that negated globs are not supported. Would have complicated the code
to add them, without changing the data type serialization in a
non-backwards-compatable way.

This commit was sponsored by Denver Gingerich.
2014-03-02 14:53:19 -04:00
Joey Hess
a1432bce2f Put non-object tmp files in .git/annex/misctmp, leaving .git/annex/tmp for only partially transferred objects.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).

.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.

I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.

git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
2014-02-26 16:52:56 -04:00
Joey Hess
8d5158fa31 Preserve metadata when staging a new version of an annexed file.
Performance impact: When adding a large tree of new files, this needs
to do some git cat-file queries to check if any of the files already
existed and might need a metadata copy. I tried a benchmark in a copy
of my sound repository (so there was already a significant git tree
to check against.

Adding 10000 small files, with a cold cache:
  before: 1m48.539s
  after:  1m52.791s

So, impact is 0.0004 seconds per file added. Which seems acceptable, so did
not add some kind of configuration to enable/disable this.

This commit was sponsored by Lisa Feilen.
2014-02-24 14:41:33 -04:00
Joey Hess
7498c5dd96 annex.genmetadata can be set to make git-annex automatically set metadata (year and month) when adding files 2014-02-23 00:08:29 -04:00
Joey Hess
bdfc8e1f44 fix build with old version of Data.Set that lacks toDescList 2014-02-21 11:30:31 -04:00
Joey Hess
cfed7f6a5d remove special case for tags in view branch names
Just having "_" for tags=* turned out to be too hard to understand.

Note that this invalidaes all current views.
2014-02-19 17:38:45 -04:00
Joey Hess
c85a482136 improve view branch name when there are a list of values 2014-02-19 16:35:00 -04:00
Joey Hess
dd7b99c860 add tip about metadata driven views (and more flexible view filtering)
While writing this documentation, I realized that there needed to be a way
to stay in a view like tag=* while adding a filter like tag=work that
applies to the same field.

So, there are really two ways a view can be refined. It can have a new
"field=explicitvalue" filter added to it, which does not change the
"shape" of the view, but narrows the files it shows.
Or, it can have a new view added, which adds another level of
subdirectories.

So, added a vfilter command, which takes explicit values to add to the
filter, and rejects changes that would change the shape of the view.

And, made vadd only accept changes that change the shape of the view.

And, changed the View data type slightly; now components that can match
multiple metadata values can be visible, or not visible.

This commit was sponsored by Stelian Iancu.
2014-02-19 16:29:56 -04:00
Joey Hess
39ebfa1a2e pre-commit: Update metadata when committing changes to annexed files within a view.
So the user can now switch to a view and then move files around within it
to manage metadata. For example, moving a file into a new directory
when in the tags=* view adds a tag to it.

Implementation is fairly efficient. One diff-index, which is no more
expensive than the first stage of a git commit, followed by possibly
some cat-file --batch traffic to find the key (when deleting a file).
Very similar to what's done in direct mode when committing. And like
direct mode when updating the WC after a merge, it has to buffer the
diff-tree values in order to make 2 passes over them.

When not in a view, pre-commit now does one extra git symbolic-ref,
which is tiny overhead.

This commit was sponsored by Andrew Eskridge.
2014-02-19 14:17:58 -04:00
Joey Hess
02259d2a55 speed up currentView when not in a view
Avoid reading the view log when the branch is clearly not a view branch.
2014-02-19 12:52:47 -04:00
Joey Hess
4e0be2792b remove Read instance for Ref
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)

Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
2014-02-19 01:19:57 -04:00
Joey Hess
2bf338f443 fixed vpop 2014-02-18 21:09:25 -04:00
Joey Hess
67fd06af76 add git annex view command
(And a vpop command, which is still a bit buggy.)

Still need to do vadd and vrm, though this also adds their documentation.

Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.

Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.

This commit was sponsored by Peter Hogg.
2014-02-18 18:22:20 -04:00
Joey Hess
a18eae9a0f
nice git ack space optimisation when setting the same metadata value for multiple files 2014-02-13 01:57:43 -04:00
Joey Hess
361aee0470
avoid churning in git to no benefit when optimising metadata log
I think this is now optimal.
2014-02-12 23:24:04 -04:00
Joey Hess
8076530284 improve simplifier 2014-02-12 22:50:41 -04:00
Joey Hess
a05ac13e92 fix metadata log simplifier and additional quickcheck tests 2014-02-12 22:27:55 -04:00
Joey Hess
9f7e76130e add metadata command to get/set metadata
Adds metadata log, and command.

Note that unsetting field values seems to currently be broken.
And in general this has had all of 2 minutes worth of testing.

This commit was sponsored by Julien Lefrique.
2014-02-12 21:30:33 -04:00
Joey Hess
c390e896d1 fix windows build (and make --stop work on windows, incidentially)
The Utility.PID will clean up other code soon.
2014-02-11 15:25:59 -04:00
Joey Hess
4f7e72b51a fix parsing of unused log; keys can contain spaces 2014-02-08 15:27:11 -04:00
Joey Hess
a44e01c29c --in can now refer to files that were located in a repository at some past date. For example, --in="here@{yesterday}" 2014-02-06 12:43:56 -04:00
Joey Hess
1572c460e8 avoid using openFile when withFile can be used
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
2014-02-03 10:19:06 -04:00
Joey Hess
32f1f68dc9 typo 2014-01-28 17:17:21 -04:00
Joey Hess
f0dfac4d96 fix build with old ghc that used old-time type 2014-01-28 17:14:43 -04:00
Joey Hess
eefda291c6 fix warning 2014-01-28 14:43:20 -04:00
Joey Hess
891c85cd88 use locking on Windows
This is all the easy cases, where there was already a separate lock file.
2014-01-28 14:42:03 -04:00
Joey Hess
3518c586cf fix transfers of key with no associated file
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.

As part of this, preferred content checks pass the Key around.

Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.

I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
2014-01-23 16:44:02 -04:00
Joey Hess
e0bd088f08 add webapp UI to manage unused files 2014-01-23 15:09:43 -04:00
Joey Hess
3da0064657 assistant unused file handling
Make sanity checker run git annex unused daily, and queue up transfers
of unused files to any remotes that will have them. The transfer retrying
code works for us here, so eg when a backup disk remote is plugged in,
any transfers to it are done. Once the unused files reach a remote,
they'll be removed locally as unwanted.

If the setup does not cause unused files to go to a remote, they'll pile
up, and the sanity checker detects this using some heuristics that are
pretty good -- 1000 unused files, or 10% of disk used by unused files,
or more disk wasted by unused files than is left free. Once it detects
this, it pops up an alert in the webapp, with a button to take action.

TODO: Webapp UI to configure this, and also the ability to launch an
immediate cleanup of all unused files.

This commit was sponsored by Simon Michael.
2014-01-22 22:53:18 -04:00
Joey Hess
4b55afe9e9 add "unused" preferred content expression
With a really nice optimisation that keeps it from having any overhead
in normal operation!

This commit was sponsored by Ulises Vitulli.
2014-01-22 16:35:32 -04:00
Joey Hess
ae3cd632bd add timestamps to unused log files
This will be used in expiring old unused objects. The timestamp is when it
was first noticed it was unused.

Backwards compatability: It supports reading old format unused log files.
The old version of git-annex will ignore lines in log files written by the
new version, so the worst interop problem would be git annex dropunused not
knowing some numbers that git-annex unused reported.
2014-01-22 15:33:02 -04:00