Slightly tricky as they are not normal UUIDBased logs, but are instead maps
from (uuid, chunksize) to chunkcount.
This commit was sponsored by Frank Thomas.
When annex.genmetadata is set, metadata from the feed is added to files
that are imported from it.
Reused the same feedtitle and itemtitle, feedauthor, itemauthor, etc names
that are used in --template.
Also added title and author, which are the item title/author if available,
falling back to the feed title/author. These are more likely to be common
metadata fields.
(There is a small bit of dupication here, but once git gets
around to packing the object, it will compress it away.)
The itempubdate field is not included in the metadata as a string; instead
it is used to generate year and month fields, same as is done when adding
files with annex.genmetadata set.
This commit was sponsored by Amitai Schlair, who cooincidentially
is responsible for ikiwiki generating nice feed metadata!
This avoids a potential slowdown when using lots of views.
I think that it makes sense for unused to ignore (local) view branches,
since these are by definition supposed to be views of an existing branch,
so looking at the branch should be sufficient (and if the view is out of
date and has files that have since been deleted from the branch, the user's
intent is not to preserve those from unused reaping).
Avoid stomping on existing group and preferred content settings
when enabling or combining with an already existing remote.
Two level fix. First, use defaultStandardGroup rather than
setStandardGroup, so if there is an existing configuration in the git-annex
branch, it's not overwritten.
To handle pre-existing ssh remotes (including gcrypt), a second level is
needed, because before syncing with the remote, it's configuration won't be
available locally. (And syncing could take a long time.) So, in this case,
keep track of whether the remote is being created or enabled, and only set
configs when creating it.
This commit was sponsored by Anders Lannerback.
This includes checking when dropping files that any required content
configuration is satisfied. However, it does not yet include an active
check on the required content; the location log is trusted when checking
the required content expression.
Motivation: Hook scripts for nautilus or other file managers
need to provide the user with feedback that a file is being downloaded.
This commit was sponsored by THM Schoemaker.
Note that this is a nearly entirely free feature. The data was already
stored in the metadata log in an easily accessible way, and already was
parsed to a time when parsing the log. The generation of the metadata
fields may even be done lazily, although probably not entirely (the map
has to be evaulated to when queried).
For example "standard or (include=otherdir/*)" or even "not standard"
Note that the implementation avoids any potential for loops (if a
standard preferred content expression itself mentioned standard).
This commit was sponsored by Jochen Bartl.
Note that negated globs are not supported. Would have complicated the code
to add them, without changing the data type serialization in a
non-backwards-compatable way.
This commit was sponsored by Denver Gingerich.
This allows eg, putting .git/annex/tmp on a ram disk, if the disk IO
of temp object files is too annoying (and if you don't want to keep
partially transferred objects across reboots).
.git/annex/misctmp must be on the same filesystem as the git work tree,
since files are moved to there in a way that will not work cross-device,
as well as symlinked into there.
I first wanted to put the tmp objects in .git/annex/objects/tmp, but
that would pose transition problems on upgrade when partially transferred
objects existed.
git annex info does not currently show the size of .git/annex/misctemp,
since it should stay small. It would also be ok to make something clean it
out, periodically.
Performance impact: When adding a large tree of new files, this needs
to do some git cat-file queries to check if any of the files already
existed and might need a metadata copy. I tried a benchmark in a copy
of my sound repository (so there was already a significant git tree
to check against.
Adding 10000 small files, with a cold cache:
before: 1m48.539s
after: 1m52.791s
So, impact is 0.0004 seconds per file added. Which seems acceptable, so did
not add some kind of configuration to enable/disable this.
This commit was sponsored by Lisa Feilen.
While writing this documentation, I realized that there needed to be a way
to stay in a view like tag=* while adding a filter like tag=work that
applies to the same field.
So, there are really two ways a view can be refined. It can have a new
"field=explicitvalue" filter added to it, which does not change the
"shape" of the view, but narrows the files it shows.
Or, it can have a new view added, which adds another level of
subdirectories.
So, added a vfilter command, which takes explicit values to add to the
filter, and rejects changes that would change the shape of the view.
And, made vadd only accept changes that change the shape of the view.
And, changed the View data type slightly; now components that can match
multiple metadata values can be visible, or not visible.
This commit was sponsored by Stelian Iancu.
So the user can now switch to a view and then move files around within it
to manage metadata. For example, moving a file into a new directory
when in the tags=* view adds a tag to it.
Implementation is fairly efficient. One diff-index, which is no more
expensive than the first stage of a git commit, followed by possibly
some cat-file --batch traffic to find the key (when deleting a file).
Very similar to what's done in direct mode when committing. And like
direct mode when updating the WC after a merge, it has to buffer the
diff-tree values in order to make 2 passes over them.
When not in a view, pre-commit now does one extra git symbolic-ref,
which is tiny overhead.
This commit was sponsored by Andrew Eskridge.
Removed instance, got it all to build using fromRef. (With a few things
that really need to show something using a ref for debugging stubbed out.)
Then added back Read instance, and made Logs.View use it for serialization.
This changes the view log format.
(And a vpop command, which is still a bit buggy.)
Still need to do vadd and vrm, though this also adds their documentation.
Currently not very happy with the view log data serialization. I had to
lose the TDFA regexps temporarily, so I can have Read/Show instances of
View. I expect the view log format will change in some incompatable way
later, probably adding last known refs for the parent branch to View
or something like that.
Anyway, it basically works, although it's a bit slow looking up the
metadata. The actual git branch construction is about as fast as it can be
using the current git plumbing.
This commit was sponsored by Peter Hogg.
Adds metadata log, and command.
Note that unsetting field values seems to currently be broken.
And in general this has had all of 2 minutes worth of testing.
This commit was sponsored by Julien Lefrique.
Potentially fixes some FD leak if an action on an opened file handle fails
for some reason. There have been some hard to reproduce reports of
git-annex leaking FDs, and this may solve them.
Several places assumed this would not happen, and when the AssociatedFile
was Nothing, did nothing.
As part of this, preferred content checks pass the Key around.
Note that checkMatcher is sometimes now called with Just Key and Just File.
It currently constructs a FileMatcher, ignoring the Key. However, if it
constructed a FileKeyMatcher, which contained both, then it might be
possible to speed up parts of Limit, which currently call the somewhat
expensive lookupFileKey to get the Key.
I have not made this optimisation yet, because I am not sure if the key is
always the same. Will need some significant checking to satisfy myself
that's the case..
Make sanity checker run git annex unused daily, and queue up transfers
of unused files to any remotes that will have them. The transfer retrying
code works for us here, so eg when a backup disk remote is plugged in,
any transfers to it are done. Once the unused files reach a remote,
they'll be removed locally as unwanted.
If the setup does not cause unused files to go to a remote, they'll pile
up, and the sanity checker detects this using some heuristics that are
pretty good -- 1000 unused files, or 10% of disk used by unused files,
or more disk wasted by unused files than is left free. Once it detects
this, it pops up an alert in the webapp, with a button to take action.
TODO: Webapp UI to configure this, and also the ability to launch an
immediate cleanup of all unused files.
This commit was sponsored by Simon Michael.
This will be used in expiring old unused objects. The timestamp is when it
was first noticed it was unused.
Backwards compatability: It supports reading old format unused log files.
The old version of git-annex will ignore lines in log files written by the
new version, so the worst interop problem would be git annex dropunused not
knowing some numbers that git-annex unused reported.