Based on my earlier benchmark, I have a rough cost model for how
expensive it is for git-annex smudge to be run on a file, vs
how expensive it is for a gigabyte of a file's content to be read and
piped through to filter-process.
So, using that cost model, it can decide if using filter-process will
be more or less expensive than running the smudge filter on the files to
be restaged.
It turned out to be *really* annoying to temporarily disable
filter-process. I did find a way, but urk, this is horrible. Notice
that, if it's interrupted with it disabled, it will remain disabled
until the next time restagePointerFile runs. Which could be some time
later. If the user runs `git add` or `git checkout` on a lot of small
files before that, they will see slower than expected performance.
(This commit also deletes where I wrote down the benchmark results
earlier.)
Sponsored-by: Noam Kremen on Patreon
This reverts commit afe327ac49.
Unfortunately, disabling it by setting it to "" does not work, git
then ignores filter.annex.smudge/clean, and does not pass files through
git-annex at all.
I don't think there is a way to temporarily disable this git config
from the git command line. Which seems like a bug in git.
So, it may be more expensive than anticipated to enable
filter.annex.process, since git checkout etc will pipe all annexed files
being checked out through it.
This means git will run git-annex smudge --clean once per file that is
restaged, which can be slow. But probably *not* as slow as git feeding
all the content of annexed files you've gotten through a pipe to
git-annex filter-process.
The only time this is probably not ideal is after a drop of a bunch of
files, when filter-process would be faster.
filter-process: New command that can make git add/checkout faster when
there are a lot of unlocked annexed files or non-annexed files, but that
also makes git add of large annexed files slower.
Use it by running: git
config filter.annex.process 'git-annex filter-process'
Fully tested and working, but I have not benchmarked it at all.
And, incremental hashing is not done when git add uses it, so extra work is
done in that case.
Sponsored-by: Mark Reidenbach on Patreon
This module is not used yet, but the plan is to use it for smudge/clean
filtering, at least as an option. In some circumstances, using this
interface may perform better than the interface git-annex is currently
using.
Sponsored-by: Brock Spratlen on Patreon
metadata --batch --json: Reject input whose "fields" does not consist of
arrays of strings. Such invalid input used to be silently ignored.
Used to be that parseJSON for a JSONActionItem ran parseJSON separately
for the itemAdded, and if that failed, did not propagate the error. That
allowed different items with differently named fields to be parsed.
But it was actually only used to parse "fields" for metadata, so that
flexability is not needed.
The fix is just to parse "fields" as-is. AddJSONActionItemFields is needed
only because of the wonky way Command.MetaData adds onto the started json
object.
Note that this line got a dummy type signature added,
just because the type checker needs it to be some type.
itemFields = Nothing :: Maybe Bool
Since it's Nothing, it doesn't really matter what type it is,
and the value gets turned into json and is then thrown away.
Sponsored-by: Kevin Mueller on Patreon
Turns out that CommandStart actions do not have their exceptions caught,
which is why the giveup was causing a crash. Mostly these actions
do not do very much work on their own, but it does seem possible there
are other commands whose CommandStart also throws an exception.
So, my first attempt at a fix was to catch those exceptions. But,
--json-error-messages then causes a difficulty, because in order to output
a json error message, an action needs to have been started; that sets up
the json object that the error message will be included in a field of.
While it would be possible to output an object with just an error field,
this would be json output of a format that the user has no reason to
expect, that happens only in an exceptional circumstance. That is something
I have always wanted to avoid with the json output; while git-annex man
pages don't document what the json looks like, the output has always
been made to be self-describing. Eg, it includes "error-messages":[]
even when there's no errors.
With that ruled out, it doesn't seem a good idea to catch CommandStart
exceptions and display the error to stderr when --json-error-messages
is set. And so I don't know if it makes sense to catch exceptions from that
at all. Maybe I'd have a different opinion if --json-error-messages did not
exist though.
So instead, output a blank line like other batch commands do.
This also leaves open the possibility of implementing support for matching
object with metadata --json, which would also want to output a blank line
when the input didn't match.
Sponsored-by: Dartmouth College's DANDI project