340bdd0dac
Detect when a preferred content expression contains "not present", which would lead to repeatedly getting and then dropping files, and make it never match. This also applies to "not balanced" and "not sizebalanced". --explain will tell the user when this happens Note that getMatcher calls matchMrun' and does not check for unstable negated limits. While there is no --present anyway, if there was, it would not make sense for --not --present to complain about instability and fail to match.
83 lines
3.6 KiB
Markdown
83 lines
3.6 KiB
Markdown
[[!toc ]]
|
|
|
|
## motivating examples
|
|
|
|
Preferred content expressions can be complicated to write and reason about.
|
|
A complex expression can involve lots of repositories that can get into
|
|
different states, and needs to be written to avoid unwanted behavior.
|
|
|
|
It would be very handy to provide some way to prove things about behavior
|
|
of preferred content expressions, or a way to simulate the behavior of a
|
|
network of git-annex repositories with a given preferred content configuration
|
|
|
|
The worst case of this is `not present`, where the file gets dropped and
|
|
transferred over and over again. The docs warn against using that one. But
|
|
they can't warn about every bad preferred content expression.
|
|
|
|
Mostly, git-annex manages to keep things stable that seem like they would
|
|
not be. Consider repo A that is not in group foo, and B is in group foo. A
|
|
has preferred content "onlyingroup=foo". This will make A want a file that
|
|
is in B. And once it has it, it will not want to drop it. That's because
|
|
when dropping, it considers if it would be preferred content after the
|
|
drop. In this case it would, so it doesn't drop it.
|
|
|
|
## balanced preferred content
|
|
|
|
When [[design/balanced_preferred_content]] is added, a whole new level of
|
|
complexity will exist in preferred content expressions, because now an
|
|
expression does not make a file be wanted by a single repository, but
|
|
shards the files amoung repositories in a group.
|
|
|
|
And up until this point preferred content expressions have behaved the same no
|
|
matter the sizes of the underlying repositories, but balanced preferred
|
|
content does take repository fullness into account, which further
|
|
complicates fully understanding the behavior.
|
|
|
|
Notice that `fullybalanced()` is not stable when used
|
|
on its own, and so `balanced()` adds an "or present" to stabilize it.
|
|
And so `not balanced()` includes `not present`, which is bad!
|
|
|
|
## proof
|
|
|
|
What could be proved about a preferred content expression?
|
|
|
|
No idea really. Would be interesting to consider what formal methods can
|
|
do here. Could a SAT solver be used somehow for example?
|
|
|
|
## static analysis
|
|
|
|
Clearly `not present` is an problematic preferred content expression. It
|
|
would be good if git-annex warned and/or refused to set such an expression
|
|
if it could detect it. Similarly `not groupwanted` could be detected as a
|
|
problem when the group's preferred content expression contains `present`.
|
|
|
|
> This is now detected and such an unstable expression never matches.
|
|
> --debug explains why too.
|
|
>
|
|
> Note that the detection will not be trigged by `"not (not present)"`,
|
|
> but it will by `"include=* or (not present)"` even though that is always
|
|
> stable, because `"include=*"` always matches and so what it's ORed with
|
|
> doesn't matter. Probably noone will set something like that in real life
|
|
> though.
|
|
>
|
|
> It's problimatic to make `git-annex wanted` warn about it. Consider
|
|
> if in one repository, groupwanted is set to "present". In another
|
|
> repository, which is disconnected, wanted is set to "not groupwanted".
|
|
> Both operations are ok, but upon merging the two repositories,
|
|
> the combined effect is that "not present" has been set.
|
|
>
|
|
> So while it could warn sometimes on setting "not present",
|
|
> it would sometimes not be able to. Better to not warn inconsistently.
|
|
> --[[Joey]]
|
|
|
|
## simulation
|
|
|
|
Simulation seems fairly straightforward, just simulate the network of
|
|
git-annex repositories with random files with different sizes and
|
|
metadata. Or use the current files and metadata.
|
|
Be sure to enforce invariants like numcopies the same as git-annex does.
|
|
|
|
Since users can write preferred content expressions, this should be
|
|
targeted at being used by end users.
|
|
|
|
[[!tag projects/openneuro]]
|