76 lines
3.2 KiB
Text
76 lines
3.2 KiB
Text
|
Preferred content expressions can be complicated to write and reason about.
|
||
|
A complex expression can involve lots of repositories that can get into
|
||
|
different states, and needs to be written to avoid unwanted behavior.
|
||
|
|
||
|
It would be very handy to provide some way to prove things about behavior
|
||
|
of preferred content expressions, or a way to simulate the behavior of a
|
||
|
network of git-annex repositories with a given preferred content configuration
|
||
|
|
||
|
## motivating examples
|
||
|
|
||
|
For example, consider two reposities A and B. A is in group M and B is in
|
||
|
group N. A has preferred content `not inallgroup=N` and B has `not inallgroup=M`.
|
||
|
|
||
|
If A contains a file, then B will want to also get a copy. And things
|
||
|
stabilize there. But if the file is removed from A, then B also wants to
|
||
|
remove it. And once B has removed it, A wants a copy of it. And then B also
|
||
|
wants a copy of it. So the result is that the file got transferred twice,
|
||
|
to end up right back where we started.
|
||
|
|
||
|
The worst case of this is `not present`, where the file gets dropped and
|
||
|
transferred over and over again. The docs warn against using that one. But
|
||
|
they can't warn about every bad preferred content expression.
|
||
|
|
||
|
## balanced preferred content
|
||
|
|
||
|
When [[design/balanced_preferred_content]] is added, a whole new level of
|
||
|
complexity will exist in preferred content expressions, because now an
|
||
|
expression does not make a file be wanted by a single repository, but
|
||
|
shards the files amoung repositories in a group.
|
||
|
|
||
|
And up until this point preferred content expressions have behaved the same no
|
||
|
matter the sizes of the underlying repositories, but balanced preferred
|
||
|
content does take repository fullness into account, which further
|
||
|
complicates fully understanding the behavior.
|
||
|
|
||
|
Notice that `balanced()` (in the current design) is not stable when used
|
||
|
on its own, and has to be used as part of a larger expression to make it
|
||
|
stable, eg:
|
||
|
|
||
|
((balanced(backup) and not (copies=backup:1)) or present
|
||
|
|
||
|
So perhaps `balanced()` should include the other checks in it,
|
||
|
to avoid the user shooting themselves in the foot. On the other
|
||
|
hand, if `balanced()` implicitly contains `present`, then `not balanced()`
|
||
|
would include `not present`, which is bad!
|
||
|
|
||
|
(For that matter, what does `not balanced()` even do currently?)
|
||
|
|
||
|
## proof
|
||
|
|
||
|
What could be proved about a preferred content expression?
|
||
|
|
||
|
No idea really. Would be interesting to consider what formal methods can
|
||
|
do here. Could a SAT solver be used somehow for example?
|
||
|
|
||
|
## static analysis
|
||
|
|
||
|
Clearly `not present` is an problematic preferred content expression. It
|
||
|
would be good if git-annex warned and/or refused to set such an expression
|
||
|
if it could detect it. Similarly `not groupwanted` could be detected as a
|
||
|
problem when the group's preferred content expression contains `present`.
|
||
|
|
||
|
Is there is a more general purpose and not expensive way to detect such
|
||
|
problematic expressions, that can find problems such as the
|
||
|
`not inallgroup=N` example above?
|
||
|
|
||
|
## simulation
|
||
|
|
||
|
Simulation seems fairly straightforward, just simulate the network of
|
||
|
git-annex repositories with random files with different sizes and
|
||
|
metadata. Be sure to enforce invariants like numcopies the same as
|
||
|
git-annex does.
|
||
|
|
||
|
Since users can write preferred content expressions, this should be
|
||
|
targeted at being used by end users.
|