a612fe7299
Tagging with projects/openneuro as Christopher Markiewicz has oked them funding at least the initial design work on this.
77 lines
3.2 KiB
Markdown
77 lines
3.2 KiB
Markdown
Preferred content expressions can be complicated to write and reason about.
|
|
A complex expression can involve lots of repositories that can get into
|
|
different states, and needs to be written to avoid unwanted behavior.
|
|
|
|
It would be very handy to provide some way to prove things about behavior
|
|
of preferred content expressions, or a way to simulate the behavior of a
|
|
network of git-annex repositories with a given preferred content configuration
|
|
|
|
## motivating examples
|
|
|
|
For example, consider two reposities A and B. A is in group M and B is in
|
|
group N. A has preferred content `not inallgroup=N` and B has `not inallgroup=M`.
|
|
|
|
If A contains a file, then B will want to also get a copy. And things
|
|
stabilize there. But if the file is removed from A, then B also wants to
|
|
remove it. And once B has removed it, A wants a copy of it. And then B also
|
|
wants a copy of it. So the result is that the file got transferred twice,
|
|
to end up right back where we started.
|
|
|
|
The worst case of this is `not present`, where the file gets dropped and
|
|
transferred over and over again. The docs warn against using that one. But
|
|
they can't warn about every bad preferred content expression.
|
|
|
|
## balanced preferred content
|
|
|
|
When [[design/balanced_preferred_content]] is added, a whole new level of
|
|
complexity will exist in preferred content expressions, because now an
|
|
expression does not make a file be wanted by a single repository, but
|
|
shards the files amoung repositories in a group.
|
|
|
|
And up until this point preferred content expressions have behaved the same no
|
|
matter the sizes of the underlying repositories, but balanced preferred
|
|
content does take repository fullness into account, which further
|
|
complicates fully understanding the behavior.
|
|
|
|
Notice that `balanced()` (in the current design) is not stable when used
|
|
on its own, and has to be used as part of a larger expression to make it
|
|
stable, eg:
|
|
|
|
((balanced(backup) and not (copies=backup:1)) or present
|
|
|
|
So perhaps `balanced()` should include the other checks in it,
|
|
to avoid the user shooting themselves in the foot. On the other
|
|
hand, if `balanced()` implicitly contains `present`, then `not balanced()`
|
|
would include `not present`, which is bad!
|
|
|
|
(For that matter, what does `not balanced()` even do currently?)
|
|
|
|
## proof
|
|
|
|
What could be proved about a preferred content expression?
|
|
|
|
No idea really. Would be interesting to consider what formal methods can
|
|
do here. Could a SAT solver be used somehow for example?
|
|
|
|
## static analysis
|
|
|
|
Clearly `not present` is an problematic preferred content expression. It
|
|
would be good if git-annex warned and/or refused to set such an expression
|
|
if it could detect it. Similarly `not groupwanted` could be detected as a
|
|
problem when the group's preferred content expression contains `present`.
|
|
|
|
Is there is a more general purpose and not expensive way to detect such
|
|
problematic expressions, that can find problems such as the
|
|
`not inallgroup=N` example above?
|
|
|
|
## simulation
|
|
|
|
Simulation seems fairly straightforward, just simulate the network of
|
|
git-annex repositories with random files with different sizes and
|
|
metadata. Be sure to enforce invariants like numcopies the same as
|
|
git-annex does.
|
|
|
|
Since users can write preferred content expressions, this should be
|
|
targeted at being used by end users.
|
|
|
|
[[!tag projects/openneuro]]
|