[[!toc ]] ## motivating examples Preferred content expressions can be complicated to write and reason about. A complex expression can involve lots of repositories that can get into different states, and needs to be written to avoid unwanted behavior. It would be very handy to provide some way to prove things about behavior of preferred content expressions, or a way to simulate the behavior of a network of git-annex repositories with a given preferred content configuration For example, consider two reposities A and B. A is in group M and B is in group N. A has preferred content `not inallgroup=N` and B has `not inallgroup=M`. If A contains a file, then B will want to also get a copy. And things stabilize there. But if the file is removed from A, then B also wants to remove it. And once B has removed it, A wants a copy of it. And then B also wants a copy of it. So the result is that the file got transferred twice, to end up right back where we started. The worst case of this is `not present`, where the file gets dropped and transferred over and over again. The docs warn against using that one. But they can't warn about every bad preferred content expression. ## balanced preferred content When [[design/balanced_preferred_content]] is added, a whole new level of complexity will exist in preferred content expressions, because now an expression does not make a file be wanted by a single repository, but shards the files amoung repositories in a group. And up until this point preferred content expressions have behaved the same no matter the sizes of the underlying repositories, but balanced preferred content does take repository fullness into account, which further complicates fully understanding the behavior. Notice that `fullbalanced()` is not stable when used on its own, and so `balanced()` adds an "or present" to stabilize it. And so `not balanced()` includes `not present`, which is bad! ## proof What could be proved about a preferred content expression? No idea really. Would be interesting to consider what formal methods can do here. Could a SAT solver be used somehow for example? ## static analysis Clearly `not present` is an problematic preferred content expression. It would be good if git-annex warned and/or refused to set such an expression if it could detect it. Similarly `not groupwanted` could be detected as a problem when the group's preferred content expression contains `present`. Is there is a more general purpose and not expensive way to detect such problematic expressions, that can find problems such as the `not inallgroup=N` example above? ## simulation Simulation seems fairly straightforward, just simulate the network of git-annex repositories with random files with different sizes and metadata. Be sure to enforce invariants like numcopies the same as git-annex does. Since users can write preferred content expressions, this should be targeted at being used by end users. [[!tag projects/openneuro]]