2024-05-01 16:14:59 +00:00
|
|
|
[[!toc ]]
|
|
|
|
|
2024-05-01 16:18:14 +00:00
|
|
|
## motivating examples
|
|
|
|
|
2024-03-13 15:04:06 +00:00
|
|
|
Preferred content expressions can be complicated to write and reason about.
|
|
|
|
A complex expression can involve lots of repositories that can get into
|
|
|
|
different states, and needs to be written to avoid unwanted behavior.
|
|
|
|
|
|
|
|
It would be very handy to provide some way to prove things about behavior
|
|
|
|
of preferred content expressions, or a way to simulate the behavior of a
|
|
|
|
network of git-annex repositories with a given preferred content configuration
|
|
|
|
|
|
|
|
The worst case of this is `not present`, where the file gets dropped and
|
|
|
|
transferred over and over again. The docs warn against using that one. But
|
|
|
|
they can't warn about every bad preferred content expression.
|
|
|
|
|
2024-09-03 17:49:34 +00:00
|
|
|
Mostly, git-annex manages to keep things stable that seem like they would
|
|
|
|
not be. Consider repo A that is not in group foo, and B is in group foo. A
|
|
|
|
has preferred content "onlyingroup=foo". This will make A want a file that
|
|
|
|
is in B. And once it has it, it will not want to drop it. That's because
|
|
|
|
when dropping, it considers if it would be preferred content after the
|
|
|
|
drop. In this case it would, so it doesn't drop it.
|
|
|
|
|
2024-03-13 15:04:06 +00:00
|
|
|
## balanced preferred content
|
|
|
|
|
|
|
|
When [[design/balanced_preferred_content]] is added, a whole new level of
|
|
|
|
complexity will exist in preferred content expressions, because now an
|
|
|
|
expression does not make a file be wanted by a single repository, but
|
|
|
|
shards the files amoung repositories in a group.
|
|
|
|
|
|
|
|
And up until this point preferred content expressions have behaved the same no
|
|
|
|
matter the sizes of the underlying repositories, but balanced preferred
|
|
|
|
content does take repository fullness into account, which further
|
|
|
|
complicates fully understanding the behavior.
|
|
|
|
|
2024-09-03 17:49:34 +00:00
|
|
|
Notice that `fullybalanced()` is not stable when used
|
2024-09-03 15:52:54 +00:00
|
|
|
on its own, and so `balanced()` adds an "or present" to stabilize it.
|
|
|
|
And so `not balanced()` includes `not present`, which is bad!
|
2024-03-13 15:04:06 +00:00
|
|
|
|
|
|
|
## proof
|
|
|
|
|
|
|
|
What could be proved about a preferred content expression?
|
|
|
|
|
|
|
|
No idea really. Would be interesting to consider what formal methods can
|
|
|
|
do here. Could a SAT solver be used somehow for example?
|
|
|
|
|
|
|
|
## static analysis
|
|
|
|
|
|
|
|
Clearly `not present` is an problematic preferred content expression. It
|
|
|
|
would be good if git-annex warned and/or refused to set such an expression
|
|
|
|
if it could detect it. Similarly `not groupwanted` could be detected as a
|
|
|
|
problem when the group's preferred content expression contains `present`.
|
|
|
|
|
2024-09-03 17:49:34 +00:00
|
|
|
> This is now detected and such an unstable expression never matches.
|
|
|
|
> --debug explains why too.
|
|
|
|
>
|
|
|
|
> Note that the detection will not be trigged by `"not (not present)"`,
|
|
|
|
> but it will by `"include=* or (not present)"` even though that is always
|
|
|
|
> stable, because `"include=*"` always matches and so what it's ORed with
|
|
|
|
> doesn't matter. Probably noone will set something like that in real life
|
|
|
|
> though.
|
|
|
|
>
|
|
|
|
> It's problimatic to make `git-annex wanted` warn about it. Consider
|
|
|
|
> if in one repository, groupwanted is set to "present". In another
|
|
|
|
> repository, which is disconnected, wanted is set to "not groupwanted".
|
|
|
|
> Both operations are ok, but upon merging the two repositories,
|
|
|
|
> the combined effect is that "not present" has been set.
|
|
|
|
>
|
|
|
|
> So while it could warn sometimes on setting "not present",
|
|
|
|
> it would sometimes not be able to. Better to not warn inconsistently.
|
|
|
|
> --[[Joey]]
|
2024-03-13 15:04:06 +00:00
|
|
|
|
|
|
|
## simulation
|
|
|
|
|
|
|
|
Simulation seems fairly straightforward, just simulate the network of
|
|
|
|
git-annex repositories with random files with different sizes and
|
2024-09-03 17:49:34 +00:00
|
|
|
metadata. Or use the current files and metadata.
|
|
|
|
Be sure to enforce invariants like numcopies the same as git-annex does.
|
2024-03-13 15:04:06 +00:00
|
|
|
|
|
|
|
Since users can write preferred content expressions, this should be
|
|
|
|
targeted at being used by end users.
|
2024-05-01 15:04:20 +00:00
|
|
|
|
2024-09-03 18:23:48 +00:00
|
|
|
The sim could be run in a clone of a repository, and update location
|
|
|
|
logs as it runs. This would let the user query with `whereis` and
|
|
|
|
`maxsize` etc to see what happens.
|
|
|
|
|
|
|
|
Such a repository's location tracking would no longer match reality,
|
|
|
|
so it would need to be clearly marked as a simulation result, and be
|
|
|
|
prevented from merging back into another repository. This can be done by
|
|
|
|
adding a new Difference to the repository.
|
|
|
|
|
|
|
|
The sim would need a map of repositories with connections between them.
|
|
|
|
Perhaps `git-annex map` could be used?
|
|
|
|
|
|
|
|
For each step of the sim, it would pick a repository from the map
|
|
|
|
(excluding special remotes), and simulate an operation being run in that
|
|
|
|
repository, affecting it and its remotes.
|
|
|
|
|
|
|
|
Split brain needs to be simulated, so the operations in the sim should
|
|
|
|
include pushing and fetching the git-annex branch. The ref of each
|
|
|
|
git-annex branch of each repository would be stored, with
|
|
|
|
refs/heads/git-annex being set to the git-annex branch of the repository
|
|
|
|
it is currently simulating an operation in.
|
|
|
|
|
|
|
|
The other operations would include get, drop, copy, move, sync, all
|
|
|
|
with preferred content respected.
|
|
|
|
|
|
|
|
May want to also simulate adding files to a repository, which would be
|
|
|
|
generated (without any actual content) according to simulation parameters.
|
|
|
|
Also file moves and deletions. `git-annex fuzztest` has some prior art.
|
|
|
|
|
|
|
|
The location log history could be examined at the end of the simulation
|
|
|
|
to find problems like instability.
|
|
|
|
|
2024-05-01 15:04:20 +00:00
|
|
|
[[!tag projects/openneuro]]
|
2024-09-23 19:27:45 +00:00
|
|
|
|
|
|
|
> [[done]], see `git-annex sim` command. --[[Joey]]
|