update

2024-03-15 10:13:12 -04:00 · 2024-03-15 10:13:12 -04:00 · f001ae7c09
commit f001ae7c09
parent 7d407516c4
1 changed files with 87 additions and 83 deletions
--- a/doc/design/balanced_preferred_content.mdwn
+++ b/doc/design/balanced_preferred_content.mdwn
@ -1,4 +1,4 @@
-Say we have 2 backup drives and want to fill them both evenly with files,
+Say we have 2 drives and want to fill them both evenly with files,
 different files in each drive. Currently, preferred content cannot express
 that entirely:
@ -6,11 +6,20 @@ that entirely:
 * Or, can let both repos take whatever files, perhaps at random, that the
  other repo is not know to contain, but then repos will race and both get
  the same file, or similarly if they are not communicating frequently.
  Existing preferred content expressions such as the one for archive group
  have this problem.
 So, let's add a new expression: `balanced(group)`
 ## implementation
 This would work by taking the list of uuids of all repositories in the
-group, and sorting them, which yields a list from 0..M-1 repositories.
+group that have enough free space to store a key, and sorting them, 
 which yields a list from 0..M-1 repositories.
 (To know if a repo has enough free space to store a key 
 will need [[todo/track_free_space_in_repos_via_git-annex_branch]]
 to be implemented.)
 To decide which repository wants key K, convert K to a number N in some
 stable way and then `N mod M` yields the number of the repository that
@ -19,60 +28,29 @@ wants it, while all the rest don't.
 (Since git-annex keys can be pretty long and not all of them are random
 hashes, let's md5sum the key and then use the md5 as a number.)
-This expression is stable as long as the members of the group don't change.
+## stability
 I think that's stable enough to work as a preferred content expression.
-Now, you may want to be able to add a third repo and have the data be
+Note that this preferred content expression will not be stable. A change in 
-rebalanced, with some moving to it. And that would happen. However, as this
+the members of the group will change which repository is selected. And
-scheme stands, it's equally likely that adding repo3 will make repo1 and
+changes in how full repositories are will also change which repo is
-repo2 want to swap files between them. So, we'll want to add some
+selected.
-precautions to avoid a lot of data moving around in this case:
+
 Without stability, when another repo is added to the group, all data will
 be rebalanced, with some moving to it. Which could be desirable in some
 situations, but the problem is that it's likely that adding repo3 will make
 repo1 and repo2 want to swap some files between them,
 So, we'll want to add some precautions to avoid a lot of data moving around
 in such a case:
 	((balanced(backup) and not (copies=backup:1)) or present
 So once file lands on a backup drive, it stays there, even if more backup
 drives change the balancing.
-----
+## use case: 3 of 5
-Some limitations:
+What if we have 5 backup repos and want each key to be stored in 3 of them?
 * The item size is not taken into account. One repo could end up with a
  much larger item or items and so fill up faster. And the other repo
  wouldn't then notice it was full and take up some slack.
 * With the complicated expression above, adding a new repo when one 
  is full would not necessarily result in new files going to one of the 2
  repos that still have space. Some items would end up going to the full
  repo.
 These can be dealt with by noticing when a repo is full and moving some
 of it's files (any will do) to other repos in its group. I don't see a way
 to make preferred content express that movement though; it would need to be
 a manual/scripted process.
 > Could the size of each repo be recorded (either actual disk size or
 > desired max size) and when a repo is too full to hold an object, be left
 > out of the set of repos used to calculate where to store that object?
 >
 > With the preferred content expression above with "present" in it, 
 > a repo being full would not cause any content to be moved off of it,
 > only new content that had not yet reached any of the repos in the 
 > group would be affected. That seems good.
 > 
 > This would need only a single one-time write to the git-annex branch,
 > to record the repo size. Then update a local counter for each repository
 > from the git-annex branch location log changes. 
 > There is a todo about doing this,
 > [[todo/track_free_space_in_repos_via_git-annex_branch]].
 > 
 > Of course, in the time after the git-annex branch was updated and before
 > it reaches the local repo, a repo can be full without us knowing about
 > it. Stores to it would fail, and perhaps be retried, until the updated
 > git-annex branch was synced.
 -----
 What if we have 5 backup repos and want each file to land in 3 of them?
 There's a simple change that can support that:
 `balanced(group:3)`
@ -80,29 +58,15 @@ This works the same as before, but rather than just `N mod M`, take
 `N+I mod M` where I is [0..2] to get the list of 3 repositories that want a
 key.
-This does not really avoid the limitations above, but having more repos
+However, once 3 of those 5 repos get full, new keys will only be able to be
-that want each file will reduce the chances that no repo will be able to
+stored on 2 of them. At that point one or more new repos will need to be
-take a given file. In the [[iabackup]] scenario, new clients will just be
+added to reach the goal of each key being stored in 3 of them. It would be
-assigned until all the files reach the desired level or replication.
+possible to rebalance the 3 full repos by moving some keys from them to the
 other 2 repos, and eke out more storage before needing to add new
 repositories. A separate rebalancing pass, that does not use preferred
 content alone, could be implemented to handle this (see below).
-However.. Imagine there are 9 repos, all full, and some files have not
+## use case: geographically distinct datacenters
 reached desired level of replication. Seems like adding 1 more repo will make
 only 3 in 10 files be wanted by that new repo. Even if the repo has space
 for all the files, it won't be sufficient, and more repos would need to be
 added.
 One way to avoid this problem would be if the preferred content was only
 used for the initial distribution of files to a repo. If the repo has
 gotten all the files it wants, it could make a second pass and
 opportunistically get files it doesn't want but that it has space for
 and that don't have enough copies yet.
 Although this gets back to the original problem of multiple repos racing
 downloads and files getting more than the desired number of copies.
 > With the above idea of tracking when repos are full, the new repo
 > would want all files when the other 9 repos are full.
 ----
 Of course this is not limited to backup drives. A more complicated example:
 There are 4 geographically distributed datacenters, each of which has some
@ -112,7 +76,7 @@ on some drive there.
 This can be implemented by making a group for each datacenter, which all of
 its drives are in, and using `balanced()` to pick the drive that holds the
 copy of the file. The preferred content expression would be eg:
-	
+
    ((balanced(datacenterA) and not (copies=datacenterA:1)) or present
 In such a situation, to avoid a `N^2` remote interconnect, there might be a
@ -126,20 +90,60 @@ the place that `balanced()` picks for a group. Eg,
 `balancedgroup=datacenterA` for 1 copy and `balancedgroup=group:datacenterA:2`
 for N copies.
----
+The [[design/passthrough_proxy]] idea is an alternate way to put a
 repository in front of such a cluster, that does not need additional
 extensions to preferred content.
-Another possibility to think about is to have one repo calculate which
+## split brain situations
 files to store on which repos, to best distribute and pack them. The first
 repo that writes a solution would win and other nodes would work to move
 files around as needed. 
-In a split brain situation, there would be sets of repos doing work toward 
+Of course, in the time after the git-annex branch was updated and before
-different solutions. On merge it would make sense to calculate a new
+it reaches the local repo, a repo can be full without us knowing about
-solution that takes that work into account as well as possible. (Some work
+it. Stores to it would fail, and perhaps be retried, until the updated
-would surely have been in vain.)
+git-annex branch was synced. 
-## see also
+In the worst case, a split brain situation
 can make the balanced preferred content expression
 pick a different repository to hold two independent
 stores of the same key. Eg, when one side thinks one repo is full,
 and the other side thinks the other repo is full.
 If `present` is used in the preferred content, both of them will then
 want to contain it. (Is `present` really needed like shown in the examples
 above?)
 If it's not, one of them will drop it and the other will
 usually maintain its copy. It would perhaps be possible for both of
 them to drop it, leading to a re-upload cycle. This needs some research
 to see if it's a real problem. 
 See [[todo/proving_preferred_content_behavior]].
 ## rebalancing
 In both the 3 of 5 use case and a split brain situation, it's possible for
 content to end up not optimally balanced between repositories. git-annex
 can be made to operate in a mode where it does additional work to rebalance
 repositories. 
 This can be an option like --rebalance, that changes how the preferred content
 expression is evaluated. The user can choose where and when to run that.
 Eg, it might be run on a node inside a cluster after adding more storage to
 the cluster.
 In several examples above, we have preferred content expressions in this
 form:
    ((balanced(group:N) and not (copies=group:N)) or present
 In order to rebalance, that needs to be changed to:
    balanced(group:N)
 What could be done is make `balanced()` usually expand to the former,
 but when --rebalance is used, it only expands to the latter.
 (Might make the fully balanced behavior available as `fullybalanced()` for
 users who want it, then 
 `balanced() == ((fullybalanced(group:N) and not (copies=group:N)) or present`
 usually and when --rebalance is used, `balanced() == fullybalanced(group:N)`
 [[todo/proving_preferred_content_behavior]]  
 [[todo/passthrough_proxy]]