This ADR documents the current implementation for Realm Picking in Kernel.
When an user starts the client and no realm is specified, then there are multiple criterias to pick realms, and there is no clear agreement on which of these criterias is more important.
There are different variables to take into consideration:
This is quite straightforward. The main issue is that those variables sometimes run against each other. For instance, what if the catalyst with the lowest latency is empty and there is a catalyst with a little more latency that has users? Or what happens if the catalyst that has the lowest latency is overloaded?
A simple solution would be to define a priority of these variables. But which is more important? The answer most of the time is it depends. For instance, if latency is too much (let's say, >3 seconds), it’d be the most important variable. But if you have two catalysts, one with 20 ms of latency and the other with 30 ms latency, maybe you can consider them equal and the amount of users would become the most important factor. Let’s say that there’s an event, and there are 3 catalysts that have at least 1000 people. In that case, given their latency can be considered equal, balancing the load would probably be the most important factor.
Taking into consideration the cases variables and cases listed above is how the algorithm works. Something that will try to “intelligently” decide which is the best realm, given all the conditions.
This algorithm considers the three variables listed above balancing between them in most of the cases.
The algorithm has some configurable parameters, in order to be able to tune it without deploying, and even making hot changes.
There are a couple of requirements that are nice to have for the algorithm, namely:
This algorithm also checks that the Catalysts are healthy to send traffic to check that they read the endpoint /about.
This algorithm is implemented as a Chain of Responsibility.
This means that it encapsulates the processing elements inside a "pipeline" abstraction; and have clients "launch and leave" their requests at the entrance to the pipeline. The pattern chains the receiving objects together, and then passes any request messages from object to object until it reaches an object capable of handling the message. The number and type of handler objects isn't known a priori, they can be configured dynamically. The chaining mechanism uses recursive composition to allow an unlimited number of handlers to be linked.
The request for the chain will be picking a realm from a list of candidates. Each “rule” of the algorithm can be a link of the chain. Rules are prioritized. If a rule can make a decision, then it does. If not, it delegates in the following rules.
Rules can share a context. This context could be used to avoid recalculating values if rules can reuse them. Rules have a name, which is used to log and report analytics on which rule was used to make a decision. The rules can be enabled or disabled through runtime configuration. Rules have parameters to tune their configuration.
Extending the algorithm should be as simple as adding a new rule and selecting the appropriate priority for it. In the same vein, changing or tuning an existing rule should be quite independent from other rules. Rules may be composed in the future, effectively turning the chain into a kind of decision tree.
Description: Is a Round Robin Mechanism to distribute all peers between all Catalysts.
Configuration:
export type LoadBalancingConfig = {
type: AlgorithmLinkTypes.LOAD_BALANCING
}
Description: Calculates the latency to all peer candidates, then filters out the ones with a latency greater than the threshold. If only one meets the condition, then it's returned. If not, then it delegates the decision to the next link from the filtered peers.
Pseudo Code:
export function largeLatencyLink() {
const sorted // all peers sorted by latency
const minElapsed = sorted[0]
picked = sorted.filter((it) => it.elapsed - minElapsed < largeLatencyThreshold)
if (picked.length === 1) {
context.selected = context.picked[0] // return a candidate only if one meets the threshold
}
return picked // delegate to next link only the filtered peers
}
Configuration:
export type LargeLatencyConfig = {
type: AlgorithmLinkTypes.LARGE_LATENCY
config?: { largeLatencyThreshold: number }
}
Description: Assigns to all peers a scored based on their amount of total users. Then
calculates the estimation of the size of the peer if assigning current user to it, and returns
the score deduced by latency, equivalent to users. This responds to the following formula:
m * (e ^ (x / d) - 1)
.
Pseudo Code:
export function allPeersScoreLink() {
const score = peers.forEach((peer) => calculateAllPeersScore(peer))
return selectFirstByScore(context, score, definitiveDecisionThreshold)
}
function linearUsersScore(usersCount: number) {
return baseScore + usersCount
}
export function calculateAllPeersScore(peer) {
if (peer.usersCount === 0) return 0 // Prefer realms that have users. Those will have at least baseScore
if (maxUsers) {
// Try to fill all realms until around the percentage provided
if (peer.usersCount >= fillTargetPercentage * max) {
// If this is the case, then it's "downward" phase of the score
// Calculate a segment joining the fillTargetPercentage% of users with baseScore at discourageFillTargetPercentage% maxUsers
// In that way, when reach discourageFillTargetPercentage% maxUsers, realms that have at least one user start to get prioritized
const segment = {
start: { x: fillTargetPercentage * max, y: linearUsersScore(fillTargetPercentage * max) },
end: { x: discourageFillTargetPercentage * max, y: baseScore }
}
const slope = (segment.end.y - segment.start.y) / (segment.end.x - segment.start.x)
// The score is the result of calculating the corresponding point of this segment at usersCount
return segment.start.y + slope * (count - segment.start.x)
}
}
return linearUsersScore(peer.usersCount)
}
Configuration:
export type AllPeersScoreConfig = {
type: AlgorithmLinkTypes.ALL_PEERS_SCORE
config?: {
baseScore?: number // Base score for any realm that has at least 1 user. Default: 40
fillTargetPercentage?: number // If the realm has maxUsers, the score will rise only until the target percentage of fullness represented by this value is reached
discourageFillTargetPercentage?: number // If the realm has maxUsers, the score will become baseScore when this percentage is reached
definitiveDecisionThreshold?: number // If the score difference between two candidates is greater than this value, this link makes a definitive decision. Otherwise, it is delegated to the next link
latencyDeductionsParameters?: LatencyDeductionsParameters
}
}
Description: Calculates the score acording the amount of users near the current parcel.
Pseudo Code:
export function closePeersScoreLink() {
const score = peers.forEach((peer) => closeUsersScore(peer))
return selectFirstByScore(context, score, definitiveDecisionThreshold)
}
export function closeUsersScore(peer) {
const parcels = usersParcels(peer)
if (parcels && parcels.length > 0) {
return baseScore + countParcelsCloseTo(currentParcel, parcels, closePeersDistance)
} else return 0
}
Configuration:
export type ClosePeersScoreConfig = {
type: AlgorithmLinkTypes.CLOSE_PEERS_SCORE
config?: {
closePeersDistance?: number // Distance in parcels to which a peer is considered close, so it can count for the score.
baseScore?: number
definitiveDecisionThreshold?: number // If the score difference between two candidates is greater than this value, the link makes a definitive decision. Otherwise, it delegates to the next link
latencyDeductionsParameters?: LatencyDeductionsConfig
}
}
export type LatencyDeductionsConfig = Partial<LatencyDeductionsParameters>
/**
* Score deduced by latency, equivalent to users. This responds to the following formula: m * (e ^ (x / d) - 1)
* Where m is the multiplier, e is Euler's number, x is the latency and d is the exponencialDivisor.
* See here for a visualization of the formula: https://www.desmos.com/calculator/zflj2ik6pl
* By default, these values are 60 for the multiplier, and 700 for the divisor, resulting, for example, in the following values:
*
* | latency | deduction |
* | ------- | --------- |
* | 500 | 62 |
* | 750 | 115 |
* | 1000 | 190 |
* | 1250 | 300 |
* | 1500 | 451 |
* | 1750 | 670 |
* | 2000 | 984 |
*
* If a maxDeduction is provided, then no more than that number of users will be deduced from the score.
*/
export type LatencyDeductionsParameters = {
multiplier: number
exponentialDivisor: number
maxDeduction: number
}
To Configure which algorithms to use, then a feature flag
explorer-pick_realm_algorithm_config
is used. Currently it has two
configurations:
[
{
"type": "ALL_PEERS_SCORE"
},
{
"type": "CLOSE_PEERS_SCORE"
}
]
[
{
"type": "LARGE_LATENCY"
},
{
"type": "LOAD_BALANCING"
},
{
"type": "CLOSE_PEERS_SCORE"
},
{
"type": "ALL_PEERS_SCORE"
}
]
function pickCandidate(candidates: Candidate[], userParcel: Parcel) {
// candidates = all DAO Peers
if (candidates.length === 0) throw new Error("Cannot pick candidates from an empty list")
for (const link of chain) {
// Default: chain = ["ALL_PEERS_SCORE", "CLOSE_PEERS_SCORE"]
context = link.pick(context) // Execute the current Link
if (context.selected) return context.selected // If a link picks a particular candidate, it is returned
}
return candidates[0] // If all the links have gone through, and there is not a clear candidate, then pick the first
}
The main benefit is having an algorithm that can be tuned. In the long run, it could take advantage of the resources available to enable a nice user experience for those users who don’t pick a particular realm.
Another option is also to leave the choice of the realm to the user, making them pick a realm from a list of options or suggestions when they haven’t selected one.