Given the number of new features we have, we need a way to measure the impact that each one has on the use of our products. A/B testing is what we use to decide between two or more versions of a new feature and/or to compare it with the current one. This ADR describes how we measure A/B testings in our products.
To setup an A/B testing we use Feature Flags as an assignation tool and Segment as a measurement tool.
Using this standard, an A/B is defined as a feature flag that ends with
_variant
with each of the possible variants defined as a variant and the
activation defined as your application required.
Please note that if you need to test a new variant against the current one it is preferable to use a variant with two options (
enabled
anddisabled
) instead of the activation because otherwise, you won't be able to differentiate (and measure) when your A/B is active or just should show the current version.
Just create a new feature that ends _variant
You can define your activation as your project needs, but in most of the cases you will need one of this two options:
applicationHostname
gradualRolloutSessionId
applicationHostname
allows you to keep your test active on any testing
environment and when its active on production
If you also add gradualRolloutSessionId
you can rollout your test gradually
In the VARIANT
tab you can define as many cases as you need and control the
percentage of users that will get each variant.
Please note that if you define and activation with rollout and
enabled
/disabled
variants you will split your users as follow:
- assuming total:
1.000
users- activation rollout (50%):
500
usersenabled
variant (50%):250
usersdisabled
variant (50%):250
usersFor a total of
1.000
only half of them will count as testing users and half of those will see theenabled
options, that means that750
users will see the old version but only250
will be considered in the result of the A/B testing. If your objective is to test500
users withenabled
and500
users withdisabled
remove the rollout configuration.
Once your A/B testing is ready you should get something like this:
{
"flags": {
"test-flag": true, // not an A/B testing
"test-test-feature_name_variant": true
},
"variants": {
"test-feature_name_variant": {
"name": "enabled" // or "disabled"
}
}
}
If instead your A/B testing is inactive you should get something like this:
{
"flags": {
"test-flag": true // not an A/B testing
},
"variants": {}
}
Some times you need to include additional data to know how to handle each variant, this is useful because you can develop the variant implementation detached from the end value (that can be content that is not ready yet). You will then be able to develop and deploy your A/B testing and, other teams in charge of producing the content, will be able to test it without any other interaction with your code.
To measure if any of the variants of an A/B testing is significant we need to know three things:
To know the total number of users we send an event feature_flags
each time a user
gets a new set of feature flags, but to know which variant each user gets we need to format
our feature flags as the following array:
;[`${FEATURE_FLAG_NAME}`, `${FEATURE_FLAG_NAME}:${FEATURE_FLAG_VARIANT_NAME}`]
For example, the previous feature flags will be format as:
[
"test-flag", // not an A/B testing
"test-feature_name_variant",
"test-feature_name_variant:enabled"
// or "test-feature_name_variant:disabled",
]
And send it in the prop featureFlags
this way we can easily search feature flags
and/or variants.
analytics.track(`feature_flags`, {
featureFlags: [
"test-flag", // not an A/B testing
"test-feature_name_variant",
"test-feature_name_variant:enabled"
]
})
Finally, you need to define your success metric (and time frame). This way the data team (or any other team) will be able to generate a dashboard to measure the success (or failure) of your test.