A/B testing is a crucial method used in optimizing websites and applications, allowing businesses to compare two versions of a webpage or app to determine which performs better. Understanding the key metrics and terminology involved in A/B testing is essential for interpreting results accurately. In this article, we will examine important A/B testing metrics and terminology, including p-value, confidence interval, one-sided and two-sided tests, z-score, observed power, variant, control group, incremental revenue, conversion rate and Bayesian calculation.

Key A/B Testing Metrics and Terminology

1. Variant

A variant refers to one of the versions being tested in an A/B test. Typically, the existing version is called the control, and the new version is the variant.

Example: In an A/B test of a landing page, Version A (the current page) is the control, and Version B (the new design) is the variant.

2. Control Group

The control group is the group of users exposed to the original version (control) in an A/B test. It serves as a baseline to compare the performance of the variant.

Example: If 10,000 users visit a website, 5,000 might see the control page (control group), and 5,000 might see the variant page.

 

Incremental-revenue.png
Source: https://getrecast.com/incrementality/

 

3. Incremental Revenue

Incremental revenue refers to the additional revenue generated as a result of changes made during an A/B test. It helps in assessing the financial impact of the test.

Example: If the variant page increases the average order value by $5 and 1,000 additional purchases are made, the incremental revenue is $5,000.

 

65a7d2b7e323ce3c628e0eeb_conversion-rate-formula.png

 

4. Conversion Rate

Conversion rate is the percentage of users who complete a desired action, such as making a purchase or signing up for a newsletter, out of the total number of visitors.

Example: If 100 out of 1,000 visitors make a purchase, the conversion rate is 10%.

5. P-Value

The p-value measures the probability that the observed difference between two variations occurred by chance. A lower p-value (typically less than 0.05) indicates that the observed difference is statistically significant.

Example: Suppose an A/B test compares two versions of a landing page. Version A has a conversion rate of 5%, and Version B has a conversion rate of 7%. If the p-value is 0.03, there is a 3% chance that the observed difference occurred by chance, indicating a significant difference between the two versions.

 

confidence-interval-formula.jpg

 

6. Confidence Interval

The confidence interval provides a range within which the true effect size is expected to lie, with a certain level of confidence (usually 95%). It helps assess the reliability of the test results.

Example: In the same A/B test, the 95% confidence interval for the difference in conversion rates might be [1%, 3%]. This means that we are 95% confident that the true difference in conversion rates lies between 1% and 3%.

7. One-Sided and Two-Sided Tests

A one-sided test assesses the direction of the effect (e.g., whether Version B is better than Version A), while a two-sided test assesses whether there is any difference in either direction.

One-Sided Test Example: Tests if Version B's conversion rate is higher than Version A's.
Two-Sided Test Example: Tests if there is any difference between the conversion rates of Version A and Version B, regardless of direction.

 

1_FCAkTCjZtmuADgbSNwYudA.jpg

 

8. Z-Score

The z-score measures how many standard deviations an element is from the mean. In A/B testing, it is used to determine the significance of the observed difference between two variations. Common confidence levels and their z-score equivalents:

  • Confidence interval 95%
    • Two-Sided Z-Score: 1.96
    • One-Sided Z-Score: 1.65
  • Confidence interval 99%
    • Two-Sided Z-Score: 2.58
    • One-Sided Z-Score: 2.33
  • Confidence interval 90%
    • Two-Sided Z-Score: 1.64
    • One-Sided Z-Score: 1.28

Example: If the z-score for the difference in conversion rates between Version A and Version B is 2.5, it indicates that the difference is 2.5 standard deviations away from the mean, suggesting a statistically significant difference.

9. Observed Power

Observed power refers to the probability that the test correctly rejects the null hypothesis when there is a true effect. Higher observed power indicates a higher likelihood of detecting a true difference.

Example: In an A/B test with an observed power of 0.8 (80%), there is an 80% chance of detecting a true difference between the variations if one exists.

 

bayesian-formula.png
Source: https://www.freecodecamp.org/news/bayes-rule-explained/

 

10. Bayesian Calculation

Bayesian calculation involves using Bayes' theorem to update the probability estimate for a hypothesis as additional evidence is acquired. In A/B testing, it provides a probabilistic framework to make decisions based on the data.

Example: Using Bayesian methods, you can determine the probability that one variant is better than the control given the observed data, rather than relying solely on traditional p-values.

 

ba93f062-2975-4281-8923-4374ed171a9a_1920x1080.png
Source: https://thepalindrome.org/p/is-probability-frequentist-or-bayesian

 

11. Frequentist Statistics

Frequentist statistics is a traditional approach in hypothesis testing that focuses on the frequency or proportion of data. It relies on fixed data sets and does not incorporate prior knowledge or probability distributions.

Example: In a Frequentist approach to A/B testing, you would use p-values and confidence intervals to determine the significance of the test results, without incorporating prior probabilities.

Practical Examples

Example 1: Email Campaign A/B Test

A company wants to test two email subject lines to see which one results in higher open rates.

  • Subject Line A: 25% open rate
  • Subject Line B: 28% open rate
  • P-Value: 0.02 (indicating a significant difference)
  • Confidence Interval: [2%, 5%] (95% confidence that the true difference in open rates is between 2% and 5%)
  • Z-Score: 2.33 (suggesting a statistically significant difference)
  • Observed Power: 0.85 (85% chance of detecting a true difference)
Example 2: Website Landing Page A/B Test

An e-commerce website tests two landing page designs to determine which leads to more purchases.

  • Design A: 4% conversion rate
  • Design B: 5% conversion rate
  • P-Value: 0.045 (indicating a significant difference)
  • Confidence Interval: [0.5%, 1.5%] (95% confidence that the true difference in conversion rates is between 0.5% and 1.5%)
  • Z-Score: 2.01 (suggesting a statistically significant difference)
  • Observed Power: 0.78 (78% chance of detecting a true difference)

A/B testing is a powerful tool for optimizing digital experiences, and understanding its key metrics and terminology is crucial for accurate interpretation. Switas knows how to conduct effective A/B tests, ensuring that businesses can make data-driven decisions to enhance their performance and provides reliable and actionable insights that drive growth and success.