How do I measure the statistical significance of test result
How do I measure the statistical significance of test results in an AB test?
Solution
Even though measuring Statistical significance in an A/B test is of great importance, there exists no proper rule to determine it\'s significant value.
There are no \'do\'s\' over here, but a list of don\'ts for sure.
What drives our needed sample size?
There are a few concerns that drive the sample size required for a meaningful A/B test:
1) We want to be reasonably sure that we don’t have a false positive—that there is no real difference, but we detect one anyway. Statisticians call this Type I error.
2) We want to be reasonably sure that we don’t miss a positive outcome (or get a false negative). This is called Type II error.
3) We want to know whether a variation is better, worse or the same as the original. Why do we want to know the difference between worse vs same? I probably won’t switch from the original if the variation performs worse, but I might still switch even if it’s the same—for a design or aesthetic preference, for example.
What not to do
There are a few “gotchas” that are worth watching out for when you start thinking about the statistical significance of A/B tests:
1) Don’t look at your A/B testing tool’s generic advice that “about 100 conversions are usually required for significance”. Your conversion rate and desired sensitivity will determine this, and A/B testing tools are always biased to want you to think you have significant results as quickly as possible.
2) Don’t continuously test for significance as your sample grows, or blindly keep the test running until you reach statistical significance. Evan Miller wrote a great explanation of why you shouldn’t do this, but briefly:
3) Don’t rely on a rule of thumb like “16 times your standard deviation squared divided by your sensitivity squared”. Same thing with the charts you see on some websites that don’t make their assumptions clear. It’s better than a rule of thumb like “100 conversions”, but the math isn’t so hard it’s worth skipping over, and you’ll gain an understanding of what’s driving required sample size in the process.
How to calculate your needed sample size
Instead of continuously testing or relying on generic rules of thumb, you can calculate the needed sample size and statistical significance very easily. For simplicity, I’ve assumed you’re doing an A vs B test (two variations), but this same approach can be scaled for other things.
1) Specify the outcome you’re trying to measure. We typically measure conversion to signup as the primary measure, but depending on what you’re testing, it might be button clicks, newsletter signups, etc. In almost every case, you’ll be measuring a proportion—e.g., the portion of landing page visitors who complete signup, or the portion of landing page visitors who sign up for a newsletter.
2) Decide how substantial of a difference you’d like to detect – this is the sensitivity of the test. I generally target an A/B test that will have a statistically meaningful sample size that detects a 10% difference in conversion rate (e.g., to detect 11% vs. 10% conversion rate). This is a somewhat arbitrary decision you’ll have to make—testing a reasonably large difference will help to make sure you don’t spend forever testing in a local minima, but instead that you are moving on to test potentially bigger changes. Jesse Farmer has a great article on balancing speed vs. certainty in A/B testing.
3) Calculate the required sample size based on your baseline conversion rate and your desired sensitivity. Since we’re dealing with proportions, we want to perform a simple statistical analysis called a “power analysis for two independent proportions”. Let’s break this down:
This extract was taken from the following site:
https://signalvnoise.com/posts/3004-ab-testing-tech-note-determining-sample-size
Hope this helps to some extent.

