Beginners Guide to Hypothesis Testing
What is hypothesis testing?
Assume that we have a coffee shop. Since we always look for ways to improve the business, we decided to offer free wifi for our customers. We believe customers spent more time in the shop after the introduction of this new change. The data collected also shows an increase in the meantime spent by the customers. But wait, how can we make sure that it is not due to pure random chance?
This is where hypothesis tests are used. It helps us to make sure that we are giving in to some random effects. We make a null and alternate hypothesis statement to test the effect. After collecting the data of time spent by customers, before and after the change, we can use one of the statistical methods available to verify the effect. The most appropriate one for this problem is Z-test. P-value is very useful and helps in determining the chance of having the result due to pure random chance. The lower the p-value, the lower the chance of randomness. Accepting or rejecting the change depends on the significance level we set. Here we use test statistic instead of P-value. Let's solve our cafe problem by using hypothesis testing.
Significance Level:
The significance level is the probability of rejecting the null hypothesis when it is true. In other words, the level of risk we are willing to take. The most common significance level is 0.05, but 0.01 is prevalent in the pharma industry.
Formulate the hypothesis:
Ho = The mean time spent by customers unchanged (Null Hypothesis)
Ha = The mean time spent by customers are not equal (Alternate Hypothesis)
Z-Score:
After formulating the hypothesis and get our hands on the data, we can start calculating the Z-score. The formulae for Z-Score is
z = (x-mean)/(standard deviation/sqrt(n))
x = The mean that we found from our data after introducing the change.
mean = Calculated mean before introducing the change.
Standard deviation = Standard deviation of our data collected after the change.
n = No of the samples selected.
After finding the Z-test statistic, we can use the Z-distribution curve to find the corresponding Z-value for the significance level we have chosen. For a significance level of 0.05, the Z-value is 1.96 for a two-tail test. If the Z-test statistic is less than 1.96 we failed to reject the null hypothesis, which means introducing wifi has no significant effect on the increased customer time spending. If the Z-test statistic is greater than 1.96, then we reject the null hypothesis and the results are statistically significant.
Assumptions to Check:
There are four main assumptions we make in this test.
- The data are continuous.
- The data follow a normal probability distribution.
- The population standard deviation is known.
- The samples are randomly selected.
If our model violate one of these assumptions, we cannot use the Z-test for this testing.
If you would like to know more about hypothesis testing, I suggest taking a look at 'Design and Analysis of Experiments' by Douglas Montgomery.
Comments
Post a Comment