Welcome to the second lesson on Mastering Hypothesis Testing with R! Our focus today is on the Mann-Whitney U test. We've engaged with T-tests previously and have now set our sights on the Mann-Whitney U test — a valuable tool when data do not meet the T-test's normality assumption. In this lesson, we'll unpack the nuances of the Mann-Whitney U test by applying it to a realistic dataset using R's wilcox.test()
function.
We'll begin with non-parametric tests. They're also known as distribution-free tests because they cater to data that does not follow a normal distribution. We resort to them when our data is either skewed, ordinal, or has outliers. Ordinal data is a particular type in which the order of data points matters, though the difference between the data points does not. For example, the sequence in which runners finish a race matters, but the exact time difference between each runner does not necessarily matter.
The Mann-Whitney U test is used to compare two independent groups when the dependent variable is either ordinal or continuous but does not follow a normal distribution. By ranking the values from both groups and summing the ranks, equivalent sums of ranks suggest that the two groups do not differ significantly.
The Mann-Whitney U test yields two values: the U-statistic
and the p-value
. The U-statistic
reflects the rank sum difference between the two groups in relation to their observed data values. Essentially, a larger U-statistic
indicates a greater separation or difference between the data of the two groups. The p-value
conveys the same information as in the T-test: If the p-value
is less than 0.05, the difference is statistically significant and not due to chance.
To perform the U test, we use R's wilcox.test()
function. This function takes two data samples as inputs and outputs a test statistic (W) and a p-value (p). Check out this code for a better insight:
If the p-value
is less than 0.05, this result suggests that we should reject the null hypothesis.
The exact = FALSE
parameter in the wilcox.test()
function instructs R not to use the exact distribution method for computing the p-value. This is particularly useful when dealing with larger samples, as calculating the exact p-value can become computationally intensive. By setting exact = FALSE
, the function instead approximates the p-value using normal distribution assumptions, making the computation more efficient for larger datasets.
To illustrate the Mann-Whitney U test with real data, let's assume that we have information about the time users from two regions spent interacting on a website. The goal is to determine if there is a significant difference in user behaviour between the two regions.
Because the p-value
is not under 0.05, this result implies that there isn't a significant difference.
Great job! You've now grasped the fundamentals of the Mann-Whitney U test and how to use R to perform it. You're equipped to work with datasets that don't follow a normal distribution. Ready for the practice session? It will help reinforce your understanding and provide hands-on experience. Remember, practice is essential for mastering new techniques. Enjoy your learning journey!
