Welcome to lesson two on Hypothesis Testing with Python! Today's focus is on the Mann-Whitney U test. We've previously discussed T-tests, but now we're shifting attention to the Mann-Whitney U test, used to compare two independent samples. This tool helps when data doesn't meet the normality assumption for T-tests. In this lesson, we explore the ins and outs of the Mann-Whitney U test, applying it to a real-life dataset using Scipy
.
Let's start with non-parametric tests. These are also called distribution-free tests; they work with data that doesn't fit a normal distribution. We use them when our data is skewed, ordinal, or has outliers. Here, ordinal means a type of data where the order of the data points matters, but not the difference between the data points. For example, the sequence in which participants finish matters in a race, but the exact time difference between each runner does not.
The Mann-Whitney U test compares two independent groups when the dependent variable is ordinal or continuous but doesn't follow a normal distribution. It ranks the values from the two groups and adds the ranks. Similar sums of ranks indicate the two groups are not significantly different.
The Mann-Whitney U test provides two values: the U-statistic
and the p-value
. The U-statistic
signifies the rank sum difference between two groups regarding their observed data values. In everyday language, the larger the U-statistic
, the more separation or difference between the two groups' data. The p_value
here means the same as in the T-test: If the p-value
is less than 0.05, we conclude that the difference is statistically significant and not due to randomness.
The mannwhitneyu()
function from Scipy
runs the U test, taking two data samples as input and returning a test statistic (U) and a p-value (p). Here's what the code looks like:
If the p-value is less than 0.05, we consider it evidence against the null hypothesis, leading to its rejection.
Let's try the Mann-Whitney U test on some actual data. Suppose we have information about time spent on a website by users from two regions. The mission is to determine if there's a significant difference between these user types.
The p-value is not below 0.05, it means there isn't a considerable difference.
Great job! You now understand the Mann-Whitney U test and how to implement it with Python's Scipy library
. You can now work with datasets that don't follow a normal distribution. Ready for the practice session? It will help cement what you've learned and give you hands-on experience. Remember, practice is vital for mastering these methods. Enjoy the learning journey!
