Hello, friend! Today's topic is Filtering Data. It's about focusing on the data that matters to us. We'll use pandas
, a Python library, to help us with this.
The goal? Master data filtering in pandas
. By the end, you'll be able to pick the necessary data from a big data set.
Filtering data in pandas
is like finding your favorite outfit in a wardrobe. The easiest way to filter data is by columns. Let's illustrate this using a DataFrame
of students' details.
The code above creates a DataFrame
and selects only the rows where the grade_level
is 7. Now, you have the data of the 7th-grade students. Note that it works exactly the same as the numpy's boolean selection. Let's recall how it works under the hood.
One of the magic tricks of pandas
is Boolean masking. Boolean
is a True or False data type. "Mask" means to hide. A Boolean mask hides parts of your data based on it being True or False.
We can create a Boolean Series, a list of True or False values, in pandas
and use it for filtering.
This code creates a Boolean Series checking where the grade_level
is 7. Then, it filters the data using this series:
Note that only students with True
in the boolean series were selected.
Sometimes we need to filter data using multiple conditions. Python lets us do this with logical operators: And (&
), Or (|
), and Not (~
). Let's check them out:
The isin()
method in pandas
is another wonderful tool. It checks whether a pandas
Series is in a list of values.
Fantastic! Now you know advanced data filtering techniques.
This lesson covered basic to advanced data filtering, including Boolean masking and multiple conditions in filtering. Keep practicing these skills on different datasets. Remember, practice makes perfect. Stay tuned for the next lesson!
