Welcome, future data analyzers! Today, we're tackling Index Columns and Locating Elements in a Pandas DataFrame. We'll learn how to handle index columns, locate specific data, and strengthen our understanding of DataFrames. Ready, set, code!
In a Pandas DataFrame, an index is assigned to each row, much like the numbers on books in a library. When a DataFrame is created, Pandas establishes a default index. Let's refer to an example:
The numbers on the left are the default index.
Occasionally, we might need to establish a custom index. The Pandas' set_index()
function allows us to set a custom index. To reset the index to its default state, we use reset_index()
.
To better understand these functions, let's consider an example in which we create an index using unique IDs:
In this example, ID
column is displayed as an index. Let's reset the index to return to the original state:
By setting inplace
parameter to True
, we ask pandas to reset the index in the original df
dataframe. Otherwise, pandas will create a copy of the data frame with a reset index, leaving the original df
untouched.
Let's consider a dataframe with a custom index. If you want to select a specific row based on its index value (for example, ID = 102
), you can do this:
For multiple rows, simply use list of ids:
As you can see, the output of the .loc
operation is some subset of the original dataframe.
To select specific multiple columns for these rows, you can provide the column labels as well:
Also you can select all rows for specific columns, providing :
as a set of index labels:
The iloc
function enables us to select elements in a data frame based on their index positions. iloc
works like the loc
, but it expects the index number of the rows. For example, we can select the 3
rd row:
You can also use slicing here:
That's it! We've covered the index column, how to set it, and how to locate data in a DataFrame. Exciting exercises are up next. Let's practice and strengthen the skills you've learned today. Let the fun begin!
