Introduction

Hello! Today we're diving into Indexing and Selecting Data in pandas, a crucial part of data manipulation and analysis. Indexing helps us locate data in specific rows while selecting focuses on picking specific columns or cells.

We'll delve into how to select and index data using pandas by walking you through some hands-on examples. Let's begin!

Understanding Indexing: Setting Index

In pandas, an index is more or less the address of your data. By default, pandas assigns integer labels to the rows, but we can set any column as the index. This effectively turns it into an identifier for the rows.

Here's a basic example using pandas DataFrame's set_index(), reset_index(), and rename() methods:

Accessing data using the index is performed with pandas loc[] method for label-based indexing and iloc[] method for integer-based indexing, which we will investigate later.

The inplace parameter is common for a lot of pandas dataframe methods. If inplace is set to True, changes are applied to the target dataframe. Otherwise, the target dataframe will be copied, the copy will be changed and returned.

However, it is important to note that in the pandas 3.0 the `inplace parameter will be omitted, and you will have to do it this way:

Understanding Indexing: Resetting Index

If you want to reset index back to the default, it is done easily with the following method:

Understanding Indexing: Renaming Index

Renaming the index is simply renaming the corresponding column. It is done with the rename method:

Here, we provide a dictionary where the key is the old name, and the value is the new name.

Selecting Data Using Labels and Location

pandas provides loc[] and iloc[] for accessing data in a DataFrame in a manner similar to array indexing for label-based and integer-based indexing, respectively. loc[] uses label-based indexing, and iloc[] uses integer-based indexing.

Let's understand this with an example:

Note that we set the "Name" column as index. In loc, we use labels (which Is the name-indices and column names) to select the required data. In iloc, we use numerical indices for both rows and columns: It works similarly to 2d NumPy arrays.

Lesson Summary and Practice

Congrats on completing this lesson! You've learned how to index and select data in pandas, including functions like set_index(), reset_index(), loc[], and iloc[].

Next up are some practice exercises. These exercises will help solidify what you've learned in this lesson. It's crucial to practice when learning new programming skills.

In the next lesson, we will dive deeper into pandas and cover more useful features. Stay tuned!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal