Before we can analyze data, we need to load it. Data often lives in files on your computer. Pandas makes it easy to read these files and turn them into the DataFrames we learned about in the first unit.
Engagement Message
What's the first step you need to take before you can start exploring a dataset?
The most common format for storing tabular data is a CSV file (Comma-Separated Values). It's a simple text file where each line is a row, and commas separate the values.
To read a CSV, we use the Pandas function read_csv()
.
Engagement Message
Let's see how it works, shall we?
We usually import Pandas with the nickname pd
. The full command to load a CSV file named employees.csv
into a DataFrame called df
would be:
df = pd.read_csv('employees.csv')
This single line does all the work for you!
Engagement Message
Similarly, how would you load a students.csv
file?
Another common format is an Excel file (.xlsx
). Pandas has you covered with a similar function: pd.read_excel()
.
df_excel = pd.read_excel('report.xlsx')
