Welcome to our session on creating new columns in Pandas. Today, we'll build on our data handling skills as we learn how to create new columns in our DataFrame
. This ability is crucial for data cleaning and manipulation, enabling us to generate novel fields of data from our existing data.
By the end of this session, you'll be adept at adding new columns with static values, generating new columns through operations with existing columns, and creating new columns based on specific conditions.
Creating new columns is key for data analysis. Consider a DataFrame
of prices and quantities of goods sold. We might want to get the total sales, which is price * quantity
.
In this code, we create a new "Total"
column. For dataframes, it works similarly to adding a new key to a dictionary: this easy!
Adding a new column with a static value is quite simple. For example, adding a Location
column for a group of employees working in the same location.
We can create new columns based on conditions from the values of the existing columns. For example, if we have a DataFrame
of student scores, we can create a column that flags whether the student's score is above 40.
Here's how we can do this:
The np.where
function works as follows: it takes three arguments - a condition, a value to set when the condition is true, and a value to set when the condition is false. In this example, the condition is df["Score"] > 40
. If this condition is true, the new column "Status"
will have the value "Pass"
, otherwise it will have the value "Fail"
.
So far, we've covered how to create new columns in a DataFrame
with static values, through operations with existing columns, and based on conditions. The more you practice, the better your understanding will get. Looking forward to our exercise session!
