Welcome back! In the previous lesson, you learned how to prepare training data from user session logs. Now, we are ready to take the next step: training a machine learning model that can predict which tracks a user is likely to enjoy. This is called predicting track affinity.
Track affinity is a measure of how much a user might like a particular track. By predicting this, we can recommend songs that users are more likely to enjoy, making the music player smarter and more personalized. In this lesson, you will learn how to train a simple but effective model to make these predictions.
Before training our model, let’s quickly review what kind of data we’re working with.
The function prepare_training_data() returns:
- A feature matrix
Xwhere each row is a combination of a user profile vector and a track embedding. A label vectory, where: 1means the user listened to (liked) the track0means the user did not listen to (negative sample) Each feature vector may look like this:
This structure is important: our model will try to learn a function like f(user, track) → probability of liking.
In this project, we use logistic regression because our goal is a binary classification:
1→ The user is likely to enjoy the track (positive sample)0→ The user is unlikely to enjoy the track (negative sample)
Logistic regression is a simple yet effective model for this kind of task because:
- It predicts probabilities, not just yes/no outcomes. This is useful for ranking recommendations by likelihood.
- It’s interpretable — the learned weights show which features push predictions higher or lower.
- It’s fast to train and works well even with small datasets, making it ideal for an educational setting before moving on to more complex models.
- It outputs a sigmoid function result between 0 and 1, which maps naturally to the idea of “likelihood of liking.”
While real-world recommendation systems may use deep learning or more complex models, logistic regression is a great starting point to:
- Understand the full pipeline from data preparation to prediction.
- Build intuition for how features influence recommendations.
- Avoid overfitting when data is limited.
Now, let’s train a model that can predict whether a user will like a track. We will use a logistic regression model, which is a simple and popular choice for binary classification tasks (like predicting 1 or 0).
Here is the main function for training the model:
Let’s break down what happens here:
- We get our features (
X) and labels (y) using the data preparation function. - If there isn’t enough data, or if all the labels are the same, we skip training and print a message. These checks prevent runtime errors and model training failures. For instance:
- If you try to train with fewer than 10 samples, the model might overfit or fail to converge.
- If the labels are all
1or all0, the classifier can’t learn anything meaningful — it's like trying to teach it to distinguish cats from... just more cats.
- We split the data into training and test sets. This helps us check how well the model works on new data.
After training, it’s important to check if the model is actually working well. We use two main metrics:
- Accuracy: The percentage of correct predictions.
- ROC AUC: A score that tells us how well the model can distinguish between positive and negative samples. A score closer to 1.0 is better.
However, accuracy alone can be misleading — especially if your dataset is imbalanced (e.g., many more negative samples than positive). That’s why we also calculate ROC AUC, which evaluates how well the model separates the two classes regardless of their ratio. A score of 0.5 means the model is guessing; a score closer to 1.0 means strong separation.
Here’s how we save the trained model so we can use it later:
save_modelwrites the trained model to a file, so you don’t have to retrain it every time — saving you compute time and ensuring consistent predictions. This becomes especially important once you deploy the system or start batch-generating recommendations.load_modellets you load the model back into your program when you need it.
Example Output:
In this lesson, you learned how to train a logistic regression model to predict track affinity using prepared user and track data. You also saw how to evaluate the model’s performance and save it for future use.
Next, you will get a chance to practice these steps yourself. You’ll train your own model, check its accuracy, and save it — just like we did here. Good luck, and have fun experimenting with your own music recommendation model!
