Factorization Machines in C++

Introduction to Factorization Machines

Welcome to this lesson on factorization machines, an important model in the realm of recommendation systems. Factorization machines, or FM, excel at capturing interactions between variables, making them a powerful tool for both regression and classification tasks. For instance, they can predict a rating (regression) or calculate the likelihood of a recommendation (classification).

Review of Dataset Preparation

Before we delve into the implementation of a factorization machine, let's briefly revisit the dataset preparation process from the previous lesson. Even though we won't repeat the entire code here, it's crucial to remember the structure we've established. In the prior lesson, you learned how to load and prepare data using C++ data structures. We used vectors, arrays, and matrices to represent user-item interactions and auxiliary features. The dataset was constructed with one-hot encoded columns for users and items, as well as additional features such as user age and item category. These features were combined into a matrix, where each row represents a user-item interaction and each column represents a feature. This structured approach allows us to efficiently process and model the data for recommendation tasks. Recall the importance of these preparatory steps as we move forward.

Theory Behind

\hat{y}(\mathbf{x}) = w_0 + \sum_{i=1}^{n} w_i x_i + \sum_{i=1}^{n} \sum_{j=i+1}^{n} \langle \mathbf{v}_i, \mathbf{v}_j \rangle x_i x_j

Latent Vectors

[v_{u1,1}, v_{u1,2}]

Feature	Latent Vector
`user1`	$[v_{u1,1}, v_{u1,2}]$
`user2`	$[v_{u2,1}, v_{u2,2}]$
`user3`	$[v_{u3,1}, v_{u3,2}]$
`item1`	$[v_{i1,1}, v_{i1,2}]$
`item2`	$[v_{i2,1}, v_{i2,2}]$
`item3`	$[v_{i3,1}, v_{i3,2}]$
`uf1`	$[v_{uf1,1}, v_{uf1,2}]$
`uf2`	$[v_{uf2,1}, v_{uf2,2}]$
`if1`	$[v_{if1,1}, v_{if1,2}]$
`if2`	$[v_{if2,1}, v_{if2,2}]$

Implementing the Factorization Machine Model: Part 1

Let's move on to the implementation of the factorization machine model. We'll break this into parts to ensure clarity. First, let's define the constructor to initialize the required data. C++#include <Eigen/Dense> #include <random> using Eigen::MatrixXd; using Eigen::VectorXd; class SimpleFactorizationMachine { public: SimpleFactorizationMachine(int n_factors, int n_features, double learning_rate = 0.01, int epochs = 100, double reg = 0.01) : n_factors_(n_factors), lr_(learning_rate), epochs_(epochs), reg_(reg), w0_(0.0) { W_ = VectorXd::Zero(n_features); std::default_random_engine generator; std::normal_distribution<double> distribution(0.0, 0.1); V_ = MatrixXd::NullaryExpr(n_features, n_factors, [&](){ return distribution(generator); }); } // ... };#include <Eigen/Dense> #include <random> using Eigen::MatrixXd; using Eigen::VectorXd; class SimpleFactorizationMachine { public: SimpleFactorizationMachine(int n_factors, int n_features, double learning_rate = 0.01, int epochs = 100, double reg = 0.01) : n_factors_(n_factors), lr_(learning_rate), epochs_(epochs), reg_(reg), w0_(0.0) { W_ = VectorXd::Zero(n_features); std::default_random_engine generator; std::normal_distribution<double> distribution(0.0, 0.1); V_ = MatrixXd::NullaryExpr(n_features, n_factors, [&](){ return distribution(generator); }); } // ... }; We use the Eigen library, a high-performance C++ template library for linear algebra, to handle matrix and vector operations efficiently. MatrixXd and VectorXd are used for dynamic-size matrices and vectors of doubles, respectively. In the constructor, we initialize several key parameters of the factorization machine. n_factors_: This defines the number of components in each latent vector. It represents the dimensionality of the latent space for each feature, capturing the complexity of interactions. n_features: This is the total number of features in the dataset. Together, n_factors_ and n_features define the dimensions of the interaction matrix V_, which is of size (n_features, n_factors). Each row in this matrix corresponds to a feature, and each column corresponds to a component of the latent vector for that feature. The lr_, epochs_, and reg_ are hyperparameters governing the learning process. The w0_ is the global bias, W_ stores linear coefficients for features, and V_ contains the interaction factors, initialized with small random values.

Gradient Descent

\theta

Implementing the Factorization Machine Model: Part 2

Next, we define the fit method that uses gradient descent to train the algorithm. C++void fit(const MatrixXd& X, const VectorXd& y) { for (int epoch = 0; epoch < epochs_; ++epoch) { for (int i = 0; i < X.rows(); ++i) { VectorXd x_i = X.row(i); double linear_terms = w0_ + x_i.dot(W_); double interaction_term = 0.0; for (int f = 0; f < n_factors_; ++f) { double dot_xv = x_i.dot(V_.col(f)); double dot_x2_v2 = x_i.array().square().matrix().dot(V_.col(f).array().square().matrix()); interaction_term += 0.5 * (dot_xv * dot_xv - dot_x2_v2); } double prediction = linear_terms + interaction_term; double err = prediction - y(i); w0_ -= lr_ * err; W_ -= lr_ * (err * x_i + reg_ * W_); for (int f = 0; f < n_factors_; ++f) { VectorXd v_f = V_.col(f); V_.col(f) -= lr_ * (err * (x_i * x_i.dot(v_f) - x_i.array().square().matrix().cwiseProduct(v_f)) + reg_ * v_f); } } } }void fit(const MatrixXd& X, const VectorXd& y) { for (int epoch = 0; epoch < epochs_; ++epoch) { for (int i = 0; i < X.rows(); ++i) { VectorXd x_i = X.row(i); double linear_terms = w0_ + x_i.dot(W_); double interaction_term = 0.0; for (int f = 0; f < n_factors_; ++f) { double dot_xv = x_i.dot(V_.col(f)); double dot_x2_v2 = x_i.array().square().matrix().dot(V_.col(f).array().square().matrix()); interaction_term += 0.5 * (dot_xv * dot_xv - dot_x2_v2); } double prediction = linear_terms + interaction_term; double err = prediction - y(i); w0_ -= lr_ * err; W_ -= lr_ * (err * x_i + reg_ * W_); for (int f = 0; f < n_factors_; ++f) { VectorXd v_f = V_.col(f); V_.col(f) -= lr_ * (err * (x_i * x_i.dot(v_f) - x_i.array().square().matrix().cwiseProduct(v_f)) + reg_ * v_f); } } } } Initialize Parameters and Loop: We begin by iterating over each epoch to train the model, and for each data instance in the dataset. Calculate Linear Terms: The linear terms are computed by summing the global bias w0_ and the dot product of the feature coefficients W_ with the data instance x_i. Calculate Interaction Terms: For each latent factor f, interaction terms are computed by taking the difference between the square of dot products and the dot product of squared terms, capturing feature interactions. Compute Predictions and Error: Combine linear and interaction terms for predictions, then compute the error by subtracting actual ratings from predicted values. Update Global Bias: The global bias is updated with the gradient of the error. Update Linear Coefficients: Linear coefficients are adjusted using gradient descent, with regularization to prevent overfitting. Update Interaction Factors: Each interaction factor is updated using the error and incorporates regularization to fine-tune learning of feature interactions. The gradient for interaction factors includes two parts: The term x_i.dot(v_f) computes the dot product between the feature vector and the current latent vector, highlighting the current influence of all features on the interaction term. The term x_i.array().square().matrix().cwiseProduct(v_f) is the element-wise multiplication between the squared feature vector and the current latent vector, used to adjust for non-linearity and overfitting in interactions. This dual consideration ensures that interactions are learned without inflating the error, especially with regularization.

Implementing the Factorization Machine Model: Part 3

Finally, we define the predict method that will use the model's coefficients to make predictions. C++VectorXd predict(const MatrixXd& X) { VectorXd y_pred(X.rows()); for (int i = 0; i < X.rows(); ++i) { VectorXd x_i = X.row(i); double linear_terms = w0_ + x_i.dot(W_); double interaction_term = 0.0; for (int f = 0; f < n_factors_; ++f) { double dot_xv = x_i.dot(V_.col(f)); double dot_x2_v2 = x_i.array().square().matrix().dot(V_.col(f).array().square().matrix()); interaction_term += 0.5 * (dot_xv * dot_xv - dot_x2_v2); } y_pred(i) = linear_terms + interaction_term; } return y_pred; }VectorXd predict(const MatrixXd& X) { VectorXd y_pred(X.rows()); for (int i = 0; i < X.rows(); ++i) { VectorXd x_i = X.row(i); double linear_terms = w0_ + x_i.dot(W_); double interaction_term = 0.0; for (int f = 0; f < n_factors_; ++f) { double dot_xv = x_i.dot(V_.col(f)); double dot_x2_v2 = x_i.array().square().matrix().dot(V_.col(f).array().square().matrix()); interaction_term += 0.5 * (dot_xv * dot_xv - dot_x2_v2); } y_pred(i) = linear_terms + interaction_term; } return y_pred; } Initialize Predictions Array: Start by creating a vector to store predictions for each data instance. Calculate Linear Terms: For each instance, compute linear terms by adding the global bias to the dot product of features with their coefficients. Calculate Interaction Terms: Iterate over all latent factors to compute the interaction term for each instance. Store Predictions: For each instance, sum the linear and interaction terms to calculate and store the predicted value.

Making Predictions and Evaluating Model Performance

\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2

Conclusion and Summary

In this lesson, we successfully implemented and evaluated a factorization machine model for recommendation systems. We've gone from initializing parameters, through training, to making predictions and evaluating performance. This concludes our exploration of factorization machines and marks the end of this course module. Congratulations on completing the course! The skills you've acquired here form a strong foundation for building and understanding recommendation systems. Continue exploring other models and refine your expertise in this dynamic field. Well done!

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal

Feature	Latent Vector
`user1`	$[v_{u1,1}, v_{u1,2}]$
`user2`	$[v_{u2,1}, v_{u2,2}]$
`user3`	$[v_{u3,1}, v_{u3,2}]$
`item1`	$[v_{i1,1}, v_{i1,2}]$
`item2`	$[v_{i2,1}, v_{i2,2}]$
`item3`	$[v_{i3,1}, v_{i3,2}]$
`uf1`	$[v_{uf1,1}, v_{uf1,2}]$
`uf2`	$[v_{uf2,1}, v_{uf2,2}]$
`if1`	$[v_{if1,1}, v_{if1,2}]$
`if2`	$[v_{if2,1}, v_{if2,2}]$

Feature

Latent Vector

user1

[v_{u1,1}, v_{u1,2}]

user2

[v_{u2,1}, v_{u2,2}]

user3

[v_{u3,1}, v_{u3,2}]

item1

[v_{i1,1}, v_{i1,2}]

item2

[v_{i2,1}, v_{i2,2}]

item3

[v_{i3,1}, v_{i3,2}]

uf1

[v_{uf1,1}, v_{uf1,2}]

uf2

[v_{uf2,1}, v_{uf2,2}]

if1

[v_{if1,1}, v_{if1,2}]

if2

[v_{if2,1}, v_{if2,2}]

#include <iostream> #include <vector> #include <iomanip> #include "data_extractor.hpp" #include <Eigen/Dense> int main() { // Extract dataset auto [X, y] = extract_dataset(); // Split dataset into train and test // NOTE: A real implementation would shuffle the data first for a random split. long train_size = X.rows() * 0.8; MatrixXd X_train = X.topRows(train_size); VectorXd y_train = y.head(train_size); MatrixXd X_test = X.bottomRows(X.rows() - train_size); VectorXd y_test = y.tail(y.size() - train_size); // Define and train factorization machine model SimpleFactorizationMachine fm_model(3, X_train.cols(), 0.01, 50); fm_model.fit(X_train, y_train); // Test the model VectorXd y_pred = fm_model.predict(X_test); // Evaluate model using Mean Squared Error double mse = (y_test - y_pred).array().square().mean(); std::cout << "Mean Squared Error: " << std::fixed << std::setprecision(4) << mse << std::endl; return 0; }