Grid Search for Hyperparameter Tuning in scikit-learn

Introduction: The Importance of Hyperparameter Tuning

Welcome to the first lesson of the "Hypertuning Classical Models" course. In this course, you will learn how to systematically improve the performance of classical machine learning models by tuning their hyperparameters. Hyperparameters are settings or configurations that you choose before training a model, such as the regularization strength in logistic regression or the number of trees in a random forest. Unlike model parameters, which are learned from the data, hyperparameters are set manually and can have a significant impact on how well your model performs.

Finding the right hyperparameters can be challenging, especially since there are often many possible combinations to try. This is where grid search comes in. Grid search is a systematic way to search through a set of hyperparameter values to find the combination that gives the best results. In this lesson, you will learn how to use grid search to tune hyperparameters for a logistic regression model using scikit-learn, a popular Python library for machine learning.

How Grid Search Works

Grid search is a brute-force approach to hyperparameter tuning. The idea is simple: you define a set of possible values for each hyperparameter you want to tune, and grid search tries every possible combination of these values. For each combination, the model is trained and evaluated, usually using cross-validation to get a reliable estimate of performance.

Cross-validation means splitting your training data into several parts, training the model on some parts, and validating it on the others. This is important because it helps ensure that the results are not just due to a lucky or unlucky split of the data. Without cross-validation, you might choose hyperparameters that work well on one particular split but do not generalize to new, unseen data. By averaging the results over several splits, cross-validation gives a more robust and reliable estimate of how well each hyperparameter setting is likely to perform in practice.

For example, if you want to tune the regularization strength C in logistic regression, you might try values like 0.01, 0.1, 1, 10, and 100. Grid search will train and evaluate a model for each of these values and tell you which one works best.

Setting Up Grid Search in scikit-learn

To use grid search in scikit-learn, you need to import a few libraries and set up your parameter grid and model. On CodeSignal, these libraries are already installed, but if you are working on your own device, you may need to install scikit-learn first. The main tool for grid search in scikit-learn is GridSearchCV, which stands for "Grid Search with Cross-Validation."

You start by importing GridSearchCV and the model you want to tune. Then, you define a dictionary called param_grid that lists the hyperparameters you want to search and the values you want to try for each one. For logistic regression, a common hyperparameter to tune is C, which controls the strength of regularization. A smaller C means stronger regularization.

Since we saw in the previous course that C=1 was the best parameter, you might want to try values close to 1 for a more fine-grained search. For example:

In this code, param_grid specifies values of C around 1 for better comparison, and cv=5 means that 5-fold cross-validation will be used to evaluate each combination.

Step-by-Step Example: Grid Search with Logistic Regression

Let’s walk through a complete example of using grid search to tune a logistic regression model. Suppose you have your training data in X_train and y_train. You want to find the best value for the regularization parameter C. Here is the full code:

First, you import the necessary classes. Then, you define the parameter grid for C and create a logistic regression model. The GridSearchCV object is set up with the model, the parameter grid, and 5-fold cross-validation. When you call search.fit(X_train, y_train), grid search will train and evaluate the model for each value of C using cross-validation. After fitting, you can print the best parameters and the best cross-validation score found during the search.

The output might look like this:

This means that the best value for C was 1, and the average cross-validation score for this setting was 0.85. You can now use these results to train your final model with the best hyperparameters.

Summary and Next Steps

In this lesson, you learned what hyperparameters are and why tuning them is important for building better machine learning models. You saw how grid search can help you systematically try different hyperparameter values and find the best combination using cross-validation. You also walked through a practical example using scikit-learn’s GridSearchCV with logistic regression.

Now that you understand how grid search works and how to use it in your own projects, you are ready to practice these skills. In the next section, you will find hands-on exercises that will help you reinforce what you have learned and build confidence in using grid search for hyperparameter tuning. Good luck, and enjoy experimenting with your models!

Next Lesson: Random Search for Hyperparameter Tuning in scikit-learn

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal