Random Search for Hyperparameter Tuning in scikit-learn

Introduction: Why Try Random Search?

Welcome back! In the last lesson, you learned how to use grid search to tune hyperparameters for a machine learning model. As a quick reminder, grid search works by trying every possible combination of hyperparameter values that you specify. This is a great way to find the best settings when you have a small number of options, but it can become slow and inefficient if you have many hyperparameters or a wide range of values to test.

This is where random search comes in. Random search is another approach to hyperparameter tuning that can be much faster and more practical, especially when you have a large search space. Instead of checking every possible combination, random search picks random combinations of hyperparameters to try. This means you can cover a wide range of possibilities without having to test them all. In this lesson, you will learn how to use random search with scikit-learn’s RandomizedSearchCV to efficiently search for good hyperparameters, using a RandomForestClassifier as an example.

How RandomizedSearchCV Works

Random search is built into scikit-learn through the RandomizedSearchCV class. The main idea is simple: instead of exhaustively searching every combination like grid search, RandomizedSearchCV samples a fixed number of random combinations from the hyperparameter space you define. This makes it much more efficient when you have many parameters or a wide range of values.

One key difference from grid search is that, with random search, you can specify distributions or ranges for each hyperparameter rather than fixed lists. For example, instead of listing every possible number of trees in a random forest, you can tell random search to pick random values between 50 and 300. This flexibility allows you to explore more possibilities in less time.

Just like grid search, RandomizedSearchCV uses cross-validation to evaluate each combination, so you still get reliable results. The main advantage is that you can control how many combinations to try, making it easy to balance speed and thoroughness.

Defining Parameter Distributions

To use random search effectively, you need to define the range or distribution for each hyperparameter you want to tune. In scikit-learn, you can use tools from the scipy.stats module, such as randint, to specify these distributions. This is different from grid search, where you provide a fixed list of values.

For example, if you want to tune the number of trees (n_estimators) in a random forest, you might not know the exact values to try. Instead, you can use randint(50, 300) to let random search pick values between 50 and 300. It’s important to note that randint does not generate all possible values in the range at once. Instead, for each iteration of the random search, it picks a single random value from the specified range. Over the course of the search, which is controlled by the n_iter parameter, random search will sample different values from the distribution multiple times—one value per iteration.

The same approach applies to other hyperparameters, like the maximum depth of each tree (max_depth). Using distributions instead of fixed lists allows random search to sample a wide variety of values, which can help you find better hyperparameters without having to try every single option.

Example: Random Search with RandomForestClassifier

Let’s walk through a practical example of using random search to tune a RandomForestClassifier. Suppose you have your training data in X_train and y_train. You want to find good values for the number of trees (n_estimators) and the maximum depth of the trees (max_depth). Here’s how you can set up and run a random search using scikit-learn:

In this code, you first import the necessary classes. The param_dist dictionary defines the distributions for n_estimators and max_depth using randint. This tells random search to pick random values for these hyperparameters within the specified ranges. The RandomForestClassifier is the model you want to tune. The RandomizedSearchCV object is set up with the model, the parameter distributions, the number of random combinations to try (n_iter=10), and the number of cross-validation folds (cv=3).

Summary and Practice Preview

In this lesson, you learned how random search can help you efficiently tune hyperparameters, especially when you have a large or complex search space. You saw how to use RandomizedSearchCV in scikit-learn, how to define parameter distributions with scipy.stats, and how to interpret the results. Random search gives you a flexible and practical way to find good hyperparameters without having to try every possible combination.

Next, you will get a chance to practice running and modifying random searches yourself. These exercises will help you build confidence in using random search to improve your own machine learning models. Good luck, and enjoy experimenting with different hyperparameter settings!

Previous Lesson

Next Lesson: Cross-Validation with StratifiedKFold in scikit-learn

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal