Random search in Machine Learning

Machine learning(ML) has become one of the hot fields of study and job markets. Day to day we see an increase in demand for people who can build ML models as per user requirements. Building a machine learning model involves seven steps. They are

  1. Collect Data
  2. Prepare the Data
  3. Choose model
  4. Train the model
  5. Evaluation
  6. Hyperparameter Tuning
  7. Inference or Prediction

All seven steps are important in building a machine learning model and one should be careful at every step to avoid undesirable results. One such important step is Hyperparameter Tuning. Every model has an inference equation that contains some coefficients also called hyperparameters. These play an important role in predicting the final outcome with good accuracy. To find good hyperparameters that fit our data into the algorithm equation there are several techniques like Manual Search, Grid Search, Random Search, Bayesian Search, etc. In this article, we will discuss Random Search.

Overview

  • What is Random Search?
  • How to do Random Search?
  • Pros and Cons of Random Search
  • Conclusion

What is Random Search?

Random Search is a search algorithm for hyperparameters where random combinations of the hyperparameters are used to find the best solution for the model built. In this tuning technique, a model is built with every combination of hyperparameters by choosing random values for each combination. The model which gives the highest accuracy is selected as the best model and the hyperparameters are chosen as the best ones.

source: Image

Basically, the entire domain of hyperparameters is divided into a discrete grid and then a set of values are chosen randomly. Each set of values that are randomly selected is applied and the performance is calculated on each set. The best performing set of values is chosen to be the best hyperparameters for that particular model. In this way, the Random Search algorithm works and finds the best hyperparameters for our model. Next, let’s see how to do a Random Search using the scikit-learn library.

How to do Random Search?

Scikit-learn is one of the most used libraries to build machine learning models. It provides several ML algorithms and other algorithms for hyperparameter tuning like Random Search, Grid Search, etc., We can use it to implement our Random Search algorithm to find the best hyperparameters. 

Source: image

The following code sample shows the implementation of Random Search on a logistic regression classifier.

from sklearn.datasets import load_iris

from sklearn.linear_model import LogisticRegression

from sklearn.model_selection import RandomizedSearchCV

from scipy.stats import uniform

iris = load_iris()

logistic = LogisticRegression(solver='saga', tol=1e-2, max_iter=200,

                              random_state=0)

distributions = dict(C=uniform(loc=0, scale=4),

                    penalty=['l2', 'l1'])

clf = RandomizedSearchCV(logistic, distributions, random_state=0)

search = clf.fit(iris.data, iris.target)

search.best_params_

{'C': 2..., 'penalty': 'l1'}

Explanation of the above code

Firstly, we import the iris dataset from the scikit-learns’ datasets module. Then we import the logistic regression algorithm from the scikit-learns’ learns linear_model module. Next, we import the Random search algorithm from the scikit-learns’ model_selection module. In the next step, we load the dataset, and subsequently, we define our classification model with the variable logistic. Then we define the randomness using the variable distributions. We define an instance of the random search algorithm for classification using the variable clf and fit the model for training and store all the values in the variable search. Finally, we can retrieve the best set of hyperparameters using the command search.best_params_. 

Thus in a simple way explained above we can implement a Random Search algorithm using the scikit-learn library. The following shows an illustration of how this algorithm works 

source: image

Pros and Cons of Random Search

So far we have seen what is a random search algorithm and the way to implement it. Now, let’s see the pros and cons of this algorithm.

The main disadvantages of this algorithm are it gives rise to high variance during computing and may not work properly on small search parameters.

But this algorithm works really well on a higher number of search parameters and gives good results with fewer iterations and avoids overfitting.

Conclusion

Random search is an important hyperparameter algorithm that gets results within less time and in an efficient way. It works well with a higher number of search parameters and I think this algorithm should be tried first before going to test other algorithms on your dataset.

Read More from Strictly By The Numbers

Subscribe to Strictly Learning

By Strictly By The Numbers

Tune into our e-learning platform to remain abreast with the latest Technological trends in Data Science, Cloud Engineering & Artificial Intelligence, curated by industry experts! Hands-on code tutorials and demos using real-life use cases.