Topic Overview

Welcome back! In this lesson, we'll delve into the pivotal aspect of training a machine learning model once data cleaning and preprocessing is complete. Imagine teaching a pet to perform a trick. Initially, it is clumsy, but after several lessons, it begins to perform the trick correctly. Our machine learning model is the pet, and the trick is predicting outcomes based on data. Our aim in this lesson is to hone your skills in applying Python and the Scikit-learn library to train a machine-learning model on the Titanic dataset.

Introduction to Model Training

Model training, as the name suggests, is the process of training our machine learning model on a subset of the available data (the training dataset) so it can start recognizing patterns and making predictions.

Just like a student studies a portion of the syllabus (the training dataset) and then gets tested on a smaller, unseen portion (the testing dataset), our model has a similar experience. The model learns from the training dataset, and then we assess its performance using the testing dataset.

Preventing overfitting (a model learning too well from the training data and performing poorly on the unseen data) is important, much like ensuring that a student understands the concepts being taught and can apply them instead of simply memorizing the course material. The next sections will show how we can use the train_test_split function from Scikit-learn to split our dataset.

Setting Up Training and Testing Datasets

The preparation of the training and testing datasets involves splitting our data into two sections. The bigger section (usually 70%-80%) becomes our training data for the model to learn from, while the smaller section serves as our testing data to validate the model's performance.

Consider it as having a big apple pie (the full dataset). You want to eat the majority of it (the training set), but you save a slice for later (the testing set).

Here's an example of how to split our full dataset using Python and Scikit-learn:

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal