In previous lessons, you learned how to handle datasets stored in compressed formats and manage large numerical datasets efficiently. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using C++. This is crucial because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.
Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.
In this lesson, we will work with a set of CSV files containing car data. Here's what a typical record might look like:
- Model: Ford Mustang
- Transmission: Automatic
- Year: 2020
- Price: 25000.00
- Distance Traveled (km): 50000
- Color: Red
These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.
Now, let's delve into reading these CSV files in batches using C++ constructs. We'll build our solution step-by-step.
First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.
C++1#include <fstream> 2#include <vector> 3#include <string> 4 5// Structure to represent a car 6struct Car { 7 std::string model; 8 double price; 9}; 10 11// Filenames to read 12std::vector<std::string> filenames = {"data_part1.csv", "data_part2.csv", "data_part3.csv"}; 13 14// List to store all car data 15std::vector<Car> car_data;
Here, we declare a vector filenames
to hold the names of the CSV files and a vector car_data
with a custom struct Car
to store the car data read from the files.
Now, we'll loop through each filename, read the data using file streams, and append it to our car_data
structure.
C++1#include <sstream> 2#include <iostream> 3 4for (const auto& filename : filenames) { 5 std::ifstream csv_file(filename); 6 std::string line; 7 8 // Skip the header line 9 std::getline(csv_file, line); 10 11 // Read rows with car data 12 while (std::getline(csv_file, line)) { 13 std::stringstream ss(line); 14 std::string model, transmission, year_str, price_str, distance_traveled_str, color; 15 16 // Read model and skip unwanted columns to retrieve the price 17 std::getline(ss, model, ','); 18 std::getline(ss, transmission, ','); 19 std::getline(ss, year_str, ','); 20 std::getline(ss, price_str, ','); 21 22 // Convert price string to double 23 double price = std::stod(price_str); 24 25 // Add car to the vector 26 car_data.push_back({model, price}); 27 } 28}
In this code:
- We open each file using
std::ifstream
and use a loop to read lines. - We skip the header with
std::getline
. - For each row, we use
std::stringstream
to split the line into components using ',' as a delimiter. - We convert the price from a string to a double and append the data to
car_data
.
With all data combined in car_data
, the next step is identifying the car with the lowest price in C++.
C++1// Find the car with the lowest price 2Car lowest_cost_car = car_data.front(); 3for (const auto& car : car_data) { 4 if (car.price < lowest_cost_car.price) { 5 lowest_cost_car = car; 6 } 7} 8 9// Display the car with the lowest price 10std::cout << "Model: " << lowest_cost_car.model << "\n"; 11std::cout << "Price: $" << lowest_cost_car.price << "\n";
Here:
- We initialize
lowest_cost_car
with the first car incar_data
. - A loop evaluates each car to find the one with the minimum price.
- Finally, we print the model and price of the car with the lowest price.
In scenarios where loading all data into memory isn't feasible, C++ streams allow us to process data efficiently without retaining it all at once.
C++1Car lowest_cost_car; 2double lowest_price = 1000000.0; // Initialized to a large value 3 4for (const auto& filename : filenames) { 5 std::ifstream csv_file(filename); 6 std::string line; 7 8 // Skip the header line 9 std::getline(csv_file, line); 10 11 while (std::getline(csv_file, line)) { 12 std::stringstream ss(line); 13 std::string model, transmission, year_str, price_str, distance_traveled_str, color; 14 15 std::getline(ss, model, ','); 16 std::getline(ss, transmission, ','); 17 std::getline(ss, year_str, ','); 18 std::getline(ss, price_str, ','); 19 20 double price = std::stod(price_str); 21 22 if (price < lowest_price) { 23 lowest_price = price; 24 lowest_cost_car.model = model; 25 lowest_cost_car.price = price; 26 } 27 } 28} 29 30// Output the car with the lowest price 31std::cout << "Model: " << lowest_cost_car.model << "\n"; 32std::cout << "Price: $" << lowest_cost_car.price << "\n";
In this implementation:
- We manage two variables:
lowest_cost_car
to store the lowest price car details, andlowest_price
, initialized to a large value. - Data is streamed per file, allowing us to update the result only when a lower price is found.
In this lesson, you have learned how to:
- Read data in batches from multiple CSV files using C++ file handling with
<fstream>
. - Process the data efficiently with string and data type conversions using
std::stringstream
andstd::stod
. - Identify insights, such as the car with the lowest price, using loops instead of built-in functions.
These techniques prepare you to handle similar datasets efficiently using C++. Practice these skills with exercises designed to reinforce your understanding, focusing on reactive and efficient data handling techniques.