In previous lessons, you learned how to handle datasets stored in compressed formats and manage large datasets efficiently using JavaScript. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files. This is important because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.
Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.
In this lesson, we'll work with a set of CSV files containing car data. Here's what a typical record might look like:
- Model: Ford Mustang
- Transmission: Automatic
- Year: 2020
- Price: 25000.00
- Distance Traveled (km): 50000
- Color: Red
These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.
Let's delve into reading these CSV files in batches using the fs
module and csv-parser
library in JavaScript. We'll build our solution step by step.
First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.
JavaScript1const fs = require('fs'); 2const csv = require('csv-parser'); 3 4// Filenames to read 5const filenames = ['data_part1.csv', 'data_part2.csv', 'data_part3.csv']; 6 7// List to store all car data 8let carData = [];
Here, we initialize a list of filenames and create an empty array carData
to store all the car data read from the files.
Next, we'll loop through each filename, read the data, and append it to our carData
array.
JavaScript1let filesRead = 0; 2filenames.forEach((filename) => { 3 fs.createReadStream(filename) 4 .pipe(csv()) 5 .on('data', (row) => { 6 // Convert price from string to float for comparison 7 row.price = parseFloat(row.price); 8 carData.push(row); 9 }) 10 .on('end', () => { 11 filesRead += 1; 12 if (filesRead === filenames.length) { 13 // Process the combined data here 14 } 15 }) 16 .on('error', (error) => { 17 console.error('Error reading file:', error); 18 }); 19});
In this snippet:
- We employ
fs.createReadStream()
to open each CSV file. - The
csv-parser
library reads each line, converting it from CSV directly to a JavaScript object. - We parse
row.price
from a string to a float for numerical comparison and store the entire row incarData
. - The
end
event checks if all files have been processed.
After reading all the data into carData
, the next step is identifying the car with the lowest price.
JavaScript1const lowestCostCar = carData.reduce((minCar, car) => { 2 return car.price < minCar.price ? car : minCar; 3}); 4 5console.log(`Model: ${lowestCostCar.model}`); 6console.log(`Price: $${lowestCostCar.price.toFixed(2)}`);
Here:
- We use JavaScript’s
reduce()
function to find the car with the lowest price incarData
. - This method iterates through each object in the array, comparing the
price
field. - We then print the model and price of the car with the lowest price, providing a clear output.
Instead of loading all data into memory, a streaming approach can process each record as it's read. This is beneficial for systems with limited memory or when working with extremely large datasets.
JavaScript1let lowestCostCar = null; 2let lowestPrice = Infinity; 3 4filenames.forEach((filename) => { 5 fs.createReadStream(filename) 6 .pipe(csv()) 7 .on('data', (row) => { 8 const price = parseFloat(row.price); 9 if (price < lowestPrice) { 10 lowestPrice = price; 11 lowestCostCar = row; 12 } 13 }) 14 .on('end', () => { 15 filesProcessed += 1; 16 if (filesProcessed === filenames.length && lowestCostCar) { 17 console.log(`Model: ${lowestCostCar.model}`); 18 console.log(`Price: $${lowestPrice.toFixed(2)}`); 19 } 20 }) 21 .on('error', (error) => { 22 console.error('Error reading file:', error); 23 }); 24});
In this implementation:
- We maintain two variables:
lowestCostCar
to store the data of the car with the lowest price andlowestPrice
, initialized to infinity. - We parse and compare the
price
field for each record in the file. If a record's price is lower, we updatelowestPrice
and store the record inlowestCostCar
. - This approach reduces memory usage since we do not retain unnecessary data.
In this lesson, you learned how to:
- Read data in batches from multiple CSV files using JavaScript’s
fs
module andcsv-parser
. - Process that data efficiently and convert data types when necessary.
- Identify specific insights, such as the car with the lowest price, utilizing JavaScript methods like
reduce()
.
Now, you're ready to apply these skills with practice exercises designed to reinforce your understanding. These exercises will challenge you to read and analyze data from similar datasets efficiently. Continuous practice is key to mastering these data handling techniques.