Lesson 3
Reading Data in Batches with JavaScript
Introduction to Reading Data in Batches

In previous lessons, you learned how to handle datasets stored in compressed formats and manage large datasets efficiently using JavaScript. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files. This is important because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.

Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.

Understanding CSV Data Structure

In this lesson, we'll work with a set of CSV files containing car data. Here's what a typical record might look like:

  • Model: Ford Mustang
  • Transmission: Automatic
  • Year: 2020
  • Price: 25000.00
  • Distance Traveled (km): 50000
  • Color: Red

These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.

Implementing Batch Reading of CSV Files

Let's delve into reading these CSV files in batches using the fs module and csv-parser library in JavaScript. We'll build our solution step by step.

First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.

JavaScript
1const fs = require('fs'); 2const csv = require('csv-parser'); 3 4// Filenames to read 5const filenames = ['data_part1.csv', 'data_part2.csv', 'data_part3.csv']; 6 7// List to store all car data 8let carData = [];

Here, we initialize a list of filenames and create an empty array carData to store all the car data read from the files.

Read Data from Each File

Next, we'll loop through each filename, read the data, and append it to our carData array.

JavaScript
1let filesRead = 0; 2filenames.forEach((filename) => { 3 fs.createReadStream(filename) 4 .pipe(csv()) 5 .on('data', (row) => { 6 // Convert price from string to float for comparison 7 row.price = parseFloat(row.price); 8 carData.push(row); 9 }) 10 .on('end', () => { 11 filesRead += 1; 12 if (filesRead === filenames.length) { 13 // Process the combined data here 14 } 15 }) 16 .on('error', (error) => { 17 console.error('Error reading file:', error); 18 }); 19});

In this snippet:

  • We employ fs.createReadStream() to open each CSV file.
  • The csv-parser library reads each line, converting it from CSV directly to a JavaScript object.
  • We parse row.price from a string to a float for numerical comparison and store the entire row in carData.
  • The end event checks if all files have been processed.
Finding the Car with the Lowest Price

After reading all the data into carData, the next step is identifying the car with the lowest price.

JavaScript
1const lowestCostCar = carData.reduce((minCar, car) => { 2 return car.price < minCar.price ? car : minCar; 3}); 4 5console.log(`Model: ${lowestCostCar.model}`); 6console.log(`Price: $${lowestCostCar.price.toFixed(2)}`);

Here:

  • We use JavaScript’s reduce() function to find the car with the lowest price in carData.
  • This method iterates through each object in the array, comparing the price field.
  • We then print the model and price of the car with the lowest price, providing a clear output.
Streaming Approach: Finding the Car with the Lowest Price Without Loading All Data into Memory

Instead of loading all data into memory, a streaming approach can process each record as it's read. This is beneficial for systems with limited memory or when working with extremely large datasets.

JavaScript
1let lowestCostCar = null; 2let lowestPrice = Infinity; 3 4filenames.forEach((filename) => { 5 fs.createReadStream(filename) 6 .pipe(csv()) 7 .on('data', (row) => { 8 const price = parseFloat(row.price); 9 if (price < lowestPrice) { 10 lowestPrice = price; 11 lowestCostCar = row; 12 } 13 }) 14 .on('end', () => { 15 filesProcessed += 1; 16 if (filesProcessed === filenames.length && lowestCostCar) { 17 console.log(`Model: ${lowestCostCar.model}`); 18 console.log(`Price: $${lowestPrice.toFixed(2)}`); 19 } 20 }) 21 .on('error', (error) => { 22 console.error('Error reading file:', error); 23 }); 24});

In this implementation:

  • We maintain two variables: lowestCostCar to store the data of the car with the lowest price and lowestPrice, initialized to infinity.
  • We parse and compare the price field for each record in the file. If a record's price is lower, we update lowestPrice and store the record in lowestCostCar.
  • This approach reduces memory usage since we do not retain unnecessary data.
Summary and Practice Preparation

In this lesson, you learned how to:

  • Read data in batches from multiple CSV files using JavaScript’s fs module and csv-parser.
  • Process that data efficiently and convert data types when necessary.
  • Identify specific insights, such as the car with the lowest price, utilizing JavaScript methods like reduce().

Now, you're ready to apply these skills with practice exercises designed to reinforce your understanding. These exercises will challenge you to read and analyze data from similar datasets efficiently. Continuous practice is key to mastering these data handling techniques.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.