Lesson 3
Introduction to Reading Data in Batches with TypeScript
Introduction to Reading Data in Batches with TypeScript

In previous lessons, you learned how to handle datasets stored in compressed formats and manage large datasets efficiently using TypeScript. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using TypeScript. This is important because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.

Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.

Understanding CSV Data Structure

In this lesson, we'll work with a set of CSV files containing car data. Here's what a typical record might look like:

  • Model: Ford Mustang
  • Transmission: Automatic
  • Year: 2020
  • Price: 25000.00
  • Distance Traveled (km): 50000
  • Color: Red

These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.

Implementing Batch Reading of CSV Files

Let's delve into reading these CSV files in batches using TypeScript's import syntax and the csv-parser library. We'll build our solution step by step.

First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.

TypeScript
1import fs from 'fs'; 2import csvParser from 'csv-parser'; 3 4// Filenames to read 5const filenames: string[] = ['data_part1.csv', 'data_part2.csv', 'data_part3.csv']; 6 7// List to store all car data 8let carData: { model: string; price: number }[] = [];

Here, we define an array of filenames and create a typed array carData to store all the car data read from the files.

Read Data from Each File

Next, we'll loop through each filename, read the data, and append it to our carData array.

TypeScript
1let filesRead: number = 0; 2filenames.forEach((filename) => { 3 fs.createReadStream(filename) 4 .pipe(csvParser()) 5 .on('data', (row: { model: string; price: string }) => { 6 // Convert price from string to float for comparison 7 const car = { 8 model: row.model, 9 price: parseFloat(row.price) 10 }; 11 carData.push(car); 12 }) 13 .on('end', () => { 14 filesRead += 1; 15 if (filesRead === filenames.length) { 16 // Process the combined data here 17 } 18 }) 19 .on('error', (error) => { 20 console.error('Error reading file:', error); 21 }); 22});

In this snippet:

  • We employ fs.createReadStream() to open each CSV file.
  • The csv-parser library reads each line, converting it from CSV directly to a TypeScript object.
  • We parse row.price from a string to a float for numerical comparison and store the entire row in carData.
  • The end event checks if all files have been processed.
Processing the Combined Data: Finding the Lowest Cost Car using `reduce`

After combining the data into the carData array, we can find the car with the lowest price using the reduce method:

TypeScript
1if (filesRead === filenames.length) { 2 const lowestCostCar = carData.reduce<{ model: string; price: number } | null>((lowest, car) => { 3 if (!lowest || car.price < lowest.price) { 4 return car; 5 } 6 return lowest; 7 }, null); 8 9 if (lowestCostCar) { 10 console.log(`Model: ${lowestCostCar.model}`); 11 console.log(`Price: $${lowestCostCar.price.toFixed(2)}`); 12 } 13}

The reduce method traverses the carData array. It starts with an initial value null, which allows it to hold the first car's data as the lowest initially. For each car, it checks if the current car's price is lower than the stored lowest price. If true, it updates lowest. The result, lowestCostCar, is the car with the lowest price, which we then log. This approach is efficient and clear for accumulating a specific result from an array.

Streaming Approach: Finding the Car with the Lowest Price Without Loading All Data into Memory

Instead of loading all data into memory, a streaming approach can process each record as it's read. This is beneficial for systems with limited memory or when working with extremely large datasets.

TypeScript
1let lowestCostCar: { model: string; price: number } | null = null; 2let lowestPrice: number = Infinity; 3 4filenames.forEach((filename) => { 5 fs.createReadStream(filename) 6 .pipe(csvParser()) 7 .on('data', (row: { model: string; price: string }) => { 8 const price = parseFloat(row.price); 9 if (price < lowestPrice) { 10 lowestPrice = price; 11 lowestCostCar = { 12 model: row.model, 13 price: price 14 }; 15 } 16 }) 17 .on('end', () => { 18 filesRead += 1; 19 if (filesRead === filenames.length && lowestCostCar) { 20 console.log(`Model: ${lowestCostCar.model}`); 21 console.log(`Price: $${lowestPrice.toFixed(2)}`); 22 } 23 }) 24 .on('error', (error) => { 25 console.error('Error reading file:', error); 26 }); 27});

In this implementation:

  • We maintain two variables: lowestCostCar to store the data of the car with the lowest price, and lowestPrice, initialized to infinity.
  • We parse and compare the price field for each record in the file. If a record's price is lower, we update lowestPrice and store the record in lowestCostCar.
  • This approach reduces memory usage since we do not retain unnecessary data.
Summary and Practice Preparation

In this lesson, you learned how to:

  • Read data in batches from multiple CSV files using TypeScript's fs module and csv-parser.
  • Process that data efficiently and convert data types when necessary.
  • Identify specific insights, such as the car with the lowest price, utilizing TypeScript methods like type annotations and robust data handling techniques.

Now, you're ready to apply these skills with practice exercises designed to reinforce your understanding. These exercises will challenge you to read and analyze data from similar datasets efficiently. Continuous practice is key to mastering these data handling techniques.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.