In previous lessons, you learned how to handle datasets stored in compressed formats and manage large datasets efficiently using TypeScript. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using TypeScript. This is important because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.
Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.
In this lesson, we'll work with a set of CSV files containing car data. Here's what a typical record might look like:
- Model: Ford Mustang
- Transmission: Automatic
- Year: 2020
- Price: 25000.00
- Distance Traveled (km): 50000
- Color: Red
These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.
Let's delve into reading these CSV files in batches using TypeScript's import
syntax and the csv-parser
library. We'll build our solution step by step.
First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.
TypeScript1import fs from 'fs'; 2import csvParser from 'csv-parser'; 3 4// Filenames to read 5const filenames: string[] = ['data_part1.csv', 'data_part2.csv', 'data_part3.csv']; 6 7// List to store all car data 8let carData: { model: string; price: number }[] = [];
Here, we define an array of filenames and create a typed array carData
to store all the car data read from the files.
Next, we'll loop through each filename, read the data, and append it to our carData
array.
TypeScript1let filesRead: number = 0; 2filenames.forEach((filename) => { 3 fs.createReadStream(filename) 4 .pipe(csvParser()) 5 .on('data', (row: { model: string; price: string }) => { 6 // Convert price from string to float for comparison 7 const car = { 8 model: row.model, 9 price: parseFloat(row.price) 10 }; 11 carData.push(car); 12 }) 13 .on('end', () => { 14 filesRead += 1; 15 if (filesRead === filenames.length) { 16 // Process the combined data here 17 } 18 }) 19 .on('error', (error) => { 20 console.error('Error reading file:', error); 21 }); 22});
In this snippet:
- We employ
fs.createReadStream()
to open each CSV file. - The
csv-parser
library reads each line, converting it from CSV directly to a TypeScript object. - We parse
row.price
from a string to a float for numerical comparison and store the entire row incarData
. - The
end
event checks if all files have been processed.
After combining the data into the carData
array, we can find the car with the lowest price using the reduce
method:
TypeScript1if (filesRead === filenames.length) { 2 const lowestCostCar = carData.reduce<{ model: string; price: number } | null>((lowest, car) => { 3 if (!lowest || car.price < lowest.price) { 4 return car; 5 } 6 return lowest; 7 }, null); 8 9 if (lowestCostCar) { 10 console.log(`Model: ${lowestCostCar.model}`); 11 console.log(`Price: $${lowestCostCar.price.toFixed(2)}`); 12 } 13}
The reduce
method traverses the carData
array. It starts with an initial value null
, which allows it to hold the first car's data as the lowest initially. For each car, it checks if the current car's price is lower than the stored lowest
price. If true, it updates lowest
. The result, lowestCostCar
, is the car with the lowest price, which we then log. This approach is efficient and clear for accumulating a specific result from an array.
Instead of loading all data into memory, a streaming approach can process each record as it's read. This is beneficial for systems with limited memory or when working with extremely large datasets.
TypeScript1let lowestCostCar: { model: string; price: number } | null = null; 2let lowestPrice: number = Infinity; 3 4filenames.forEach((filename) => { 5 fs.createReadStream(filename) 6 .pipe(csvParser()) 7 .on('data', (row: { model: string; price: string }) => { 8 const price = parseFloat(row.price); 9 if (price < lowestPrice) { 10 lowestPrice = price; 11 lowestCostCar = { 12 model: row.model, 13 price: price 14 }; 15 } 16 }) 17 .on('end', () => { 18 filesRead += 1; 19 if (filesRead === filenames.length && lowestCostCar) { 20 console.log(`Model: ${lowestCostCar.model}`); 21 console.log(`Price: $${lowestPrice.toFixed(2)}`); 22 } 23 }) 24 .on('error', (error) => { 25 console.error('Error reading file:', error); 26 }); 27});
In this implementation:
- We maintain two variables:
lowestCostCar
to store the data of the car with the lowest price, andlowestPrice
, initialized to infinity. - We parse and compare the
price
field for each record in the file. If a record's price is lower, we updatelowestPrice
and store the record inlowestCostCar
. - This approach reduces memory usage since we do not retain unnecessary data.
In this lesson, you learned how to:
- Read data in batches from multiple CSV files using TypeScript's
fs
module andcsv-parser
. - Process that data efficiently and convert data types when necessary.
- Identify specific insights, such as the car with the lowest price, utilizing TypeScript methods like type annotations and robust data handling techniques.
Now, you're ready to apply these skills with practice exercises designed to reinforce your understanding. These exercises will challenge you to read and analyze data from similar datasets efficiently. Continuous practice is key to mastering these data handling techniques.