Lesson 4
Reading and Processing CSV Data in Batches with Java
Introduction to Reading Data in Batches with Java

In previous lessons, you learned how to handle datasets efficiently stored in compressed formats using Java. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using Java. This is crucial because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.

Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.

Understanding CSV Data Structure

In this lesson, we will work with a set of CSV files containing car data. Here's an example of the CSV format:

csv
1model,price,transmission,year,distance_traveled_km,color 2Chevrolet Silverado,43725.23,Manual,2013,55504,Silver 3BMW X5,5643.78,Semi-Automatic,2014,11902,Red 4Honda Accord,42850.79,Manual,2010,102223,Black 5BMW Series 3,53359.81,Automatic,2009,237231,Gray 6...

A typical record might look like this:

  • Model: Chevrolet Silverado
  • Price: 43725.23
  • Transmission: Manual
  • Year: 2013
  • Distance Traveled (km): 55504
  • Color: Silver

These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.

Setting Up for CSV File Batch Reading

To effectively read and process CSV files in batches, we need to set up our environment by defining the necessary classes and data structures. We will also specify the filenames for our CSV data files.

First, we'll define a Car class to map each row of the CSV file into a Java object. This class includes fields that correspond to the columns in the CSV file. The class also needs an empty default constructor, which is required by the Jackson library. The default constructor must be empty because Jackson uses it to instantiate objects without any initial parameters during the data binding process.

Java
1// Class to represent a car 2public class Car { 3 public String model; 4 public double price; 5 public String transmission; 6 public int year; 7 public int distance_traveled_km; 8 public String color; 9 10 // Default constructor needed for Jackson 11 public Car() {} 12}

Next, we declare an array filenames that will hold the names of the CSV files we want to read. Additionally, we create an ArrayList<Car> named carData to store all the car data read from these files, enabling us to process and analyze this data collectively.

Java
1// Filenames to read 2String[] filenames = {"data_part1.csv", "data_part2.csv", "data_part3.csv"}; 3 4// List to store all car data 5ArrayList<Car> carData = new ArrayList<>();

By setting up these structures, we prepare to efficiently read and store data from multiple CSV files, allowing us to handle large datasets by processing information in manageable batches.

Reading Data from Each File

Now, we'll loop through each filename, configure CsvMapper and CsvSchema to map CSV records to Car objects using Java's Jackson library, and append it to our carData structure.

Java
1// Configure CsvMapper and schema using header 2CsvMapper csvMapper = new CsvMapper(); 3CsvSchema csvSchema = CsvSchema.emptySchema().withHeader(); // use first row as header 4 5// Iterate over each filename in the filenames array 6for (String filename : filenames) { 7 // Create a File object for the current CSV file 8 File csvFile = new File(filename); 9 10 // Create a MappingIterator to iterate over the car records 11 MappingIterator<Car> carIterator = csvMapper.readerFor(Car.class) 12 .with(csvSchema) 13 .readValues(csvFile); 14 15 // Add each car record to the ArrayList 16 while (carIterator.hasNext()) { 17 Car car = carIterator.next(); 18 carData.add(car); 19 } 20}

In this code:

  • We utilize CsvMapper and CsvSchema to map CSV records to Java objects.
  • We read each CSV file and generate a MappingIterator to iterate through car records, leveraging the Car class to map each CSV row into a Java object.
  • Each car record is added to our carData ArrayList.
Finding the Car with the Lowest Price

With all data combined in carData, the next step is identifying the car with the lowest price in Java.

Java
1// Check if car data is not empty before finding the lowest cost car 2if (!carData.isEmpty()) { 3 // Initialize with the first car in the list 4 Car lowestCostCar = carData.get(0); 5 6 // Iterate through the car list to find the car with the lowest price 7 for (Car car : carData) { 8 if (car.price < lowestCostCar.price) { 9 lowestCostCar = car; 10 } 11 } 12 13 // Output the model and price of the lowest cost car 14 System.out.println("Model: " + lowestCostCar.model); 15 System.out.println("Price: $" + lowestCostCar.price); 16} else { 17 // If no valid car data is available, output an appropriate message 18 System.out.println("No valid car data available."); 19}

Here:

  • We initialize lowestCostCar with the first car in carData.
  • A loop evaluates each car to find the one with the minimum price.
  • Finally, we print the model and price of the car with the lowest price.
Summary and Practice Preparation

In this lesson, you have learned how to:

  • Read data in batches from multiple CSV files using Java and Jackson's CSV module.
  • Process the data efficiently by defining schemas and mapping CSV records to Java objects.
  • Identify insights, such as the car with the lowest price, using loops to evaluate data elements.

These techniques prepare you to handle similar datasets efficiently using Java. Practice these skills with exercises designed to reinforce your understanding, focusing on reactive and efficient data handling techniques with modern Java libraries.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.