In previous lessons, you learned how to handle datasets stored in compressed formats and manage large numerical datasets efficiently. Building on that foundation, today's lesson will teach you how to read and process data in batches from multiple CSV files using Go. This is crucial because working with data in smaller chunks, or batches, can make your code more efficient and faster when dealing with large datasets.
Our focus in this lesson will be on a practical scenario where a dataset containing car information is spread across multiple files. You will learn to read, process, and analyze this data to extract meaningful insights, such as determining the car with the lowest price.
In this lesson, we will work with a set of CSV files containing car data. Here's what a typical CSV file might look like:
csv1transmission,price,color,year,model,distance_traveled_km 2Automatic,60383.80,Silver,2013,Ford Focus,10437 3Manual,82471.28,White,2011,Toyota Corolla,221662 4Automatic,52266.72,Black,2012,BMW Series 5,30296 5...
Each line represents a car record with the following attributes:
- Transmission: Type of transmission (e.g., Automatic, Manual)
- Price: The price of the car
- Color: The color of the car
- Year: The manufacturing year of the car
- Model: The model of the car
- Distance Traveled (km): Kilometers the car has traveled
These files are divided into multiple parts to allow batch processing, and understanding their structure is crucial as you learn to read and process them efficiently.
Now, let's delve into reading these CSV files in batches using Go constructs. We'll build our solution step-by-step.
First, we need to specify the filenames for our CSV files and prepare a data structure to hold the combined data.
Go1package main 2 3import ( 4 "encoding/csv" 5 "fmt" 6 "os" 7 "strconv" 8) 9 10// Struct to represent a car 11type Car struct { 12 Model string 13 Price float64 14} 15 16// Filenames to read 17func main() { 18 filenames := []string{"data_part1.csv", "data_part2.csv", "data_part3.csv"} 19 20 // Slice to store all car data 21 var carData []Car
Here, we declare a slice filenames
to hold the names of the CSV files and a slice of Car
structs to store the car data read from the files.
Now, we'll loop through each filename, read the data using os.Open
and csv.NewReader
, and append it to our carData
structure.
Go1 for _, filename := range filenames { 2 file, err := os.Open(filename) 3 if err != nil { 4 fmt.Println("Error opening file:", filename, err) 5 continue 6 } 7 defer file.Close() 8 9 reader := csv.NewReader(file) 10 records, err := reader.ReadAll() 11 if err != nil { 12 fmt.Println("Error reading file:", filename, err) 13 continue 14 } 15 16 // Process each record (skip header) 17 for i, record := range records { 18 if i == 0 { 19 continue // Skip header 20 } 21 if len(record) < 2 { 22 continue // Skip invalid records 23 } 24 25 price, err := strconv.ParseFloat(record[1], 64) 26 if err != nil { 27 continue 28 } 29 carData = append(carData, Car{ 30 Model: record[4], 31 Price: price, 32 }) 33 } 34 }
In this code:
- We open each file using
os.Open
and create a CSV reader withcsv.NewReader
. - We handle errors that might occur when opening or reading files.
- We use a loop to process each record, skipping the header line.
- For each row, we parse the model from index 4 and the price from index 1 and append the data to
carData
. - The
strconv.ParseFloat
method is used to convert the price from a string to a float64. It takes two arguments: the string to be converted and the precision (64 in this case, indicating a float64). If the conversion fails, an error is returned, which we handle by skipping the invalid record.
With all data combined in carData
, the next step is identifying the car with the lowest price in Go.
Go1 // Find the lowest cost car 2 if len(carData) > 0 { 3 lowestCostCar := carData[0] 4 for _, car := range carData { 5 if car.Price < lowestCostCar.Price { 6 lowestCostCar = car 7 } 8 } 9 fmt.Printf("Model: %s\n", lowestCostCar.Model) 10 fmt.Printf("Price: $%.2f\n", lowestCostCar.Price) 11 } else { 12 fmt.Println("No valid car data available.") 13 } 14}
Here:
- We initialize
lowestCostCar
with the first car incarData
. - A loop evaluates each car to find the one with the minimum price.
- Finally, we print the model and price of the car with the lowest price.
In this lesson, you have learned how to:
- Read data in batches from multiple CSV files using Go file handling with
os.Open
andcsv.NewReader
. - Process the data efficiently with string conversions and error handling using
strconv.ParseFloat
. - Identify insights, such as the car with the lowest price, using loops and conditionals to evaluate data elements.
These techniques prepare you to handle similar datasets efficiently using Go. Practice these skills with exercises designed to reinforce your understanding, focusing on reactive and efficient data handling techniques.