In this lesson, we will explore working with CSV files — a universally employed format for data storage and interchange. By the end of this lesson, you will gain skills in reading data from CSV files, recognizing rows and columns separated by commas, and effectively manipulating string data using Scala. This lesson builds on your prior experience with file parsing using tabs and introduces you to working specifically with CSV data.
CSV stands for Comma-Separated Values, a format that stores tabular data in plain text. Each line represents a data row, with columns separated by commas, facilitating easy storage and interpretation.
Consider a CSV file named data.csv
:
1Name,Age,Occupation 2John,28,Engineer 3Alice,34,Doctor 4Bob,23,Artist
In this file:
- The first line contains the headers:
Name
,Age
, andOccupation
. - Each subsequent line holds data for an individual, with values separated by commas.
Understanding this structure is crucial as it guides us on parsing the data effectively within our programming environment.
To read all lines from the file and skip the header line, we can utilize the capabilities you are already familiar with:
Scala1// Define the file path 2val filePath = os.pwd / "data.csv" 3 4// Read all lines from the CSV file 5val lines = os.read.lines(filePath) 6 7// Use drop to skip the header and access the data lines 8val dataLines = lines.drop(1)
To parse each line of the CSV file, we employ the string split
method to extract individual data fields. Each line read from the file is divided into an array of strings, representing the row's columns:
Scala1// Parse each data line into columns 2val data = dataLines.map { line => 3 // Split each line by commas 4 line.split(",") 5}
The line.split(",")
operation breaks down each line at the commas, storing the result in an array that can be directly manipulated or accessed using indices.
To ensure the CSV data is correctly parsed, we display the structured data. Here's how you can format and verify the output:
Scala1// Iterate over each row of parsed data 2data.foreach { row => 3 // Format and print each row 4 println(row.mkString(", ")) 5}
The expected output should resemble the following, confirming accurate parsing:
Plain text1John, 28, Engineer 2Alice, 34, Doctor 3Bob, 23, Artist
This output verifies that each line from the CSV has been successfully parsed into structured arrays of data.
Now, let's map the parsed data into a list of classes, leveraging your prior knowledge. We'll use a Person
case class to organize the data:
Scala1case class Person(name: String, age: Int, occupation: String)
In this process:
- We use the
Person
case class to encapsulate data, mapping each column to its corresponding field. - Each parsed line is transformed into a
Person
object, holding structured data.
Scala1// Map each line to a Person object 2val people = data.map { columns => 3 val name = columns(0) 4 val age = columns(1).toInt 5 val occupation = columns(2) 6 Person(name, age, occupation) 7} 8 9// Print each Person object 10people.foreach { person => 11 println(s"Name: ${person.name}, Age: ${person.age}, Occupation: ${person.occupation}") 12}
The expected output should resemble the following, confirming accurate mapping and printing:
Plain text1Name: John, Age: 28, Occupation: Engineer 2Name: Alice, Age: 34, Occupation: Doctor 3Name: Bob, Age: 23, Occupation: Artist
Here is how the complete code looks like, integrating all the steps from reading the CSV file to mapping it into a list of classes and printing the results:
Scala1import os._ 2 3case class Person(name: String, age: Int, occupation: String) 4 5@main def main() = 6 // Define the file path 7 val filePath = os.pwd / "data.csv" 8 9 // Read all lines from the CSV file 10 val lines = os.read.lines(filePath) 11 12 // Use drop to skip the header and access the data lines 13 val dataLines = lines.drop(1) 14 15 // Parse each data line into columns 16 val data = dataLines.map { line => 17 // Split each line by commas 18 line.split(",") 19 } 20 21 // Map each line to a Person object 22 val people = data.map { columns => 23 val name = columns(0) 24 val age = columns(1).toInt 25 val occupation = columns(2) 26 Person(name, age, occupation) 27 } 28 29 // Print each Person object 30 people.foreach { person => 31 println(s"Name: ${person.name}, Age: ${person.age}, Occupation: ${person.occupation}") 32 }
In this lesson, we covered parsing CSV files using Scala, focusing on reading data with commas as delimiters and managing structured data using arrays and case classes. You've employed techniques you've already mastered, such as os-lib
for file operations and string manipulation, to efficiently handle CSV content.
As you move forward to the practice exercises, verify the correctness of your parsed data and explore potential applications using Scala's functional programming paradigms for advanced data-handling techniques. Continue your excellent work, and keep exploring Scala's rich capabilities for data manipulation and processing.