Lesson 3
Parsing and Managing CSV Data in Scala
Introduction and Context Setting

In this lesson, we will explore working with CSV files — a universally employed format for data storage and interchange. By the end of this lesson, you will gain skills in reading data from CSV files, recognizing rows and columns separated by commas, and effectively manipulating string data using Scala. This lesson builds on your prior experience with file parsing using tabs and introduces you to working specifically with CSV data.

Understanding CSV Structure and Delimiter

CSV stands for Comma-Separated Values, a format that stores tabular data in plain text. Each line represents a data row, with columns separated by commas, facilitating easy storage and interpretation.

Consider a CSV file named data.csv:

1Name,Age,Occupation 2John,28,Engineer 3Alice,34,Doctor 4Bob,23,Artist

In this file:

  • The first line contains the headers: Name, Age, and Occupation.
  • Each subsequent line holds data for an individual, with values separated by commas.

Understanding this structure is crucial as it guides us on parsing the data effectively within our programming environment.

Reading CSV Data

To read all lines from the file and skip the header line, we can utilize the capabilities you are already familiar with:

Scala
1// Define the file path 2val filePath = os.pwd / "data.csv" 3 4// Read all lines from the CSV file 5val lines = os.read.lines(filePath) 6 7// Use drop to skip the header and access the data lines 8val dataLines = lines.drop(1)
Parsing Each Line

To parse each line of the CSV file, we employ the string split method to extract individual data fields. Each line read from the file is divided into an array of strings, representing the row's columns:

Scala
1// Parse each data line into columns 2val data = dataLines.map { line => 3 // Split each line by commas 4 line.split(",") 5}

The line.split(",") operation breaks down each line at the commas, storing the result in an array that can be directly manipulated or accessed using indices.

Verifying Parsed Output

To ensure the CSV data is correctly parsed, we display the structured data. Here's how you can format and verify the output:

Scala
1// Iterate over each row of parsed data 2data.foreach { row => 3 // Format and print each row 4 println(row.mkString(", ")) 5}

The expected output should resemble the following, confirming accurate parsing:

Plain text
1John, 28, Engineer 2Alice, 34, Doctor 3Bob, 23, Artist

This output verifies that each line from the CSV has been successfully parsed into structured arrays of data.

Mapping to a List of Classes

Now, let's map the parsed data into a list of classes, leveraging your prior knowledge. We'll use a Person case class to organize the data:

Scala
1case class Person(name: String, age: Int, occupation: String)

In this process:

  • We use the Person case class to encapsulate data, mapping each column to its corresponding field.
  • Each parsed line is transformed into a Person object, holding structured data.
Scala
1// Map each line to a Person object 2val people = data.map { columns => 3 val name = columns(0) 4 val age = columns(1).toInt 5 val occupation = columns(2) 6 Person(name, age, occupation) 7} 8 9// Print each Person object 10people.foreach { person => 11 println(s"Name: ${person.name}, Age: ${person.age}, Occupation: ${person.occupation}") 12}

The expected output should resemble the following, confirming accurate mapping and printing:

Plain text
1Name: John, Age: 28, Occupation: Engineer 2Name: Alice, Age: 34, Occupation: Doctor 3Name: Bob, Age: 23, Occupation: Artist
Complete Code Example

Here is how the complete code looks like, integrating all the steps from reading the CSV file to mapping it into a list of classes and printing the results:

Scala
1import os._ 2 3case class Person(name: String, age: Int, occupation: String) 4 5@main def main() = 6 // Define the file path 7 val filePath = os.pwd / "data.csv" 8 9 // Read all lines from the CSV file 10 val lines = os.read.lines(filePath) 11 12 // Use drop to skip the header and access the data lines 13 val dataLines = lines.drop(1) 14 15 // Parse each data line into columns 16 val data = dataLines.map { line => 17 // Split each line by commas 18 line.split(",") 19 } 20 21 // Map each line to a Person object 22 val people = data.map { columns => 23 val name = columns(0) 24 val age = columns(1).toInt 25 val occupation = columns(2) 26 Person(name, age, occupation) 27 } 28 29 // Print each Person object 30 people.foreach { person => 31 println(s"Name: ${person.name}, Age: ${person.age}, Occupation: ${person.occupation}") 32 }
Summary and Preparing for Practice

In this lesson, we covered parsing CSV files using Scala, focusing on reading data with commas as delimiters and managing structured data using arrays and case classes. You've employed techniques you've already mastered, such as os-lib for file operations and string manipulation, to efficiently handle CSV content.

As you move forward to the practice exercises, verify the correctness of your parsed data and explore potential applications using Scala's functional programming paradigms for advanced data-handling techniques. Continue your excellent work, and keep exploring Scala's rich capabilities for data manipulation and processing.

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.