Lesson 2
Parsing Files Line-by-Line in Python
Introduction and Context Setting

Welcome to this lesson, where we'll explore an essential technique in text data manipulation: reading files line-by-line. In many real-world applications, processing data one line at a time is crucial for effective data management, particularly when dealing with large files like logs or data streams. By the end of this lesson, you'll understand how to efficiently read and process file data line-by-line, leveraging Python's built-in functionality.

Recall: Opening Files

As a quick reminder from our previous lesson, let's revisit how opening files in Python works. File handling is done using the open() function. You specify the file path and the mode — here, we'll use 'r' for reading:

Python
1file_path = 'input.txt' 2file = open(file_path, 'r')

In this example, file_path is a string that indicates the location of your file, and file is a file object created by open(). Remember, using the with statement simplifies file handling by automatically closing files, freeing resources, and preventing potential errors.

Reading Files Line-by-Line

To read a file line-by-line in Python, we use the readlines() method. This method reads the entire file and returns a list where each element is one line from the file.

Let's break down how to use readlines():

Python
1file_path = 'input.txt' 2with open(file_path, 'r') as file: 3 lines = file.readlines()

Here, lines is a list containing all lines from input.txt, each as a separate string. For example, consider these file contents:

Plain text
1Hello, 2world 3!

After executing readlines method on it, we will get the following list: ["Hello,\n", "world\n", "!"]. Note \n symbol in the end of each string in the list. It is a newline symbol, which is also a part of the file's line.

Iterating Over Lines and Cleaning Up Output

Once you have your file lines in a list, you can iterate over them using a for loop. During this process, it's often necessary to clean up the output. The strip() method is a handy tool to remove unwanted newline characters and extra spaces. Let's see this in action:

Python
1file_path = 'input.txt' 2with open(file_path, 'r') as file: 3 lines = file.readlines() 4 5for line in lines: 6 print(line.strip())
  • Looping Over Lines: The for loop goes through each line in the lines list.
  • Using strip(): This method is applied to each line to remove any leading and trailing whitespace, including newline characters.

The output of this code will neatly display each line from input.txt without extra newlines:

Plain text
1Hello, 2world 3!
Reading Integers from a File and Finding Their Sum

To extend our file line-by-line reading skills, let's look at an example where we read integers from a file and calculate their sum. Here's how you can do it:

Assume the numbers.txt file contains:

Plain text
110 220 330 440

The following code reads integers from this file and calculates their sum. It is very similar to the code we had before, except it also converts each line to an integer and implements logic to calculate sum.

Python
1file_path = 'numbers.txt' 2total_sum = 0 3 4with open(file_path, 'r') as file: 5 lines = file.readlines() 6 7for line in lines: 8 number = int(line.strip()) # Convert each line to an integer 9 total_sum += number # Add the integer to total_sum 10 11print("The sum of the numbers is:", total_sum)
  • Reading Lines: The file's lines containing numbers are read into the lines list.
  • Converting to Integers: Each line is stripped of whitespace and converted to an integer using int().
  • Calculating Sum: The converted integers are summed up in the total_sum variable.

After executing the code, the output will show the total sum of the numbers:

Plain text
1The sum of the numbers is: 100
Summary and Practice Preparation

In this lesson, you gained the skills to read a text file line-by-line using Python — a fundamental technique for processing large datasets efficiently. You've learned to manage file I/O operations safely and effectively with with, explored how to utilize readlines(), and cleaned data with strip().

Now, you're ready to dive into practice exercises where you can apply these concepts and strengthen your understanding. Continue to build on these skills as we explore further parsing techniques in future lessons. Keep up the great work!

Enjoy this lesson? Now it's time to practice with Cosmo!
Practice is how you turn knowledge into actual skills.