Database Integration and Persistence

Introduction: Why Store Code and Commits in a Database?

Welcome back! So far, you have learned how to scan a codebase, extract commit history, and analyze code using a command-line interface. Until now, all the information about code files and commits has been stored in memory — meaning it disappears as soon as your program stops running.

In real-world applications, especially for tools like a code review assistant, it is important to keep this information organized and persistent. Storing data in a database allows you to:

Save code and commit information for later use
Query and analyze data efficiently
Share data between different parts of your application

In this lesson, you will learn how to use a database to store code files and commit data using SQLAlchemy, a popular Python library for working with databases. This will make your code review assistant more powerful and reliable.

Setting Up Your Environment

Before we begin working with databases, you need to install the required dependencies. SQLAlchemy is not part of Python's standard library, so it needs to be installed separately.

Installing SQLAlchemy

If you're working on your own machine, you'll need to install SQLAlchemy. Here are the installation commands:

Using pip:

Using pip with a virtual environment (recommended):

Using conda:

Additional Dependencies

For this lesson, we'll also use SQLite as our database, which comes built-in with Python. If you want to use other databases like PostgreSQL or MySQL, you would need additional drivers:

Note: On CodeSignal, the required libraries are already installed, so you do not need to worry about installation here. However, it is good practice to know how to set up your environment on your own device.

Quick Recall: Data Classes and CLI Integration

Before we dive in, let's quickly remind ourselves of what you have already built:

You used Python data classes to represent code files and commits.
You built a CLI tool that scanned a project directory and extracted commit history, displaying useful statistics.

In those lessons, all data was kept in memory using Python objects. Now, we will take the next step and store this data in a database so it can be accessed and updated over time.

Defining SQLAlchemy Models for Code and Commits

To store data in a database, we need to define the structure of our tables. In SQLAlchemy, this is done by creating Python classes called models. Each model represents a table in the database.

Let's start by defining a model for a code file.

Explanation:

Base = declarative_base() sets up the base class for all our models.
class CodeFile(Base): defines a table called code_files.
Each attribute (like id, file_path, content) becomes a column in the table.
id is the primary key, which uniquely identifies each row.
file_path is marked as unique, so no two files can have the same path.

Now, let's define a model for a commit:

Explanation:

This class creates a commits table.

Setting Up and Initializing the Database

Now that we have our models, we need to set up the database connection and create the tables.

First, let's set up the connection:

Explanation:

DATABASE_URL tells SQLAlchemy where to find the database. If the environment variable is not set, it uses a local SQLite file called code_review.db.
create_engine() creates a connection to the database.
SessionLocal is a factory for creating sessions, which are used to interact with the database.

Next, let's create the tables:

Explanation:

Base.metadata.create_all() creates all tables defined by our models if they do not already exist.

Finally, we need a helper function to get database sessions:

Explanation:

get_session() creates and returns a new database session using the SessionLocal factory we defined earlier.
This session is used to interact with the database — adding, updating, and querying data.

Populating the Database with Repository Data

With the database ready, let's see how to add code files and commits to it. We will use our repository scanner and git history extractor from previous lessons, but now we will store the results in the database.

First, let's scan the repository and add code files:

Explanation:

We use a RepositoryScanner to get a list of code files.
For each file, we create a CodeFile object and add it to the session.
session.merge() adds the object to the database, updating it if it already exists.
session.commit() saves the changes.

Now, let's add commits:

Summary and Practice Preview

In this lesson, you learned how to:

Install and set up SQLAlchemy as a dependency for database operations
Define SQLAlchemy models to represent code files, commits, and their relationships
Set up and initialize a database connection
Scan a repository and extract commit history
Store code and commit data in a database for persistence

You are now ready to practice these skills by working with real code and seeing how data is stored and retrieved from the database. In the next exercises, you will get hands-on experience with database integration and persistence, making your code review assistant more robust and useful.

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal