Welcome back! So far, you have learned how to scan a codebase, extract commit history, and analyze code using a command-line interface. Until now, all the information about code files and commits has been stored in memory — meaning it disappears as soon as your program stops running.
In real-world applications, especially for tools like a code review assistant, it is important to keep this information organized and persistent. Storing data in a database allows you to:
- Save code and commit information for later use
- Query and analyze data efficiently
- Share data between different parts of your application
In this lesson, you will learn how to use a database to store code files and commit data using SQLAlchemy, a popular Python library for working with databases. This will make your code review assistant more powerful and reliable.
Before we begin working with databases, you need to install the required dependencies. SQLAlchemy is not part of Python's standard library, so it needs to be installed separately.
If you're working on your own machine, you'll need to install SQLAlchemy. Here are the installation commands:
Using pip:
Using pip with a virtual environment (recommended):
Using conda:
For this lesson, we'll also use SQLite as our database, which comes built-in with Python. If you want to use other databases like PostgreSQL or MySQL, you would need additional drivers:
Note: On CodeSignal, the required libraries are already installed, so you do not need to worry about installation here. However, it is good practice to know how to set up your environment on your own device.
Before we dive in, let's quickly remind ourselves of what you have already built:
- You used Python data classes to represent code files and commits.
- You built a CLI tool that scanned a project directory and extracted commit history, displaying useful statistics.
In those lessons, all data was kept in memory using Python objects. Now, we will take the next step and store this data in a database so it can be accessed and updated over time.
To store data in a database, we need to define the structure of our tables. In SQLAlchemy, this is done by creating Python classes called models
. Each model represents a table in the database.
Let's start by defining a model for a code file.
Explanation:
Base = declarative_base()
sets up the base class for all our models.class CodeFile(Base):
defines a table calledcode_files
.- Each attribute (like
id
,file_path
,content
) becomes a column in the table. id
is the primary key, which uniquely identifies each row.file_path
is marked as unique, so no two files can have the same path.
Now, let's define a model for a commit:
Explanation:
- This class creates a
commits
table.
Now that we have our models, we need to set up the database connection and create the tables.
First, let's set up the connection:
Explanation:
DATABASE_URL
tells SQLAlchemy where to find the database. If the environment variable is not set, it uses a local SQLite file calledcode_review.db
.create_engine()
creates a connection to the database.SessionLocal
is a factory for creating sessions, which are used to interact with the database.
Next, let's create the tables:
Explanation:
Base.metadata.create_all()
creates all tables defined by our models if they do not already exist.
Finally, we need a helper function to get database sessions:
Explanation:
get_session()
creates and returns a new database session using theSessionLocal
factory we defined earlier.- This session is used to interact with the database — adding, updating, and querying data.
With the database ready, let's see how to add code files and commits to it. We will use our repository scanner and git history extractor from previous lessons, but now we will store the results in the database.
First, let's scan the repository and add code files:
Explanation:
- We use a
RepositoryScanner
to get a list of code files. - For each file, we create a
CodeFile
object and add it to the session. session.merge()
adds the object to the database, updating it if it already exists.session.commit()
saves the changes.
Now, let's add commits:
In this lesson, you learned how to:
- Install and set up SQLAlchemy as a dependency for database operations
- Define SQLAlchemy models to represent code files, commits, and their relationships
- Set up and initialize a database connection
- Scan a repository and extract commit history
- Store code and commit data in a database for persistence
You are now ready to practice these skills by working with real code and seeing how data is stored and retrieved from the database. In the next exercises, you will get hands-on experience with database integration and persistence, making your code review assistant more robust and useful.
