Application Level Data Protection

Introduction

Welcome to the final lesson of the "Cryptographic Failures" course! In our previous lessons, we explored the importance of cryptography in securing data and identified common vulnerabilities, such as weak algorithms and hardcoded secrets.

In this lesson, we'll focus on understanding the limitations of automatic database encryption and the importance of application-level encryption. Let's dive in! 🚀

The Encryption Reliance Problem

Many developers assume that enabling database encryption features, often called Transparent Data Encryption (TDE) or encryption-at-rest, automatically makes their sensitive data secure. While these features are crucial for protecting data on disk—for instance, if a physical hard drive is stolen—they offer no protection once the data is accessed by the application.

Think of it this way: database encryption locks the file cabinet, but the application has the key. When your application queries the database, the database engine automatically decrypts the data and returns it in plaintext. This means that any process or person who can legitimately access your application or database can view sensitive data in its unencrypted form. This automatic decryption creates a significant security risk, especially for highly sensitive information like credit card numbers, personal identification data, or healthcare records. An attacker who compromises the application layer can read this data as if it were never encrypted at all.

Let's examine how this vulnerability manifests in code and learn how to properly secure it using application-level encryption.

Vulnerable: Storing Card Information

Suppose we have an endpoint responsible for adding credit card information. It uses SQL queries for adding the payment information. For simplicity, we'll skip authentication in this example.

It's critical to note that in a real-world application, you should never store full credit card numbers (Primary Account Numbers, or PANs) on your servers. The complexity and legal requirements for handling this data are governed by the Payment Card Industry Data Security Standard (PCI-DSS). Instead, you should use a certified payment provider like Stripe or Braintree. These services use a process called tokenization, where they handle the sensitive data and provide you with a non-sensitive token to use for transactions.

For this lesson, however, we use credit card numbers as a clear and relatable example of highly sensitive data. The principles we'll discuss—hashing, data minimization, and application-level protection—apply equally to other sensitive information you might need to store, such as Social Security Numbers (SSNs), bank account details, or private health information.

Here's how a vulnerable endpoint might look without proper encryption, using FastAPI and SQLAlchemy:

This implementation is highly problematic because it stores the full credit card number (Primary Account Number or PAN) in its raw, plaintext form. The column is named card_hash, but the code inserts the raw . Even if the database encrypts its files on disk, the number is vulnerable during transmission to the database and is fully exposed to anyone or anything that can query the application, including developers, database administrators, or attackers who compromise the application.

Vulnerable: Retrieving Card Information

The vulnerability becomes even more apparent when an endpoint is created to retrieve the stored card information. This is a common requirement for features like a "manage my payment methods" page.

This endpoint directly retrieves the raw credit card numbers from the database and sends them to the client. This creates a massive data leak. An attacker who gains access to this endpoint can exfiltrate sensitive payment information for all users. Furthermore, this sensitive data might get logged by proxies, cached in intermediate systems, or exposed in browser developer tools, dramatically increasing the risk of theft.

Exploiting the Vulnerability

An attacker who discovers this API can easily retrieve all the sensitive card information stored in the database with a simple request. They don't need to bypass database encryption or steal a hard drive; they just need to call the exposed endpoint.

Example Response:

As you can see, the credit card numbers are exposed in plaintext in the API response. This vulnerability exists regardless of database-level encryption because the data is automatically decrypted when the application queries it. Let's look at how to properly secure this sensitive data.

Secure: Adding Card Information

The secure approach follows the principle of data minimization: never store sensitive data you don't absolutely need. Instead of storing the full card number, we will store a non-reversible hash of the number for verification and the last four digits for display purposes.

Here's how to properly hash and store sensitive data using FastAPI and Python's bcrypt library:

This secure implementation hashes the credit card number before storing it in the card_hash column. We only persist the hash and the last four digits. This way, even if an attacker gains full access to the database, they cannot recover the original card numbers. The full, sensitive number is processed in memory only for a brief moment and is never written to disk.

Implementing Card Number Hashing

The hash_card_number function uses bcrypt, a strong hashing algorithm. bcrypt is a Key Derivation Function (KDF), which means it's specifically designed to be slow and computationally intensive. This "slowness" is a critical security feature that makes it extremely resistant to brute-force attacks, where an attacker tries to guess the input by hashing billions of possibilities.

A key feature of bcrypt is its automatic use of a salt. A salt is a random string that is added to the input (the card number) before hashing. This ensures that even if two users have the identical card number, their stored hashes will be completely different. This prevents attackers from using pre-computed hash lists, known as "rainbow tables," to crack the hashes. The salt is stored as part of the final hash string, so you don't need a separate database column for it.

Secure: Retrieving Card Information

When displaying card information back to users, we must ensure the full card number is never sent. We only need to show enough information for the user to identify their card. The industry standard is to show only the last four digits.

This implementation ensures that only masked card numbers are ever exposed through the API. The last four digits are sufficient for users to identify their cards, as this is a standard practice in the payment card industry. For example, a user can easily recognize that ****-****-****-4561 is their Visa card, while ****-****-****-3789 is their Mastercard. This approach provides an excellent balance between security and usability.

Verifying Card Numbers

While you can't decrypt the hash to get the original number, you can verify if a newly provided card number matches the stored hash. This is useful for processes like confirming a card before making a payment.

This function uses bcrypt.checkpw, which re-hashes the provided card_number with the salt extracted from the stored_hash and compares the result. This comparison is done in a way that is safe from timing attacks, where an attacker could measure the tiny differences in comparison time to guess the hash's contents. The try...except block ensures that malformed hash data or other errors don't leak information and simply result in a False verification.

You can use this function in an endpoint like this:

This endpoint securely confirms whether the user-provided card number matches the one on file, without ever exposing the stored hash or other sensitive details. This pattern is essential for building secure verification workflows that protect user data at every step.

Conclusion and Next Steps

In this lesson, we explored why relying solely on database encryption isn't sufficient for protecting sensitive data. We learned that true security requires a defense-in-depth approach, with application-level controls being paramount.

We learned how to implement proper security measures for handling sensitive data by:

Applying the principle of data minimization: never storing sensitive data in its raw form.
Using a strong, slow KDF like bcrypt to create non-reversible hashes.
Storing only a hash and non-sensitive identifiers (like the last four digits).
Never transmitting full sensitive data in API responses, using masking instead.
Securely verifying data using functions that protect against side-channel attacks like timing attacks.

As you move on to the practice exercises, you'll have the opportunity to implement these security methods yourself. Keep up the great work! 🌟

Previous Lesson

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal