Data Integrity Fundamentals

Introduction

Welcome to the very first lesson of the "Creating a Secure App following OWASP Top 10 Risks (Part 2)" course! In this lesson, we will explore the fundamentals of data integrity, a crucial aspect of secure data handling. Data integrity ensures that information remains accurate, consistent, and reliable throughout its lifecycle. By the end of this lesson, you'll understand the importance of data integrity and how to implement it in your applications. Let's dive in! 🚀

Understanding Data Integrity

As we discussed earlier, if an attacker already has access to some endpoints or the network, it is still important to limit their capabilities. Previously, we focused on securing specific types of data that attackers can target, such as user credentials and JWT tokens. Now, let's consider a more generalized approach, focusing on the integrity of ANY DATA within the application. Data integrity is the assurance that data is accurate, consistent, and reliable. It plays a vital role in maintaining the trustworthiness of information, especially in applications where data is frequently accessed and modified. There are three main types of data integrity:

Physical Integrity: Ensures that data is stored and retrieved without corruption due to hardware failures or environmental factors.
Logical Integrity: Maintains the correctness and validity of data within a database, ensuring that it adheres to predefined rules and constraints.
Referential Integrity: Ensures that relationships between data in different tables remain consistent, preventing orphaned records or invalid references.

Understanding these types helps us appreciate the various ways data integrity can be compromised and the importance of safeguarding it. In this lesson, we primarily focus on logical and referential integrity.

To better understand how logical and referential integrity apply in real-world Express applications, consider a case where multiple related entities (e.g., users and their posts) exist in a database. Logical integrity ensures that a user's email follows a proper format and is unique, while referential integrity ensures that every post references a valid user ID. If a user is deleted, cascading deletes or foreign key constraints are often used to maintain referential integrity. These constraints are typically enforced at the database level (e.g., PostgreSQL, MySQL).

The Vulnerable Code

Let's examine a scenario where data integrity can be compromised due to a lack of proper security measures. Consider a simple Express application that stores and retrieves snippets of code. Here's a basic implementation:

In this code, the application stores the snippet content directly without any integrity checks. This lack of verification can lead to data being altered or tampered with, compromising its integrity.

Implementing Data Hashing

To protect data integrity, we can start by implementing data hashing. Hashing converts data into a fixed-size string of characters, which acts as a unique identifier for the data. Here's how you can implement data hashing in the POST /api/snippets endpoint:

In this code, we use the crypto module to create a SHA-256 hash of the snippet content. The createHash function initializes the hash object with the SHA-256 algorithm, update adds the data to be hashed, and digest('hex') finalizes the hash and returns it as a hexadecimal string. This hash is then stored alongside the original data, allowing us to verify its integrity later.

Retrieving and Verifying the Snippet

When retrieving the snippet, verify its integrity using the stored hash:

In this implementation, when a snippet is stored, both the content and its hash are saved in the database. When the snippet is retrieved, the application recalculates the hash of the content and compares it with the stored hash to ensure the data has not been altered. If the hashes match, the data is considered intact and is returned to the client. If they don't match, an error is returned, indicating a potential integrity issue.

Implementing HMAC Verification

HMAC (Hash-based Message Authentication Code) is a mechanism that combines a cryptographic hash function with a secret key to provide both data integrity and authenticity. Unlike simple hashing, which only ensures data integrity, HMAC also verifies that the data has not been altered by an unauthorized party. This is achieved by using a secret key that is known only to the sender and receiver, making it more secure against tampering.

Here's how to modify the GET /api/snippets/:id endpoint to implement HMAC verification:

In this example, we verify the HMAC of the snippet data using a secret key. If the HMAC is valid, the data is sent to the client; otherwise, an error is returned. This ensures that the data has not been tampered with and that it originates from a trusted source.

Implementing Digital Signature Verification

A digital signature is a cryptographic technique that provides data integrity, authenticity, and non-repudiation. It involves using a pair of keys: a private key for signing the data and a public key for verifying the signature. Unlike hashing and HMAC, which primarily ensure data integrity and authenticity, digital signatures also provide non-repudiation, meaning the signer cannot deny having signed the data.

Digital signatures are ideal in distributed systems where different services or clients interact. For instance, a client signs a request payload with a private key, and the server verifies it using the public key - ensuring the request truly came from the client and hasn’t been altered. This technique is common in financial APIs and cross-service authentication.

Here's how to modify the GET /api/admin/test endpoint to implement digital signature verification:

In this code, we verify the digital signature of an admin action using a public key. This ensures that the action was authorized by the holder of the corresponding private key, enhancing the security of admin operations. Digital signatures provide a higher level of security by ensuring that the data is not only intact and authentic but also that the signer cannot deny their involvement.

Conclusion and Next Steps

In this lesson, we explored the fundamentals of data integrity, including its importance, common threats, and methods to ensure it. We also implemented data integrity measures in an Express application, using techniques like hashing, HMAC verification, and digital signature verification. As you move on to the practice exercises, you'll have the opportunity to apply these concepts and reinforce your understanding. In the upcoming lessons, we'll continue to build on this foundation, exploring more advanced security topics. Keep up the great work, and let's continue to secure our applications! 🎉

Next Lesson: File Checksum Verification

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal