Introduction to Google Cloud Storage and Object Storage

Welcome back! In the previous lessons, you learned how to launch and secure VM instances on Google Cloud Platform (GCP), which are the foundation of compute in GCP. Now, it is time to explore another essential part of cloud infrastructure: storage. In this lesson, you will learn about Google Cloud Storage (GCS), GCP's highly scalable and durable object storage service, and how to use it to store, organize, and protect your data.

Google Cloud Storage is designed for storing and retrieving any amount of data at any time. Unlike traditional file systems, which organize data in folders and files on disks, GCS uses a flat structure based on buckets and objects. Each object in GCS is stored in a bucket and is identified by a unique object name. This approach makes GCS ideal for storing large amounts of unstructured data, such as backups, images, logs, and even static website content.

Before we dive into the code, let's clarify some key GCS concepts:

  • A bucket is a container for your data. Every object you store in GCS must be inside a bucket.
  • An object is the actual piece of data you store, such as a file or a block of text.
  • An object name is the unique identifier for an object within a bucket.

To interact with GCS from Python, you will use the google-cloud-storage library. On CodeSignal, this library is pre-installed, so you do not need to worry about setup here. However, in your own environment, you would install it using pip install google-cloud-storage.

By the end of this lesson, you will be able to create GCS buckets, upload and manage objects, enable versioning for data protection, and retrieve specific versions of your files — all using Python code.

Creating and Configuring GCS Buckets with Versioning

Let's start by creating a GCS bucket. When creating a bucket, you must choose a unique name across all of GCP. Bucket names must be globally unique, use only lowercase letters, numbers, dashes (-), underscores (_), and dots (.), and must not contain spaces or uppercase letters. It is good practice to include a timestamp or a unique identifier in your bucket name to avoid conflicts.

Here is a function that creates a new GCS bucket and enables object versioning on it:

When you call this function with a unique bucket name and your GCP project ID, it will create the bucket and immediately enable versioning. Versioning in GCS keeps multiple versions (called "generations") of an object in the same bucket, protecting you from accidental deletions or overwrites. This is especially important for compliance and data recovery.

For example, if you run:

You might see output like:

This confirms that your bucket was created and is ready for use with versioning enabled.

If you need to enable versioning on an existing bucket, you can use the following approach:

This function retrieves an existing bucket and enables versioning by setting to and then calling to apply the change.

Uploading Files and Managing Objects

Now that you have a bucket, let's upload a file to it. In GCS, each object is stored with an object name, which is usually the file name or a path-like string. You can upload files from your local system or create objects directly in your code.

Here is a function that uploads a file to your GCS bucket:

Suppose you have a file called sample.txt with the following content:

When you call upload_file(bucket_name, 'sample.txt', project_id), the file is uploaded to your bucket with the object name sample.txt.

While GCS uses a flat namespace, you can simulate folder structures by including slashes (/) in object names. For example, an object named documents/report.txt can be used to represent a file within a "documents" folder. This naming convention is commonly used to organize objects within buckets and makes it easier to manage large numbers of objects.

You can also create objects programmatically with custom content. For example, you might want to create several versions of the same object to see how versioning works. The following function does exactly that:

Understanding and Working with Object Versions

With versioning enabled, every time you upload a new object with the same name, GCS creates a new generation instead of overwriting the old one. Each version is identified by a unique generation number. The most recent version is called the "live" version. When listing with versions=True, you can identify the live version using the API's is_live/isLive flag when it is available, or by computing the live generation as the maximum generation value for the object and comparing each version's generation to that value. The example below uses the max-generation approach.

Let's see how you can list all versions (generations) of an object:

If you call this function after creating multiple versions, you might get output like:

And the details for each version might look like this:

Listing and Monitoring Bucket Contents

To keep track of what is stored in your bucket, you can list all objects and their metadata. This is useful for monitoring storage usage and understanding how your data is organized.

Here is a function that lists all objects in a bucket and calculates the total storage used (for the latest versions):

After uploading and versioning your objects, you can call this function to see what is in your bucket. The output might look like:

This tells you how many objects are present and the total size they occupy. Remember, this only lists the latest version of each object. To see all versions, you need to use the version listing function shown earlier.

Downloading Specific Object Versions

One of the most powerful features of GCS versioning is the ability to download any version (generation) of an object, not just the latest one. This is useful for restoring previous data or auditing changes.

Here is a function that downloads a specific version (generation) of an object:

To use this function, you need the bucket name, the object name, the generation number you want to retrieve, the name you want to save the file as, and your project ID. For example, if you want to download the first version of sample.txt, you would call:

This will save the specified version as sample_v1.txt on your local system. If you do not specify a generation, GCS will return the latest version by default.

In practice, this feature is invaluable for recovering from mistakes or for compliance, where you may need to prove what data existed at a certain point in time. Always handle errors gracefully, such as when a generation does not exist or the object has been deleted.

Handling Object Deletion with Versioning

When versioning is enabled, deleting an object does not immediately remove it from your bucket. Instead, the live version becomes noncurrent, and all versions are retained until explicitly deleted. This provides an additional layer of protection against accidental data loss.

To permanently delete a specific version of an object, you can use the generation parameter:

This function deletes a specific version of an object by specifying its generation number. This is useful when you need to clean up old versions to reduce storage costs or comply with data retention policies.

Summary and Practice Preparation

In this lesson, you learned the fundamentals of Google Cloud Storage and how it fits into the GCP ecosystem alongside Compute Engine. You now know how to create GCS buckets, enable versioning for data protection, upload and manage objects, retrieve specific versions of your files, and handle object deletion with versioning enabled. You also saw how to monitor your bucket's contents and storage usage.

GCS's versioning feature is a powerful tool for protecting your data from accidental loss or unwanted changes. Combined with Compute Engine, GCS allows you to build robust, scalable, and secure cloud solutions.

In the next set of exercises, you will get hands-on practice with these concepts. You will automate GCS bucket creation, upload files, manage object versions, and retrieve historical data — all using Python and the google-cloud-storage library. This will prepare you to build reliable cloud storage workflows and integrate GCS into your own projects.

Get ready to put your new skills to the test and deepen your understanding of GCP storage automation!

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal