Welcome to the first lesson of our course, "Creating Images with Gemini's Nano Banana and Python". In this course, you will explore the fascinating world of AI-driven image generation using Google's Gemini API and its native image generation models, known as Nano Banana. Our journey begins with understanding how to set up the environment and generate a simple image. We'll also touch upon Python, a lightweight web framework that will help us integrate and display our generated images. This foundational lesson will set the stage for more advanced topics in subsequent units.
A quick note on naming: Google's older standalone Imagen models have been deprecated, and image generation is now unified directly inside the Gemini models. Throughout this course we will use gemini-3.1-flash-image (also called Nano Banana 2), which is optimized for fast, high-volume generation. A higher-end sibling, gemini-3-pro-image (Nano Banana Pro), exists for professional asset production, and you can swap the model string at any point to experiment with it.
Before we dive into generating images, it's crucial to set up our environment correctly. First, ensure you have access to the Gemini API by retrieving your API key. This key is essential for authenticating your requests to the API. You can set this key as an environment variable named GOOGLE_API_KEY. On CodeSignal, many libraries come pre-installed, but it's good practice to know how to install them on your own device. For this lesson, you'll need the google-genai library for accessing the Gemini API and the PIL library for image processing. You can install these using pip:
With the environment set up, the next step is to configure the Gemini API client. This involves initializing the client with your API key. The API key is retrieved from the environment variable GOOGLE_API_KEY. If the key is not found, the script will raise an error, prompting you to set it before proceeding. Here's how you can initialize the client:
This setup ensures that your application can securely communicate with the Gemini API.
Now, let's generate a simple image using the Gemini image model. We'll start by defining a prompt, which is a textual description of the image you want to create. In this example, the prompt is "A serene sunset over a mountain range." Unlike the older Imagen interface, Gemini's image models are multimodal: you call the same generate_content method you would use for text, and you ask for an image back by setting response_modalities to include "IMAGE". You can also specify an aspect ratio through an ImageConfig object. Here, we request one image with a 16:9 aspect ratio:
A few important differences from the older Imagen API are worth highlighting. There is no longer a number_of_images parameter: each generate_content call returns a single image, so to produce several images you simply call the model multiple times (you'll do exactly that in a later exercise). The contents parameter takes a list, which is what makes the model so flexible — later you can mix text and reference images in the same list. Finally, the aspect_ratio parameter lives inside ImageConfig and supports many options, such as 1:1 (square), 4:3 (fullscreen), 3:4 (portrait fullscreen), 16:9 (widescreen), and 9:16 (portrait), as well as wider ratios like 21:9. Each aspect ratio serves different purposes, from social media posts to cinematic landscapes.
Once the image is generated, the next step is to process and display it. The response from a Gemini image model is a list of parts. Some parts may contain text (for example, a short description the model decides to include), and the part we care about contains the image as binary data inside inline_data. We loop over the parts, find the one carrying image bytes, and process it using the PIL library. Here's how you can handle the image data and save it to a file:
This code snippet iterates over the response parts, opens the image carried by inline_data, and saves it to the specified directory. The PIL library is instrumental in handling image data efficiently.
In this example, the BytesIO class plays a critical role in processing the image. The API returns the image data as raw bytes inside part.inline_data.data, and BytesIO allows us to treat this raw binary data as if it were a file. By passing the byte stream to Image.open(), the PIL library can read and manipulate the image directly from memory. This approach avoids the need to write the raw data to disk before processing, resulting in faster and more memory-efficient image handling.
In this lesson, you learned how to set up your environment, configure the Gemini API client, and generate a simple image using a prompt. We also explored how to process and save the generated image from the response parts. This foundational knowledge will be crucial as you progress through the course. As you move on to the practice exercises, I encourage you to experiment with different prompts and configurations to see the diverse range of images you can create. This hands-on practice will solidify your understanding and prepare you for more advanced topics in the upcoming lessons.

