Generating Prompt Data for the LLM Prediction Game

Introduction: The Role of Prompt Data in the Game

Welcome back! In the previous lesson, you learned about the LLM Prediction Game — how it works, how players interact with it, and how their guesses are scored. Now, we are ready to start building the data that powers the game.

In this lesson, you will learn how to generate the prompt data that the game uses to ask questions. This data is essential because it provides the daily questions and the structure that the game relies on. By the end of this lesson, you will know how to create a set of prompts, each with a question, some settings, and a few random elements to keep the game interesting.

Key Python Concepts for This Lesson

Before we dive in, let’s quickly remind ourselves of a few Python basics that will be useful in this lesson:

Lists: A way to store multiple items in a single variable.
Random Selection: The random module helps us pick random items from a list.
Saving Files: We use the open() function and the json module to write data to a file.

If you need a refresher on any of these, feel free to look at Python documentation. Otherwise, let’s move forward!

Building a Prompt Entry: What Goes Into Each Question

Each prompt entry in our game is a small package of information. Let’s break down what each entry contains:

llm: The name of the language model we are using (for example, "gpt-4o").
system_prompt: Instructions for the model (for example, "You are a helpful assistant. Your answers must only contain 10 words.").
user_question: The question that will be shown to the player.
breakpoints: A list of numbers that will be used later in the game logic.

Let’s look at a simple example of what one entry might look like in Python:

This dictionary holds all the information for one prompt. In the next steps, we’ll see how to generate many of these entries automatically.

Generating User Questions and Breakpoints

To make the game interesting, we want a variety of questions and some randomness in the breakpoints. Let’s build this step by step.

Step 1: Creating a List of Topics

First, we need a list of topics or nouns that our questions will be about. Here’s a small example:

Step 2: Making Questions from Topics

We can use these nouns to create questions by combining them with a template. For example:

Output:

This gives us a list of questions ready to use.

Step 3: Randomly Selecting Breakpoints

We want each prompt to have a set of breakpoints. Let’s create a list of possible breakpoint options and pick one at random for each entry.

Output (example):

As we set the seed to 0, each time you run this you will get the same selection. If you want to get a different set of breakpoints, be sure to not set the seed of the random library.

Step 4: Assembling the Full Entry

Now, let’s put it all together for one entry:

Output (example):

Saving the Prompt Data to a JSON File

Once we have many entries, we want to save them so the game can use them later. We use the json module for this.

Let’s say we want to create 3 entries and save them:

This code creates a list of entries and writes them to a file called data.json. The indent=2 part makes the file easy to read.

Note: On CodeSignal, the json module and file writing are ready to use, so you don’t need to install anything extra. If you run this on your own computer, make sure you have permission to write files in your working directory.

Summary and What’s Next

In this lesson, you learned how to:

Build the structure for a prompt entry.
Generate a list of user questions from a set of topics.
Randomly select breakpoints for each entry.
Assemble and save all the prompt data to a JSON file.

This prompt data is the foundation for the LLM Prediction Game. In the next set of exercises, you’ll get hands-on practice generating and saving prompt data yourself. This will prepare you for the next steps in building the game. Good luck!

Previous Lesson

Next Lesson: Selecting the Daily Prompt for the LLM Prediction Game

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal