Few-Shot Learning Optimizers Overview

Welcome to our lesson on Automatic Few-Shot Learning with DSPy! In the previous lesson, we introduced the concept of optimization in DSPy and explored the three main categories of optimizers: Few-Shot Learning, Instruction Optimization, and Finetuning. Now, we'll dive deeper into the first category: Few-Shot Learning optimizers.

As you may recall, few-shot learning is a technique where we provide the language model with examples of the task before asking it to solve a new instance. This approach helps the model understand what we're asking for and improves its performance. While you could manually select and include examples in your prompts, DSPy's few-shot optimizers automate this process, finding the most effective examples to include.

In this lesson, we'll explore four different few-shot optimizers:

  1. LabeledFewShot: The simplest approach, which randomly selects examples from your training data.
  2. BootstrapFewShot: A more advanced approach that generates new examples using your program itself.
  3. BootstrapFewShotWithRandomSearch: Extends BootstrapFewShot by exploring multiple sets of examples to find the best combination.
  4. KNNFewShot: A retrieval-based approach that selects examples most similar to the current input.

Each optimizer has its strengths and is suited for different scenarios. If you have very few examples (around 10), BootstrapFewShot is a good starting point. With more data (50+ examples), BootstrapFewShotWithRandomSearch can yield better results. KNNFewShot is particularly useful when the relevance of examples varies significantly depending on the input.

Let's explore each of these optimizers in detail, with practical examples to help you understand how to implement them in your own projects.

LabeledFewShot: Basic Example Selection

The simplest few-shot optimizer in DSPy is LabeledFewShot. This optimizer takes examples from your training data and includes them in the prompt sent to the language model. It's straightforward but effective, especially when you have high-quality labeled examples.

Here's how to implement LabeledFewShot:

In this example, we create a LabeledFewShot optimizer that will include 8 examples in each prompt. The k parameter controls the number of examples, and you can adjust it based on your needs and the context window size of your language model.

When you call compile(), the optimizer randomly selects k examples from your training set and incorporates them into the prompts of your DSPy program. The result is a new program (your_dspy_program_compiled) that includes these examples in its prompts. Since LabeledFewShot selects examples randomly, the examples used may differ between runs unless you fix the random seed. If reproducibility is important—for example, when comparing results or debugging—you may want to set a random seed before compiling the program to ensure consistency across runs.

Here's what happens behind the scenes:

  1. The optimizer selects k random examples from your training set.
  2. For each module in your program, it formats these examples according to the module's signature.
  3. It prepends these formatted examples to the prompt template of each module.
  4. It returns a new program with the updated prompts.

The main advantage of LabeledFewShot is its simplicity. It doesn't require a metric function and doesn't perform any complex optimization. However, this simplicity also means it doesn't adapt the examples to the specific input or try to find the most effective examples. It's a good baseline approach, especially when you have a small but high-quality training set.

BootstrapFewShot: Self-Generated Examples

While LabeledFewShot simply uses examples from your training data, BootstrapFewShot goes a step further by generating additional examples using your program itself. This is particularly useful when you have limited labeled data or when you want to create more diverse examples.

Here's how to implement BootstrapFewShot:

In this example, we create a BootstrapFewShot optimizer with several parameters:

  • metric: A function that evaluates the quality of generated examples.
  • max_bootstrapped_demos: The maximum number of examples to generate (4 in this case).
  • max_labeled_demos: The maximum number of examples to use from the training set (16 in this case).
  • max_rounds: The number of rounds of bootstrapping to perform.
  • max_errors: The maximum number of errors allowed before stopping the bootstrapping process.

The bootstrapping process works as follows:

  1. The optimizer selects examples from your training set (up to max_labeled_demos).
  2. It uses your program (or a specified "teacher" program) to generate complete demonstrations for these examples.
  3. It evaluates the generated demonstrations using your metric function.
  4. It includes only the successful demonstrations (those that pass the metric) in the compiled program.

You can also use a different language model for the teacher by specifying it in the teacher_settings:

This is particularly useful when you have access to a more powerful model (like GPT-4) that can generate high-quality examples but want to optimize a program that will run on a smaller, more efficient model.

The main advantage of BootstrapFewShot over LabeledFewShot is that it can generate new, high-quality examples beyond what's in your training set. This can lead to better performance, especially when your training data is limited.

BootstrapFewShotWithRandomSearch: Finding Optimal Example Sets

Building on BootstrapFewShot, the BootstrapFewShotWithRandomSearch optimizer adds another layer of optimization by exploring multiple sets of examples to find the best combination. This is particularly useful when you have a larger training set and want to find the most effective subset of examples.

Here's how to implement BootstrapFewShotWithRandomSearch:

In this example, we configure the optimizer with several parameters:

  • max_bootstrapped_demos: The maximum number of examples to generate (4 in this case).
  • max_labeled_demos: The maximum number of examples to use from the training set (4 in this case).
  • num_candidate_programs: The number of random programs to evaluate (10 in this case).
  • num_threads: The number of threads to use for parallel evaluation (4 in this case).

The random search process works as follows:

  1. The optimizer creates multiple candidate programs, each with a different set of examples.
  2. These candidates include the uncompiled program, a program optimized with LabeledFewShot, a program optimized with BootstrapFewShot using unshuffled examples, and num_candidate_programs programs optimized with BootstrapFewShot using randomized example sets.
  3. It evaluates all these candidates on your validation set using your metric function.
  4. It returns the candidate program that performs best according to your metric.

The parallelization through num_threads can significantly speed up the optimization process, especially when evaluating many candidate programs.

The main advantage of BootstrapFewShotWithRandomSearch over BootstrapFewShot is that it explores a larger space of possible example combinations, increasing the chances of finding a particularly effective set. However, this comes at the cost of increased computation time, as it needs to evaluate multiple candidate programs.

KNNFewShot: Context-Aware Example Selection

The final few-shot optimizer we'll explore is KNNFewShot, which takes a different approach by selecting examples based on their similarity to the current input. This is particularly useful when the relevance of examples varies significantly depending on the input.

Here's how to implement KNNFewShot:

In this example, we first create an Embedder using a pre-trained SentenceTransformer model. This embedder converts text into vector representations that capture semantic meaning. Then, we create a KNNFewShot optimizer with several parameters:

  • k: The number of nearest neighbors (examples) to include in each prompt (3 in this case).
  • trainset: Your training set of examples.
  • vectorizer: The embedder that converts text to vectors for similarity comparison.

The KNN (k-nearest neighbors) process works as follows:

  1. When your program receives a new input, the optimizer converts it to a vector using the embedder.
  2. It compares this vector to the vectors of all examples in your training set.
  3. It selects the k examples that are most similar to the current input.
  4. It includes these examples in the prompt sent to the language model.

The main advantage of KNNFewShot over the other optimizers is that it dynamically selects examples based on their relevance to the current input. This can lead to better performance, especially when different types of inputs benefit from different types of examples. However, it requires an additional step of computing embeddings, which adds some computational overhead.

Summary and Practice Preview

In this lesson, we've explored four different few-shot optimizers in DSPy:

  1. LabeledFewShot: The simplest approach, which randomly selects examples from your training data.
  2. BootstrapFewShot: A more advanced approach that generates new examples using your program itself.
  3. BootstrapFewShotWithRandomSearch: Extends BootstrapFewShot by exploring multiple sets of examples to find the best combination.
  4. KNNFewShot: A retrieval-based approach that selects examples most similar to the current input.

When deciding which optimizer to use, consider the following guidelines:

  • If you have very few examples (around 10), start with BootstrapFewShot.
  • If you have more data (50+ examples), try BootstrapFewShotWithRandomSearch.
  • If different inputs benefit from different types of examples, consider KNNFewShot.
  • If you're just getting started and want a simple baseline, LabeledFewShot is a good choice.

In the practice exercises that follow, you'll get hands-on experience with these optimizers. You'll implement each one, observe how they affect your program's performance, and develop an intuition for when to use each approach.

In the next lesson, we'll explore another category of optimizers: Automatic Instruction Optimization. These optimizers focus on improving the natural language instructions in your prompts, complementing the few-shot learning techniques we've covered here.

Remember, optimization is an iterative process. After applying these few-shot optimizers, you might want to try different configurations, combine them with other optimization techniques, or revisit your program design to further improve performance. The tools and techniques you've learned in this lesson provide a solid foundation for this iterative improvement process.

Sign up
Join the 1M+ learners on CodeSignal
Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal