Understanding Context Windows: Managing Input and Output Sizes

Introduction: Why Context Windows Matter

In previous lessons, you learned how large language models (LLMs) generate text one token at a time and how different model versions affect your results. Now, let's focus on a key concept that shapes what you can do with LLMs: the context window.

A context window is the maximum amount of information (measured in tokens) a model can consider simultaneously. This includes your input (the prompt) and the model's output (the response). If you try to give the model more information than fits in its context window, some of it will be ignored or cut off.

Understanding context windows is important because it helps you design prompts that fit within these limits, ensuring the model can "see" everything it needs to give you a good answer.

Historical Evolution of Context Limits

Context windows have changed a lot as LLMs have improved. Early models could only handle short prompts and responses, while newer models can work with much more information at once.

Here's a simple table showing how context window sizes have grown over time:

Model Name	Release Year	Context Window Size (tokens)
GPT-2	2019	1,024
GPT-3	2020	2,048
GPT-3.5	2022	4,096
GPT-4 (8k)	2023	8,192
GPT-4 (32k)	2023	32,768
Claude 2	2023	100,000

As you can see, newer models can handle much larger context windows. This means you can give them longer prompts or get longer responses, but there is always a limit.

How Context Windows Affect Input and Output

Let's look at how the context window shapes what you can do with an LLM.

The context window is shared between your input and the model's output. For example, if a model has a 4,096-token context window and your prompt is 2,000 tokens, it can only generate up to 2,096 tokens in its response. If your prompt is too long, the model's response will be shorter or cut off part of your input.

Strategies: Make Inputs Shorter and More Relevant

When you hit the context window limit, you have several strategies to get the desired results. Let's go through them step by step, with examples.

Instead of pasting everything, focus on the most important parts.

Suppose you have a long email thread but only need a summary of the last conversation.

Prompt:

Explanation:
Including only the relevant emails saves space and ensures the model focuses on what matters.

Strategies: Request Shorter or Partial Outputs

If you can't fit everything, ask the model for a recommendation or a plan, not the full result.

Suppose you want to improve a lengthy document, but it's too big for the context window.

Prompt:

Explanation:
Instead of asking for a complete rewrite, you ask for suggestions. This fits within the context window and still gives you helpful feedback.

Strategies: Break into Parts

Break the task into smaller steps. Suppose you have a book chapter to summarize.

Prompt 1:

Prompt 2:

Prompt 3:

Explanation:
By summarizing in smaller chunks and then combining the results, you work around the context window limit.

Of course, there are other strategies, including advanced iterative approaches. We will explore these approaches in the following courses of this path.

Summary and What's Next

In this lesson, you learned what context windows are, how they have changed over time, and how they affect the size of your inputs and outputs. You also saw practical strategies for working within these limits, including making your input shorter, asking for partial outputs, and using iterative summarization.

Next, you'll get a chance to practice these strategies yourself. You'll work with different prompt sizes and see how to get the best results from LLMs, even when you have a lot of information to handle.

Previous Lesson

Next Lesson: Editing Messages to Refine LLM Responses

Join the 1M+ learners on CodeSignal

Be a part of our community of 1M+ users who develop and demonstrate their skills on CodeSignal