In previous lessons, you learned how large language models (LLMs) generate text one token at a time and how different model versions affect your results. Now, let's focus on a key concept that shapes what you can do with LLMs: the context window.
A context window
is the maximum amount of information (measured in tokens) a model can consider simultaneously. This includes your input (the prompt) and the model's output (the response). If you try to give the model more information than fits in its context window
, some of it will be ignored or cut off.
Understanding context windows
is important because it helps you design prompts that fit within these limits, ensuring the model can "see" everything it needs to give you a good answer.
Context windows
have changed a lot as LLMs have improved. Early models could only handle short prompts and responses, while newer models can work with much more information at once.
Here's a simple table showing how context window
sizes have grown over time:
As you can see, newer models can handle much larger context windows
. This means you can give them longer prompts or get longer responses, but there is always a limit.
Let's look at how the context window
shapes what you can do with an LLM.
The context window
is shared between your input and the model's output. For example, if a model has a 4,096-token context window
and your prompt is 2,000 tokens, it can only generate up to 2,096 tokens in its response. If your prompt is too long, the model's response will be shorter or cut off part of your input.
When you hit the context window
limit, you have several strategies to get the desired results. Let's go through them step by step, with examples.
Instead of pasting everything, focus on the most important parts.
Suppose you have a long email thread but only need a summary of the last conversation.
Prompt:
Explanation:
Including only the relevant emails saves space and ensures the model focuses on what matters.
If you can't fit everything, ask the model for a recommendation or a plan, not the full result.
Suppose you want to improve a lengthy document, but it's too big for the context window
.
Prompt:
Explanation:
Instead of asking for a complete rewrite, you ask for suggestions. This fits within the context window
and still gives you helpful feedback.
Break the task into smaller steps. Suppose you have a book chapter to summarize.
Prompt 1:
Prompt 2:
Prompt 3:
Explanation:
By summarizing in smaller chunks and then combining the results, you work around the context window
limit.
Of course, there are other strategies, including advanced iterative approaches. We will explore these approaches in the following courses of this path.
In this lesson, you learned what context windows
are, how they have changed over time, and how they affect the size of your inputs and outputs. You also saw practical strategies for working within these limits, including making your input shorter, asking for partial outputs, and using iterative summarization.
Next, you'll get a chance to practice these strategies yourself. You'll work with different prompt sizes and see how to get the best results from LLMs, even when you have a lot of information to handle.
