In the world of Large Language Models (LLMs), understanding context limits
is crucial. Whether you're working with GPT-3.5, GPT-4, Claude 3.5 Sonnet, or LLaMA, all of these models have a specific limit on how much text they can consider at one time when generating responses. This limit often influences how one designs prompts, and understanding it can significantly improve your interaction with LLMs. This lesson will clarify what context limits are, how they have been evolving, and practical methods to navigate these limitations.
A context limit
refers to the maximum amount of text an LLM can consider when generating a response. For example, as of the last update, GPT-3.5
has a context window of approximately 4096 tokens
.
This lesson, for example, is roughly 500
words and 650
tokens.
It's important to realize a token
isn't just a word, as you can see in the image above. It can be a word, part of a word, or punctuation. This means that the actual text a model can consider may be shorter than you initially anticipated, though as a general rule of thumb, it's okay to think of tokens as words.
