Historically and even today, poor memory has been a barrier to the usefulness of text-generating AI. As a recent piece in The Atlantic aptly puts it, even advanced generative text AI like ChatGPT has the memory of a goldfish. Each time the model generates a response, it only considers a very limited amount of text, which prevents it from, say, summarizing a book or reviewing a large coding project.
But Anthropic is trying to change that.
Today, the AI research startup announced that it has expanded the context window for Claude — its flagship text-generating AI model, still in preview — from 9,000 tokens to 100,000 tokens. Context window refers to the text that the model considers before generating additional text, while tokens represent plain text (for example, the word “fantastic” is split into the tokens “fan”, “bag”, and “tic”).
So what exactly is the meaning? Well, as mentioned earlier, models with small context windows tend to “forget” the content of even very recent conversations, leading them off topic. After about a few thousand words, they also forget their first instructions, instead extrapolating their behavior from the latest information within their context window rather than from the original request.
Given the benefits of large context windows, it’s not surprising that figuring out ways to extend them has become a major focus of AI labs like OpenAI, which have devoted an entire team to the problem. OpenAI’s GPT-4 held the previous crown in terms of context window sizes, weighing in at 32,000 tokens at the top – but the improved Claude API blows past that.
With a larger ‘memory’, Claude should be able to converse relatively coherently for hours – several days even – rather than minutes. And perhaps more importantly, it should be less likely to derail.
In a blog post, Anthropic touts the other benefits of Claude’s larger context window, including the model’s ability to process and analyze hundreds of pages of material. In addition to reading long texts, the upgraded Claude can help retrieve information from multiple documents or even a book, Anthropic says, by answering questions that require “synthesis of knowledge” in many parts of the text.
Anthropic lists a number of possible use cases:
- Processing, summarizing and explaining documents such as annual accounts or research documents
- Analyzing risks and opportunities for a company based on its annual reports
- Assessing the pros and cons of a piece of legislation
- Identification of risks, themes and different forms of argumentation in legal documents.
- Reading through hundreds of pages of developer documentation and answers to technical questions
- Rapid prototyping by contextualizing an entire codebase and intelligently building upon or modifying it
“The average person can read 100,000 tokens of text in about five hours, and then they may need significantly more time to process, remember and analyze that information,” continues Anthropic. “Claude can now do this in less than a minute. For example, we loaded the entire text of The Great Gatsby into Claude… and modified one line to say that Mr. Carraway was “a software engineer working on machine learning tools at Anthropic.” When we asked the model to see what was different, it answered with the correct answer within 22 seconds.”
Now, longer context windows don’t solve the other memory-related challenges around large language models. Claude, like most models in its class, cannot remember information from one session to the next. And unlike the human brain, it treats every bit of information as equally important, making it a not particularly reliable narrator. Some experts believe that solving these problems requires entirely new model architectures.
For now, however, Anthropic seems to be leading the way.