Mind Readings: Why Large Context Generative AI Models Matter

Written by

AI, Artificial Intelligence, Generative AI

Mind Readings: Why Large Context Generative AI Models Matter

In today’s episode, you’ll delve into the groundbreaking world of Google’s Gemini 1.5 and its massive 2-million-token context window. Discover how this expanded “short-term memory” for AI models is revolutionizing what’s possible with generative AI. You’ll gain a clear understanding of what this technological leap means for you and your business, unlocking a world of possibilities for data analysis, content creation, and more. Tune in to grasp the implications of this game-changing advancement in AI and explore how you can leverage it to your advantage.

Mind Readings: Why Large Context Generative AI Models Matter

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode, let’s talk about Google’s most recent release of the Gemini model. The most recent incarnation is Gemini 1.5, with a 2-million-token context window. This version of Gemini — the large language model, which is a competitor to ChatGPT — has the same basic operations as the previous version, with one really big change: it now comes with a 2-million-token context window. That is huge in multiple senses of the word.

For the non-nerds, the non-technical folks, AI models — particularly those used by language models like ChatGPT, Google Gemini, Anthropic, Claude — have two kinds of memory: there’s latent space and context windows. Think of these as long-term and short-term memory. In the keynotes that I deliver, I refer to them as “the library and the librarian.” There’s a library where it has a huge store of knowledge in a relatively static form, like books. And then there’s a librarian.

You talk to the librarian, and it goes into the library, goes into the stacks, and finds the books you want. The difference with language models is, instead of entire books, the librarian goes and gets one word at a time once you ask them to do something.

So, our ability to get good results out of language models is contingent on asking the librarian very specifically what we want — maybe giving the librarian a lot of detailed instructions. If you tell the librarian, “Go get me a book,” they will, and it won’t be what you want. If you tell the librarian, “I want a copy of AI for Marketers, Third Edition, by Christopher Penn,” you’re probably going to get a better result.

In the early days of generative AI, the librarian would forget what you were asking for fairly frequently, and you’d have to remind the librarian a lot what you were talking about. Their short-term memory was really short. When ChatGPT first came out, it had a context window of — this is 4,000, I think it was 8,000 — 8,000 tokens, or about 5,000 words. Because the way models work, they don’t even work on full words, they work on pieces of words.

So, if your conversation — if you’re talking about a blog post, and, you know, it’s an 800-word blog post or 1,000-word blog post, and you go through a couple of revisions — by the time you get to like revision four, the model, the librarian, has started to forget the beginning of the conversation. You have to remind it, “Oh, we’re talking about writing a blog post about this.”

When you use today’s modern models like the paid version of ChatGPT, the paid version of Gemini, etc., their memories are much longer. You can talk about that same blog post, and it won’t forget what you’re talking about for quite some time.

Now, this most recent reveal from Google means that Gemini can hold a conversation in memory that is about 1.5 million words long. To give you some context, the book series A Song of Ice and Fire by George R.R. Martin, which is the basis for Game of Thrones, that’s like a million and a half words — that could fit in Gemini’s short-term memory model now. This book by Ann Handley is about 75,000 words; it’ll hold about 20 of these now in its memory. That is a crazy amount of short-term memory.

Can you imagine having a conversation that was 20 books long in a few minutes? No. That’s beyond what human beings are capable of. That’s why machines scale so much better.

What does this mean? What do we do with this information?

Well, it opens up ever larger possibilities for people in their usage of generative AI. For example, let’s say you issued an RFP, and you get all these RFP responses back, and every agency that’s responded to this, you know, put together an 83-page RFP response, and, you know, the first 10 pages are, “Oh, we’re, you know, we have won these awards, we work with these brands, and we’ve got a lake in front of our building and blah, blah, blah,” right? Not a lot of value there. And you get 10 responses, 20 responses, 30 responses.

Well, now, instead of having to, you know, manually comb through all these responses, you can have generative AI build a scoring rubric for the requirements that you outlined in the RFP. Then you load all 20 or 30 RFP responses into the model and say, “Score them all.” Because of the size of memory it has available, it can do that. It will do that in a very, very capable way. It will save you an enormous amount of time. You can say, “I want to know about these five things. Go through all 20 responses and score the responses on these five things.”

If you’re analyzing data, you can have crazy amounts of data in memory. You can have — you could put — I was at an event recently, was speaking at the Australian Food and Grocery Council’s annual conference, and I took the entire federal budget for the country of Australia and stuffed it in Gemini and said, “Okay, what are the implications for this association, based on what was in the budget and what the association does?” Right on stage, people can see, “Wow, this is exactly how this budget is going to improve my business or possibly harm my business.”

If you’re writing a book, you can store the entire thing, plus multiple drafts, plus reference materials, in memory as you write. You can get advice along the way. “How do I rewrite this? How do I outline this? Rearrange this part for me.” The models can keep track of stuff. As long as it’s structured well, they can keep track and recall and move things around.

One of my favorite use cases is to take a transcript, like the closed captions that are coming along with this video, and have a model outline what I talked about, and then say, “Okay, well, rearrange the pieces of this outline in a more logical fashion, that maybe has a better narrative flow.” And it will do that. Then I say, “Okay, now give me back my words.” So, not writing something, just rearranging my words according to this outline. Because these models have such sophisticated, large memories now, they can do something like that.

And this roadmap that Google has said is to get to a functional 10-million-token context window. At that size, we’re talking about 10 copies of Game of Thrones, 17 copies of Lord of the Rings, 100 of Ann’s book, right? That is — that is even more mind-boggling just how much knowledge these things will be able to hold onto.

Think about an example: like, if you wanted to understand what long COVID was really about, you might download 50, 60, 100 clinical papers, peer-reviewed research in detail, and get — get all that in. That might be a couple million words. But then, because the models can keep that all in memory, and you can ask questions across different papers, you might be able to get really good conclusions backed in — in reality, right? So you’re not going to hallucinate if you’re asking questions about the data you’ve already got.

In fact, there are tools like Noteable Plugin for Language Model Applications (LM) that do exactly that: they won’t write anything, but they will answer questions based on the data you give them.

So, start thinking about the data that you have now, how big it is, and what you could do with that data if you had a model with access to a large short-term memory — which you do — and what you would do with that model that you can’t do today. That’s the key — the key takeaway: what would you do with the data you have that you’re not doing today because there’s just too much of it, and you can’t keep it in mind, but a machine can?

That’s going to do it for this episode. Thanks for tuning in! Talk to you next time. If you enjoyed this video, please hit the like button. Subscribe to my channel if you haven’t already. And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.
♪ ♪

Mind Readings: Why Large Context Generative AI Models Matter

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Mind Readings: 4Us of Generative AI Literacy

Mind Readings: New Is the Skill of the Future in an AI World

Mind Readings: Blind Trust in AI Overviews

Almost Timely News: 🗞️ Generative AI Strategy, Part 4 – Deductive and Inductive Reasoning (2025-04-06)

Pin It on Pinterest