Mind Readings: Retrieval Augmented Generation vs. Fine Tuning in Generative AI

Written by

In today’s episode, you’ll gain valuable insights into two powerful techniques for enhancing generative AI models: retrieval augmented generation and fine-tuning. I’ll use a simple yet effective analogy of a librarian and a library to illustrate the key differences between these approaches and when to apply each one. Whether you’re dealing with limited data or aiming to tailor AI responses for specific business needs, you’ll walk away with a clear understanding of how to optimize your AI applications for superior performance.

Mind Readings: Retrieval Augmented Generation vs. Fine Tuning in Generative AI

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn: In today’s episode, let’s talk about two techniques — two technical techniques that people use to improve the performance of generative AI systems — like ChatGPT, for example.

Although these techniques are more geared for people running models like the GPT-4 model, but local ones within their servers, or using the more technical API versions of these tools.

Why would you do this? Why would this be of interest to you? Well, you may want to have a model with very specific, custom information, or you may be running a model in a controlled environment where security and data privacy are really important, and you just can’t use the public tools — national security, protected health care information, etc.

There are two general ways to change how a model behaves. One is called “retrieval augmented generation,” where you connect a database of your data to a model. The other is called “fine-tuning,” where you essentially give a model lots and lots of specific examples of what you want it to do, and retrain it — re-weight it. There’s a variety of techniques within that school.

When you hear these terms — when you hear people talking about these terms — they are talking about changing behaviors, typically for a specific application, like a chatbot on your website. You wouldn’t really do this for the consumer version of any of these tools. You wouldn’t do it, for —

Christopher Penn: — example, to change how you use ChatGPT on a day-to-day basis; there’s really no point in doing that.

What are these things? Here’s a way to think about it. If you think of a language model — again, like the ones that power ChatGPT or Anthropic’s Claude, etc. — as a library, it is a huge library. There’s a certain percentage of the model that, in technical terms, are called “retrieval heads.” Think of these as the librarians at the front desk in the library.

When you give them a prompt, they go back into the library, and they try and pull all the stuff necessary to answer your prompt — except instead of returning entire books, they return, like, one word at a time. They’re going through all these books trying to find the related words and bring back all the words to you.

Sometimes they get it wrong. Sometimes there’s — it’s called “hallucinating,” where they try to get the — they try to do what you tell them to do, but the information may not be in the library, or they may not know how to do it.

For example, suppose a really small library — a little, like, neighborhood library. In this fictional example, you go to the library, and you say, “Hey, I’d like a copy of The Joy of Cooking,” and the librarian goes back into those very small stacks, looks around, and says, “Don’t have that. What’s the closest thing I can find?” Pulls a book off the shelf, comes back, and says, “Here’s The Joy of Sex. That’s what you wanted, right?”

You’re like, “Oh, no, that is—” I mean, linguistically, yes, it’s close in title, but this is completely the wrong answer. Hallucination, typically, is something that is statistically correct — or statistically close — but factually wrong.

That’s a sort of humorous example. So, in situations like that, you may want to say, like, “Yeah, we need more books in the library.”

So you would use a technique like retrieval augmented generation and connect a database of your data. One of the rules of thumb with AI is: the more data you bring to the party, the less it’s going to hallucinate — the less it’s going to make things up.

So in retrieval augmented generation, it’s like saying, “The library doesn’t have books about our company, about — about Trust Insights. Let’s give the model — that’s connected to a database — all the stuff that Trust Insights has ever done: all our newsletters, our blog posts, our YouTube videos, our live streams.”

Suddenly, the next time you go back to the library and you see the librarian — “Hey, what do you got for — for stuff on Trust Insights?” — the librarian looks around and goes, “Well, there’s not much in the mainline, but wow, there’s this new — this new wing you just built filled with stuff about Trust Insights. I can go and get that stuff from that section of the library. And here’s the answer.” And you, as the prompter, are much happier.

Fine-tuning, on the other hand, is when the library doesn’t know how to do something. You go to the library — and this is a silly example — you go to the library, and you say, “Hey, I want some books, and I want some DVDs,” and the librarian’s like, “What’s a DVD? Never heard of it. We got books, but I don’t know what a DVD is.”

You’re like, “Okay, so let me — let me show you some examples. This is a DVD.” You pull one out and stuff like that. “It’s filed like a book, but it’s a little, brown disc, and it’s got a cover, and it’s got a barcode.”

You essentially teach the librarian, “Here’s what a DVD is, here’s how to find a DVD, here’s where they are in the library,” and so on and so forth. The librarian becomes smarter; the librarian now knows how to find books and how to find DVDs.

That’s an example of fine-tuning. You’re giving lots of examples to the model to say, “I want you to learn how to do this specific task really well.”

Now, when you fine-tune, depending on the method you use, you have the potential of causing some interesting chain reactions inside the model. It’d be like, teaching the librarian how to find DVDs, they forget what a book is. Like, “Whoops, that wasn’t supposed to happen.”

So that can happen, but these two techniques are important to understand the difference because a lot of people mix the two up in terms of what they’re good at. As a result, [they] pursue the wrong strategy when it comes to customizing AI within their company — at a — at a company level, at an organizational level, at an enterprise level.

Again, this does not apply to your personal usage of ChatGPT, or Gemini, or Claude. This applies to companies building applications based on AI models. In the big, public stuff, you have the ability to fine-tune, AKA its library and how to do something different, but to connect your own data is a little bit more work.

So although there are systems like Google’s Vertex that allow you to do both, —

Christopher Penn: — you need to know the difference because you need to know what kind of problem you’re trying to solve. Is generative AI in your company’s application failing because it just doesn’t have enough books in the library? Then you want retrieval augmented generation.

Is it failing because it doesn’t know how to answer customers’ requests that are specific to your business? If that’s the case, that’s a behavioral issue, and you need to use fine-tuning. You need to gather up 1,000, 2,000, 3,000 examples of a customer request and the appropriate answer, and you tune the model to say, “Here’s how to answer these types of customer requests.”

So knowing the difference between the two helps you set a better strategy for how you want to customize your use of AI in enterprise applications. The analogy of a librarian who either just can’t find the books in the back or doesn’t know what a DVD is really helps make tangible these two, very technical concepts.

That’s going to do it for today’s show. Thanks for tuning in! Talk to you on the next one. If you enjoyed this video, please hit the like button. Subscribe to my channel if you haven’t already. If you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.
♪ ♪

Mind Readings: Retrieval Augmented Generation vs. Fine Tuning in Generative AI

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 3

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 2

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 1

Almost Timely News: 🗞️ How To Get Started with Local AI Models (2025-04-20)

Pin It on Pinterest