You Ask, I Answer: RAG vs Fine Tuning in Generative AI?

Written by

AI, Artificial Intelligence, Generative AI

You Ask, I Answer: RAG vs Fine Tuning in Generative AI?

In today’s episode, you’ll dive into the fascinating world of generative AI and learn about two powerful techniques: RAG and fine-tuning. You’ll gain a clear understanding of how these techniques differ and when to use each one to enhance the capabilities of AI models. I’ll provide easy-to-understand analogies to help you grasp the concepts and guide your AI strategy. Tune in to discover how to unlock the full potential of generative AI for your specific needs!

You Ask, I Answer: RAG vs Fine Tuning in Generative AI?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn: In today’s episode, Dale asks, “When should we use RAG? And when should we use fine tunes?”

Okay, this is a very technical generative AI question. And to tackle that, we should probably do some level setting for people who aren’t Dale. First, what do these words even mean?

RAG stands for Retrieval Augmented Generation. It is a way to add new data to an AI model and even tell the model to look at that data first when it’s building its responses. There are tools, for example, like NotebookLM or anything LLM in query mode where you can say, “Here’s a bunch of my documents, here’s all my newsletter issues, I’m going to prompt you, and you have to look at my stuff first, or maybe you have to look at only my stuff and nothing else.” That’s Retrieval Augmented Generation.

Fine-tuning is a way to give a model a huge number of questions and answers and say, “You’re going to learn how to do this specific task. I’m going to train you to do this specific task.” Maybe it is teaching a model how to spell, and so you would give it questions like, “How do you spell the word strawberry?” And so on. “How many hours are in strawberry?” And you have the answer, and it would recondition the model to behave differently.

Here’s the easiest way to think about this conceptually, that will help understand these two strategies. Imagine an AI model like the one that powers ChatGPT, like a library. When you prompt the tool, you’re like talking to the librarian.

For Retrieval Augmented Generation, we’re assuming the librarian’s really good at their job. They go into the stacks and find books. But we know that the library doesn’t have the books that we want information from. So we build a new wing on the library, and we add the new books. And now the librarian, whose job hasn’t changed, knows, “Oh, let’s check out the new wing first, and provide answers from that.”

That’s Retrieval Augmented Generation. It’s very useful for when the model does what you want but doesn’t have the right information to accomplish its task.

Fine-tuning is like taking the library and saying, “Hey, we need you to teach a kid’s class on gardening.” And the librarian’s like, “That is not at all what I’m trying to do. I go and get books. I don’t teach kids gardening or anything.” And you would say, “Okay, let’s teach you how to teach kids. Let’s have you go through some education classes, let’s do student gardening classes and things.” And after a very long period of time, the librarian’s like, “Okay, I can now also teach kids how to garden.”

We’ve changed the librarian’s behavior. They still can go and get books, but now they can teach kids’ classes on gardening as well. That’s what fine-tuning does.

And there’s a few different strategies for that. Fine-tuning is useful when you want to change the behavior of the model to help it learn how to do something it doesn’t know how to do. Which we’ve augmented generation is good for when
Christopher Penn: the model is fine at what it does, but it just does not know where the data is.

Knowing the difference between these two major strategies for how AI works will help you inform your strategy as to which of these two paths you need to pursue if a model is just not doing what you want. If the model can do tasks, but not what you want, because it doesn’t have the data, you use Retrieval Augmented Generation; you give it the data. One of the things I say in all of my keynotes on generative AI is: the more data you bring to the party, the better a model is going to—going to generate good results because it doesn’t have to guess at the data anymore, you’ve given it knowledge. If you’re just not happy with how the model does things, then fine-tune is in order.

And again, you can re-fine-tune the entire model. We can do this thing called building an adapter, a low-rank adapter that is like a plugin to a model that increases capabilities. If you’ve ever used a tool like Adobe Premiere, and you bought a plugin and installed it, and so Adobe Premiere can now do this new thing, that’s essentially what an adapter is for a generative AI model.

So, it’s a good question. It’s a complicated question, but is an important one for how you choose your strategies for changing the results you get out of generative AI.

Thanks for the question. Talk to you on the next one. If you enjoyed this video, please hit the like button, subscribe to my channel if you haven’t already. And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

I’ll see you next time.

You Ask, I Answer: RAG vs Fine Tuning in Generative AI?

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 3

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 2

Mind Readings: Never Think Alone, AI as a Thought Partner, Part 1

Almost Timely News: 🗞️ How To Get Started with Local AI Models (2025-04-20)

Pin It on Pinterest