Mind Readings: Tool Handling and Non-Language Generative AI Tasks

Written by

AI, Artificial Intelligence, Generative AI

Mind Readings: Tool Handling and Non-Language Generative AI Tasks

In today’s episode, you’ll learn how generative AI is evolving to tackle non-language tasks more effectively through the power of tool handling. Discover how models like Llama 3.1 are integrating tools, similar to Batman’s utility belt, to access external functions and overcome their limitations. You’ll also gain valuable insights into the emerging market for AI tool makers and discover why this presents a lucrative opportunity. Don’t miss out on understanding this crucial shift in the AI landscape!

Mind Readings: Tool Handling and Non-Language Generative AI Tasks

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

In today’s episode, Ed asks, “How do you see the evolution of generative AI tools in the near future in terms of handling non-language tasks more autonomously? Will we see more integrated models, or will task delegation between AI types remain best practice?”

The direction almost every model has gone through is in tool handling. We see this explicitly in a model like Llama 3.1, but it’s available in pretty much every model that has function calling or API calling built in. So, ChatGPT supports it within their APIs. It’s built in; it’s available within custom GPTs. Google’s Gemini has it in the developer edition.

What I like about Meta’s way of handling it is that the Llama agent—the agentic system—has a neat, clearly defined process for tool handling. And rumor has it that tool handling will be baked straight into Llama 4.0.

Now, for the non-technical folks, because “tool handling” sounds odd. Tool handling means creating functions that a model knows how to use. For example, you might have a tool called “web search.” And if the conversation you’re having with a model like Llama—which you would use in Meta AI, for example, in their Instagram or WhatsApp or Threads—if the conversation heads in a direction where an AI says, “Hey, searching the web right now might be a good idea. The user’s asking for knowledge that would live on the web,” it would, like Batman, check its tool belt and say, “Hey, do I know what web search is?” And you’ve declared, “Yes, web search exists.” And then you would pick up the web search tool and use it, and it would talk to the web search tool.

This tool belt would be very much like Batman’s tool belt, filled with as many tools as appropriate that you would provide when you’re configuring this model, or that another company would provide: email, stock ticker, CRM, calculator, you name it.

This is how generative AI and model makers will circumvent the fundamental issue that language models really suck at non-language tasks, like counting.

If you use Google’s Gemini, the consumer version, you’ve seen tool handling—you can explicitly call it. You can say, “@YouTube” or “@Gmail” or “@Google Drive,” and invoke Gemini inside, invoke these tools inside Gemini. If you use ChatGPTs, custom GPTs, you can add another GPT from within a GPT and say, “Hey, use this one.”

Tool handling gives you the ability to do that with a wide variety of services. Think of it like browser tabs. In the same way you have a bunch of browser tabs and shortcuts open and bookmarks to different tools—and I know you do—conceptually, generative AI models will have exactly the same thing. Maybe they’ll be a little bit better about closing tabs they don’t need.

There are two major implications to this tool handling evolution. Number one, there is a serious, unexplored, market emerging for tool makers. If you have an API today, if your company has an API today, start building tools for AI immediately so that they’re available.

I would suggest standardizing in the Llama architecture because it is growing insanely fast. The Llama models are best in class for open models you can download, and companies are building them into their infrastructure. So, it’s rapidly becoming sort of the de facto for open models. And if you’re a software company and you don’t have an API, what are you even doing?

The limitations you see in AI today, to Ed’s question, are going to go away fast because tools—which are basically just plugins—if you’ve used Adobe Premiere or Adobe Photoshop, you’ve seen a plugin. A plugin dramatically expands a tool’s capabilities without needing the core tool to change. You don’t have to rewrite Photoshop to install a plugin.

Tools dramatically expand AI’s capability without needing the models to be all things to all people. They don’t need to be able to count. They can just say, “Hey, I’m going to call the calculator tool. Bring it in, count things for me. Good.”

Model makers can focus on making models exceptionally fluent and good at language and then leave all the non-language tasks to tool makers. So there is—you know how they always say the folks who make money during a gold rush are the folks who make picks and shovels? That’s what tools are. So, think about the things that you have available, that you would want to offer within an AI system, and figure out how to make tools about it, and you’re going to do okay.

That’s going to do it for today’s episode. Thanks for tuning in. Talk to you in the next one. If you enjoyed this video, please hit the like button. Subscribe to my channel if you haven’t already. And if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

Mind Readings: Tool Handling and Non-Language Generative AI Tasks

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Mind Readings: Why AI Can’t Do Your Slides Well

AI Book Review: First-Party Data Activation

Mind Readings: What’s Missing from AI Digital Clones

Mind Readings: Stop Teaching AI to Fail Up

Pin It on Pinterest