Almost Timely News: đď¸ You Can’t Reasonably Measure AI Brand Strength (2024-08-18) :: View in Browser
đ Register for my upcoming free webinar, Generative AI for Hospitality!
Content Authenticity Statement
90% of this week’s newsletter was generated by me, the human. You’ll see some AI outputs in the opening section. Learn why this kind of disclosure is a good idea and might be required for anyone doing business in any capacity with the EU in the near future.
Watch This Newsletter On YouTube đş
Click here for the video đş version of this newsletter on YouTube Âť
Click here for an MP3 audio đ§ only version Âť
What’s On My Mind: You Can’t Reasonably Measure AI Brand Strength
Be very cautious of AI search results tools.
Not that itâs a bad idea to think about how your brand is positioned within language models like GPT-4o (the model that powers ChatGPT today), Claude, or Gemini, but because itâs a nearly impossible task to make that determination.
If You Donât Feel Like Getting Into The Weeds
Todayâs newsletter is going to be very technical so hereâs the TL;DR:
Using âAI optimizationâ tools to determine your âAI SEOâ is a bit of a foolâs errand because AI models do not work the way search engines do. If you want to influence AI models, the best thing you can do is publish very high volumes of very high quality content – in other words, traditional SEO. The main difference is that instead of links as the gold standard of measurement, itâs appearances around the public web.
That’s it. Go home. See you next week.
…now, if youâd like to stick around for the detailed explanation, grab your libation of choice, find a comfortable position, and letâs dig into the weeds.
Part 1: How AI Models Work
To learn why trying to measure our findability in AI models is a bit of a foolâs errand, we first need to know how these things work. As a reminder, a model is the engine of todayâs generative AI tools. Services like ChatGPT use models like GPT-4omni under the hood.
When these models are built by deep-pocketed technology companies, theyâre trained on massive amounts of text data, 7-10 petabytes of text, mostly extracted from the public web. To give you a rough sense of how much text that is, all of William Shakespeareâs plays fit inside roughly 5.4 megabytes. 7 petabytes is 1.3 billion copies of Shakespeareâs works. If each volume is about 3 inches tall, 7 petabytes is a stack of Shakespeareâs books that circles the equator twice.
Thatâs how much text you need to train todayâs models. Itâs also why only a few companies can make state of the art foundation models – it costs millions of dollars in time and hardware just to digest that amount of text.
The first part of the process is breaking down words into tokens, little three and four letter fragments of words.
Then, every token is turned into a number.
Then, every tokenâs position is compared to every other tokenâs position in a massive statistical database. Thatâs what is under the hood in every large language model: a big pile of statistics. The way I recommend people think about this conceptually is to think about word clouds. In any given piece of text, whatâs the word cloud for EVERY WORD in that text? Thatâs essentially whatâs going on, conceptually. (Mathematically, thatâs not true but no one wants to do linear algebra in a newsletter)
When we ask a model something, using a prompt, the model breaks down our question into tokens, then goes into its library and fetches all the word clouds it can find that match those tokens.
For example, suppose you looked at all the word clouds on the Internet around the term Starbucks. You would see words like coffee and tea. You would also see thousands of other words. Thatâs what is inside a large language model. When we give a model a prompt, what comes back are the most related words (actually tokens, but words is fine for this conceptually) that the model makerâs software has seen all over the Internet.
This is why attempting to measure a brandâs presence in a language model is a bit of a foolâs errand. Itâs not like regular SEO, where you can see how web pages relate to each other through links. Instead, every part of every word is related to every part of every other word, phrase, sentence, paragraph, and document.
Part 2: How People Behave
The technical obstacles to measurement are tough enough to deal with. However, thereâs an additional wrinkle. In old school SEO, we pay attention to things like words. Youâll still see things like keyword lists floating around, and keyword planning tools. Those are fine when youâre measuring links and checking the text of words in links as a way to determine what a page is authoritative about.
All that goes out the window in generative AI, because thatâs not how people talk to AI. People donât type in things like:
âTrust Insights consultingâ or âmanagement consulting firmâ
They ask real questions and have real conversations: âGive me a list of recommended management consulting firms that specialize in healthcare in the Boston area,â and âWho is Trust Insights and what do they do?â And âWhatâs a good recipe for guacamole for people who are allergic to citrus?â
You might think, âOh, then we can just ask common questions and measure how often we come up in responses, right?â
Except, going back to part 1, every word gets tokenized when we prompt a model. The more words, the more tokens, and the more specific and clear the response⌠and the harder it is to predict what people themselves are going to type into an AI model, and the more likely it is the model is going to return different results.
Hereâs an example. Suppose I type in:
âWho is Trust Insights and what do they do?â
Iâll get a response from Googleâs Gemini model that looks like this:
Now, suppose I type:
âWhat does Trust Insights do and who are they?â
Before we look at the results, it should be clear that these are basically the same question. We should get basically the same answer, right?
Nope.
These are wildly different answers. Both are factually correct, but you could not easily predict what the answers would be. If you’re trying to measure even whether or not your brand is in an AI model, you can see how complicated this could get.
Why does this happen? Remember that the way these models work, they take into account the order in which words appear when theyâre first trained.
The order of words matters. In search engines, in traditional search, word order matters very little because itâs a much different language technology. In language models, word order and even seemingly unimportant words can trigger different token activations and lead to very different results.
Imagine how problematic this becomes when an informational query like “Recommend some top consulting firms in the Boston area” and “Recommend some Boston area top consulting firms” yields totally different results. Good luck measuring your brand presence on every possible way of wording informational queries.
Part 3: A Cornucopia of Models
Okay, so the conversational nature of generative AI means itâs very, very difficult to predict what an AI model will even say, much less measure our brand strength in it. Let me add an even bigger wrinkle.
There are many, many more models in generative AI than there are in traditional search. In traditional search, you mainly have to worry about ranking well in Google. In generative AI, you have a much, much wider field. Here are just a few:
- Googleâs Gemini powers Gemini (the app) as well as Googleâs AI Answers and every other place they try to stuff Gemini.
- GPT-4omni powers ChatGPT and the many, many companies that have built their infrastructure on OpenAIâs platform, like Hubspotâs ChatSpot.
- Anthropicâs Claude models power a variety of writing apps as well as their own web app.
- Metaâs Llama family of models power Meta AI and the AI search in all of Metaâs apps like Facebook, Instagram, WhatsApp, and Messenger.
- Mistralâs family of models power apps and services around the world but especially in the EU.
And there are many, many, many more. You can see and test the top 89 (yes, 89) in LMsysâ Chatbot Arena:
Any company or tool saying they can measure your brandâs presence in AI had better be able to query at least these major models, not to mention the tens of thousands of smaller models that are in production, like IBMâs Granite in WatsonX.
In fact, IBM WatsonX is a great example of how challenging even answering the question of AI presence can be. People building enterprise apps on that platform can choose from over a dozen different models. If you want to influence large enterprises using the WatsonX system, youâd need to influence each of those models.
If any company or piece of software is telling they can measure your brand’s presence in an AI model, and it can return that result in less than a day, they’re lying. There’s no kind way to put it. They’re lying. You would need to query dozens, if not hundreds of different ways of asking a model questions about your brand, tens of thousands of different ways to ask questions about your industry or niche, and ask those questions of at least half a dozen, if not more models. Chatbot Arena, one of the top sites for testing and measuring models, has 89 top models to test. Even fully automated, you’re not getting that answer in under a day, and certainly not for free. The compute costs alone would be in the hundreds of dollars just for one model.
Part 4: Ground Truth
Most generative AI services that offer consumer interfaces have started creating grounding in their software; that is, the software gets answers from the model and then essentially fact checks them with website data, typically from search engines. Metaâs AI model does this. ChatGPT does this. Google’s Gemini does this.
Which means… while people may not necessarily use search engines the way they used to or as much as they used to, optimizing your content for search engines is still vitally important. Showing up in search engines means that when an AI model is unsure about its answer, if the model has the ability to call tools (like web searches), it’s going to ask the nearest search engine.
Those sites and pages that do well in legacy, traditional search will be the points of reference for AI models that are checking their own work, which is a polite way of telling you not to fire your SEO team just yet.
Part 5: So What Do We Do?
Let’s get to the So What? as Katie Robbert likes to say. Okay, we effectively cannot measure our brand presence in an AI model because the architecture of the model simply makes that infeasible.
But what we can do, what we know how to do, is create content on the Internet that’s available for models to learn from. Almost every AI company making models uses some or all of the Common Crawl archive of public web pages as the base data for its model training. Common Crawl is a regularly updated massive archive of text, 6.2 petabytes as of the last crawl, and is something you can – if you have the disk space – inspect yourself to see what’s inside. You can download portions of it and see what part of your website has been scraped. You can see what news outlets, what blogs, what podcasts, etc. are inside their archive – and by extension, inside the training data for language models.
We can and should be creating high quality, high volume content everywhere we can. Make YouTube videos at scale – we know for sure companies like NVIDIA have scraped millions of hours of YouTube video for their model training. I am almost 100% confident OpenAI has scraped the same from YouTube, because in their voice transcription model, Whisper, it recognizes and tags my voice in audio
We should be creating content in all the places we know scrapers go, and AI models go to obtain training data. Github. StackExchange. Reddit. YouTube. Local news sites. Substack. Various social networks.
And finally, but probably most important, we should be building brand like crazy. You should be building your personal brand by being in as many places as you feasibly can be. You should be publishing guest content wherever you can. You should be on every podcast that will have you. You should be joining livestreams and being on any YouTube channel you can be.
We want models to train on our names and our topics from as much text as possible, which means we need to create that text in as many places on the Internet as we can. However, instead of chasing inbound links as we used to in SEO, now we’re just chasing placements, period. Believe it or not, this is where public relations firms and teams could make a huge difference for their companies. The best firms and teams excel at getting media placements, so their strategy just has to pivot from “we want top tier media” to “we’ll take any media we can get”.
You Can’t Reasonably Measure AI Brand Strength
The bottom line is this: yes, you can measure your brand’s placement in AI, but it’s a time-intensive, compute-intensive, very expensive process to do so if you do it well. (it’s free and easy if you want inaccurate, incorrect answers)
However, you can absolutely measure the impact of AI on your business using the gold standard of measurement: asking people how they learned about you. There is no substitute for this, there never has been, and when done properly, will always be a timeless, valuable source of marketing attribution.
If you’d still like to try measuring brand strength in AI and feel like lighting a lot of money on fire, my company Trust Insights will try to talk you out of that but will still do it if you absolutely insist..
How Was This Issue?
Rate this week’s newsletter issue with a single click. Your feedback over time helps me figure out what content to create for you.
Share With a Friend or Colleague
If you enjoy this newsletter and want to share it with a friend/colleague, please do. Send this URL to your friend/colleague:
https://www.christopherspenn.com/newsletter
For enrolled subscribers on Substack, there are referral rewards if you refer 100, 200, or 300 other readers. Visit the Leaderboard here.
Advertisement: Bring Me In To Speak At Your Event
Elevate your next conference or corporate retreat with a customized keynote on the practical applications of AI. I deliver fresh insights tailored to your audience’s industry and challenges, equipping your attendees with actionable resources and real-world knowledge to navigate the evolving AI landscape.
If you’d like to see more, here are: