2020 Rewind: AI and SEO Applications

Written by

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

Welcome to 2020 Rewind! In this series, we’re taking a look at the year that was and the wonderful shows and podcasts that had me on as a guest this past year. A fair amount of the time, I’d rather read than listen, but podcasts, live streams, webinars, and videos were how we made media this year. So let’s get the best of both worlds; 2020 Rewind will share the original episode and show it aired on, and provide you with a machine-generated transcript from the episode.

2020 Rewind: AI and SEO in 2020 with Kevin Indig and the Tech Bound Podcast

Summary: We talk through the new GPT-3 model and its likely impact on SEO, how transformers work, optimizing content in 2020, and what to look for in the year ahead.

Find the original episode here.

The state of AI and SEO w/ Christopher Penn

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Kevin Indig 0:00
Hey, and thanks for tuning in.

Again.

This is a special inbound tech bound episode that I shot with Christopher Penn.

And in this conversation, we speak about anything artificial intelligence, the impact of AI on SEO and of course, GPT.

Three, Christopher Penn is actually the co founder and chief data scientist of Trust Insights.

He’s also the co host of marketing over coffee, and three times IBM analytics champion.

I really appreciate a five star rating wherever you listen to podcasts, or a subscription to YouTube and of course, the tech bond newsletter, so you don’t miss any of this content in the future.

Thank you very much.

And enjoy this episode with Christopher Penn.

321 Christopher Penn, thank you so much for coming on the show.

Christopher Penn 0:51
Thank you for having me.

Kevin Indig 0:53
It’s an absolute pleasure.

And I’m going to plug your brain for so many things.

But I wanted to start with GPT three.

So GPT, three, created this wave of fear.

It came crashing down on content marketers and SEOs when they saw what it could do a couple of weeks ago.

And on the other end, many people were excited because it’s potentially takes off the weight of creating lots of boilerplate text.

So I was curious, what is your take on the impact of what the potential impact of GPT three on SEO and content marketing?

Christopher Penn 1:25
So I think it’s probably important, have you done talked about GPT? Three yet, on your show already? Do listeners know what this thing even is?

Kevin Indig 1:34
Slightly? Yes, I touch on in a blog post.

But I think as a quick explainer would be amazing from you.

Christopher Penn 1:40
Okay.

So there’s a group called Open AI that creates these, among other things, lots and lots of different AI models and AI models a fancy term for software, right? It’s a piece of software.

There’s this general pre trained transformer GPT family of models that this group is created GPT one, which is about two years ago, TP two which was last year’s, which has been used very heavily for natural language processing, and natural language generation, creating writing net new code.

And then this year, the appropriately the model is now version three.

version three is a departure from the previous versions in that it now instead of having, you know, a lots of parameters and guardrails to generate text, it takes a prompt, so you’ll say you, for example, write in a quarter of a paragraph, tell it how much content to create, and it will try and essentially, guess at what the rest of the logical pieces of content should be.

And it does some really cool things.

One of which the I’m personally entranced by is called neural style transfer, where it is trained with something like how Ernest Hemingway writes.

And then you feed it, JK Rowling’s Harry Potter series and you say rewrite Harry Potter in the style of Ernest Hemingway.

And it will change the language structurally to do that.

Now, there’s some upsides and downsides with the way this new model works.

The obviously the big upside is that it requires a lot less prompting to use the actual model once you’ve trained it.

And it’s called priming.

And it can do all sorts of very different pieces of tasks.

You can write, for example, reasonably credible poetry.

It can do regular texts, you know, marketing, direct marketing contests, not always that exciting novels, things like that.

It can also generate code is there examples of writing code from scratch, given a prompt, say, like generate a window that has these four buttons, and it would write this in, in Swift was the language being used.

Now, that all sounds cool, and as you said, some people are very afraid other people are very optimistic.

Here’s the downside, that’s not really a downside.

It’s just knowing the limitations.

Number one, this model is gigantic, it is 174 billion hyper parameters.

And a hyper parameter is the best way I can explain you know, hyper parameters and hyper parameter optimization is think about an oven right? You baking cookies, right? And go put cookies in the oven, what are all the dials on the oven, there’s things like time there’s temperature, there’s convection, convection, each of those parameters at every degree from like, 170, which is your ovens keep warm setting to like, you know, 800, which is like you know, clean.

When you do hyper parameter optimization, you’re essentially gonna try and bake a cookie at every single possible device setting.

And so this model has been taking the English language in, and I believe it’s trained mostly on English, and has essentially tried to write 170 4 billion different ways, these hyper parameters a tune in order to generate text.

That means that from a computational perspective, it is extremely expensive requires big hardware, big ion lots and lots of GPUs.

And the ability to use in a production capacity is going to be constrained by those resources.

It’s not Could it be, you’re not gonna put this on your laptop and run it? Well, you can, but you expect to wait a couple years.

So that’s one downside.

And the second downside of this model right now, at least from the folks who have talked about it.

And one of the things that early adopters have said is that, you know, it requires what’s called a lot of pre priming, a lot of giving it samples, a lot of very tuned text in order to know what to do.

And that’s, again, no surprise, basic number one basic of machine learning is you’ve got to have good data to tune a model on.

And the tuning process for this apparently, is also very computationally expensive.

So is it something that a content marketer, or an SEO professional needs to be like, Oh, my God, tomorrow, I’m out of a job No, not even close.

It requires a lot of expertise, it requires a lot of hardware.

And it requires a very well tuned data set to be able to generate the incredibly cool proofs of concept that have come out.

But again, it’s not something you go to, you know, fire up a web browser and just say, okay, make me my next 1000.

blog posts.

That’s it, we’re not there yet.

Kevin Indig 6:08
I read somewhere that takes the estimated cost of train that model is between 10 and $12 million.

So an absolutely incredible effort needed.

But where do you fall? Which side of the coin? are you on? Is it? Are you more intimidated by what’s possible already? What we see? Or are you excited,

Christopher Penn 6:28
I’m very much on the excited side of things.

But also, I am also very skeptical, a lot of a lot of the hype that has come around with AI in the last two years.

And it’s not because the technology is not there, the technology is absolutely ready.

In many cases for production.

Some of the more, the more advanced, but not like the cutting edge models, like you know, the T five transformers, and even GPT-2 could do some pretty cool stuff.

And they can generate, you know, state of the art results on a lot of different tasks.

The challenge for a lot of AI and for a lot of AI companies, in marketing in particular is are they solving a problem that we actually have right now? Or is it are these solutions in search of a problem is some things 100% are definitely a a great solution to an existing problem using these natural language models, thanks for that question and answers with Chatbots.

Perfect application very useful, very well tuned, and can save companies a lot of time and money.

And while still providing a great user experience, the user really feels like they know they’re in a Turing test, like, am I talking to a human? Am I talking to a machine? I don’t know.

But the answers are pretty good.

So there’s that.

But on the flip side, there’s also you know, a lot of stuff out there that really is just hype.

It’s it.

There was a piece in the Financial Times.

That’s now about a year and a half old.

The Financial Times did an investigation of 100 different companies that were said they were AI software companies, and found the 35% of them had none zero, nothing at all, they had outsourced it to like overseas work in like Bangladesh, which Yes, they’re using human intelligence, which is still is still the state of the art.

But it was they weren’t living up to their claim.

So I am very much on the optimistic side, I write a lot of my own code, I build a lot of my own models and things for my work in marketing.

And once you get into it, you realize there are many more limitations than you would you know, you go to all the vendor websites, you’re on the virtual tradeshow floor.

Now, I always come here because cool promises.

And then when you get into the coding, I like, Oh, this is a lot of hard.

Kevin Indig 8:39
Luck.

Yeah, it’s just a very strong, sophisticated spreadsheet in some some cases.

But he also wrote a whole series on her blog called the AI powered SEO process.

Can you elaborate on that and tell us what it looks like?

Christopher Penn 8:55
So the AI powered SEO process actually looks very much like the scientific method in a lot of places.

But it is essentially, what data do you have that you can train on? What are the models you’re going to select? What are the outcomes you’re after? And then do you have the ability to generate the individual pieces using a couple of different tech techniques and tactics? A big part that I think is immediately useful to a lot of SEO folks is topic modeling.

And topic modeling is well beyond proven.

Now it is it is old hat for a lot of more mature machine learning, folks.

But there’s just so many good tools for doing topic modeling and to be able to say, Okay, I’m going to do a search for I don’t know espresso shops near me, right and you pull in the top content or you use the SEO tool of your choice and pull in the top 100 pages on these things.

And then may you pull another set of like, you know, the next 900 and then you do a split say okay, what the top 100 pages have in common that is absent from the next 900 bill topic.

You’ll build your topic models, look at the intersection or look at the exclusions and say okay, what’s in common These top pages.

The other thing is that with tools, for example, Facebook’s fast text, you can do what’s called vectorization, which is where you turn words essentially into all these numerical vectors and say what are the semantically related things that you that would be associated with it.

So I may have an espresso shop.

I may or may not mention the word cold brew, right.

But we know from how Google works with its own models, that it is doing semantic Association.

So you may end up ranking for like a latte.

Even though you don’t have a page on your website, you don’t know about our lattes, it’s not there, right.

But Google understands from a semantic perspective, you’re an espresso shop, you probably have lattes.

And so in a local search, you may come up for someone such as your latte near me, using this topic models using these techniques, is a great way to start teasing that out.

And creating content that is logically that should be there based on the data that you’re being given, it’s kind of it’s not truly doing it because Google’s models are much bigger.

But it is kind of like reverse engineering, a little bit of it, just to understand what else should be in the content you’re creating.

So that’s a big part of this process is is doing an inventory, inventory, what you have inventory, what’s in the top results, trying to figure out again, what are the intersections? What are the places where you’ve got a gap? And then another one that I think is is so overlooked, is key opinion leader or influencer identification.

It’s still, you know, for good or ill inbound links are still the gold standard of what predicts like, hey, this site’s gonna rank reasonably well.

And while it has been proven time and time and time, again, that there is zero correlation between social media sharing and search rank, there is a logical relationship between getting an influencer to write a blog post about you and getting that link.

Right.

So that’s a part that I feel like so many, SEO folks, particularly folks who are still stuck in like 2015 are getting wrong.

They’re just like, you know, the other this to spamming people like to please link to yet I’ve got this great resource, please link to it.

As opposed to say, Okay, in this network of people who are expert about this topic, who are the network hubs? How do I approach them carefully build a real relationship over time? And then can I get one piece of content placed with them somehow, because I know if I do that, it will spread like a fire to the entire first and second two connections that this person has.

And that’s a better model of doing this type of influencer outreach, then, you know, spamming everybody that you possibly can, which I still get, like, 40 of those a day.

Kevin Indig 12:42
Yeah, it’s sometimes stunning how many of these old terrible habits are sticking in an environment that develops so rapidly and so fast? And I totally agree with you, I think, you know, as SEO was where we’re traditionally very bad at taking things to the next meta level.

And instead, we’re often sticking to and trying to scale these old kind of terrible tactics.

But in in the rounds of your AI powered SEO process series, you created a simple k means cluster based on your blog articles with two Moz metrics that basically show your most valuable content in a nutshell.

And I’m curious, how can SEOs or basically, beginners Get Started leverage leveraging very basic machine learning models for their work? What’s the entry point.

Christopher Penn 13:32
So on that particular example, using k means clustering, that I don’t do that anymore.

That technique is very old now.

And it’s not as good as using Markov chain models.

Got the there’s this concept.

And this is I think it’s an important concept to to understand.

There was an archaic archetypical story of a college that opened up its campus and didn’t have any sidewalks and just let students wander randomly.

And then a year later, paved sidewalks were all the pads were worn.

And supposedly this campus, which has never been named, is a nice, it feels like a nice campus to wander, it feels very natural.

That concept is still a great concept.

And when you look at how people traverse your website, there are paths to conversion.

There are logical places that people go on your website, behaviourally, that lead to conversion.

So if someone’s on your site, they’re on your blog, and then they go to the your services page, and then they go to your about page, and then they go to the land your contact page, right? That’s a path to conversion.

And one of the things that people don’t understand about attribution analysis is that you can perform the same thing you do to figure out like which channels work you should be doing with your content, which is your content works.

And it is absolutely possible to model that today with the data that you have in your existing web analytics tool, particularly using Google Analytics.

When somebody completes a goal in Google Analytics, and you can run strictness, a goals that had organic searches one of the drivers, if you want to focus on SEO, inside the API, there’s goal conversion location.

There’s previous page one, previous page two, previous page three.

So you can see the three, the three steps before a goal completion and the goal completion, using this machine learning technique called Markov chain modeling, you can absolutely understand the importance of what pages are the most important in that sequence to goal completion, that tells you these are the pages on your site that you must optimize, you must have them not only tuned for SEO, but also tuned for conversion rate optimization to make sure like, it may turn out this blog post that you wrote is just fire, it’s on fire, great, optimize the heck out of it, make sure it ranks for every term you can possibly get it to rank for, but also put some budget towards promoting it maybe even on the SEM side, because you need traffic to come to that page, because you know, that is the precursor to a conversion.

And so that’s not an easy starting point from a machine learning perspective.

But it is the easiest starting point from a results perspective to be able to demonstrate the value of SEO, hey, we’re going to find the pages that already convert, we’re going to tune them out.

First, they are our priorities take care of if you want a place to start with machine learning the simplest technique of all, is linear regression.

Right? It is it’s it’s, it is technically machine learning.

But most people would agree that like if you can do an Excel, it’s probably not.

But looking at the data that you have in your analytics software and trying to assess what are the things that potentially lead to the outcome you care about.

So I would say if you want to get a head start, look at it at a page level from your Google Analytics data.

And you can do this in Data Studio, you can do it from the API, I like to do it from the API, because you can get more data out of it that way.

Your pages, the organic searches per page, which is a metric that is in the API is super valuable people miss it, your sessions and your goal completions.

Right, and then do a multiple linear regression.

Is there a relationship between say organic searches to that page and conversions? If there isn’t, it means that your search strategy may be attracting searches, but it may be attracting searches from traffic that doesn’t convert? Right? One of the things that SEO folks forget an awful lot is that we’re optimized, we’re optimizing, we’re optimizing, we’re trying to get top ranking positions and all this stuff.

But are we getting a decent quality audience? I look at my search console data.

And I like hmm, I’m getting a lot of traffic because you know, there’s like three or four times I’m getting a lot of traffic.

But this is not what I’m about.

This is not what I want to be known for.

Like I’m just even just delete that post.

I don’t know if it’s worth having.

But that simple regression analysis is a great starting place to say how do I start to understand my data as it relates to SEO? And give me some guidance about what I should be doing?

Kevin Indig 17:56
Right? And it’s not because I think that it’s in some weird twisted way, Google kind of weeds out the bed audience for us, ourselves by monitoring or by using things like like user behavior signals, and in what capacity to do that, and to what extent is still very debatable.

But I totally agree with you.

There was wondering, I know that you’re a master in our and there’s a hype that has been kicked off, I would say six to 12 months ago and SEO seen about Python.

What kind of what? Because I know this question will pop up what tools you recommend folks to to use to get started with like simple linear regressions and then to expand from there.

Christopher Penn 18:35
So okay, on the R vs.

Python thing that I swear more than anything is an age thing.

I’m old.

I’m in my 40s.

I was doing SEO when, when the search engine of choice was a, you know, Yahoo directory.

And I’d AltaVista, I remember AltaVista, right? And so I grew up learning languages like C and Java and C plus plus.

And so our syntax is much more familiar and comfortable to me.

I have a really hard time with Python syntax.

I know otitis obviously, with the stupid indenting thing I like why are we doing loops with indents? This is dumb.

But that’s me.

I think the two languages other two languages, Python has much more general use.

So for someone brand new is never coded.

I think it’s probably a better choice.

But I would encourage people to try both and see which one just feels better to you.

Now that’s a Do you need to program to do some stuff? No.

As as you mentioned in the introduction, I’m an IBM champion.

And one of the tools that IBM has is a fantastic tool called IBM Watson Studio.

Inside there is a drag and drop click based model where we put these little colored blocks chain them together, and you can drop in like a CSV or an Excel spreadsheet and have it you obviously have an entire graphical interface to push the buttons and things but you can do a lot These analyses regression modeling x g boost, gradient boosting, clustering all these statistical and machine learning techniques inside of a no coding environment, there are limitations to it.

But as a beginner to intermediate, you’re not going to hit those limitations for a long time you’re going to be, you know, learning the tools.

And I think it’s a really great way to try and

Unknown Speaker 20:19
learn

Christopher Penn 20:20
the thinking, without getting hung up on the code.

What should I logically do? I should clean my data first.

Okay, I’ll use the data cleaning module.

Should I do figure out what data is important? Should I use the feature selection model module? And then what should I do next? Why should we actually try and do a numerical analysis can use the auto numeric block chain for these little colored blocks together, and it spits out a result and like, okay, you were able to do that without coding.

And I think it’s a really, really good start.

And if you go over to Watson Studio, it’s it’s sort of one of those sort of free to play things where you get a certain number of hours each month, and I think you’re capped at 50 hours a month for free, before you have to start paying for it.

For a lot of the work that we’re doing in SEO 50 hours is more than enough to do some of these analyses.

But more than anything, it’s just to get your brain trained, okay, this is how we should think about the process of processing my data for SEO purposes or anything using machine learning techniques, but not necessarily having to sling code.

Kevin Indig 21:22
That’s fantastic advice.

Thank you for that.

One person from the audience also asked, Do you keywords still matter? And then Si, sorry, in a AI SEO world? And really liked your answer, because you came back to a lot of these concepts that we touched on like co citation entities vectorization, that, you know, just the relationship between different entities.

I was wondering, can you go a bit deeper into that? Can you elaborate on that?

Christopher Penn 21:49
I think if you understand the the models that Google uses that they’ve publicly stated, you can start to tease out what is important to how they how they think about particularly text.

One of the greatest misses I’d see in SEO is people not going to Google’s academic publications page and reading their publications.

They’re, you know, hundreds of these things every year.

And it pretty clearly tells you the direction that they’re researching, even if the research is it, you know, in in the product, yet, it gives you a sense, oh, this is what they’re thinking about.

When they announced, for example, that for processing queries last year, they were starting to use their BERT model, the bidirectional encoding representation transformers.

The first thing be like, Oh, well, you know, that doesn’t matter to SEO, because they’re using to just understand the context of the query like, well, it’s a it’s a two sided coin.

Yes, you use BERT to understand the context of the query.

But by definition, you kind of should probably run the same thing on your corpus so that you can, you know, do pairwise matching, which is something that Google says they do.

It’s like, okay, so BERT does matter, for understanding and taking apart entities and context, prepositions, etc.

on both the query side, and on the result side.

So why would you not take your content and run it through any of these transformers and understand what it is that they would see in your text? And so you should be analyzing your text for entity detection? Like are there are other entities that are logical that should be in your content? At the end of the day, like you said earlier, when we’re talking about behaviors and stuff, Google is fundamentally capturing and replicating human behavior, right? So the old advice from 20 years ago is still valid, right? For humans.

Right? Right, as if there was no Google.

So that people would say, Wow, that was really good.

I want to refer this to my friends.

Because as Google’s not natural language processing technologies evolve, and the way they they’re doing their matching evolves, it’s looking more and more like the kinds of things you would recommend to a friend anyway, because again, they’re they’re they’re copying our behaviors.

That means if you don’t have access to the state of the art models, you can start to at least play with some of them.

One of the greatest gifts Google has given us His Google colab, which if you’re unfamiliar with it, is their machine learning laboratory, you can sign up for a free account, and you get a four hour working session, or you can start a new one anytime.

But after four hours, a timezone shuts down to say resources.

And you could load up with their hardware like Tesla, Katie’s GPUs and stuff.

And you can run code in this environment.

And you can load up things like the T five transform, which is one of their their big transformer models, you’re loading your text and say do some analysis with this, do some testing with this.

One of the great techniques that there t five transformer does is abstractive summarization.

So put in, say your blog post, let’s say, transformer.

Read this, process it and give me a three sentence summary of what you think this piece of text is about.

It will spit that out.

Sometimes it comes out with salad.

But sometimes it comes out with a really good summary.

Well guess what if the T five transformer in Google’s environment, which is a Google based transformer spits this out as abstracting the summary of what it thinks your piece of text is about? What do you think that same transformer is doing for a search results, right is trying to understand what is this piece of text about and doesn’t match these queries.

By the way, if you want to, that’s a fun tip, if you’re doing meta descriptions, or even just social media posts, stick through an abstractive summarization tool, and get, you know, a two or three sentence summary though those short summaries are so good, they, they go off the rails once you get beyond like, you know, 1500 characters, but I forgot the words, but two or three sentences, they exist this nail it,

Kevin Indig 25:46
I felt like something you could build into a headless CMS and just enrich your CMS.

Christopher Penn 25:50
You could it’s very cost intensive processing time wise.

So like a blog post will take about two and a half to three minutes to process, which is no big deal for one blog post.

But if you got a bunch of users on a big CMS, you’re talking like hours of compute time.

Kevin Indig 26:08
Right? You yourself mentioned an add on for our that you use for natural language processing.

I was just curious for the audience.

What is that into To what extent to use it.

Christopher Penn 26:18
So there’s a bunch but the primary natural language one I use is called quantitative.

It’s a it is open source package, just like our itself is open source.

And it does a lot of these things like basic term frequency and inverse document frequency scoring, which has been in use in SEO for five years now.

And it’s still relevant.

But it also does things like cosine similarity, Euclidean distances, etc.

One of the things that I’m playing with right now is this idea or this concept.

And this is an old concept This is from, I want to say like the 60s or the 70s.

With this concept called stylometry.

stylometry is a way of measuring how someone’s writing style looks, and then comparing it to other writing styles.

Like, for example, and rice has a very distinctive way of writing Ernest Hemingway has a very distinctive way of writing, there’s just ways to use words and phrases.

And one of the things I’ve run into trouble with with content curation for social media marketing is you’ll find a lot of content that you share, that it’s not quite aligned with your brand, right? It just seems off.

And so I’m using these natural language tools and trying to build some of the stuff right now to say, okay, not only do I want to share stuff that has a high domain authority, and you know, lots of organic traffic, so if that, but is it stylistically similar in tone to my own stuff, so that someone who’s reading my favorite Oh, that makes total sense why Chris would share that because it sounds just like him.

Or it sounds close topically and and from a language perspective, it sounds like him.

from an SEO perspective.

This is a fantastic tool, a fantastic concept, I would say, for things like vetting guest writers, right? If you’re trying to get a pool, see 150 Guest writers have them all submit a sample, you know, it can be any sample or whether through a stylometry tool with some of your posts that say, okay, which writers sound like us, so that we have a minimum amount of editing to do in order to get something that sounds like a polished product, as opposed to Hey, I’ve, I used to run a guest blogging program for a huge tech company.

And some of the submissions we got, it’s like the personal space rolling across the keyboard.

What happened here? And so these tools, and this one in particular, are really good at at doing those individual techniques.

There are a lot like utensils in a kitchen, right, you know, different tools for everything.

It still needs you as the chef to understand what tools to use, when and how.

Kevin Indig 28:46
And ultimately, we can probably even transfer someone’s writing into the style that we want to without, you know, having to analyze it in the first place.

Christopher Penn 28:54
Yes, and that’s where that neural style transfer that in GPT three has real potential Could I take a piece of content and rewrite it in my style? Now that has some very, very interesting and thorny implications from a legal perspective, because the language it creates is net new language.

If I take this model and say GPT three, ingest all my blog posts, and now rewrite Harry Potter in my voice, it’s going to sound very different.

It’s gonna be net new language, who owns that? Right? And it’s, it is a derivative work.

So I understand the copyright law would follow it would qualify as a derivative work, but could you prove it? I mean, obviously, the character still named Harry Potter you could.

But if you did, like a fine replace like el James did with 50 Shades of Grey, which was originally a twilight fanfiction, and they just did a fan you’ll find a place on the character names.

It’s no longer Twilight.

It is it’s now an independent work.

It’s the characters all still have essentially the same characteristics as the Twilight characters.

So if I take something like Twilight and say rewrite it in, in my style, who’s working Is that because I didn’t really write it machine did.

It understood my style, and it took a source material.

This for SEO perspectives presents a very, very interesting challenge.

Because if you have an industry leader like say, in tech, like you have Cisco, right, and you can do an assessment of which of the best LinkedIn blog posts on Cisco’s blog and say your well, Netgear Cisco on stack here, well, we’ll just use for example, say your neck, your your neck, your marketing team, what happens, you copy and paste Cisco’s top 50 blogs, you use a neural style Transfer Tool with your own stuff.

And now you have 50 new blog posts that are exactly topically identical to Cisco’s butter unit and new net new language.

from an SEO perspective, you’re probably going to do pretty well, because they said they’re going to cover the same major points.

But who owns that? Whose copyright? Is that? And what is happening? Can it be proved in a court of law? The answer is probably not.

Kevin Indig 30:54
Yeah, it’s fascinating.

And it touches slightly on fake videos, like, you know, Obama saying things that was machine learning created.

But then at the same time, I think it comes a little bit full circle to the fear that I mentioned in the first question, which is that, say we could we know the elements of a good story, for example, right, or several different story arcs and how they work and how popular they are, you could theoretically just take something like the hero journey, which is one of the most classical story arcs that exists and just inject any topic on that and just keeps churning out these amazing stories, right.

And I think the underlying fear there is also to be redundant because the machine gets so much better.

And this might be future talk still, right? I don’t think we’re there.

And this is something we established, but just the sheer thought of having these structures that we know work well, which we could have analyzed with AI in the first place to validate that they work well.

And then using models to basically create our own from that, I think it’s a it paints a picture of a world that’s a little sinister, but also a little bit exciting.

Christopher Penn 32:00
I would say though, if you’ve ever intentionally or accidentally read a trashy romance novel, that is functionally exactly the same story and you know, 100,000 different versions, you know, person beats Person person falls in love with person, strange conflict person, you know, resolves this person and off you go.

That hasn’t changed.

If you read, for example, the warrior series by Aaron Hunter, which is a kid’s like a young adults who is it’s exactly the same story over and over again, it’s a team of five writers there actually is no Aaron hunters, the same team firefighters basically just recycling the same plots over and over again with different different cats.

So I don’t people, people just inherently find value and comfort in repetition and in stuff they already know.

I mean, there actually is a term fact and drawing a blank and what it is, but is one of the reasons why you is why we watch the same series you’ve watched on Netflix over and over again, like why are you still watching this, like, you know how it ends? People do it as a form of comfort and certainly in as the the beaten to death expression goes in these unprecedented times, you know, anything that reduces anxiety is a good thing.

That said, one of the greater dangers that no one’s talking about and that is a problem in the tech industry and in the SEO industry is that you need to have a very strong ethics foundation.

In order to use AI responsibly.

That can be anything from the basics of Hey, are we pulling from enough diverse content sources? To Who are we sharing? Do we have an implicit or an overt bias and who we share? Or who we link to? To how are we calibrating our marketing results on on a representative audience? Should our audience be representative of the general population? Like if you’re a b2c marketer? The answer is probably yes.

And if your audience is not representative, you have to ask why is it in violation of the law? And even if it’s not, is it the most profitable possible outcome? A real simple example of this is the one I give all the time about my little pony.

So my little pony is a toy made by Hasbro company.

And it is ostensibly targeted towards girls eight to 14 years old.

If you train and all of your data and all your modeling is based on that assumption, you’re going to create models and content and all this stuff.

But, and there’s a Netflix special about this.

There’s an entire audience of men 26 to 40, who are rapidly in love with my little pony they called brownies this conferences, conventions, but guess what, they have way more disposable income than eight year old.

If you build your entire marketing strategy on your SEO strategy on this one bias you have of you know, eight to 14 year old girls, you’ve missed a market opportunity, a lucrative market opportunity and you have a real risk of of not making as much As you could have, whether it’s for yourself, your company, whatever.

But even things like SEO, we have to be aware of and we have to constantly question are we biased? Are we baking biases into our assumptions? Are we baking bias into our data sources? When we build, you know, keyword list something as simple as a keyword list? What language you’re using? You know, there’s a, in linguistics this, this phrase, you know, English is the language of privilege, it is the the buying language of rich people.

And guess what the majority of the planet doesn’t speak it.

If you’re optimizing for your market, are you by optimizing an English on loan, intentionally ignoring potentially lucrative other markets? You know, you if you don’t have an understanding of your Portuguese, you could missing all of Brazil, you if you don’t have an understanding of Chinese you’re missing help 1.3 billion people.

And so we have to constantly ask ourselves, are we optimizing? Are we doing SEO for assumptions that are no longer valid compared to the market? We could have?

Kevin Indig 36:09
At that point, for two reasons.

I’m going to try Christopher The first one is because when I worked at Atlassian, actually met a Bruni and ahead of what was going on I normal guy, and he had his I think it was a developer, and his background, his laptop background was my little pony.

And I couldn’t connect the dots for life of it.

So one day, ask them to what’s going on here.

And he was like, Yeah, I watched my little pony.

I was like, isn’t that a good show? And he was like, Yeah, well, you know, that he explained this whole concept of bronies.

And how huge it is, as you mentioned, you know, it’s a, it’s a, it’s a huge market, actually, it’s very, very potent, in the second reason for why I love this is because I did a little bit of research.

And in one of your most recent newsletters, you actually wrote about questioning your assumptions.

And I’m going to read about, I’m going to read really quickly what you wrote.

He said, as busy marketing professionals, we don’t give ourselves enough time to study, research, investigate, and most importantly, challenge our assumptions.

We fail to do this, we operate under our old knowledge.

And in a rapidly changing world.

Old knowledge is dangerous.

How do you in your daily work? Question your assumptions?

Christopher Penn 37:24
There’s two ways.

One is I have, you know, obviously, my own sets of checklists and things to ask myself Are these problems.

And actually, if you want to get a head start on, there’s a great free book on Amazon called the ethics of data science by Dr.

Hilary Mason, I think it is mandatory reading for anybody who works with data in any in any way, shape, or form.

It’s totally free.

It’s not even your Kindle Unlimited, and it’s totally free.

Go buy it and read it, I’ll get it and read it.

And too, I do a lot of content creation, writing my newsletter is how I stay up to date is one of my quote, secrets, right? Because in order to curate content and stuff and build these newsletters, I have to read, I have to constantly keeping up to date, like what’s going out this thing, I’m looking at my social feed for next week.

And they’re stuffing you’re like, Huh, I don’t recall seeing that.

I don’t recall seeing that happening.

I must have missed the news on this particular thing.

And in doing that, it keeps me up to date keeps me fresh and aware of what changing changes are happening.

And because the the input sources for a lot of the tools I’ve built are more diverse and just marketing blogs, there’s a lot of other stuff that gets finds his way in here.

Like there’s a whole piece right now on measuring the temperature of melt water as a proxy for adjusting how quickly glaciers and polar ice caps are melting.

like okay, that’s cool.

Can I find data on that? If you go explore that, you know, on a Saturday night or whatever, just go play around the go, Hmm, there might be something to to this.

SEO professionals, all marketing professionals need to be dedicating time every single week in their work towards reading and research towards, you know, reading the top blogs in the field and reading you know, the not top blog, SEO doing some digging around looking at falling reliable people on Twitter and seeing what they share.

I think that’s one of the things that again, people forget is that it’s when you follow somebody and they’re sharing stuff.

You’re not following just the person you’re following their media diet, you following what’s important to that person.

If you follow you know, Bill Slutsky and you follow Danny Sullivan, you follow a camera? What’s her name?

Kevin Indig 39:36
He just saw this.

Christopher Penn 39:38
Yes, thank you.

You follow? All these folks.

You see what they share? You start then reading their sources and it helps you bridge out it’s kinda like how you find new music.

A friend says Hey, listen to the song and check out the song.

You check out the band like Oh, I like this band and you start to listen to all the music and stuff.

That’s how you stay fresh.

And it is more important than ever that SEO practitioners be doing this because they are underlying technologies that companies like Google are using are changing constantly.

They’re upgrading.

They’re, they’re doing new stuff.

And if you’re not following along, you’re operating on techniques that may be counterproductive.

Now, they worked five years ago, but they haven’t worked in three years like and why would you? Why would you keep doing something that doesn’t work?

Kevin Indig 40:20
Yeah, those are fantastic experts.

And it’s funny that you mentioned, for getting and things that don’t work, because you also wrote about this concept of everything decays.

In your newsletter, she wrote, everything decays, but a digital marketing, much of what we do everyday decays a little.

you experience it on a daily basis, every email campaign that goes out has a few more non working addresses, every social media account gains and loses followers, every piece of code and software grows a little more stale every day, if it’s not regularly maintained.

And then you wrote the entity to decay is that only maintenance but injection of new energy, new blood, email, this can be regularly maintained.

But if you’re not adding new subscribers will continue to shrink over time.

It has a patient pale shadow of itself.

The same is true of your social accounts, your CRM, your marketing, automation software, everything explained to me what that means to you.

Christopher Penn 41:14
It means exactly what it said it is that you’re if you’re not growing, you’re receiving there is no such thing as standing still in marketing, there really isn’t.

from an SEO perspective, you know, this, you know that if you’re not getting new inbound links, and your old links are decaying, you’re gonna lose ranking, right? It’s It’s as simple as that.

What are you doing to keep growing? What are you doing to foster growth, and more importantly, to also the previous, what are you doing now to set the foundation for future growth? That’s probably one of the greatest challenges people are not thinking about is what are you doing today that won’t pay dividends today won’t pay dividends tomorrow, but it may pay dividends in a year or two years or three years.

A lot of things like investing in yourself and building your machine learning capabilities and building your knowledge of how to do these things are things that will pay long term dividends, if you have the good sense to use them.

Just like you know, building that relationship with that influence.

It’s going to take you probably a year, a year to get well known to an influencer, my friend Mitch Joel says this.

Fantastic.

It’s not who you know, it’s who knows you.

Right? When somebody says, Hey, I need to talk about SEO, I’m gonna talk to Kevin, okay.

It’s who knows you that relationship takes time to build and it takes effort, it takes a willingness to actually want to talk to these people.

That’s the foundation for growth and it has to be something that you have a plan for, do you invest in over the long term, which I recognize is a super challenging thing these days because these days you were also focused on the oh this quarter this month this week trying to get just get things done, stay afloat to keep the business running.

We’re in a a an environment now we’re forecasting anything we on two weeks is impossible.

Like you literally have no idea it’s gonna happen to Oh, look, you know, massive largest, strongest hurricane hit us the US mainland in ever, like, Oh, that was this week.

Oh, by the way, California is still on fire.

Oh, by the way, we have brand new police murders going on, you know, and several our cities, it’s like, you can’t forecast any of this stuff.

But you can and you, you are in control of yourself, you are in control of your own progression of what things you need to know.

So one of the things I would suggest to people I tell people all the time is go to any major marketing site, like marketing land, or whatever, right? I just look at the categories in like their blog role.

And ask yourself, do I know anything about this? If so, what do I need to know anything about this? Why? And what are the things I think have the potential to grow? In a couple of years? Should I be training myself on that now? And that gives you a blueprint, a professional development plan to invest in yourself sick? Okay, I got to learn more about email marketing.

I know it’s the thing that emails not going anywhere, everyone says emails dead, the same as last 15 years.

And yet here we are still sending email every day.

What do I need to know in order to be able to make that a part of my my professional development? I can’t emphasize that enough, you are in control of yourself, you are in control of your professional development? What could you What plan are you going to build in the next few years for yourself to learn some of these techniques?

Kevin Indig 44:16
That’s exactly how this statement arrived on my end between the lines, it’s, you can drive a Volvo and you can shoot that Volvo up.

But at some point you buy Tesla is completely different thing.

So you know, I was just curious, like between optimizing and let’s call it innovation or new things.

Who do you see doing that extremely well? Who do you Who do you think invest enough like some brands, people who invest enough in long term growth while keeping the boat afloat?

Christopher Penn 44:49
That’s a good question.

I don’t have good answers for because I see across the board companies not investing enough in people.

I see people not investing enough in themselves.

There are some folks I see a lot in my slack group, for example, who are asking great questions.

That’s that, by the way is the indicator of who’s got the potential for growth is by the questions they ask.

People who are asking good questions people are asking consistently better questions, shows you they’re on a path towards growth in the number of folks I can’t name because I’ve got them give me the permission to name them.

But they’re in like, our analytics for marketers slack and, you know, and other slack instances.

But when I go to conferences, even virtually now, and I listen to the questions I get in the q&a period, questions are different.

The questions aren’t better, the questions aren’t showing that people are growing, what’s happening is that it’s through this bizarre turnstile or treadmill.

As soon as somebody gains some proficiency, they get promoted, they bring in a new person, and the new person is starting from ground zero, there’s no knowledge transfer.

And so the new person goes to conferences, say, you know, what should I be doing my keyword lists like, that was fine 10 years ago.

But you know, this person is brand new, they’re 23 years old, they you know, first or second job out of university like, Okay, so here we go again.

And I don’t see and this is one of the things I think is most concerning, I don’t see any kinds of events or groups or anything for the intermediate to advanced practitioner.

So now it’s entirely possible that they exist in their secret for a reason.

I remember when I was doing affiliate marketing, one of the jokes was, you go to Affiliate Summit, and you’re seeing everything worked great last year.

And absolutely no one in their right mind will tell you what’s working for them right now because they need to make their money now.

But there aren’t, there isn’t enough content out there for the advanced practitioner, like I would say, of the blogs that I read.

No, cmo, Havas blog, and Google Tag Manager is probably one of the few that’s constantly like, Hey, this is advanced deal with.

But there’s not a ton else in the market.

Well, now there’s a ton to in the machine learning world in the AI world because a lot of it’s still academic.

And that’s where I definitely see a lot of advancement.

Kevin Indig 47:05
See, well, how this book, definitely recommendable, and I’ll have all of these things in the show notes.

All the people you mentioned all the books you mentioned, of course, tons of links to your blog to your newsletter to marketing over coffee, and want to wrap this up, but not before I ask you two more questions.

And the first one is, in or outside of work, SEO, AI, whatever.

What are you genuinely excited about right now?

Christopher Penn 47:32
Outside of work entirely, I mean,

Kevin Indig 47:34
um, you could pick inside work, outside work, whatever comes up.

Christopher Penn 47:39
So inside work a lot of the work in things like stylometry and natural language processing, I’m doing more and more with natural language processing.

I’m about to build my first recommendation engine based on stylometric stuff to say like, hey, these, these are the pieces that are stylistically similar, because I want to test it out to see if that how that compares to what again, Markov chain modeling.

So that’s pretty cool.

And it’s gonna be fun.

I just started playing with a, a pre trained music separation AI model from Dieter, you give it an mp3 file, like, you know, Taylor Swift’s latest song, right? And it’s, it uses pre trained models to split apart that file into the vocals, drums, lead instruments and accompany them and it sounds good.

It sounds so good.

I was testing it out the other day.

Okay, what it came up with to separate the vocals from the backing track is enough that you could take the backing track and use it for karaoke, right? It’s good enough.

So that stuff is a lot of fun.

One of my sort of inside outside, it’s an interest.

It’s not something I’m excited about.

It’s exactly the exact opposite.

I dread it is I read a daily email newsletter called lunchtime pandemic, that is about what I see in research and curate about the pandemic.

If you go to lunchtime pandemic, that’s substack.com you can you can sign up for I was hoping to not be doing this newsletter anymore.

I was hoping like Phil saying, Oh, yeah, retiring this newsletter.

We’ve got things under control, great.

Instead, we don’t.

But I take content from the New England Journal of Medicine Lancet, Journal of American Medical Association, brief 19 stat news.

And I’m looking for articles that are prescriptive, or that have a clinical relevance.

I am not in any way shape or form a medical professional, but I read a lot.

And at six months of doing this, okay, I can tell this is credible.

This is not that’s a preprint that’s been peer reviewed.

And looking for things like okay, this is what’s likely to happen.

And just in general, like stuff like that, like we had this hurricane hit, a bunch of refineries have taken some damage and stuff and others are restarting, but it’s gonna have an impact on gas prices.

So yesterday, I shared to folks like hey, you haven’t already top off your tank.

You know, if you live in North America, top off your gas tank just in case because there’s always the potential in a strong strong for shortages afterwards.

Kevin Indig 49:52
Amazing, and I can recommend everybody to check that out.

The last question I want to leave you with is what are all the places on the internet that people can Find and follow you.

Christopher Penn 50:01
So the two places to make it easy my company and work, Trust insights.ai today’s blog post which will be up, obviously forever is on pumpkin spice data.

So he took a look at 378 products that have the name pumpkin spice in the in the name of the product, their ingredients, their calories, etc.

It’s a fun day to dive.

So Trust insights.ai and then from my personal stuff, just go to Christopher s pen.com.

Those are the two places you can find everywhere else from there, but those are the places to go.

Kevin Indig 50:28
Christopher I need to need to sit down for 10 minutes and that just all information just gave me because it was absolutely phenomenal.

I thank you very much for your time.

Everybody go check out Christopher Penn’s stuff on it.

He’s all over his stuff is really high quality top notch.

And I appreciate you coming on the show.

Christopher Penn 50:45
Thanks for having me.

Kevin Indig 50:46
Thank you, sir.

Have a nice day.

You too.

2020 Rewind: AI and SEO Applications

2020 Rewind: AI and SEO in 2020 with Kevin Indig and the Tech Bound Podcast

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Almost Timely News: 🗞️ What Is Vibe Marketing? (2025-03-30

即时之讯：🗞️ 什么是氛围营销？ (2025-03-30)

Almost Timely News: 🗞️ 바이브 마케팅이란 무엇인가? (2025-03-30)

Berita Hampir Tepat Pada Waktunya: 🗞️ Apakah Itu Pemasaran Vibe? (2025-03-30)

Pin It on Pinterest