Mind Readings: Who Wins in a Generative AI Licensed Data War?

Written by

AI, Artificial Intelligence, Generative AI

Mind Readings: Who Wins in a Generative AI Licensed Data War?

In today’s episode, we dig into the potential consequences of a licensed data war in the AI industry. You’ll explore a thought-provoking scenario where court rulings could reshape the AI landscape, potentially favoring tech giants like Google and Meta. Discover why these companies might hold an edge due to their user agreements and the implications for AI development globally. Tune in to understand the potential ramifications of licensed data on the future of artificial intelligence!

Mind Readings: Who Wins in a Licensed Data War?

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Christopher Penn: In today’s episode, let’s talk about who wins in a licensed data war. What are we talking about? Some AI companies, like OpenAI, are criticized—correctly criticized—especially by creative folks, for ingesting massive amounts of data from the internet, from the public internet, without permission, consent, or attribution. Their models spit out imitations of the things they’ve learned.

Many folks correctly say this infringes, if not on their intellectual property rights, on their moral rights as creators, that their works were used in a way that doesn’t benefit them and without their permission. One possibility in the generative AI space is a licensed data war. What does this mean?

As court cases proceed about how AI models are trained and whether leveraging copyrighted data violates intellectual property rights, there’s a scenario we can play out as to how this might end.

Now, I want to be clear: this is a fictional scenario. I am not an attorney; I cannot give legal advice. This is purely amateur speculation on my part.

Suppose a high court somewhere says, “Yes, using other people’s copyrighted works without permission violates intellectual property rights.” Copyright owners whose content was used without consent are owed compensation. What happens?

Immediately, there’s a gazillion and a half class action lawsuits. OpenAI gets sued left, right, and center. Anthropic, the same thing. They will lose those court cases if a precedent is set, and copyright owners are going to get, like, $4 from the settlements because the lawyers will take most of the money.

OpenAI and Anthropic are probably toast. They will owe more money in class action lawsuits than they have because everyone and their cousin will come after them, and they’re not profitable. They rely solely on investor money at this point.

So, what does the AI landscape look like then? In the USA, your AI will come from two companies: Google and Meta. Why? Because both companies—and this is true of every service provider—have a clause in their terms of service (that you signed and agreed to) that says any content you submit to a service like Instagram, WhatsApp, Facebook, YouTube, etc., you license to the company to use however they choose under a derivative works clause. This clause states that you give them a worldwide, non-exclusive, perpetual license to make derivative works of your work.

If you don’t believe me, go to the terms of service on your favorite website and look it up—you will see it. Search for “derivative works.” Derivative works mean they can use your data to train AI.

Those two companies basically win the AI wars because Meta has Facebook, Instagram, WhatsApp, Threads, Oculus, and more. They’ve got most of the world’s social media data with all their acquisitions. Google has Chrome, Search, Gmail, YouTube, Google Photos, Android—you name it. They’ve got most of the rest of the world’s data.

Because of those terms of service, we have licensed our stuff to them. If you’ve uploaded a piece of art to Google Photos, you have licensed it to Google—check the terms of service. If you post a picture of art on Instagram, you have licensed it to Meta—check the terms of service.

These two companies will have defensible positions in AI because they can say, “We trained our models on this data, which we are licensed to use and have obtained consent for.” Whether or not you knew you were consenting is a problem for us, the end users, not them. We clicked “Yes, I agree,” and that is legally binding.

Chinese models like Yi and WuDao will win because China generally does not abide by other people’s intellectual property rights. This has been an ongoing struggle for 50 years, and in the digital space, the Chinese government pretty much thumbs its nose at everyone else’s intellectual property laws.

This is true of any model made by a sovereign government that chooses how to interpret its laws within its jurisdiction. The United Arab Emirates makes Falcon, for example. Any of these companies overseas, in different jurisdictions, interpret the law as they want within their jurisdiction. There’s not much anyone else can do about it because it is a sovereign nation, and its laws and culture may not agree that training a model is a violation of intellectual property rights.

China will be the clear winner on this front, though.

So, is this good? Is this bad? It’s hard to say.

Generally speaking—and this is where we have to do some thinking as a civilization—a world with less choice is less optimal. If Google, Apple, and Meta are your only choices for AI, that’s not good—it’s fewer choices.

However, a world that compensates creators fairly and allows creators to hold companies accountable is also optimal. The ability to sue OpenAI is a good thing because, yes, they used your work without compensation or consent.

How this plays out, we don’t know. We don’t know how the balance will be found, but what is clear is that some companies, through existing terms of service and agreements that we signed, have an advantage that others do not. That will position them as winners in the licensed data wars in AI.

That’s going to do it for this episode. Thanks for tuning in; talk to you soon! If you enjoyed this video, please hit the like button. Subscribe to my channel if you haven’t already, and if you want to know when new videos are available, hit the bell button to be notified as soon as new content is live.

Mind Readings: Who Wins in a Generative AI Licensed Data War?

Machine-Generated Transcript

Comments

Leave a Reply Cancel reply

More posts

Mind Readings: Why AI Can’t Do Your Slides Well

AI Book Review: First-Party Data Activation

Mind Readings: What’s Missing from AI Digital Clones

Mind Readings: Stop Teaching AI to Fail Up

Pin It on Pinterest