Поиск по этому блогу

Search1

123

пятница, 26 июля 2024 г.

Building a more sustainable AI ecosystem

Data walls, dialectics, drafting plans for the future
O'Reilly
Next:Economy
Newsletter

"Only a copyright-aware AI ecosystem can tear down the data wall." Generated with Adobe Firefly.

A data wall of their own making

Training a new generative AI model requires an enormous amount of data. But as The New York Times' Kevin Roose explains, "Gathering new data has gotten trickier" in recent months "as publishers and online platforms have taken steps to prevent their data from being harvested." Roose cites a new study from the Data Provenance Initiative that confirms an "emerging crisis in consent." Analyzing three popular data sets, C4, RefinedWeb, and Dolma, the research shows that in just the past year, "5 percent of all data, and 25 percent of data from the highest-quality sources, has been restricted" and "as much as 45 percent of the data in one set, C4, had been restricted by websites' terms of service" (per Roose). And AI companies are concerned, Roose reports:

Some A.I. executives I've spoken to worry about hitting the "data wall"—their term for the point at which all of the training data on the public internet has been exhausted, and the rest has been hidden behind paywalls, blocked by robots.txt or locked up in exclusive deals.

As I wrote in “How to Fix 'AI's Original Sin' " (which we shared last week), it doesn’t have to be this way, if AI companies would only put the lessons we learned from the web and YouTube into practice. Web-crawling search engines like Google made a bargain: we’ll read all your content and use it to build a search index, but that will be good for you because we’ll help people to find your content, and we'll help you monetize it. YouTube gave music companies asking for a takedown of any user video that contained copyrighted music a better alternative: let us monetize it for you and share the revenue. At O’Reilly, we ground all our AI derivatives in content from our authors, subject matter experts, and partner publishers, and tie it directly into our payment system, which allocates a share of our subscription revenue to our content providers based on usage.

I've had conversations with OpenAI and other AI companies since 2022 about the urgent need for an economic model by which they reward creators for participating in the AI ecosystem. But they've chosen instead to take without figuring out how to give back. The fact that more and more content is being closed off from use in AI training is a direct result of the content land grab. As the Chinese philosopher Lao Tzu once wrote, “Fail to honor people, they fail to honor you.”

It’s not too late to build a creator- and copyright-aware AI ecosystem that allows training on copyrighted material because it provides fair recompense for its use—not with one-time licensing fees (“selling your house for firewood”) but as part of a sustainable business partnership that allocates value to those who help create it. I made a few suggestions in my article referenced above, but a world of possibility awaits once entrepreneurs start seeing the possibilities in a copyright-aware AI ecosystem.

+ From AI Snake Oil: "AI Scaling Myths"

+ From The New York Times: "How Tech Giants Cut Corners to Harvest Data for A.I."

+ From Proof News and WIRED: "Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI."

+ From The Verge: "Biden's Top Tech Adviser Says AI Is a 'Today Problem'."

+ ICYMI: Ilan Strauss and I are leading the SSRC's AI Disclosures Project. You can check it out here. And please follow our newsletter and social media accounts if you're interested.

Planning for our AI future(s)

As Ethan Mollick pointed out this week on One Useful Thing, "Nobody knows the future of AI." Will we achieve AGI in five years? In 50? In some ways, Mollick argues, it doesn't matter. Whatever the eventual outcome, "AGI serves as a motivating goal for an entire industry," and that fact continues to drive both advances and disruptions. But not knowing the future doesn't mean we shouldn't prepare for what may come. As Mollick notes, "You don't need to know what happens next to realize that you should be planning for multiple contingencies." ( I wholeheartedly agree.) It's a good reminder that considering possible futures can help us "steer towards" the one we want.

"Humanity's Hegelian Golden Braid"

Reid Hoffman, venture capitalist and cofounder of LinkedIn and Inflection AI, was recently awarded an honorary doctorate from the University of Perugia, and his speech, "Humanity's Hegelian Golden Braid," which ruminates on philosophy, semiotics, AI, and more, is quite good. You should definitely read it in full, but I'll draw your attention to his insightful concluding remarks:

Is there a future where the massive proliferation of robots ushers in a new era of human flourishing, not human marginalization? Where AI-driven research helps us safely harness the power of nuclear fusion in time to avert the worst consequences of climate change? It's only natural to peer into the dark unknown and ask what could possibly go wrong. It's equally necessary—and more essentially human—to do so and envision what could possibly go right.
It's not human vs. AI. It's human with AI.
It's not thesis vs. antithesis. It's synthesis.
And so, with Cato in mind, I will end with what I hope becomes a refrain:
Figura debet AI. AI auxilium nobis figurat.
We must shape AI. And, in turn, let AI help shape us.

—Tim O’Reilly and Peyton Joyce

 

Комментариев нет:

Отправить комментарий

Примечание. Отправлять комментарии могут только участники этого блога.