AI Chatbots could devour all of the internet’s written knowledge by 2026

microsoft, ai chatbots could devour all of the internet’s written knowledge by 2026

An artist's illustration showing a robot and human hand touching a book emerging from an open laptop.

Artificial intelligence (AI) systems could devour all of the internet's free knowledge as soon as 2026, a new study has warned.

AI models such as GPT-4, which powers ChatGPT, or Claude 3 Opus rely on the many trillions of words shared online to get smarter, but new projections suggest they will exhaust the supply of publicly-available data sometime between 2026 and 2032.

This means to build better models, tech companies will need to begin looking elsewhere for data. This could include producing synthetic data, turning to lower-quality sources, or more worryingly tapping into private data in servers that store messages and emails. The researchers published their findings June 4 on the preprint server arXiv.

"If chatbots consume all of the available data, and there are no further advances in data efficiency, I would expect to see a relative stagnation in the field," study first author Pablo Villalobos, a researcher at the research institute Epoch AI, told Live Science. "Models [will] only improve slowly over time as new algorithmic insights are discovered and new data is naturally produced."

Training data fuels AI systems' growth — enabling them to fish out ever-more complex patterns to root inside their neural networks. For example, ChatGPT was trained on roughly 570 GB of text data, amounting to roughly 300 billion words, taken from books, online articles, Wikipedia and other online sources.

Algorithms trained on insufficient or low-quality data produce sketchy outputs. Google's Gemini AI, which infamously recommended that people add glue to their pizzas or eat rocks, sourced some of its answers from Reddit posts and articles from the satirical website The Onion.

To estimate how much text is available online, the researchers used Google's web index, calculating that there were currently about 250 billion web pages containing 7,000 bytes of text per page. Then, they used follow-up analyses of internet protocol (IP) traffic —  the flow of data across the web — and the activity of users online to project the growth of this available data stock.

The results revealed that high-quality information, taken from reliable sources, would be exhausted before 2032 at the latest — and that low-quality language data will be used up between 2030 and 2050. Image data, meanwhile, will be completely consumed between 2030 and 2060.

Neural networks have been shown to predictably improve as their datasets increase, a phenomenon called the neural scaling law. It’s therefore an open question if companies can improve their model’s efficiency to account for the lack of fresh data, or if turning off the spigot will cause model improvements to plateau.

However, Villalobos said that it seems unlikely the data scarcity would dramatically inhibit future AI model growth. That's because there are several possible approaches firms could use to work around the issue.

"Companies are increasingly trying to use private data to train models, for example Meta's upcoming policy change," he added, in which the company announced it will use interactions with chatbots across its platforms to train its generative AI from June 26. "If they succeed in doing so, and if the usefulness of private data is comparable to that of public web data, then it's quite likely that leading AI companies will have more than enough data to last until the end of the decade. At that point, other bottlenecks such as power consumption, increasing training costs, and hardware availability might become more pressing than lack of data."

Another option is to use synthetic, artificially generated data to feed the hungry models — although this has only previously been used successfully in training systems in games, coding and math.

Alternatively, if companies make an attempt to harvest intellectual property or private information without permission, some experts foresee legal challenges ahead.

"Content creators have protested against the unauthorised use of their content to train AI models, with some suing companies such as Microsoft, OpenAI and Stability AI," Rita Matulionyte, an expert in technology and intellectual property law and associate professor at Macquarie University, Australia, wrote in The Conversation. "Being remunerated for their work may help restore some of the power imbalance that exists between creatives and AI companies."

The researchers note that data scarcity isn’t the only challenge to continued improvement of AI. ChatGPT-powered Google searches consume almost 10 times the amount of electricity as a traditional search, according to the International Energy Agency. This has made tech leaders attempt to develop nuclear fusion startups to fuel their hungry data centers, although the nascent power generation method is still far from viable.

OTHER NEWS

13 minutes ago

Video: Shocking video shows kids rescued from car in triple digit temps as mom is busted for leaving them in there

13 minutes ago

Here are several new Illinois laws hitting the books on Monday

16 minutes ago

England new boy Jamie Smith makes hundred for Surrey on testing day for batters

19 minutes ago

Russell said Wolff's radio call almost made him crash

19 minutes ago

Dyan Colclough obituary

19 minutes ago

Calls for end to Cabinet stalemate

19 minutes ago

Silverfish infestation: How to spot and eliminate these pesky invaders

19 minutes ago

UAE Petrol Prices for July 2024

19 minutes ago

Dubai Gold Prices for July 2024

19 minutes ago

Gathering of 10,000 hippies in forest shut down as Rainbow Family threatened with jail

19 minutes ago

Nightlife revival, seatbelt fines on mid-year setlist

19 minutes ago

Burgum defends Trump lies during debate: ‘Not news’

19 minutes ago

Stevie Nicks cheers on Taylor Swift, attends final Eras Tour night in Dublin, Ireland

19 minutes ago

Mert Ramazan Demir to star in exciting new project following Yalı Çapkını success

19 minutes ago

Is it true that AI won't take your job — but someone who knows AI will?

19 minutes ago

Mom Welcomes Twin Babies Early and Gets Her Ph.D. — and After She Got a Special Graduation Ceremony, Her Kids Did Too

19 minutes ago

UPDATE: John Force opens eyes, tells family 'I Love You,' is moving forward in recovery process

19 minutes ago

Revealed: What Declan Rice said to Slovakia manager Francesco Calzona during fiery full-time exchange after England came from behind to win in last-16 clash

23 minutes ago

England reaches Euro 2024 quarterfinals after Bellingham, Kane spark comeback win over Slovakia

24 minutes ago

"Intimidation factor of more dominant physique" - Tim Grover on why Michael Jordan focused on growing his biceps

24 minutes ago

Andre Iguodala offers a warning to Julius Randle amid his uncertain future with the Knicks

24 minutes ago

A Quiet Place: Day One's Rotten Tomatoes Debut Score Is Worthy of the Original Duology

24 minutes ago

Seahawks defense under Mike Macdonald should be 'right on the cutting edge'

24 minutes ago

5 GOTY Winners Get 90%+ Discounts For Steam Summer Sale 2024

24 minutes ago

Princess Eugenie Says She Is ‘Proud’ of Her Scoliosis Scar as She Thanks Mom Sarah Ferguson for Support

24 minutes ago

Vladimir Putin to send Russian children to North Korean summer camps

24 minutes ago

England vs Slovakia player ratings: Jude Bellingham and Kobbie Mainoo shine in Euro 2024 victory

24 minutes ago

2024 Ally 400: NASCAR at Nashville DFS lineups, Fantasy picks, odds, rankings, driver pool, advice

24 minutes ago

Fauci says Biden’s bid for second term is ‘an individual choice,’ recalls ‘positive’ experience with president

24 minutes ago

NYC’s budget plane on a risky flight path as City Council takes Adams for a ride

24 minutes ago

England fans rave over Jude Bellingham's celebration against Slovakia as they spot what the Real Madrid star shouted after tournament-rescuing Euro 2024 goal

24 minutes ago

ENGLAND PLAYER RATINGS: Which Three Lions regular was 'jittery' and failed to take responsibility for Slovakia's goal? Who scored a FOUR after dismal display? And who put his team-mates to shame?

24 minutes ago

Video: Ecstasy for England fans stuck back home and abroad as Three Lions tame Slovakia to produce thrilling 2-1 victory in dramatic Euro 2024 clash

29 minutes ago

Labour blames private schools for ‘pricing out’ middle classes in row over VAT on fees

29 minutes ago

Gareth Southgate’s straitjacket suffocates England again – but one lesson emerges

29 minutes ago

A look at what England can expect from quarter-final opponents Switzerland

29 minutes ago

Michael J. Fox joins Coldplay for surprise performance at Glastonbury Festival

29 minutes ago

A propane tank explosion in western Turkey has killed 5 people and injured 63 others

30 minutes ago

Star Wars' Deadliest Guns Are So Powerful They're Banned

32 minutes ago

Meet the U.S. Men’s Gymnastics Team Headed to the Paris Olympics