• Home
  • Motorcycles
  • Electric Motorcycles
  • 3 wheelers
  • FUV Electric 3 wheeler
  • Shop
  • Listings

Subscribe to Updates

Get the latest creative news from CycleNews about two, three wheelers and Electric vehicles.

What's Hot

6 Best Digital Photo Frames (2025): Aura, Nixplay, Skylight

How to Make AI Faster and Smarter—With a Little Help from Physics

Nice Rocc Palm Cooling Device Review: Pricey, Effective Palm Cooling

Facebook Twitter Instagram
  • Home
  • Motorcycles
  • Electric Motorcycles
  • 3 wheelers
  • FUV Electric 3 wheeler
  • Shop
  • Listings
Facebook Twitter Instagram Pinterest
Cycle News
Submit Your Ad
Cycle News
You are at:Home » The Fight Against AI Comes to a Foundational Data Set
Electric Motorcycles

The Fight Against AI Comes to a Foundational Data Set

cycleBy cycleJune 13, 202403 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Danish media outlets have demanded that the nonprofit web archive Common Crawl remove copies of their articles from past datasets and stop crawling their websites immediately. This request was issued amid growing outrage over how artificial intelligence companies like OpenAI are using copyrighted materials.

Common Crawl plans to comply with the request, first issued on Monday. Executive director Rich Skrenta says the organization is “not equipped” to fight media companies and publishers in court.

The Danish Rights Alliance (DRA), an association representing copyright holders in Denmark, spearheaded the campaign. It made the request on behalf of four media outlets, including Berlingske Media and the daily newspaper Jyllands-Posten. The New York Times made a similar request of Common Crawl last year, prior to filing a lawsuit against OpenAI for using its work without permission. In its complaint, the New York Times highlighted how Common Crawl’s data was the most “highly weighted dataset” in GPT-3.

Thomas Heldrup, the DRA’s head of content protection and enforcement, says that this new effort was inspired by the Times. “Common Crawl is unique in the sense that we’re seeing so many big AI companies using their data,” Heldrup says. He sees its corpus as a threat to media companies attempting to negotiate with AI titans.

Although Common Crawl has been essential to the development of many text-based generative AI tools, it was not designed with AI in mind. Founded in 2007, the San Francisco-based organization was best known prior to the AI boom for its value as a research tool. “Common Crawl is caught up in this conflict about copyright and generative AI,” says Stefan Baack, a data analyst at the Mozilla Foundation who recently published a report on Common Crawl’s role in AI training. “For many years it was a small niche project that almost nobody knew about.”

Prior to 2023, Common Crawl did not receive a single request to redact data. Now, in addition to the requests from the New York Times and this group of Danish publishers, it’s also fielding an uptick of requests that have not been made public.

In addition to this sharp rise in demands to redact data, Common Crawl’s web crawler, CCBot, is also increasingly thwarted from accumulating new data from publishers. According to the AI detection startup Originality AI, which often tracks the use of web crawlers, over 44 percent of the top global news and media sites block CCBot. Apart from Buzzfeed, which began blocking it in 2018, most of the prominent outlets it analyzed—including Reuters, The Washington Post, and the CBC—only spurned the crawler in the last year. “They’re being blocked more and more,” Baack says.

Common Crawl’s quick compliance with this kind of request is driven by the realities of keeping a small nonprofit afloat. Compliance does not equate to ideological agreement, though. Skrenta sees this push to remove archival materials from data repositories like Common Crawl as nothing short of an affront to the internet as we know it. “It’s an existential threat,” he says. “They’ll kill the open web.”



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleSupreme Court Upholds Access to Abortion Pill in Unanimous Vote
Next Article AI Chatbots Are Running for Office Now
cycle
  • Website

Related Posts

6 Best Digital Photo Frames (2025): Aura, Nixplay, Skylight

June 1, 2025

How to Make AI Faster and Smarter—With a Little Help from Physics

June 1, 2025

Nice Rocc Palm Cooling Device Review: Pricey, Effective Palm Cooling

June 1, 2025
Add A Comment

Leave A Reply Cancel Reply

You must be logged in to post a comment.

Demo
Top Posts

6 Best Digital Photo Frames (2025): Aura, Nixplay, Skylight

June 1, 2025

The urban electric commuter FUELL Fllow designed by Erik Buell is now opening orders | thepack.news | THE PACK

July 29, 2023

2024 Yamaha Ténéré 700 First Look [6 Fast Facts For ADV Riding]

July 29, 2023
Stay In Touch
  • Facebook
  • YouTube
  • TikTok
  • WhatsApp
  • Twitter
  • Instagram
Latest Reviews

Subscribe to Updates

Get the latest tech news from FooBar about tech, design and biz.

Demo
Most Popular

6 Best Digital Photo Frames (2025): Aura, Nixplay, Skylight

June 1, 2025

The urban electric commuter FUELL Fllow designed by Erik Buell is now opening orders | thepack.news | THE PACK

July 29, 2023

2024 Yamaha Ténéré 700 First Look [6 Fast Facts For ADV Riding]

July 29, 2023
Our Picks

Dame Hug Review: A Sex Toy for Couples

The I-10 Freeway Fire May Have Been Fueled by Exploding Hand Sanitizer

xBloom Studio Coffee Machine Review: The Future May Be Closer Than You Think

Subscribe to Updates

Get the latest news from CycleNews about two, three wheelers and Electric vehicles.

© 2025 cyclenews.blog
  • Home
  • About us
  • Get In Touch
  • Shop
  • Listings
  • My Account
  • Submit Your Ad
  • Terms & Conditions
  • Stock Ticker

Type above and press Enter to search. Press Esc to cancel.