Shadow ‘Archive’ Says It Copied Virtually All of Spotify’s Music

“Anna’s Archive” claimed it scraped 86 million songs from Spotify—revealing some wild things about people’s favorite music.

By Jose Antonio Lanz

6 min read

Anna's Archive, the shadow library best known for making pirated ebooks and academic papers searchable, announced this weekend what might be the largest music piracy operation in history: "We backed up Spotify."

The group claims it scraped 86 million audio files from Spotify, representing 99.6% of everything people actually listen to on the platform. Total size: just under 300 terabytes, distributed through bulk torrents.

Spotify isn't happy. A spokesperson told Billboard that "a third party scraped public metadata and used illicit tactics to circumvent DRM to access some of the platform's audio files." Note the careful wording there: "some" audio files. Anna's Archive says 86 million. Spotify isn't confirming the scale. The company also called the group "anti-copyright extremists" who had previously pirated content from YouTube.

So, aside from ripping off Spotify—and the recording artists, whose income is predominantly derived from royalty payments—what exactly did they get?

The numbers

Anna's Archive claims metadata for 99% of Spotify’s library of 256 million tracks, including audio files for the 86 million songs that actually matter—the ones people play. The metadata database alone contains 186 million unique ISRCs (International Standard Recording Codes). For comparison, MusicBrainz, the largest legal open music database, has about 5 million. Anna's Archive just built something 37 times bigger.

Popular tracks were preserved in their original OGG Vorbis format at 160 kilobits per second—no re-encoding, no quality loss. Less popular stuff got compressed to OGG Opus at 75 kbps to save space. The group used Spotify's own popularity metric to prioritize what to grab first, focusing on tracks with popularity scores above zero.

Over 70% of Spotify's 256 million tracks have a popularity score of exactly zero. Nobody listens to them. The top 10,000 songs span popularity scores of 70-100. Only about 210,000 songs—roughly 0.1% of the catalog—have a popularity score of 50 or higher. Those 0.1% account for the vast majority of all listening activity.

The top three songs on Spotify right now? Lady Gaga and Bruno Mars's "Die With A Smile" (3.07 billion streams), Billie Eilish's "BIRDS OF A FEATHER" (3.13 billion), and Bad Bunny's "DtMF" (1.12 billion). Those three tracks alone have more total plays than the bottom 20 to 100 million songs combined.

In other words, Spotify is mostly a graveyard of songs nobody will ever hear. The group decided not to archive that graveyard (the full catalog)—it would have required an additional 700 terabytes of storage for content representing just 0.04% of listening activity. Much of it is AI-generated slop anyway.

The weird stuff in the data

Anna's Archive published extensive analysis of what they found. Some of it is predictable. Some of it is strange.

Track durations cluster sharply at exactly 2:00, 3:00, and 4:00 minutes. The group says they don't know why. Album releases have exploded exponentially since 2015, with over 10 million albums dated 2023 alone—likely driven by AI generation and automated uploads.

 

Source: Anna's Archive

Electronic/Dance is the largest genre category by artist count (520,075), followed by Rock (370,179) and World/Traditional (202,529).

Also, believe it or not, Opera, choral, and chamber music have the most artists per specific sub-genre.

Source: Anna's Archive

The audio features data reveals that loudness correlates strongly with energy (no surprise), BPM clusters around 120 with a normal distribution, and most tracks have low "speechiness" and "instrumentalness" scores—meaning vocals dominate. C major and G major are the most common keys. About 13.5% of all tracks on Spotify are tagged as explicit content.

Why do this?

Anna's Archive frames it as preservation, not piracy. "We saw a role for us here to build a music archive primarily aimed at preservation," the blog post reads. The group argues that existing music archiving efforts focus too heavily on popular artists and audiophile-quality formats (lossless FLAC), leaving obscure music vulnerable to vanishing if platforms change policies or shut down.

There's some truth to that. Spotify controls 256 million tracks and can remove content, change licensing terms, or disappear entirely. Decentralized torrent distribution creates redundancy that can't be shut down by any single entity. The data is already spread across thousands of torrent nodes worldwide.

But let's be real. This is also just piracy. Spotify pays artists somewhere between $0.003 and $0.005 per stream. According to Dittomusic’s Spotify revenue calculator, 1 million reproductions would yield an artist $4,370 in royalties. Free distribution via torrents eliminates even that minimal compensation.

Both things are true at once.

The legal meteor is coming

Anna's Archive already faces mounting legal pressure. Belgium issued blocking orders with fines up to €500,000 in July 2025. The UK secured High Court blocks in December 2024. Germany's major ISPs blocked the site's main domains in October 2025. According to its own transparency report, Google has removed 749 million Anna's Archive URLs from search results—that's 5% of all DMCA takedown requests the search engine has received since 2012.

The Internet Archive—a legitimate nonprofit—settled a lawsuit over its Great 78 Project for digitizing obsolete 78rpm records after publishers sought $621 million in damages. Anna's Archive just archived 31,000 times more tracks, all current, all in-demand. The music industry's legal response will make the Internet Archive case look quaint.

On Hacker News, commenters debated whether the archive would actually be useful for consumers given Spotify's convenience. One pointed out that Anna's Archive already offers "enterprise-level" access to its book archives for tens of thousands of dollars—essentially selling bulk data access to AI companies for training.

For now, only metadata has been fully released. The audio files are rolling out gradually through bulk torrents, starting with the most popular tracks. Anna's Archive asked users to help seed the torrents and mentioned they might add individual file downloads if there's enough interest.

The lawsuits are probably coming. The only question is whether the archive survives them—and at this point, it probably doesn't matter. The data is already out there, distributed across thousands of nodes that can't be centrally shut down. That's the whole point of torrents.

Get crypto news straight to your inbox--

sign up for the Decrypt Daily below. (It’s free).

Recommended News