Generative AI is trained on the open web. That sounds abstract until you realise
the open web means your selfies, your mother's voice note, the sketch you
posted in eighth grade, the reel you filmed last Tuesday. Here are five ways that
plays out.
01 · Ordinary users
Your photos, their deepfakes.
Every public photo is a training sample. Face-swap apps, undressing tools, and
non-consensual deepfake porn all rely on models that learned from ordinary
selfies, most of which were never meant for that purpose. Once an image leaves
your phone, it can be reconstituted into something you never posed for.
e.g., a global surge in AI deepfakes driving new cases of child sexual violence.[1]
02 · Voters
Audio and video you can't trust.
Cloned political voices and AI-generated rally footage have already circulated
during election seasons. Detection tools lag behind generation tools by months,
sometimes years. The cost of making a convincing fake dropped from a studio to
a laptop and ten minutes.
e.g., AI-generated deepfake videos and cloned voices of political leaders flagged as threats across Indian elections.[2]
03 · Artists
Style mimicry isn't flattery: it's replacement.
A diffusion model that has seen a few dozen of an illustrator's pieces can
generate new work in their style on demand, for free, in seconds. Commissions
dry up. Clients pick the $0 version. The artist's own search results get
buried under imitations. Class-action lawsuits are slow; by the time courts
rule, the career damage is done.
e.g., the Glaze user study: 1,156 artists, 88% want a protection tool.[3]
04 · Creators
AI translation and the missing original.
Platforms now auto-dub creators into languages they don't speak, matching lip
movements and voice timbre. Nuance, slang, and regional identity get
flattened. More importantly: the AI-modified version often gets more reach
than the original, so the original effectively disappears.
e.g., YouTube's auto-dubbing feature automatically translating creators' videos into other languages.[4][5]
05 · Everyone
You can't read what you can't find.
Most platforms bury AI-training clauses inside 40-page Terms of Service that
the average person will never read: just 9% of adults say they always read
privacy policies before agreeing.[7]
Opt out of AI training is either missing, hidden three menus deep, or
only available to users in specific jurisdictions.[6]
Consent, in practice, is manufactured by exhaustion.
The pushback from publishers isn't aimed only at OpenAI or Anthropic. When a
newspaper updates its robots.txt to keep AI crawlers out, the same
rule frequently locks out the Internet Archive too: the public-memory service
that has preserved over a trillion web pages since 1996. By late 2024, a
large share of the world's top news sites were blocking at least
one major AI crawler;[8] many major
news sites have also moved to block the Wayback Machine.[9]
Journalism and the public record used to share one infrastructure. AI-training
anxiety is pulling them apart.
The bind is brutal: the same tools publishers reach for to keep AI models out
also keep librarians, researchers, and ordinary readers from preserving what
was published. The remedy we build for one problem is quietly dismantling the
piece of the web most of us assumed would always be there.