(NEWSnet/AP) — Hidden inside the foundation of artificial intelligence image-generators are thousands of images of child sexual abuse, according to a report that urges companies to take action to address a flaw in the technology.
Those images have made it easier for AI systems to produce realistic and explicit imagery of fake children and to transform social media photos of fully-clothed real teens into nudes, to the alarm of schools and public safety.
Until recently, anti-abuse researchers believed the only way some unchecked AI tools produced abusive imagery of children was by combining what they’ve learned from two separate buckets of online images: adult pornography and benign photos of kids.
But Stanford Internet Observatory found more than 3,200 images of suspected child sexual abuse in AI database LAION, an index of online images and captions that has been used to train leading AI image-makers such as Stable Diffusion. The watchdog group worked with Canadian Centre for Child Protection and other anti-abuse charities to identify the illegal material and report the original photo links to law enforcement.
The response was immediate. The day prior to release of Stanford Internet Observatory’s report, LAION told The Associated Press it is removing its datasets temporarily.
LAION, an acronym for nonprofit Large-scale Artificial Intelligence Open Network, said in a statement that it “has a zero tolerance policy for illegal content and in an abundance of caution, we have taken down the LAION datasets to ensure they are safe before republishing them.”
While the images account for only a fraction of LAION’s index of 5.8 billion images, the Stanford group says it is likely influencing the ability of AI tools to generate harmful output and reinforcing prior abuse of real victims.
The problem has no simple solution, and traces back to many generative AI projects being “effectively rushed to market” and made widely accessible because the field is so competitive, said Stanford Internet Observatory’s chief technologist David Thiel, who authored the report.
“Taking an entire internet-wide scrape and making that dataset to train models is something that should have been confined to a research operation, if anything, and is not something that should have been open-sourced without a lot more rigorous attention,” Thiel said.
A prominent LAION user that helped to shape the dataset’s development is Stability AI, maker of Stable Diffusion text-to-image models. New versions of Stable Diffusion have made it much more difficult to create harmful content, but an older version introduced in 2022 is still intertwined with other applications and tools, according to the Stanford report.
“We can’t take that back. That model is in the hands of many people on their local machines,” said Lloyd Richardson, director of information technology at Canadian Centre for Child Protection.
Stability AI said it hosts only filtered versions of Stable Diffusion and that since taking over the exclusive development of Stable Diffusion, has taken “proactive steps to mitigate the risk of misuse.”
LAION was the brainchild of a German researcher Christoph Schuhmann, who told AP earlier this year that part of the reason to make such a major visual database publicly accessible is to ensure the future of AI development isn’t controlled by a few powerful companies.
“It will be much safer and much more fair if we can democratize it so that the whole research community and the whole general public can benefit from it,” he said.
Rebecca Portnoff, director of data science at the anti-child sexual abuse organization Thorn, said her organization has conducted research that shows the prevalence of AI-generated images among abusers is small, but growing consistently.
Developers can mitigate harm by making sure the datasets they use to develop AI models are free of abuse materials. Portnoff said there also are opportunities to mitigate harmful use after models are in circulation.
Copyright 2023 NEWSnet and The Associated Press. All rights reserved.