LONDON:
A new set of laws governing the use of artificial intelligence (AI) in the European Union will force companies to be more transparent about the data used to train their systems, prying open one of the industry’s most closely guarded secrets.
In the 18 months since Microsoft-backed OpenAI unveiled ChatGPT to the public, there has been a surge of public engagement and investment in generative AI, a set of applications that can be used to rapidly produce text, images, and audio content.
But as the industry booms, questions have been raised over how AI companies obtain the data used to train their models, and whether feeding them bestselling books and Hollywood movies without their creators’ permission amounts to a breach of copyright.
The EU’s recently-passed AI Act is being rolled out in phases over the next two years, giving regulators time to implement the new laws while businesses grapple with a new set of obligations. But how exactly some of these rules will work in practice is still unknown.
One of the more contentious sections of the Act states that organisations deploying general-purpose AI models, such as ChatGPT, will have to provide “detailed summaries” of the content used to train them. The newly established AI Office said it plans to release a template for organizations to follow in early 2025, following a consultation with stakeholders.
While the details have yet to be hammered out, AI companies are highly resistant to revealing what their models have been trained on, describing the information as a trade secret that would give competitors an unfair advantage were it made public.
“It would be a dream come true to see my competitors’ datasets, and likewise for them to see ours,” said Matthieu Riouf, CEO of AI-powered image-editing firm Photoroom. “It’s like cooking,” he added. “There’s a secret part of the recipe that the best chefs wouldn’t share, the ‘je ne sais quoi’ that makes it different.”
How granular these transparency reports end up being will have big implications for smaller AI startups and big tech companies like Google and Meta, which have put the technology at the centre of their future operations.