why-watermarking-won’t-work

Why Watermarking Won’t Work

VentureBeat/Ideogram

Join Gen AI enterprise leaders in Boston on March 27 for an exclusive night of networking, insights, and conversations surrounding data integrity. Request an invite here.


In case you hadn’t noticed, the rapid advancement of AI technologies has ushered in a new wave of AI-generated content ranging from hyper-realistic images to compelling videos and texts. However, this proliferation has opened Pandora’s box, unleashing a torrent of potential misinformation and deception, challenging our ability to discern truth from fabrication.

The fear that we are becoming submerged in the synthetic is of course not unfounded. Since 2022, AI users have collectively created more than 15 billion images. To put this gargantuan number in perspective, it took humans 150 years to produce the same amount of pictures before 2022.

The staggering amount of AI-generated content is having ramifications we are only beginning to discover. Due to the sheer volume of generative AI imagery and content, historians will have to view the internet post-2023 as something completely different to what came before, similar to how the atom bomb set back radioactive carbon dating. Already, many Google Image searches yield gen AI results, and increasingly, we see evidence of war crimes in the Israel/Gaza conflict decried as AI when in fact it is not. 

Embedding ‘signatures’ in AI content

For the uninitiated, deepfakes are essentially counterfeit content generated by leveraging machine learning (ML) algorithms. These algorithms create realistic footage by mimicking human expressions and voices, and last month’s preview of Sora — OpenAI’s text-to-video model — only further showed just how quickly virtual reality is becoming indistinguishable from physical reality. 

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

Quite rightly, in a preemptive attempt to gain control of the situation and amidst growing concerns, tech giants have stepped into the fray, proposing solutions to mark the tide of AI-generated content in the hopes of getting a grip on the situation. 

In early February, Meta announced a new initiative to label images created using its AI tools on platforms like Facebook, Instagram and Threads, incorporating visible markers, invisible watermarks and detailed metadata to signal their artificial origins. Close on its heels, Google and OpenAI unveiled similar measures, aiming to embed ‘signatures’ within the content generated by their AI systems. 

These efforts are supported by the open-source internet protocol The Coalition for Content Provenance and Authenticity (C2PA), a group formed by arm, BBC, Intel, Microsoft, Truepic and Adobe in 2021 with the aim to be able to trace digital files’ origins, distinguishing between genuine and manipulated content.

These endeavors are an attempt to foster transparency and accountability in content creation, which is of course a force for good. But while these efforts are well-intentioned, is it a case of walking before we can run? Are they enough to truly safeguard against the potential misuse of this evolving technology? Or is this a solution that is arriving before its time?

Who gets to decide what’s real?

I ask only because upon the creation of such tools, quite quickly a problem emerges: Can detection be universal without empowering those with access to exploit it? If not, how can we prevent misuse of the system itself by those who control it? Once again, we find ourselves back to square one and asking who gets to decide what is real? This is the elephant in the room, and before this question is answered my concern is that I will not be the only one to notice it.

This year’s Edelman Trust Barometer revealed significant insights into public trust in technology and innovation. The report highlights a widespread skepticism towards institutions’ management of innovations and shows that people globally are nearly twice as likely to believe innovation is poorly managed (39%) rather than well managed (22%), with a significant percentage expressing concerns about the rapid pace of technological change not being beneficial for society at large.

The report highlights the prevalent skepticism the public holds towards how business, NGOs and governments introduce and regulate new technologies, as well as concerns about the independence of science from politics and financial interests.

Notwithstanding how technology repeatedly shows that as counter measures become more advanced, so too do the capabilities of the problems they are tasked with countering (and vice versa ad infinitum). Reversing the lack of trust in innovation from the wider public is where we must begin if we are to see watermarking stick.

As we have seen, this is easier said than done. Last month, Google Gemini was lambasted after it shadow-prompted (the method in which the AI model takes a prompt and alters it to fit a particular bias) images into absurdity. One Google employee took to the X platform to state that it was the ‘most embarrassed’ they had ever been at a company, and the models propensity to not generate images of white people put it front and center of the culture war. Apologies ensued, but the damage was done.

Shouldn’t CTOs know what data models are using?

More recently, a video of OpenAI’s CTO Mira Murati being interviewed by The Washington Post went viral. In the clip, she is asked about what data was used to train Sora — Murati responds with “publicly available data and licensed data.” Upon a follow up question about exactly what data has been used she admits she isn’t actually sure.

Given the massive importance of training data quality, one would presume this is the core question a CTO would need to discuss when the decision to commit resources into a video transformer would need to know. Her subsequent shutting down of the line of questioning (in an otherwise very friendly interview I might add) also rings alarm bells. The only two reasonable conclusions from the clip is that she is either a lackluster CTO or a lying one.

There will of course be many more episodes like this as this technology is rolled out en masse, but if we are to reverse the trust deficit, we need to make sure that some standards are in place. Public education on what these tools are and why they are needed would be a good start. Consistency in how things are labeled — with measures in place that hold individuals and entities accountable for when things go wrong — would be another welcome addition. Additionally, when things inevitably go wrong, there must be open communication about why such things did. All throughout, transparency in any and across all processes is essential.

Without such measures, I fear that watermarking will serve as little more than a plaster, failing to address the underlying issues of misinformation and the erosion of trust in synthetic content. Instead of acting as a robust tool for authenticity verification, it could become merely a token gesture, most likely circumvented by those with the intent to deceive or simply ignored by those who assume they have been already.

As we will (and in some places are already seeing), deepfake election interference will likely be the defining gen AI story of the year. With more than half of the world’s population heading to the polls and public trust in institutions still firmly sat at a nadir, this is the problem we must solve before we can expect anything like content watermarking to swim rather than sink.

Elliot Leavy is founder of ACQUAINTED, Europe’s first generative AI consultancy.

DataDecisionMakers

Welcome to the VentureBeat community!

DataDecisionMakers is where experts, including the technical people doing data work, can share data-related insights and innovation.

If you want to read about cutting-edge ideas and up-to-date information, best practices, and the future of data and data tech, join us at DataDecisionMakers.

You might even consider contributing an article of your own!

Read More From DataDecisionMakers