Content Authenticity and the Web

Meeting minutes

DKA: I don't have slides prepared, beyond what was written in the proposal for the breakout
… the idea here, is to try and think about this in terms of scoping a workshop that we might run next year on the topic of web aquthenticity
… as many may be aware, we have a problem on the web with misinformation, disinformation, intentional or unintentional
… increasingly sophisiticated tools to create photorealistic imagery
… static images and video of people, doing or saying things they did not
… we also have a proliferation of tools that people are using on the devices they buy, that allow them to use sophisticated tools to edit and adapt their content
… it's difficult to know what is real, and what is not
… the high level topic, what role can web technologies, W3C, play in tackling this problem?
… at its heart, there is a social issue here
… like many things we focus on here, security, privacy, accessibility
… we can look at things from the platform level

<TallTed> I trust all have awareness of https://www.w3.org/community/credibility/, https://www.w3.org/groups/wg/vc/, and the general applicability of cryptographically signing documents

DKA: some of the remedy is at the social, legislative, regulatory level
… out of scope for us
… we need to be aware of the regulatory landscape as well
… that's my opening
… just for disclosure, my day job is with Samsung as a contractor, part of that work is in C2PA, content authenticity
… a particular kind of digital watermarking technology, I want to disclose my biases,
… I don't want to bring my biases into this session, I want to focus more broadly on what we can do here
… makes sense?

<Zakim> gendler, you wanted to present publisher perspective

gendler: Speaking in my capacity as AC rep for NewsCorp, thanks Dan for putting this together
… I'd like to present one publisher's perspective on this problem, a general perspective, this doesn't represent all publishers
… Dan's comment on societal issues on trust and misinformation, incredibly important to think about
… there's any number of groups that would love to take on solving trust on the internet at a human level
… bigger than we can solve
… that also applies to solutions that may try to cover all types of content, all domains
… in my experience and conversations with others, tailored solutions to individual formats tend to be much more solveable and useful to publishers
… Dan mentioned C2PA, it focuses on questions of image provenance
… useful to publishers in many contexts
… it doesn't solve all issues with images
… similarly there are domain-level solutions that solves some problems for us, but not all
… I recommend we avoid trying to solve for one solution, there is not a single solution to this, we need to give space to multiple things
… maybe multiple workshops, more focused on specific areas

<Zakim> dsinger, you wanted to comment on a long-standing tension

dsinger: Personal opinion, one of the low-hanging fruits in authenticity, citing sources
… we can do this with URLs, linking to places
… bumps into modern problems
… deep-linking, it's complex
… and many sites aren't statically instructed, finding a stable URL may be hard to find
… or not exist
… just being able to cite something, before anything else, is hard

<gendler> +1 to awareness of past work

TallTed: It's important if this is going to be a workshop on this, people be familiar with Cred Web CG and ???
… this has been going on for some years now
… being able to say definitively "entity x emitted this thing"
… two perspectives on the importance of this

<shigeya> +1, and I forgot to mention OP is using VC standard in our previous session.

TallTed: the publishers' perspective
… and the consumers', more valid and more important

<tzviya8> W3C Credible Web Report https://www.w3.org/2018/10/credibility-tech/

TallTed: I need to be able to know who said this thing, so I can trust them in future
… when consuming things, there are varying levels of trustworthiness, being able to publish and affirm trustworthiness

<kenneth> I think the "???" above was Verifiable Credentials WG?

TallTed: not trivial, it's a decade-long conversation

<gendler> +1 to tangle!

<tzviya8> s/Verifiable Credentials WG?Verifiable Credentials WG

mt: This is a challenge in that the web as a whole has been in an epistemilogical crisis for a long time
… society grapples with these questions since forever
… people point to tools, social media, AI, each creates the same crisis
… this is not something that the w3c can do easily
… C2PA is maybe not suited for this purpose, but I'll do that later
… we need to determine the problem to be tackled
… a single workshop will not solve authenticity
… a single workshop would need to look at a specific problem

<gendler> +1 to scope of workshops, thus my point about multiple workshops likely being the right approach

mt: scope seems not determined yet
… each person who has spoken has cited a slightly different problem

tzviya: I think a lot of excellent points have been made

<Zakim> tzviya, you wanted to respond to martin

tzviya: I agree with Martin, any change that comes will need to be incremental
… one thing Ted mentioned is looking at this from the eyes of the user, and that is important
… we have looked at things from the development perspective for a long time
… we can look at it like "content comes from x", some publications have resources like the Pinocchio Radar
… fact-checking, we assume publications do it
… we assume, doesn't mean it works!
… I provided a link to the report from the Credible Web CG, it is useful
… we won't eliminate bad actors, but we can look at ways to inform
… "this was produced by AI", "this article was peer-reviewed"
… the idea of having a workshop to get into a room to discuss the problems

DKA: One of the discussions on scope is the user needs
… in the TAG, we get a lot of submissions or requests to look at different specifications

<mt> me gendler, I find it encouraging that you think I make perfect decisions

DKA: we often push back with "what is the user need?"
… we need to really focus on the user need

DKA: That would be a starting point for me

<mt> s/me gendler...//

bigbluehat: One of the key words is "authenticity", but half of what we have said is about "trust"
… authenticity we can provide, trust is subjective
… the web as constructed now is not authentic

<gendler> +1 bigbluehat

<cpn> +1

bigbluehat: we publish in other entitities' platforms, it's a mess, things can disappear or move
… trying to quantify trust as a deliverable is hard, but we can't create trust without authenticity
… the internet is a conman's game

DKA: I'd like to put cpn on the spot, I live in the UK, one of the bright spots in verification sides is BBC verify
… is there a BBC perspective to this?

cpn: About to repeat what I said in a earlier session
… there's a few aspects to this
… trust is a social problem

<TallTed> this also calls out for general use of DIDs, across publisher sites. reddit-style up/down-votes are also strongly relevant. crowd-sourced judgement of authenticity/credibility is imperfect, but consumers can at least choose which crowd they look to for such judgement.

cpn: one of the specific problems we have is impersonation
… if you go to our website, there's verification of who we
… when we publish to social channels, other platforms
… in that context, there are creators that use our branding and purport to be us
… it looks like the BBC, but is it them?
… I can't speak to the end user experience, but as an org it's problematic
… to have some kind of affordance that allows users to be able to see it comes from where it claims to
… one of the key things where we are looking at a solution
… technology can help with this, traceability to original sources
… can be applied to all kind of creators, orgs

<TallTed> re: "traceability to original sources" == PROVenance

mt: A little while ago there was a news story about an org encouraging people to put glue on their pizzas
… it was traced back to a reddit user
… when people ask for authenticity in these settings, its a judgement call
… that person was probably being their authentic self, their alias let them do that
… and they made the world a beautiful place as a result
… when we talk about building systems in this, when we allow someone to make a decision about what is or is not authentic, it has consequences

DKA: Are you arguing in favour of anonymity?

mt: This is a problem that I see these systems creating
… in medicine, there are no side effects, only effects

hober: Most humans don't know the difference between the web and the internet
… my question, what is specifically about the web here?
… there are methods on the internet, but what is it about the web architecture that might address tthis?
… what comes to mind if the user agent, it's unclear to me what the appetite would be for the user agent?

<shigeya> +1 uniqueness of exitsence of user-agent

DKA: It's a question of scope
… I agree with that
… the origin of content, the web site, is something we've talked about in the TAG
… a whole finding about this

<gendler> +1 hober, but I will note workshops are happening in the IETF space as well on similar topics

DKA: the importance of the user agent to be able to surface the origin
… you know the article is coming from BBC because you can see the chain of trust
… the user agent surfaces that to the user

<TallTed> see PROV-O: The PROV Ontology — https://www.w3.org/TR/prov-o/

<hober> ack

DKA: I see it in that vein
… what else could we do to support authenticity with those kind of tools?
… what can the user agent do to provide the user help navigating the web?

<cpn> +1 dka

DKA: more importantly, the browser can signal to you when something is suspect
… are there similar mechanisms

Hadley: The only thing I was going to add is that there are already UAs that discriminate on domain
… whitelist or blacklist domains, surface with a warning or block
… there are other things along those lines to explore
… as the users' representative on the web

igarashi: Personal view, relating to the discussion of user agent, we think of more than just the UA consumption use case
… the content is not just used by the browser, there is scraping, web content authenticity in the broad scope, how to prove provenance
… is the content retreived from the right place, this should be considered

<gendler> +1 to the issues that Igarashi is raising, but to hober's point, I think crawler issues are best handled from an Internt infrastructure level, not a web level, happy to hear arguments though

chaals: The US has this central role for the user, the web is a web of content, I'm concerned about talking about what the UA will do
… most of us use one of a few UAs, and trust it with our lives, and I'm not sure if that is a pattern we should be reinforcing exclusively
… looking at how content itself can contain it's own provenance
… came from thinking about what was said earlier, it's important to trace back the content to its source
… I want to be certain that the scope of what we do enables individuals and large entities the same abilities
… the chain of authenticity needs to be feasible for all

<cpn> +1 chaals

chaals: think of those two things, making sure we're not relying on the UA to mediate this, and not enabling only large entities to use this system

<Zakim> dsinger, you wanted to mention the Berlin AC talk, and what we believe to be true

DKA: +1

dsinger: We had a panel session at the AC meeting a long time ago, with publishers
… we mentally tried to distinguish between authenticity and truth
… but they are entwined
… much of what we think is true is based on what is told to us from people we trust
… built on trust and authenticity

TallTed: I think what I am hearing is a quest towards best practice

<igarashi> true and trust of content should be handled differently from authencity.

TallTed: applying things we have already built
… provenance ontology
… I wrote this, it was edited by this person, then broadcast here
… no one uses it, it's not easy to use
… it's not built into tools, but it ought to be
… it's a user focused question, and there are two users, readers and publishers
… most of us consume most of our information through the UA, and the UA can tell us if the TLS connection is solid
… the chain that matters is who originated what is presented to me
… and what happened along the way to get it from originator to me
… reading CNN vs the Onion, when the Onion tells me what is going on in the world is a bad day
… but when I see it on CNN, I need to trust it, it needs to be visible to the consumer
… the ease of this is vital
… reddit lets you up and down vote, I build credibility, and people see that
… it's not easy to do

[room bursts with love for cat]

<Zakim> gendler, you wanted to comment on traceability and publisher norms

gendler: There's a couple of things here we touched on in terms of publisher norms
… things publishers have learned to do from years of publishing

<TallTed> "A Modest Proposal"....

gendler: it should be as easy for individuals and publishers to follow the chain, access the chain
… there's a perception publishers have special tools, we don't, they're people
… anything we build should be in service of people

<DKA> +1 to people first

gendler: tracing back to where things came from, publishers have an obligation to each other
… if you publish something on CNN, and we report, you say something like "[fact], as reported first by CNN"
… it not just applies to publishers, it applies to sources
… when we can cite them, we do, and it builds on the chain
… we can learn from these practices
… publishers aren't perfect, but these practices are informative to the problem

mnot: I missed the first bit so apologies if I repeat anyone
… there are a lot of policymakers making assertions like "if we had this we could solve problem"
… and the problem is misinformation
… we need to make it clear that technology won't solve all the problems
… temper expectations

<Zakim> cpn, you wanted to respond to ted

cpn: I wanted to speak to what Max and Ted mentioned, the chain of information
… BBC verify was mentioned, what we've been doing is trialling C2PA
… when we publish a video clip in an article, we have a dropdown that appears that outlines the checks we've done
… showing user-generated content
… showing this information on the origin of the content, or changes we've made before publication
… the user research we've done shows it does increase user trust

cpn: What we've done increases the perception of trust
… we've applied C2PA to properly bind the checks to the asset

<TallTed> Breaking the reputation silos is also key -- my (in)credible/(un)trusted/(de)valued reddit votes might/should mean that my YouTube votes are granted greater (or lesser) weight, and thence on RandomNewsSite, etc. Back to the need for pseudonyms and/or DIDs and the like.

cpn: we haven't yet done a study of this outside of C2PA, seeing similar content elsewhere or outside the context of our site

annette_g: I wanted to highlight we're getting different solutions from different levels of the stack

<gendler> +100 annette

annette_g: it's a sign in my experience that this is something that needs to be tackled in depth
… I would urge us to avoid dismissing any part of the stack
… concern about user agents and reliance on them, I think it's important to be mindful, but UAs are fundamental to interaction with the web
… on a different level, there's a proposal for trust.txt, similar to robots.txt
… trusted domain
… media provenance, media file itself may have identifying information
… being aware that the solution is complex and no single solution
… like climate change

<Zakim> gendler, you wanted to talk about trust.txt

gendler: Want to start by +100 to annette_g
… a little bit of context on trust.txt
… it was presented to publishers in the US
… it does present information in a user legible way

<TallTed> *Really* addressing this will be a multi-year process, involving the browser vendors, and the OS vendors, and the large publishers, and the large hosts of small publishers, and the individual vanity website publisher, etc. This is not a problem space that's suited to (even partial) solution by a 2 year WG nor a 10 year CG.

gendler: publishers were skeptical because it was only a solution relating to reputational verification
… others like you agree with you that you're a good publisher
… there are useful things to learn
… but we need to be mindful of not just first factor effects
… there are knock-ons for business models all over

<Jake> +1

chaals: I said I don't want UAs to be the centre of this, but they are important
… to return to the framing of the assertion
… there's a couple of different things to look at
… establishing a positive chain of provenance
… in academic publishing, presenting your work to the teacher, there's a whole thing of "where did this come from really?"
… proving what you wrote is yours. There's a positive chain, and there's a thing of did it actually come from somewhere else
… a source not listed in the provenance chain
… should we try to address that?
… it's important to note where information really comes from

<igarashi> +1 to chaals

wendyreid: I'd like to point out that there are attempts on the user side to solve this problem...
… it would be wrong of us to not look at how places have tried to solve this in limited contexts like community notes/blue check verification...
… we can learn from those attempts, even when there are issues with those solutions, especially as a signal of what users will want and expect...
… how many people are actually checking a TLS certificate? My friends wouldn't understand what I was talking about if I asked them about that...
… so how is this solved in the real world with average users?

dsinger: Two comments, I hear some nihilism on this problem
… because this is a complex mix of social, education, technology
… let's not get stuck in that
… we could become a resource pointing to the good work
… one way to solve a problem is not trying to

<tzviya8> +1 dsinger

<igarashi> +1 to dsinger

dsinger: we can say it's an education problem
… we can point t where it's a social or education problem

<cpn> +1 dsinger A workshop report could do a good job of that

dsinger: there's tools trying to actively confuse people, we need to counter that

<kazho> +1 dsinger

dsinger: let's work out what we can do
… don't let the perfect be the enemy of good

<TallTed> "a wretched hive of scum and villainy..."

annette_g: Wanted to add, when verifiying the trustworthiness of something
… what are we verifying the trustworthiness of
… annette_g works at a science organziation she must be an expert on science
… any time we try to verify the trustworthiness, we need to determine what for

DKA: To wrap up, we have a range of opinions
… I would like to think we have had a lot of input here that could turn into workshop scope

<bigbluehat> tech topic lane: Mastodon signs both transactions (HTTP Signatures) and content (Linked Data Signatures) https://docs.joinmastodon.org/spec/security/

DKA: I hope we can do that
… there are major challenges
… everyone has been really helpful inthis
… TAG ehtical web principles has one principle, it should be possible for web ueers to verify the information they see
… I think there is something we can do, so I'm hopeful

???: if there is a workshop proposal where will it be worked on?

DKA: Somewhere in a slack channel in the W3C community slaxk
… I need to defer to W3C colleagues on that

<bigbluehat> S/???/Mirja

tzviya: then the workshop GH space

DKA: yes, then a call to action and a committee
… everyone here, I urge you to think if you'd help participate

mt: Any idea of scope?

DKA: We don't have it yet, we need to continue this asynchronously
… we'll get a pointer to where to have this conversation
… and iterate

– DRAFT –
Content Authenticity and the Web

25 September 2024

Attendees

Meeting minutes