W3C

– DRAFT –
Web and Generative AI risks

25 September 2024

Attendees

Present
cwilso, Igarashi, ivan
Regrets
-
Chair
Qing An
Scribe
angel

Meeting minutes

QingAn: now we can start the meeting
… let start with an short intro
… This session will introduce the new features and new risks brought by generative AI compared with traditional AI. And the session will have an open discussion about the risks related to Web and W3C can do on Web standards to address the risks.
… this session is under W3C Code of conduct
… and it will not be recorded
… now I will start the intro slides
… Generative Artificial Intelligence (AI) is happening everywhere
… people is using it in their daily life
… many orgs are evaluating risks the Generative Artificial Intelligence (AI) might bring to people's life and other aspects
… Web is an important venue to deliver the AI serviceto people
… first I will go with the features of Generative Artificial Intelligence then the risks it might bring, and discuss with you about possible solutions
… we have been using the Conventional AI for many years
… there are many mature solutions to handle the risks brought by Conventional AI
… list 5 risks brought by Conventional AI, such as Classification task: to predict discrete labels, Regression task: to predict continuous values, etc

Qing AN's slides

Qing AN: in trandisional AI, one task with one risk to handle
… but .Generative Artificial Intelligence has different goal, which is to generate new data instances that are

statistically consistent with the training data
… another difference is the output: output of Generative Artificial Intelligenc is to get new and complete data

instances, such as an image or a piece of tex
… New features of Generative AI include new contents generated
… 2nd feature is the long context windows and self-attention mechanism, which enables different attention
… 3rd feature is the low production barrier
… Contents are easily generated via natural language conversation.
… 4th feature is Highly convincing generated contents
… and the 5th feature is greater randomness of generated contents
… based on these new features, there are new kinds of risks
… risk #1: Eased access to knowledge
… Easy access to knowledge might make it easier for malicious users to cause harms to society without specialized training (e.g., CBRN knowledge, malware)
… Risk #2: Generalization of input understanding and output generation
… Generative AI’s strong generalization ability might result in hallucination, which will cause further harms if users believe the generated false content due to the highly convincing output
… Risk #3: Use of generated contents
… The generated contents, when contain faults or misalign with regulations and ethics, might mislead the downstream applications make incorrect decisions or even harmful actions.
… the outcome can be serious is the application is about heath care
… Risk #4: Influence of generated contents
… Generative AI might generate unethical contents that pose harms to individuals and society
… on one hand, Generative AI might generate disrespectful, dangerous contents that promote self-harm or illegal activities
… on the other hand, Also, it might generate biased output, which can cause unfair distribution of benefits from using the generative AI (e.g., the quality of generated contents may be better for some groups but worse for others, due to different input format or language)
… some research shows if the content is training on English content only, it might have certain bias and create unfair results
… Risk #5: Human reliance on generative AI
… Over-reliance on generative AI might cause humans to be manipulated, especially when humans have no detailed knowledge of how generative AI works
… Risk #6: Privacy
… Due to training data memorization, the generated contents might cause sensitive information leakage
… While users can benefit from the customized personal assistant by feeding personal information to generative AI.
… risk #7: Copyright and Intellectual Property
… Due to training data memorization, the generated contents might cause copyrights infringement.
… risk #8: Continuous improvement based on reinforced learning
… Continuous learning based on the user feedback can be leveraged to mislead generative AI behaviors
… also, it can enable better alignment with human preference
… risk #9: New security attacks, which mainly about Prompt-based attacks expand the attack surface
… about what Web can do to address risks
… my personal thinking is that, first is to detecting the malware (hacking, malware, and phishing) generated by Generative A
… so no harm to users
… one scenario is that user interacting with Generative AI application (typically chat) in Web Browser, malicious Generative AI application might generate malware in chat box and induce users to open the malware
… How to have Web Browser support detecting the malware generated by Generative AI in chat box?
… I believe we can do something in the Web
… the 2nd thing we can do is to increase the accountability
… by labelling the output from Generative AI, including images, videos, audios, text and code
… which can be done by using explicit watermarking
… on the Generative AI output
… one thing the Web can do, is to make the Web Browser recognize the watermarking formats
… to make it easier for users to find out if the outcome is AI generated
… also, to address the Transparency issue
… need to expose the info of AI models
… there are some regulations about this to require the AI models to expose the model info
… to enhance the AI transparency
… and Web is a good platform to deliver this kind of info
… to make the users and developers to have easier access to info like this
… which helps to increase the public trust of Generative AI,
… about Privacy, need secure access to privacy info in local generative AI
… need a secure area to store the user privacy info
… would it be benefitial to develope an API to enable securely access the user privacy info
… above are my thoughts
… I hope we can have good disucsions about if these are reasonable directions we should be chasing in W3C
… comments are welcome

Igarashi san: one qestion about the standardization of generative AI
… do you know any other SDO working on this kind of data model standards?

another question is about what kind of AI you are thinking about to protect the privacy?
… JS API or API between the client and service?

Qing AN: CSA is working on the this topic
… they have published some reports to define some data suite, which might be called model suite
… they are working on what kind of @@1 to inlcude what kind of data models
… but not about the Web
… that why I think W3C can work on it
… for the 2nd question, I agree it is not quite related to the browser, more on the AI side
… I think there is another breakout session on this topic today

Kenji san: from google, if you have looked into C2PA

<kenji_baheux> https://c2pa.org/specifications/specifications/2.0/specs/C2PA_Specification.html

QingAI: I think it is one commonly used solution to achieve the watermarking
… W3C does not need to reinvite the wheels
… if we are using the explicit watermarking, it is not very obvious for the users to see it

if making the explicit watermarking obvious, it might also affect the UX
… I think what the web can do is to expose what kind of watermarking it is using
… make it easier fo the user to get this kind of info

<Igarashi> both C2PA approach(digital cert.) and watermaking could be used, but no standard watermaking, as I know.

<ivan> s/water marking/watermarking/g

QingAN: on the zoom, anyone on the queue?

Chris Pryor: @@2

QingAn: if we want to do the next step? where we should start the work in W3C?
… I think the first step is to to call for participants for this topic
… see if we can have some experts with common interests
… and discuss if there is any topics worth further efforts from W3C
… if there is, maybe we can set up a CG to future incubate this idea
… next I will talk to W3C team and see if we can find any proper way to move forward this topic
… and maybe set up a CG to have futher discussion
… suggested summary: proprose to discuss with W3C team and prepare for settiing up a CG for this topic

<QingAN> This is another related session I mentioned: w3c/tpac2024-breakouts#44

Minutes manually created (not a transcript), formatted by scribe.perl version 229 (Thu Jul 25 08:38:54 2024 UTC).

Diagnostics

Succeeded: s/servive /service

Failed: s/water marking/watermarking/g

Succeeded 2 times: s/wather marking/watermarking/g

Succeeded: s/tlak/talk

Succeeded: s/disusion/discussion

Maybe present: QingAI, QingAn

All speakers: QingAI, QingAn

Active on IRC: angel, cwilso, Igarashi, ivan, kenji_baheux, QingAN, tpac-breakout-bot