Blocking bots – Manu
Blocking the bots is step one.
Blocking the bots is step one.
See, this is exactly why we need to poison these bots.
In their rush to cram in “AI” “features”, it seems to me that many companies don’t actually understand why people use their products.
Google is acting as though its greatest asset is its search engine. Same with Bing.
Mozilla Developer Network is acting as though its greatest asset is its documentation. Same with Stack Overflow.
But their greatest asset is actually trust.
If I use a search engine I need to be able to trust that the filtering is good. If I look up documentation I need to trust that the information is good. I don’t expect perfection, but I also don’t expect to have to constantly be thinking “was this generated by a large language model, and if so, how can I know it’s not hallucinating?”
“But”, the apologists will respond, “the results are mostly correct! The documentation is mostly true!”
Sure, but as Terence puts it:
The intern who files most things perfectly but has, more than once, tipped an entire cup of coffee into the filing cabinet is going to be remembered as “that klutzy intern we had to fire.”
Trust is a precious commodity. It takes a long time to build trust. It takes a short time to destroy it.
I am honestly astonished that so many companies don’t seem to realise what they’re destroying.
Good advice for documentation—always document steps in the order that they’ll be taken. Seems obvious, but it really matters at the sentence level.
I was chatting to Andy last week and he started ranting about the future of online documentation for web developers. “Write a blog post!” I said. So he did.
I think he’s right. We need a Wikimedia model for web docs. I’m not sure if MDN fits the bill anymore now that they’re deliberately spewing hallucinations back at web developers.
Harry popped ’round to the Clearleft studio yesterday. It’s always nice when a Clearleft alum comes to visit.
It wasn’t just a social call though. Harry wanted to run through the ideas he’s got for his UX London talk.
Wait. I buried the lede. Let me start again.
Harry Brignull is speaking at this year’s UX London!
Yes, the person who literally wrote the book on deceptive design patterns will be on the line-up. And judging from what I heard yesterday, it’s going to be a brilliant talk.
It was fascinating listening to Harry talk about the times he’s been brought in to investigate companies accused of deliberately employing deceptive design tactics. It involves a lot of research and detective work, trawling through internal communications hoping to find a smoking gun like a memo from the boss or an objection from a beleaguered designer.
I thought about this again today reading Nic Chan’s post, Have we forgotten how to build ethical things for the web?. It resonates with what Harry will be talking about at UX London. What can an individual ethical designer do when they’re embedded in a company that doesn’t prioritise user safety?
It’s like a walking into a jets pray of bullshit, so much so that even those with good intentions get easily overwhelmed.
Though I try, my efforts rarely bear fruit, even with the most well-meaning of clients. And look, I get it, no on wants to be the tall poppy. It’s hard enough to squeeze money from the internet-stone these days. Why take a stance on a tiny issue when your users don’t even care? Your competitors certainly don’t. I usually end up quietly acquiescing to whatever bad are made, praying no future discerning user will notice and think badly of me.
It’s pretty clear to me that we can’t rely on individual people to make a difference here.
Still, I take some encouragement from Harry’s detective work. If the very least that an ethical designer (or developer) does is to speak up, on the record, then that can end up counting for a lot when the enshittification hits the fan.
If you see something, say something. Actually, don’t just say it. Write it down. In official communication channels, like email.
I remember when Clearleft crossed an ethical line (for me) by working on a cryptobollocks project, I didn’t just voice my objections, I wrote them down in a memo. It wasn’t fun being the tall poppy, the squeeky wheel, the wet blanket. But I think it would’ve been worse (for me) if I did nothing.
Here’s the transcript of Paul’s excellent talk at this year’s UX London:
How designers can record decisions and cultivate a fun and inclusive culture within their team.
A few months back, I wrote about how Google is breaking its social contract with the web, harvesting our content not in order to send search traffic to relevant results, but to feed a large language model that will spew auto-completed sentences instead.
I still think Chris put it best:
I just think it’s fuckin’ rude.
When it comes to the crawlers that are ingesting our words to feed large language models, Neil Clarke describes the situtation:
It should be strictly opt-in. No one should be required to provide their work for free to any person or organization. The online community is under no responsibility to help them create their products. Some will declare that I am “Anti-AI” for saying such things, but that would be a misrepresentation. I am not declaring that these systems should be torn down, simply that their developers aren’t entitled to our work. They can still build those systems with purchased or donated data.
Alas, the current situation is opt-out. The onus is on us to update our robots.txt
file.
Neil handily provides the current list to add to your file. Pass it on:
User-agent: CCBot
Disallow: /
User-agent: ChatGPT-User
Disallow: /
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: Omgilibot
Disallow: /
User-agent: FacebookBot
Disallow: /
In theory you should be able to group those user agents together, but citation needed on whether that’s honoured everywhere:
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: GPTBot
User-agent: Google-Extended
User-agent: Omgilibot
User-agent: FacebookBot
Disallow: /
There’s a bigger issue with robots.txt
though. It too is a social contract. And as we’ve seen, when it comes to large language models, social contracts are being ripped up by the companies looking to feed their beasts.
As Jim says:
I realized why I hadn’t yet added any rules to my
robots.txt
: I have zero faith in it.
That realisation was prompted in part by Manuel Moreale’s experiment with blocking crawlers:
So, what’s the takeaway here? I guess that the vast majority of crawlers don’t give a shit about your
robots.txt
.
Time to up the ante. Neil’s post offers an option if you’re running Apache. Either in .htaccess
or in a .conf
file, you can block user agents using mod_rewrite
:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} (CCBot|ChatGPT|GPTBot|Omgilibot| FacebookBot) [NC]
RewriteRule ^ – [F]
You’ll see that Google-Extended
isn’t that list. It isn’t a crawler. Rather it’s the permissions model that Google have implemented for using your site’s content to train large language models: unless you opt out via robots.txt
, it’s assumed that you’re totally fine with your content being used to feed their stochastic parrots.
Now that the horse has bolted—and ransacked the web—you can shut the barn door:
To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:
User-agent: GPTBot Disallow: /
As part of this pointless push, an “AI explain” button appeared on MDN articles. This terrible idea actually got pushed to production (bypassing the usual deploy steps) where it lasted less than a day.
You can read the havoc it wreaked in the short term. We’ll find out how much long-term damage it has done to trust in Mozilla and MDN.
This may be the worst use of a large language model I’ve seen since synthentic users (if you click that link, no it’s not a joke: “user research without the users” is what they’re actually proposing).
If someone’s been driven to Google something you’ve written, they’re stuck. Being stuck is, to one degree or another, upsetting and annoying. So try not to make them feel worse by telling them how straightforward they should be finding it. It gets in the way of them learning what you want them to learn.
This is a great step-by-step guide to HTML by Estelle.
I remember Lara telling me a great quote from the Clarity conference a few years back: “A design system needs to be correct before it’s complete.” In other words, it’s better to have one realistic component that’s actually in production than to have a pattern library full of beautiful but unimplemented components. I feel like Robin is getting at much the same point here, but he frames it in terms of correctness and usefulness:
If we want to be correct, okay, let’s have components of everything and an enormous Figma library of stuff we need to maintain. But if we want to be useful to designers who want to get an understanding of the system, let’s be brief.
This old article from Chris is evergreen. There’s been some recent discussion of calling these words “downplayers”, which I kind of like. Whatever they are, try not to use them in documentation.
Always refreshing to see some long-term thinking applied to the web.
Trys ponders home repair projects and Postel’s Law.
As we build our pages, components, and business logic, establish where tolerance should be granted. Consider how flexible each entity should be, and on what axis. Determine which items need to be fixed and less tolerant. There will be areas where the data or presentation being accurate is more important than being flexible - document these decisions.
An interesting project that will research and document the language used across different design systems to name similar components.
What a lovely way to walk through the design system underpinning the Guardian website.
Bonus points for using the term “tweak points”!
Brian found this scanned copy of a NeXT manual on the Internet Archive. I feel a great fondness for this machine after our CERN project.
This is a great how-to from Darius Kazemi!
The main reason to run a small social network site is that you can create an online environment tailored to the needs of your community in a way that a big corporation like Facebook or Twitter never could. Yes, you can always start a Facebook Group for your community and moderate that how you like, but only within certain bounds set by Facebook. If you (or your community) run the whole site, then you are ultimately the boss of what goes on. It is harder work than letting Facebook or Twitter or Slack or Basecamp or whoever else take care of everything, but I believe it’s worth it.
There’s a lot of good advice for community management and the whole thing is a lesson in writing excellent documentation.