14 Apr 2024

planning for SCALE 2025

I missed Southern California Linux Expo this year. Normally I can think of a talk to do, but between work and [virus redacted] I didn’t have a lot of conference abstract writing time last fall. I need some new material anyway. The talks that tend to do well for me there are kind of a mix of tips for doing weird stuff.

I didn’t really have anything good to submit last fall, but this year I am building up a bunch of miscellaneous Linux stuff similar to what has worked for me at SCALE before. Because of the big Fediverse trend, the search quality crisis, the ends of third-party cookies and Twitter, and enshittification in general, it seems like there’s a lot more interest in redoing your blog—I know I have been doing it, so that’s what I’m going to see if I can come up with something on for next SCALE. But I’m not going to use a blog software package. I’m more comfortable with a mix of different stuff. This blog is now mainly done in Pandoc, auto-rebuilt by Make, and has a bunch of scripts in various languages, including shell, Perl, Python, and even a little bit of Lua now.

protip: use cowsay(1) to alert the user to errors in Makefile before restarting

I don’t really expect anybody to copy this blog, more outdo it by getting the audience to realize how much you can now do with the available tools. I’m not going to win any design prizes but with modern CSS I can make a reasonable responsive layout and dark/light modes. And yes you can make a valid RSS feed in GNU Make.

The feature I just did today is the similar posts in the left column. Remember that paper about how you can measure the similarity between two pieces of text by seeing how well they compress together? “Low-Resource” Text Classification: A Parameter-Free Classification Method with Compressors - ACL Anthology This is Python code for rating similarity of chunks of text. Check it out in the left column, you can now follow the links to similar blog posts.

import gzip

def z(s):
    return len(gzip.compress(bytes(s, 'utf-8')))

def simscore(t1, t2):
    "lower is better"
    if len(t1) == 0 or len(t2) == 0:
        return 1
    base = z(t1) + z(t2)
    minsize = min(z(' '.join([t1, t2])),
                  z(' '.join([t2, t1])),
                  base)
    return int(10000 * minsize/base)

Next I will probably try stuff like Fediverse-powered comments, some kind of search feature, LLM training set poisoning, some privacy and p2p features, and maybe something else. A lot of what I’m doing here will be possible to translate into other environments, and should be portable to people’s favorite blog software.