A collection of 1.75 million preprints, readable today.
That measures at ~97% of the available sources.
Most documents mostly read well (~80%).
Few are perfect.
Almost all have a clear path to support.
2/10
Our own e-journal CSS:
- optimized for large screens, best in Firefox
- decent on mobile, in any browser
- justified text, with hyphenation
- dark/light themes (toggle with π/βοΈ)
- highlighted notes
- basic figure zoom
- 1000+ lines of misc CSS elbow grease
3/10
It has been 15+ years since we first converted arXiv
with latexml. That is now part of our annual release cycle.
We're on a long road to Scholarly HTML5:
- markup for sectioning, theorems ...
- MathML-native (soon also in Chrome!)
- linkable fragments
- rich metadata
4/10
Some small Easter eggs:
- inline citation preview
- "Feeling lucky?" explorer
- adjacency links
5/10
Real talk: "Can this really work? Isn't it just a gimmick?"
Mapping arXiv's sources over to HTML5 is a large, finite problem which has a brutal long tail.
Today, latexml encounters over 10,000 unknown packages and >140,000 *distinct* unknown macros during the conversion.
6/10
My opinion: This is not an optional project.
There is a public need to rescue TeX/LaTeX content into machine-readable, accessible markup.
LaTeXML "the project" is a piece of digital infrastructure meant to help get us there.
Its continued backing by NIST confirms that.
7/10
Why build this again?
- Gain *rapid* turnaround in improving latexml and enriching arXiv's documents
- Not exactly new - newly public. We've had internal variants of ar5iv for ~15 years
- Give back to the community! And ask that you help us triage the errors
8/10
What's the end goal?
Reintegrate with the one-and-only arXiv.org of course!
All of this work is free and open. The quality of the documents is not good enough yet, but the moment it is, I would love to see the project fully transferred back to the main site
9/10
You're welcome to reuse all pieces mentioned here in your own work, and to contribute back.
Spread the news - I'm looking forward to seeing some ar5iv links flying by! ποΈ