A collection of 1.75 million preprints, readable today.
That measures at ~97% of the available sources.
Most documents mostly read well (~80%).
Few are perfect.
Almost all have a clear path to support.
Our own e-journal CSS:
- optimized for large screens, best in Firefox
- decent on mobile, in any browser
- justified text, with hyphenation
- dark/light themes (toggle with π/βοΈ)
- highlighted notes
- basic figure zoom
- 1000+ lines of misc CSS elbow grease
It has been 15+ years since we first converted arXiv
with latexml. That is now part of our annual release cycle.
We're on a long road to Scholarly HTML5:
- markup for sectioning, theorems ...
- MathML-native (soon also in Chrome!)
- linkable fragments
- rich metadata
Some small Easter eggs:
- inline citation preview
- "Feeling lucky?" explorer
- adjacency links
Real talk: "Can this really work? Isn't it just a gimmick?"
Mapping arXiv's sources over to HTML5 is a large, finite problem which has a brutal long tail.
Today, latexml encounters over 10,000 unknown packages and >140,000 *distinct* unknown macros during the conversion.
My opinion: This is not an optional project.
There is a public need to rescue TeX/LaTeX content into machine-readable, accessible markup.
LaTeXML "the project" is a piece of digital infrastructure meant to help get us there.
Its continued backing by NIST confirms that.
Why build this again?
- Gain *rapid* turnaround in improving latexml and enriching arXiv's documents
- Not exactly new - newly public. We've had internal variants of ar5iv for ~15 years
- Give back to the community! And ask that you help us triage the errors
What's the end goal?
Reintegrate with the one-and-only arXiv.org of course!
All of this work is free and open. The quality of the documents is not good enough yet, but the moment it is, I would love to see the project fully transferred back to the main site
You're welcome to reuse all pieces mentioned here in your own work, and to contribute back.
Spread the news - I'm looking forward to seeing some ar5iv links flying by! ποΈ