Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nix lexer improvements #2551

Merged
merged 19 commits into from
Nov 5, 2023
Merged

Conversation

tarnacious
Copy link
Contributor

@tarnacious tarnacious commented Oct 29, 2023

Improvements/fixes to the Nix lexer.

Adds test cases (snippets) to get a base-line for how the lexer was previously working. Subsequent commits add fixes and update/add test cases.

Fixes #1800 and some other issues I've noticed with the lexer.

I noticed issues when I posted some nix code blocks on my blog, the highlighting looked so broken I added a modified lexer to fix it.

I've been testing this against all the nix code in the nixpkgs repository which has around 30k nix files containing around 25 million lines of code and 100k lines of comments. I can't actually check the lexing is correct, but I can check its completing and it's not returning error tokens (before this pull request it was returning 4316 error tokens in 1081 files and I suspect it would be worse if the multi-line string literal were closing correctly).

These examples are from the Nix Reference Manual

https://nixos.org/manual/nix/unstable/language/index.html

This lexer needs some work but before changing anything here is a base
line for how it is currently working.
This wasn't working at all previously and matching pairs of single
quotes would bread subsequent highlighting.
The previous regex matched the spaces and equals sign. This is now done
using a lookahead.

I'm not actually sure these cases should be Literal.String.Symbol, I
certainly wouldn't want these highlighted as symbols, but I'm leaving
this as-is for now.
This is checked before matching operators, I feel the it should be a bit
more aware of the context, but it didn't raise any issues in the tests.
This matching maybe a bit loose but didn't cause any regression in the
test samples.
These cases were found to emit error tokens when lexing real-world code.
The lexing code is not pretty, but handles the cases we know of.
These cases found by lexing real-world code and looking for error
tokens.
Also fixes some issues with the literal paths.
@tarnacious tarnacious marked this pull request as ready for review October 29, 2023 23:05
pygments/lexers/nix.py Outdated Show resolved Hide resolved
pygments/lexers/nix.py Outdated Show resolved Hide resolved
@tarnacious
Copy link
Contributor Author

Hi @jeanas, thanks for reviewing the PR. I've implemented your suggestions and would be happy to hear any more feedback.

@jeanas
Copy link
Contributor

jeanas commented Nov 5, 2023

@tarnacious I'm going to merge it, just waiting for the CI to complete.

@jeanas jeanas merged commit 1d92d12 into pygments:master Nov 5, 2023
15 checks passed
@jeanas
Copy link
Contributor

jeanas commented Nov 5, 2023

Thank you!

@domenkozar
Copy link
Contributor

domenkozar commented Nov 5, 2023

Awesome work! 🥳

@Anteru Anteru added the A-lexing area: changes to individual lexers label Nov 17, 2023
@Anteru Anteru added this to the 2.17 milestone Nov 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lexing area: changes to individual lexers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nix lexer breaks on .${var}
4 participants