Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DocBook is generated which includes unescaped characters such as < and " in the attribute value #4646

Closed
sorairolake opened this issue Jan 8, 2025 · 1 comment
Milestone

Comments

@sorairolake
Copy link

sorairolake commented Jan 8, 2025

See jgm/pandoc#10503.

First, input the following AsciiDoc which uses the shorthand ID syntax:

.. [#s2a3]#*Term* . The term of this Public License is specified in
Section link:#s6a[6(a)] .#

This will generate the following DocBook:

<simpara><anchor xml:id="s2a3" xreflabel="Term . The term of this Public License is specified in
Section <link xl:href="#s6a">6(a)</link> ."/><emphasis role="strong">Term</emphasis> . The term of this Public License is specified in
Section <link xl:href="#s6a">6(a)</link> .</simpara>

I think this is invalid XML because it includes unescaped characters such as < and " in the attribute value, as shown below:

xreflabel="Term . The term of this Public License is specified in
Section <link xl:href="#s6a">6(a)</link> ."

I initially thought this is a bug in Pandoc, but according to jgm/pandoc#10503 (comment), it may be a bug in Asciidoctor's DocBook backend. Even if we need to use the inline anchor syntax instead of the shorthand ID syntax when including links and cross references, I think it should at least output valid XML.

@sorairolake sorairolake changed the title DocBook is generated which includes unescaped < and " in the attribute value DocBook is generated which includes unescaped characters such as < and " in the attribute value Jan 8, 2025
@mojavelinux
Copy link
Member

The issue here is that the way the anchor is being defined, it includes formatted text in the xreflabel. This is not a supported use case for AsciiDoc.

The correct way to write this would be as follows:

[[s2a3,Term]]*Term*. The term of this Public License is specified in Section <<s6a,6(a)>>.

In other words, you can use the [[id,reftext]] syntax to define a valid xreflabel (i.e., the reftext) for the reference.

We may need to discuss in the AsciiDoc Language project what happens when the reftext includes formatted text. If the language project decides that text needs to be normalized, then it's something Asciidoctor would have to implement. As for now, it's not something that will be changed in Asciidoctor.

@mojavelinux mojavelinux added this to the support milestone Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants