Distributed Marker Representation For Ambiguous Discourse Markers and Entangled Relations
Distributed Marker Representation For Ambiguous Discourse Markers and Entangled Relations
Distributed Marker Representation For Ambiguous Discourse Markers and Entangled Relations
{rudongyu,quln,zhaz}@amazon.com
[email protected]
[email protected]
Abstract
Discourse analysis is an important task because
it models intrinsic semantic structures between
sentences in a document. Discourse markers
arXiv:2306.10658v1 [cs.CL] 19 Jun 2023
• We investigate the ambiguity of discourse mark- We elaborate on the probabilistic model in Sec. 3.1
ers and entanglement among discourse relations to and its implementation with neural networks in
1 Code is publicly available at: https://github. Sec. 3.2. We then describe the way we optimize
com/rudongyu/DistMarker the model (Sec. 3.3).
I am weak large-scale corpus under this assumption:
𝑠1
Table 1: Experimental Results of Implicit Discourse Relation Classification on PDTB2. Results with † are from Wu
et al. (2022). DMR-large and DMR-base adopt roberta-large and roberta-base as SentEnc, respectively.
F1 and ACC are metrics for IDRR performance. BMGF LDSGM DMR
We note that although annotators are allowed to Comp.Concession 0. 0. 0.
annotate multiple senses (relations), only 2.3% Comp.Contrast 59.75 63.52 63.16
of the data have more than one relation. There- Cont.Cause 59.60 64.36 62.65
fore whether DMR can capture more entanglement Cont.Pragmatic Cause 0. 0. 0.
Expa.Alternative 60.0 63.46 55.17
among relations is of interest as well (Sec. 4.5).
Expa.Conjunction 60.17 57.91 58.54
4.2 Baselines Expa.Instantiation 67.96 72.60 72.16
Expa.List 0. 8.98 36.36
We compare our DMR model with competitive Expa.Restatement 53.83 58.06 59.19
baseline approaches to validate the effectiveness Temp.Async 56.18 56.47 59.26
of DMR. For the IDRR task, we compare DMR- Temp.Sync 0. 0. 0.
Macro-f1 37.95 40.49 42.41
based classifier with current SOTA methods, in-
cluding BMGF (Liu et al., 2021), which combines
Table 2: Experimental Results of Implicit Discourse
representation, matching, and fusion; LDSGM (Wu Relation Recognition on PDTB2 Second-level Senses
et al., 2022), which considers the hierarchical de-
pendency among labels; the prompt-based connec-
tive prediction method, PCP (Zhou et al., 2022) and stacked on top of models to predict relations.
so on. For further analysis on DMR, we also in-
clude a vanilla sentence encoder without the latent 4.4 Implicit Discourse Relation Recognition
bottleneck as an extra baseline, denoted as BASE. We first validate the effectiveness of modeling la-
tent senses on the challenging IDRR task.
4.3 Implementation Details
Our DMR model is trained on 1.57 million ex- Main Results DMR demonstrates comparable
amples with 174 types of markers in Discovery performance with current SOTAs on IDRR, but
dataset. We use pretrained RoBERTa model (Liu with a simpler architecture. As shown in Table 1,
et al., 2019) as SentEnc in DMR. We set the DMR leads in terms of accuracy by 2.7pt and is a
default latent dimension 𝐾 to 30. More details re- close second in macro-F1 .
garding the implementation of DMR can be found The results exhibit the strength of DMR by more
in Appendix A. straightforwardly modeling the correlation between
For the IDRR task, we strip the marker genera- discourse markers and relations. Despite the ab-
tion part from the DMR model and use the hidden sence of supervision on discourse relations during
state ℎ 𝑧 as the pair representation. BASE uses DMR learning, the semantics of latent senses dis-
the [CLS] token representation as the representa- tilled by EM optimization successfully transferred
tion of input pairs. A linear classification layer is to manually-defined relations in IDRR.
40 Model ACC@1 ACC@3 ACC@5 ACC@10
35 Discovery 24.26 40.94 49.56 61.81
30 DMR30 8.49 22.76 33.54 48.11
performance
25 DMR174 22.43 40.92 50.18 63.21
20
15 DMR-acc
Table 4: Experimental results of marker prediction on
10 DMR-f1 the Discovery test set. DMR30 and DMR174 indicate
BASE-acc
5 BASE-f1
the models with the dimension K equals to 30 and 174
0 100 200 300 400 500
respectively.
# training examples Marker 1st Cluster 2nd Cluster
as a result, for example,
Figure 3: Few-shot IDRR Results on PDTB2 additionally 𝒛1 : in turn, 𝒛 20 : for instance,
simultaneously specifically
# Training Examples 25 100 500 full (10K) thankfully, oddly,
amazingly 𝒛9 : fortunately, 𝒛 21 : strangely,
ACC - 32.20 33.85 59.45
BASE luckily unfortunately
F1 - 13.40 16.70 34.34
ACC - 33.76 37.56 60.90 indeed, anyway,
BASE 𝑝
F1 - 13.54 17.21 35.45 but 𝒛 19 : nonetheless, 𝒛 24 : and,
ACC 19.12 34.07 39.23 63.19 nevertheless well
BASE𝑔
F1 5.75 13.72 19.27 36.59
ACC 21.32 37.14 42.53 62.97 Table 5: Top 2 clusters of three random sampled mark-
DMR
F1 7.01 15.29 19.57 39.33 ers. Each cluster corresponds to a latent 𝒛 coupled with
its top 3 markers.
Table 3: Few-shot IDRR Results on PDTB2
Concession 1.6
Contrast 1.4
confusion.
Cause 1.2
Alternative
Conjunction
1.0 We use the top-3 predictions of the 20 highest
Instantiation 0.8
entropy examples to demonstrate highly confus-
List 0.6
Restatement 0.4 ing discourse relations as shown in Fig. 6. The
Asynchronous 0.2
Synchrony
accumulated joint probability of paired relations
0.0
on these examples is computed as weights in the
n
ast
ative
n
iation
List
ous
hrony
emen
Caus
essio
nctio
Contr
chron
Altern
Sync
Conju
Conc
t
Resta
Insta
Asyn
Allen Nie, Erin Bennett, and Noah Goodman. 2019. Damien Sileo, Tim Van de Cruys, Camille Pradel, and
Dissent: Learning sentence representations from ex- Philippe Muller. 2019. Mining discourse markers
plicit discourse relations. In Proceedings of the 57th for unsupervised sentence representation learning. In
Annual Meeting of the Association for Computational Proceedings of the 2019 Conference of the North
Linguistics, pages 4497–4510. American Chapter of the Association for Computa-
tional Linguistics: Human Language Technologies,
Alexander Panchenko, Eugen Ruppert, Stefano Far- Volume 1 (Long and Short Papers), pages 3477–3486.
alli, Simone P. Ponzetto, and Chris Biemann. 2018.
Building a web-scale dependency-parsed corpus from Caroline Sporleder and Alex Lascarides. 2008. Using
CommonCrawl. In Proceedings of the Eleventh In- automatically labelled examples to classify rhetorical
ternational Conference on Language Resources and relations: an assessment. Natural Language Engi-
Evaluation (LREC 2018), Miyazaki, Japan. European neering, 14(3):369–416.
Language Resources Association (ELRA).
Changxing Wu, Liuwen Cao, Yubin Ge, Yang Liu, Min
Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Zhang, and Jinsong Su. 2022. A label dependence-
Toutanova, and Wen-tau Yih. 2017. Cross-sentence aware sequence generation model for multi-level im-
n-ary relation extraction with graph LSTMs. Trans- plicit discourse relation recognition. In Proceedings
actions of the Association for Computational Linguis- of the AAAI Conference on Artificial Intelligence,
tics, 5:101–115. volume 36, pages 11486–11494.
Changxing Wu, Chaowen Hu, Ruochen Li, Hongyu
Lin, and Jinsong Su. 2020. Hierarchical multi-task
learning with crf for implicit discourse relation recog-
nition. Knowledge-Based Systems, 195:105637.
Wei Xiang, Zhenglin Wang, Lu Dai, and Bang Wang.
2022. ConnPrompt: Connective-cloze prompt
learning for implicit discourse relation recognition.
In Proceedings of the 29th International Confer-
ence on Computational Linguistics, pages 902–911,
Gyeongju, Republic of Korea. International Commit-
tee on Computational Linguistics.
Frances Yung, Kaveri Anuranjana, Merel Scholman,
and Vera Demberg. 2022. Label distributions help
implicit discourse relation classification. In Pro-
ceedings of the 3rd Workshop on Computational Ap-
proaches to Discourse, pages 48–53, Gyeongju, Re-
public of Korea and Online. International Conference
on Computational Linguistics.
Table 7: Statistics of Discovery Dataset For analysis of the entanglement among relations,
we did a human evaluation on randomly extracted
examples from PDTB2. To better understand the
Relations Train Valid Test entanglement among relations, we further filter the
Comp.Concession 180 15 17 20 most confusing examples with entropy as a met-
Comp.Contrast 1566 166 128 ric. The entanglement is shown as Fig.6 in Sec. 4.5.
Cont.Cause 3227 281 269 We list these examples in Table 9 for clarity.
Cont.Pragmatic Cause 51 6 7
Expa.Alternative 146 10 9
Expa.Conjunction 2805 258 200
Expa.Instantiation 1061 106 118
Expa.List 330 9 12
Expa.Restatement 2376 260 211
Temp.Async 517 46 54
Temp.Sync 147 8 14
Total 12406 1165 1039
A Implementation Details
Figure 7: T-SNE Visualization of the Latent 𝒛. We draw the t-sne embeddings of each latent 𝑧 in 2-d space with the
well-trained 𝜓 𝑤2 as corresponding embedding vectors. While each 𝑧 groups markers with similar meanings, we can
also observe that related senses are clustered together. For example, temporal connectives and senses are located in
the top left corner with preceding (𝑧 27 ), succeeding (𝑧25 , 𝑧22 , 𝑧 16 ), synchronous (𝑧15 ) ones separated. The existence
of 𝒛 helps to construct a hierarchical view of semantics between sentences.
60 for_example
for_instance in_sum,
in_short,
in_fact,
in_particular,
in_turn,
by_then
by_doing_this,
as_a_result,
because_of_this
because_of_that in_contrast,
by_contrast,
by_comparison,
on_the_other_hand
in_other_words
[no-conn]
on_the_contrary,
in_the_end,
in_the_meantime,
40 meantime,
although,
fortunately,
thus,particularly,
surprisingly,
instead,
typically,
20 lately,
frequently, also, though, well,
finally, occasionally,
simultaneously actually, still, yet,
absolutely,
thereby, originally, really,
thereafter,
therefore further,
furthermore afterward
subsequently,
consequently
immediately, rather, perhaps,
currently,
conversely meaning,
ultimately,
moreover overall, here,
soon,
nevertheless nationally,
naturally, probably,
elsewhere,
likewise, meanwhile,
however suddenly,
truly,
otherwise, indeed, plus, usually,
mostly, only,
especially,
this, second,
third,
anyway, besides,
admittedly,
0 additionally
regardless, or, often,
obviously,
previously, separately,
alternatively and so, but first,
alternately,
nonetheless apparently, again,
together, maybe,
once, sometimes,
unsurprisingly, presumably,
undoubtedly,presently,
unfortunately, traditionally, later, then,
essentially, historically, now,
coincidentally,
incidentally,
evidently, next,
inevitably, interestingly,
significantly,
hence, luckily,
happily, importantly,
hopefully, altogether,
initially, theoretically,
realistically,
collectively,
basically,
20 already,
ironically,
technically,
thankfully,
amazingly,
sadly,
curiously,
seriously, truthfully,
honestly,
arguably, strangely,
eventually,
gradually,
supposedly,
similarly, normally,
generally,optionally,
notably,
remarkably, clearly,
certainly,
preferably, surely,
frankly, ideally,
specifically,
locally,
increasingly, personally,
slowly,
40 namely,
oddly,
accordingly
lastly,
thirdly,
recently, secondly,
firstly,
40 20 0 20 40 60
Figure 8: T-SNE Visualization of discourse markers from BASE. We draw the t-sne embeddings of each marker
in 2-d space with averaged token representations of markers from BASE PLM. Comparing to the well-organized
hierarchical view of latent senses in DMR, markers are not well-aligned to semantics in the representation space of
BASE. It indicates the limitation of bridging markers and relations with a direct mapping.
s1 s2 1st-pred 2nd-pred 3rd-pred
Instantiation Restatement List
Right away you notice the following It attracts people with funny hair
0.502 0.449 0.014
things about a Philip Glass concert
Restatement Conjunction Instantiation
There is a recognizable musical style The music is not especially pianistic
0.603 0.279 0.048
here, but not a particular performance
style
Restatement Instantiation List
Numerous injuries were reported Some buildings collapsed, gas and wa-
0.574 0.250 0.054
ter lines ruptured and fires raged
Cause Restatement Instantiation
this comparison ignores the intensely Its supposedly austere minimalism over-
0.579 0.319 0.061
claustrophobic nature of Mr. Glass’s lays a bombast that makes one yearn for
music the astringency of neoclassical Stravin-
sky, the genuinely radical minimalism
of Berg and Webern, and what in ret-
rospect even seems like concision in
Mahler
Cause Asynchronous Conjunction
The issue exploded this year after a Fed- While not specifically mentioned in the
0.504 0.400 0.045
eral Bureau of Investigation operation FBI charges, dual trading became a fo-
led to charges of widespread trading cus of attempts to tighten industry regu-
abuses at the Chicago Board of Trade lations
and Chicago Mercantile Exchange
Cause Conjunction Asynchronous
A menu by phone could let you decide, You’ll start to see shows where viewers
0.634 0.188 0.116
‘I’m interested in just the beginning of program the program
story No. 1, and I want story No. 2 in
depth
Cause Conjunction Restatement
His hands sit farther apart on the key- The chords modulate
0.604 0.266 0.082
board.Seventh chords make you feel as
though he may break into a (very slow)
improvisatory riff
Cause Restatement Instantiation
His more is always less Far from being minimalist, the music
0.456 0.433 0.052
unabatingly torments us with apparent
novelties not so cleverly disguised in
the simplicities of 4/4 time, octave inter-
vals, and ragtime or gospel chord pro-
gressions
Contrast Cause Concession
It requires that "discharges of pollu- Whatever may be the problems with this
0.484 0.387 0.072
tants" into the "waters of the United system, it scarcely reflects "zero risk"
States" be authorized by permits that re- or "zero discharge
flect the effluent limitations developed
under section 301
Restatement Conjunction Cause
The study, by the CFTC’s division of Whether a trade is done on a dual or non-
0.560 0.302 0.095
economic analysis, shows that "a trade dual basis doesn’t seem to have much
is a trade economic impact
Restatement Synchrony Asynchronous
Currently in the middle of a four-week, He sits down at the piano and plays
0.357 0.188 0.115
20-city tour as a solo pianist, Mr. Glass
has left behind his synthesizers, equip-
ment and collaborators in favor of going
it alone
Conjunction Contrast Synchrony
For the nine months, Honeywell re- Sales declined slightly to $5.17 billion
0.541 0.319 0.109
ported earnings of $212.1 million, or
$4.92 a share, compared with earnings
of $47.9 million, or $1.13 a share, a year
earlier
Restatement Conjunction Cause
The Bush administration is seeking an that while Bush wouldn’t alter a long-
0.465 0.403 0.094
understanding with Congress to ease re- standing ban on such involvement,
strictions on U.S. involvement in for- "there’s a clarification needed" on its
eign coups that might result in the death interpretation
of a country’s leader
s1 s2 1st-pred 2nd-pred 3rd-pred
Synchrony Asynchronous Cause
With "Planet News Mr. Glass gets go- His hands sit farther apart on the key-
0.503 0.202 0.147
ing board
Alternative Contrast Restatement
The Clean Water Act contains no "legal It requires that "discharges of pollu-
0.395 0.386 0.096
standard" of zero discharge tants" into the "waters of the United
States" be authorized by permits that re-
flect the effluent limitations developed
under section 301
Contrast Concession Conjunction
Libyan leader Gadhafi met with Egypt’s They stopped short of resuming diplo-
0.379 0.373 0.129
President Mubarak, and the two offi- matic ties, severed in 1979
cials pledged to respect each other’s
laws, security and stability
Conjunction Synchrony List
His hands sit farther apart on the key- Contrasts predictably accumulate
0.445 0.303 0.181
board.Seventh chords make you feel as
though he may break into a (very slow)
improvisatory riff.The chords modulate,
but there is little filigree even though
his fingers begin to wander over more
of the keys
Conjunction Restatement Contrast
NBC has been able to charge premium but to be about 40% above regular day-
0.409 0.338 0.224
rates for this ad time time rates
Cause Instantiation Restatement
Mr. Glass looks and sounds more like a The piano compositions are relentlessly
0.380 0.323 0.241
shaggy poet describing his work than a tonal (therefore unthreatening), unvary-
classical pianist playing a recital ingly rhythmic (therefore soporific),
and unflaggingly harmonious but un-
melodic (therefore both pretty and un-
conventional
Cause Asynchronous Conjunction
It attracts people with funny hair Whoever constitute the local Left Bank
0.369 0.331 0.260
come out in force, dressed in black
Table 9: High Entropy Examples of Model Inference on Implicit Discourse Relation Classification