Research on the structure of dialogue has been hampered for years because large dialogue corpora ... more Research on the structure of dialogue has been hampered for years because large dialogue corpora have not been available. This has impacted the dialogue research community's ability to develop better theories, as well as good off the shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic independent way. Our results show that our theoretically motivated features achieve 66% accuracy, an improve...
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Effective models of social dialog must understand a broad range of rhetorical and figurative devi... more Effective models of social dialog must understand a broad range of rhetorical and figurative devices. Rhetorical questions (RQs) are a type of figurative language whose aim is to achieve a pragmatic goal, such as structuring an argument, being persuasive, emphasizing a point, or being ironic. While there are computational models for other forms of figurative language, rhetorical questions have received little attention to date. We expand a small dataset from previous work, presenting a corpus of 10,270 RQs from debate forums and Twitter that represent different discourse functions. We show that we can clearly distinguish between RQs and sincere questions (0.76 F1). We then show that RQs can be used both sarcastically and non-sarcastically, observing that non-sarcastic (other) uses of RQs are frequently argumentative in forums, and persuasive in tweets. We present experiments to distinguish between these uses of RQs using SVM and LSTM models that represent linguistic features and post-level context, achieving results as high as 0.76 F1 for SARCASTIC and 0.77 F1 for OTHER in forums, and 0.83 F1 for both SARCASTIC and OTHER in tweets. We supplement our quantitative experiments with an in-depth characterization of the linguistic variation in RQs. 1 Subjects could provide multiple discourse functions for RQs, thus the frequencies do not add to 1.
In order to build dialogue systems to tackle the ambitious task of holding social conversations, ... more In order to build dialogue systems to tackle the ambitious task of holding social conversations, we argue that we need a data-driven approach that includes insight into human conversational "chit-chat", and which incorporates different natural language processing modules. Our strategy is to analyze and index large corpora of social media data, including Twitter conversations, online debates, dialogues between friends, and blog posts, and then to couple this data retrieval with modules that perform tasks such as sentiment and style analysis, topic modeling, and summarization. We aim for personal assistants that can learn more nuanced human language, and to grow from task-oriented agents to more personable social bots.
When people converse about social or political topics, similar arguments are often paraphrased by... more When people converse about social or political topics, similar arguments are often paraphrased by different speakers, across many different conversations. Debate websites produce curated summaries of arguments on such topics; these summaries typically consist of lists of sentences that represent frequently paraphrased propositions , or labels capturing the essence of one particular aspect of an argument, e.g. Morality or Second Amendment. We call these frequently paraphrased propositions ARGUMENT FACETS. Like these curated sites, our goal is to induce and identify argument facets across multiple conversations , and produce summaries. However, we aim to do this automatically. We frame the problem as consisting of two steps: we first extract sentences that express an argument from raw social media dialogs, and then rank the extracted arguments in terms of their similarity to one another. Sets of similar arguments are used to represent argument facets. We show here that we can predict ARGUMENT FACET SIMILARITY with a correlation averaging 0.63 compared to a human topline averaging 0.68 over three debate topics, easily beating several reasonable baselines.
More and more of the information available
on the web is dialogic, and a significant portion
of... more More and more of the information available
on the web is dialogic, and a significant portion
of it takes place in online forum conversations
about current social and political topics.
We aim to develop tools to summarize
what these conversations are about. What are
the CENTRAL PROPOSITIONS associated with
different stances on an issue; what are the abstract
objects under discussion that are central
to a speaker’s argument? How can we recognize
that two CENTRAL PROPOSITIONS realize
the same FACET of the argument? We hypothesize
that the CENTRAL PROPOSITIONS
are exactly those arguments that people find
most salient, and use human summarization
as a probe for discovering them. We describe
our corpus of human summaries of opinionated
dialogs, then show how we can identify
similar repeated arguments, and group them
into FACETS across many discussions of a
topic. We define a new task, ARGUMENT
FACET SIMILARITY (AFS), and show that we
can predict AFS with a .54 correlation score,
versus an ngram system baseline of .39 and
a semantic textual similarity system baseline
of .45
Research on the structure of dialogue has been hampered for years because large dialogue corpora ... more Research on the structure of dialogue has been hampered for years because large dialogue corpora have not been available. This has impacted the dialogue research community’s ability to develop better theories, as well as good off-the-shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic-independent way. Our results show that our theoretically motivated features achieve 66% accuracy, an improvement over a unigram baseline of an absolute 6%.
Research on the structure of dialogue has been hampered for years because large dialogue corpora ... more Research on the structure of dialogue has been hampered for years because large dialogue corpora have not been available. This has impacted the dialogue research community's ability to develop better theories, as well as good off the shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic independent way. Our results show that our theoretically motivated features achieve 66% accuracy, an improve...
Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue
Effective models of social dialog must understand a broad range of rhetorical and figurative devi... more Effective models of social dialog must understand a broad range of rhetorical and figurative devices. Rhetorical questions (RQs) are a type of figurative language whose aim is to achieve a pragmatic goal, such as structuring an argument, being persuasive, emphasizing a point, or being ironic. While there are computational models for other forms of figurative language, rhetorical questions have received little attention to date. We expand a small dataset from previous work, presenting a corpus of 10,270 RQs from debate forums and Twitter that represent different discourse functions. We show that we can clearly distinguish between RQs and sincere questions (0.76 F1). We then show that RQs can be used both sarcastically and non-sarcastically, observing that non-sarcastic (other) uses of RQs are frequently argumentative in forums, and persuasive in tweets. We present experiments to distinguish between these uses of RQs using SVM and LSTM models that represent linguistic features and post-level context, achieving results as high as 0.76 F1 for SARCASTIC and 0.77 F1 for OTHER in forums, and 0.83 F1 for both SARCASTIC and OTHER in tweets. We supplement our quantitative experiments with an in-depth characterization of the linguistic variation in RQs. 1 Subjects could provide multiple discourse functions for RQs, thus the frequencies do not add to 1.
In order to build dialogue systems to tackle the ambitious task of holding social conversations, ... more In order to build dialogue systems to tackle the ambitious task of holding social conversations, we argue that we need a data-driven approach that includes insight into human conversational "chit-chat", and which incorporates different natural language processing modules. Our strategy is to analyze and index large corpora of social media data, including Twitter conversations, online debates, dialogues between friends, and blog posts, and then to couple this data retrieval with modules that perform tasks such as sentiment and style analysis, topic modeling, and summarization. We aim for personal assistants that can learn more nuanced human language, and to grow from task-oriented agents to more personable social bots.
When people converse about social or political topics, similar arguments are often paraphrased by... more When people converse about social or political topics, similar arguments are often paraphrased by different speakers, across many different conversations. Debate websites produce curated summaries of arguments on such topics; these summaries typically consist of lists of sentences that represent frequently paraphrased propositions , or labels capturing the essence of one particular aspect of an argument, e.g. Morality or Second Amendment. We call these frequently paraphrased propositions ARGUMENT FACETS. Like these curated sites, our goal is to induce and identify argument facets across multiple conversations , and produce summaries. However, we aim to do this automatically. We frame the problem as consisting of two steps: we first extract sentences that express an argument from raw social media dialogs, and then rank the extracted arguments in terms of their similarity to one another. Sets of similar arguments are used to represent argument facets. We show here that we can predict ARGUMENT FACET SIMILARITY with a correlation averaging 0.63 compared to a human topline averaging 0.68 over three debate topics, easily beating several reasonable baselines.
More and more of the information available
on the web is dialogic, and a significant portion
of... more More and more of the information available
on the web is dialogic, and a significant portion
of it takes place in online forum conversations
about current social and political topics.
We aim to develop tools to summarize
what these conversations are about. What are
the CENTRAL PROPOSITIONS associated with
different stances on an issue; what are the abstract
objects under discussion that are central
to a speaker’s argument? How can we recognize
that two CENTRAL PROPOSITIONS realize
the same FACET of the argument? We hypothesize
that the CENTRAL PROPOSITIONS
are exactly those arguments that people find
most salient, and use human summarization
as a probe for discovering them. We describe
our corpus of human summaries of opinionated
dialogs, then show how we can identify
similar repeated arguments, and group them
into FACETS across many discussions of a
topic. We define a new task, ARGUMENT
FACET SIMILARITY (AFS), and show that we
can predict AFS with a .54 correlation score,
versus an ngram system baseline of .39 and
a semantic textual similarity system baseline
of .45
Research on the structure of dialogue has been hampered for years because large dialogue corpora ... more Research on the structure of dialogue has been hampered for years because large dialogue corpora have not been available. This has impacted the dialogue research community’s ability to develop better theories, as well as good off-the-shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic-independent way. Our results show that our theoretically motivated features achieve 66% accuracy, an improvement over a unigram baseline of an absolute 6%.
Uploads
Papers by Amita Misra
on the web is dialogic, and a significant portion
of it takes place in online forum conversations
about current social and political topics.
We aim to develop tools to summarize
what these conversations are about. What are
the CENTRAL PROPOSITIONS associated with
different stances on an issue; what are the abstract
objects under discussion that are central
to a speaker’s argument? How can we recognize
that two CENTRAL PROPOSITIONS realize
the same FACET of the argument? We hypothesize
that the CENTRAL PROPOSITIONS
are exactly those arguments that people find
most salient, and use human summarization
as a probe for discovering them. We describe
our corpus of human summaries of opinionated
dialogs, then show how we can identify
similar repeated arguments, and group them
into FACETS across many discussions of a
topic. We define a new task, ARGUMENT
FACET SIMILARITY (AFS), and show that we
can predict AFS with a .54 correlation score,
versus an ngram system baseline of .39 and
a semantic textual similarity system baseline
of .45
This has impacted the dialogue research community’s ability to develop better theories, as well as good off-the-shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection
in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent
features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic-independent way.
Our results show that our theoretically motivated features achieve 66% accuracy, an improvement over a unigram baseline of an absolute 6%.
on the web is dialogic, and a significant portion
of it takes place in online forum conversations
about current social and political topics.
We aim to develop tools to summarize
what these conversations are about. What are
the CENTRAL PROPOSITIONS associated with
different stances on an issue; what are the abstract
objects under discussion that are central
to a speaker’s argument? How can we recognize
that two CENTRAL PROPOSITIONS realize
the same FACET of the argument? We hypothesize
that the CENTRAL PROPOSITIONS
are exactly those arguments that people find
most salient, and use human summarization
as a probe for discovering them. We describe
our corpus of human summaries of opinionated
dialogs, then show how we can identify
similar repeated arguments, and group them
into FACETS across many discussions of a
topic. We define a new task, ARGUMENT
FACET SIMILARITY (AFS), and show that we
can predict AFS with a .54 correlation score,
versus an ngram system baseline of .39 and
a semantic textual similarity system baseline
of .45
This has impacted the dialogue research community’s ability to develop better theories, as well as good off-the-shelf tools for dialogue processing. Happily, an increasing amount of information and opinion exchange occur in natural dialogue in online forums, where people share their opinions about a vast range of topics. In particular we are interested in rejection
in dialogue, also called disagreement and denial, where the size of available dialogue corpora, for the first time, offers an opportunity to empirically test theoretical accounts of the expression and inference of rejection in dialogue. In this paper, we test whether topic-independent
features motivated by theoretical predictions can be used to recognize rejection in online forums in a topic-independent way.
Our results show that our theoretically motivated features achieve 66% accuracy, an improvement over a unigram baseline of an absolute 6%.