Papers by Tonya trapp custis
Westlaw Edge, a new legal research platform from Westlaw, was launched in 2018. The three AI-enab... more Westlaw Edge, a new legal research platform from Westlaw, was launched in 2018. The three AI-enabled features that launched with Westlaw Edge are KeyCite Overruling Risk, Litigation Analytics, and WestSearch Plus. Keycite Overruling Risk uses NLP and machine learning to warn users when a point of law in a case may have been implicitly undermined based on a prior decision, when that prior citation has no direct citation relationship to the at-risk case. Litigation Analytics allows users to access valuable metadata extracted from legal dockets about the legal actions carried out by parties, lawyers, law firms, and judges presiding over cases. WestSearch Plus is a non-factoid Question Answering system that provides legally correct, jurisdictionally relevant, and conversationally responsive answers to user-entered questions in the legal domain. Added AI-enabled features are set to launch on the Westlaw Edge platform in 2019. It is available for demo at ICAIL 2019.
Lecture Notes in Computer Science, 2018
This paper proposes a new formulation of Gaussian process for constraints with piece-wise smooth ... more This paper proposes a new formulation of Gaussian process for constraints with piece-wise smooth conditions. Combining ideas from decision trees and Gaussian processes, it is shown that the new model can effectively identify the non-smooth regions and tackle the non-smoothness in piece-wise smooth constraint functions. A constrained Bayesian optimizer is then constructed to handle optimization problems with both noisy objective and constraint functions.
Knowledge Discovery and Data Mining, 2020
Finding relevant sources of law that discuss a specific legal issue and support a favorable decis... more Finding relevant sources of law that discuss a specific legal issue and support a favorable decision is an onerous and time-consuming task for litigation attorneys. In this paper, we present Quick Check, a system that extracts the legal arguments from a user's brief and recommends highly relevant case law opinions. Using a combination of full-text search, citation network analysis, clickstream analysis, and a hierarchy of ranking models trained on a set of over 10K annotations, the system is able to effectively recommend cases that are similar in both legal issue and facts. Importantly, the system leverages a detailed legal taxonomy and an extensive body of editorial summaries of case law. We demonstrate how recommended cases from the system are surfaced through a user interface that enables a legal researcher to quickly determine the applicability of a case with respect to a given legal issue. CCS CONCEPTS • Information systems → Retrieval models and ranking; • Computing methodologies → Information extraction.
arXiv (Cornell University), Feb 29, 2016
Over the past few decades, the rate of publication retractions has increased dramatically in acad... more Over the past few decades, the rate of publication retractions has increased dramatically in academia. In this study, we investigate retractions from a quantitative perspective, aiming to answer two fundamental questions. One, how do retractions influence the scholarly impact of retracted papers, authors, and institutions? Two, does this influence propagate to the wider academic community through scholarly associations? Specifically, we analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of non-retracted articles, authors, and institutions. We further applied the Granger Causality test to investigate whether different scientific topics are dynamically affected by retracted papers occurring within those topics. Our results show two key findings: first, the scholarly impact of retracted papers and authors significantly decreases after retraction, and the most severe impact decrease correlates to retractions based on proven purposeful scientific misconduct; second, this retraction penalty does not seem to spread through the broader scholarly social graph, but instead has a limited and localized effect. Our findings may provide useful insights for scholars or science committees to evaluate the scholarly value of papers, authors, or institutions related to retractions.
Non-factoid question answering in the legal domain must provide legally correct, jurisdictionally... more Non-factoid question answering in the legal domain must provide legally correct, jurisdictionally relevant, and conversationally responsive answers to user-entered questions. We present work done on a QA system that is entirely based on IR and NLP, and does not rely on a structured knowledge base. Our system retrieves concise one-sentence answers for basic questions about the law. It is not restricted in scope to particular topics or jurisdictions. The corpus of potential answers contains approximately 22M documents classified to over 120K legal topics.
We apply language modeling keyword search augmented with Berger and Lafferty&... more We apply language modeling keyword search augmented with Berger and Lafferty's (1999) translation model for query expansion to formulate three query expansion methods using word co-occurrence statistics from a large external corpus and user clickthrough data. We study the performance of these methods on a vertical domain (case law documents) using standard metrics and an evaluation framework designed specifically to
arXiv (Cornell University), Jul 9, 2020
Deep classifiers tend to associate a few discriminative input variables with their objective func... more Deep classifiers tend to associate a few discriminative input variables with their objective function, which in turn, may hurt their generalization capabilities. To address this, one can design systematic experiments and/or inspect the models via interpretability methods. In this paper, we investigate both of these strategies on deep models operating on point clouds. We propose PointMask, a model-agnostic interpretable information-bottleneck approach for attribution in point cloud models. PointMask encourages exploring the majority of variation factors in the input space while gradually converging to a general solution. More specifically, PointMask introduces a regularization term that minimizes the mutual information between the input and the latent features used to masks out irrelevant variables. We show that coupling a PointMask layer with an arbitrary model can discern the points in the input space which contribute the most to the prediction score, thereby leading to interpretability. Through designed bias experiments, we also show that thanks to its gradual masking feature, our proposed method is effective in handling data bias.
Proceedings of the AAAI Conference on Artificial Intelligence
Twitter has become an important online source for real-time news dissemination. Especially, offic... more Twitter has become an important online source for real-time news dissemination. Especially, official accounts of local government and media outlets have provided newsworthy and authoritative information, revealing local trends and breaking news. In this paper, we describe TipMaster an automatically constructed knowledge base of Twitter accounts that are likely to report local news, from government agencies to local media outlets. First, we implement classifiers for detecting these accounts by integrating heterogeneous information from the accounts' textual metadata, profile images, and their tweet messages. Next, we demonstrate two use cases for TipMaster: 1) as a platform that monitors real-time social media messages for local breaking news, and 2) as an authoritative source for verifying nascent rumors. Experimental results show that our account classification algorithms achieve both high precision and recall (around 90%). The demonstrated case studies prove that our platform ...
arXiv (Cornell University), May 12, 2020
The latent variables learned by VAEs have seen considerable interest as an unsupervised way of ex... more The latent variables learned by VAEs have seen considerable interest as an unsupervised way of extracting features, which can then be used for downstream tasks. There is a growing interest in the question of whether features learned on one environment will generalize across different environments. We demonstrate here that VAE latent variables often focus on some factors of variation at the expense of others-in this case we refer to the features as "imbalanced". Feature imbalance leads to poor generalization when the latent variables are used in an environment where the presence of features changes. Similarly, latent variables trained with imbalanced features induce the VAE to generate less diverse (i.e. biased towards dominant features) samples. To address this, we propose a regularization scheme for VAEs, which we show substantially addresses the feature imbalance problem. We also introduce a simple metric to measure the balance of features in generated images.
Journal of the Association for Information Science and Technology, 2017
Over the past few decades, the rate of publication retractions has increased dramatically in acad... more Over the past few decades, the rate of publication retractions has increased dramatically in academia. In this study, we investigate retractions from a quantitative perspective, aiming to answer two fundamental questions. One, how do retractions influence the scholarly impact of retracted papers, authors, and institutions? Two, does this influence propagate to the wider academic community through scholarly associations? Specifically, we analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of non-retracted articles, authors, and institutions. We further applied the Granger Causality test to investigate whether different scientific topics are dynamically affected by retracted papers occurring within those topics. Our results show two key findings: first, the scholarly impact of retracted papers and authors significantly decreases after retraction, and the most severe impact decrease correlates to retractions based on proven purposeful scientific misconduct; second, this retraction penalty does not seem to spread through the broader scholarly social graph, but instead has a limited and localized effect. Our findings may provide useful insights for scholars or science committees to evaluate the scholarly value of papers, authors, or institutions related to retractions.
Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2019
We present a non-factoid QA system that provides legally accurate, jurisdictionally relevant, and... more We present a non-factoid QA system that provides legally accurate, jurisdictionally relevant, and conversationally responsive answers to user-entered questions in the legal domain. This commercially available system is entirely based on NLP and IR, and does not rely on a structured knowledge base. WestSearch Plus aims to provide concise one sentence answers for basic questions about the law. It is not restricted in scope to particular topics or jurisdictions. The corpus of potential answers contains approximately 22M documents classified to over 120K legal topics.
Finding relevant sources of law that discuss a specific legal issue and support a favorable decis... more Finding relevant sources of law that discuss a specific legal issue and support a favorable decision is an onerous and time-consuming task for litigation attorneys. In this paper, we present Quick Check, a system that extracts the legal arguments from a user’s brief and recommends highly relevant case law opinions. Using a combination of full-text search, citation network analysis, clickstream analysis, and a hierarchy of ranking models trained on a set of over 10K annotations, the system is able to effectively recommend cases that are similar in both legal issue and facts. Importantly, the system leverages a detailed legal taxonomy and an extensive body of editorial summaries of case law. We demonstrate how recommended cases from the system are surfaced through a user interface that enables a legal researcher to quickly determine the applicability of a case with respect to a given legal issue.
Journal of the Association for Information Science and Technology
Over the past few decades, the rate of publication retractions has increased dramatically in acad... more Over the past few decades, the rate of publication retractions has increased dramatically in academia. In this study, we investigate retractions from a quantitative perspective, aiming to answer two fundamental questions. One, how do retractions influence the scholarly impact of retracted papers, authors, and institutions? Two, does this influence propagate to the wider academic community through scholarly associations? Specifically, we analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of non-retracted articles, authors, and institutions. We further applied the Granger Causality test to investigate whether different scientific topics are dynamically affected by retracted papers occurring within those topics. Our results show two key findings: first, the scholarly impact of retracted papers and authors significantly decreases after retraction, and the most severe impact decrease correlates to retractions based on proven purposeful scientific misconduct; second, this retraction penalty does not seem to spread through the broader scholarly social graph, but instead has a limited and localized effect. Our findings may provide useful insights for scholars or science committees to evaluate the scholarly value of papers, authors, or institutions related to retractions.
Proceedings of the 30th Annual International Acm Sigir Conference on Research and Development in Information Retrieval, 2007
The effectiveness of information retrieval (IR) systems is influenced by the degree of term overl... more The effectiveness of information retrieval (IR) systems is influenced by the degree of term overlap between user queries and relevant documents. Query-document term mismatch, whether partial or total, is a fact that must be dealt with by IR systems. Query Expansion (QE) is one method for dealing with term mismatch. IR systems implementing query expansion are typically evaluated by executing
Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08, 2008
We apply language modeling keyword search augmented with Berger and Lafferty&... more We apply language modeling keyword search augmented with Berger and Lafferty's (1999) translation model for query expansion to formulate three query expansion methods using word co-occurrence statistics from a large external corpus and user clickthrough data. We study the performance of these methods on a vertical domain (case law documents) using standard metrics and an evaluation framework designed specifically to
Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07, 2007
The effectiveness of information retrieval (IR) systems is influenced by the degree of term overl... more The effectiveness of information retrieval (IR) systems is influenced by the degree of term overlap between user queries and relevant documents. Query-document term mismatch, whether partial or total, is a fact that must be dealt with by IR systems. Query Expansion (QE) is one method for dealing with term mismatch. IR systems implementing query expansion are typically evaluated by executing
International Conference on Information and Knowledge Management, 2008
We apply language modeling keyword search augmented with Berger and Lafferty's (1999) transla... more We apply language modeling keyword search augmented with Berger and Lafferty's (1999) translation model for query expansion to formulate three query expansion methods using word co-occurrence statistics from a large external corpus and user clickthrough data. We study the performance of these methods on a vertical domain (case law documents) using standard metrics and an evaluation framework designed specifically to
International Conference on Information and Knowledge Management, 2008
We apply language modeling keyword search augmented with Berger and Lafferty's (1999) transla... more We apply language modeling keyword search augmented with Berger and Lafferty's (1999) translation model for query expansion to formulate three query expansion methods using word co-occurrence statistics from a large external corpus and user clickthrough data. We study the performance of these methods on a vertical domain (case law documents) using standard metrics and an evaluation framework designed specifically to
Uploads
Papers by Tonya trapp custis