Court Cases Winning Stratergys

The 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)
Taipei, Taiwan, November 21-22, 2022. The Association for Computational Linguistics and Chinese Language Processing
Legal Case Winning Party Prediction With Domain Specific

Auxiliary Models
Sahan Jayasinghe, Lakith Rambukkanage, Ashan Silva, Nisansa de Silva,

Amal Shehan Perera
Department of Computer Science & Engineering, University of Moratuwa, Sri Lanka

{sahanjayasinghe.17, rambukkanage.17, ashansilva.17, nisansaDds, shehan}@cse.mrt.ac.lk
Abstract a text rich domain with a growing need for

task automation. Legal domain corpora con-
Sifting through hundreds of old case doc-
uments to obtain information pertinent sists of statutes, regulations, constitutions and
to the case in hand has been a major case law documents among many others which
part of the legal profession for centuries. have to be repeatedly and constantly sifted
However, with the expansion of court sys- through by legal professionals to obtain infor-
tems and the compounding nature of case mation pertinent to their current case. This
law, this task has become more and more research is primarily carried on Case Law doc-
intractable with time and resource con- uments where a model is train on a corpus of
straints. Thus automation by Natural Lan-
existing case law documents so that a predic-
guage Processing presents itself as a viable
solution. In this paper, we discuss a novel tion of the winning party in a current case law
approach for predicting the winning party document can be obtained.
of a current court case by training an an-
alytical model on a corpus of prior court 1.1 Case Law
cases which is then run on the prepared In the legal domain, when confronted with a
text on the current court case. This will new case, where statues, regulations and con-
allow legal professionals to eﬀiciently and
stitutions cannot be used to straightforwardly
precisely prepare their cases to maximize
the chance of victory. The model is built arrive at a case decision, the courts refer to
with and experimented using legal domain Case Law. Case Law is the practice of using
specific sub-models to provide more visi- the information and verdicts of previous cases
bility to the final model, along with other as arguments for the case in hand where the
variations. We show that our model with older cases bear some semblance in one aspect
critical sentence annotation with a trans- or another to the contemporary case. (Cornell
former encoder using RoBERTa based sen-
Law School, 2020a).
tence embedding is able to obtain an accu-
racy of 75.75%, outperforming other mod- Since case law documents have a predictive,
els. or rather a prescriptive value, in the domain
itself, they are valuable resources for predic-
Keywords: Natural Language Process- tive tasks in both research and practical ap-
ing, Legal Domain, Case Law, Transformer En- plications. As time goes on and more and
coders more cases are closed, cases available to refer
grow in abundance on a daily basis. For hu-
1 Introduction
man legal professionals, this is a negative as
Natural Language Processing (NLP) is under- it makes their task of remembering and refer-
going rapid development and has proven to ring to these cases increasingly hard. But on
be practically useful across many text rich do- the perspective of deep learning models, this
mains. With the proper utilization of tools growth is a blessing rather than a hindrance
and technologies, effective methodologies can as more and more data is gathered, the relia-
be derived to tackle various problems that bility and accuracy of the models increase. In
are repetitive, cognitively demanding and time this study, we have used case law documents
consuming otherwise. Legal domain is such to train our models.
205
1.2 Legal Party documents contain a prescriptive values and

In all legal cases two main parties are can be used as a data source for predictions
present (Cornell Law School, 2020b). One of court case decisions. Also in United States
party corresponds to the party filing the case courts, all the facts that are to be brought up
who is referred to as petitioner or plaintiff. In in the case is known in advance by both par-
criminal cases they may also be referred to ties. With this, legal professional can prepare
as the prosecutor which is a government en- a document with arguments they are going to
tity. On the other hand, we have the party use and arguments their opposing party may
responding to the case which is referred to as use which is similar to a case law document. If
the defendant or respondent. In criminal cases, this document can be given a benchmark, that
this party may also be referred to as the ac- is to predict if the case can be won by the given
cused. These parties may consist of individu- arguments and facts, it would be a valuable in-
als, groups of people, or organizations. Also sight for legal professionals. They can revise
there may be third parties in a case who are their facts and arguments with inclusions, ex-
unaffected by the case decision. It is impor- clusions and introductions of new facts to in-
tant to note that, in the case of an appeal, crease their likelihood winning the case. Dorf
the party appealed will become the petitioner (1994) observes by pointing to Holmes (1920)
in the new case (Cornell Law School, 2020c). that this practice of trying to predict the out-
For the benefit of readability, for the rest of come of a court case at hand predates any at-
this paper, we will refer to the two parties as tempt at automation.
petitioner and defendant. In this research we discuss a novel approach
to predict the winning party of a court case us-
1.3 NLP in the Legal Domain ing case law documents from the United States
Supreme Court. The past work that have been
Recently many researchers have conducted le-
carried out is discussed in Section 2. The for-
gal domain specific researches. Among these,
mulation of our methodology is discussed in
researches on legal domain specific embedding
Section 3 and the experiments carried out and
(Sugathadasa et al., 2017, 2018; Jayawardana
the achieved results are discussed in Section 4.
et al., 2017a), legal ontology (Jayawardana
et al., 2017a,c,b), sentiment analysis (Gam-
2 Related Work
age et al., 2018; Ratnayaka et al., 2020),
and discourse analysis (Ratnayaka et al., In the work by Shaikha et al. (2020), they have
2018, 2019b,a) can be observed. Also, gran- categorized the past approaches to predict the
ular objectives such as party identification outcome of a legal case into three categories.
(Samarawickrama et al., 2020; de Almeida Three approaches are distinguished by the use
et al., 2020; Samarawickrama et al., 2021), of 1) political or social science based, 2) lin-
Party Based Sentiment Analysis (Rajapaksha guistics based or 3) legal domain based fea-
et al., 2020; Mudalige et al., 2020; Rajapak- tures as the descriptors for the machine learn-
sha et al., 2021), and critical sentence iden- ing algorithms they use. 19 features have been
tification (Jayasinghe et al., 2021) have been formalized with respect to the legal domain,
explored among these researches. However, that has the potential to impact the decision
there is still the need and opportunity for these of a criminal court case. It is important to
models to be used for higher level derivations note that feature extraction is manually done
that are more human readable or practically by going through court cases, and therefore it
useful. requires experts to identify the features. Af-
ter feature extraction and preprocessing, re-
1.4 Winning Party Prediction searchers have conducted classification under 8
Legal professionals, among other preparations, different algorithms such as Regression Trees,
go through case law documents in order to pre- Bagging and Random Forests, Support Vector
pare for ongoing court cases. The use of case Machines and K-nearest neighbours. Classifi-
law documents during preparation and during cation and Regression Trees have been found
the court case, gives the intuition that these to be the best performing.
206
In the research by Waltl et al. (2017), they the decision class and the unanimity of deci-
have conducted their research fundamentally sions have been designed. They have achieved
on German tax law cases. The research is good accuracy for some of the many model
conducted on features extracted using mostly variations.
regular expressions and manual annotations.
A Naive Bayes classifier have been chosen as 3 Methodology
the best performing machine learning model.
In this section, the approach used for dataset
They have achieved 0.57 precision, 0.58 F1
preparation and the methodology for deriving
score and 0.60 recall for positive outcomes.
the architecture used in this research are be
Research done by Aletras et al. (2016) on discussed.
predicting the decision of the European court
of human rights, is identified as the first sys- 3.1 Dataset Preparation
tematic approach to predicting winning par- As observed by Kreutzer et al. (2022), the
ties by using NLP, as per the authors. They quality of the data sets used often play a vi-
have modeled the problem as a binary classi- tal role in research. This research was con-
fication problem, while using Support Vector ducted on a dataset extracted from the case
Machines and N-grams and topics as features law website1 ranging from the year 2000 to
for the model. year 2010 and belonging to the criminal cate-
Liu and Chen (2017) also proposes a classi- gory. The extracted cases were pre-processed
fication approach for identifying the winning by removing paragraphs at the beginning and
party of a court case. The process consists of the end. These paragraphs include the in-
two phases. In the 1st phase, an Article Classi- troductory paragraphs where the background
fication model extracts top k articles that are of the case is summarized and the last para-
cited in the case document. In the 2nd phase, graphs where the decision is stated. After-
the Judgement Classification model tries to wards several preprocessing steps were applied
predict the judgement of the court case. They to the remaining paragraphs to remove cita-
have considered domain specific aspects such tions and other notations, as they do not add
as punishment, cited statutes and features de- any semantic meaning to the case. In our data
rived using NLP such as sentiment, as features pair, these cleaned and remaining paragraphs
for their model. constitute the input. Since the decision of each
A tree based approach which uses new fea- case was found in the aforementioned removed
ture engineering techniques is proposed in the paragraphs with a retrievable convention in al-
research conducted by Katz et al. (2014). The most all the cases, the decision of the court
dataset used in this research consist of cases cases were extracted automatically. In our
from the United States Supreme Court. Re- data pair, this extracted verdict constitutes
searchers have considered the impact from po- the expected output.
litical biases for the decisions as well. They Stanza NLP Library (Qi et al., 2020) was
have used data ranging over multiple presi- used to split a court case document into a list
dential terms to generalize the model more. of sentences as for the representation purposes
Features already present with there chosen discussed in Section 3.2. Since Stanza is a
dataset have been used and some has been in- general purpose NLP library (not specifically
troduced by them. With the 7700 cases used, trained on legal context), there could be sen-
they have succeeded in getting 69.7% accuracy tences divided by the periods in between ab-
and individual judge votes with 70.9% accu- breviations (some of which are specific jargon
racy. of the legal domain) and the periods within
Lage-Freitas et al. (2019) have proposed a brackets. So, further pre-processing steps were
machine learning approach to develop a sys- needed to be taken to make the sentence split-
tem that predicts Brazilian court decisions. ting process accurate.
Researchers have suggested for it to be used
as a supporting tool or a benchmark for legal • Removed text within rounded brackets.
professionals. The approach to calculate both 1
https://caselaw.findlaw.com/
207
• Replaced abbreviations specific to legal 3.2 Model Architecture

domain with their long form. As shown The approach taken to predict the winning
in the following examples: party of is discussed in this section. Each case
document is represented as a sequence of sen-
– Fed.R.Crim.P.→ Federal Rule of
tences. The model takes the corresponding
Criminal Procedure
sentence vector sequences as input.
– Fed.R.Evid.→ Federal Rule of Evi- Dimensions containing additional informa-
dence tion about a case sentence, such as the crit-
icality of a sentence towards a party, can
• Removed square brackets around letters be annotated using Critical Sentence Identi-
or words. (Ex: [T]he, Extend[ed], [peti- fication model which is derived in the work
tioner]) by Jayasinghe et al. (2021). Given a case sen-
tence, their system outputs probabilities for
• Removed numbering from topic sentences four classes which defines the criticality of the
(Ex: II., A., 3.) sentence within that court case.
A case document in US Supreme Court is 1. Has a negative impact towards petitioner

generally structured as follows: in a case where petitioner loses
2. Has a positive impact towards petitioner

• Background information of the Case (rep-
in a case where petitioner loses
resented Jury, Date of Hearing)
3. Has a negative impact towards petitioner
• A description of the case scenario in a case where petitioner wins
1. Involved parties and their members 4. Has a positive impact towards petitioner
(petitioners and defendants) in a case where petitioner wins
2. How the case is formed (cause for fil-
ing the case) A sentence is considered to be critical if it
3. Available Evidence has a negative impact towards petitioner party
in a case where petitioner loses. Also, a sen-
4. Lower court decision (Where the case
tence which has a positive impact towards the
was initially called)
petitioner party is considered critical in a case
where petitioner wins. Sentences predicted
• Supreme Court hearing
with a high probability for other classes con-
1. Charges against the petitioner sidered to be non-critical.
(he/she is the defendant in the first Probabilities for the four criticality classes
hearing by lower court) provided by the Critical Sentence Identifica-
tion model are appended to sentence vectors
2. Opinions of Jury
there by increasing the dimension. The impact
3. Arguments brought forward by each of the addition is discussed in Experiments and
party Results section 4
The sentence vector sequence representing
• Footnotes a court case document is then passed on to
Document Encoder model which is configured
After case documents were labeled with the by using Recurrent Neural Networks (RNN)
decision, notion of winning was defined with or Transformer Encoder layers. The output of
respect to the petitioner party. Aﬀirmation, the Document Encoder model is used to ob-
dismissal or rejection of a case by US Supreme tain petitioner party winning probability via
Court results in petitioner losing the case. Re- the classifier component. This classifier com-
versal of the lower court decision results in the ponent is configured by using a Linear Neural
petitioner winning the case. Network. Linear neural network ends with a
208
single-node layer which outputs the probabil-

ity of petitioner party winning the case. The
discussed overall workflow of the process is de-
picted in Fig. 1.
Figure 2: RNN based Model
Figure 1: Winning Party Prediction Workflow
When the nature of a legal case is consid-

ered, often times the case is that the proba-
bility of Defendant party winning the case is Figure 3: Transformer Encoder based Model
equal to the probability of Petitioner party los-
ing the case. There maybe cases for which it
is not necessarily true, but we have followed to a single node which is trained to predict the
that convention in this research. probability of winning of the petitioner party.
The internal architecture for RNN based Transformer Encoder based model architec-
Wining Party Prediction model is displayed in ture (Fig. 3) is built using the encoder com-
Fig. 2 and for transformer encoder is displayed ponent of the original Transformer implemen-
in Fig. 3. tation (Vaswani et al., 2017). Document En-
In the RNN based model architecture (Fig. coder takes the sequence of sentence vectors
2), Document Encoder consists of a single as the input and adds the positional encod-
layer of either GRU (Chung et al., 2014) or ing to it. Positional encoding vector is cal-
LSTM (Hochreiter and Schmidhuber, 1997) culated using the dimension of the input sen-
where the final state vector is passed on to the tence vectors. Then the processed vector se-
classifier as the input. Classifier is built using quence is passed through a series of internal
a series of Dense Layers gradually down sized encoder layers. These encoder layers are dupli-
209
cates of the same configuration and are built

up of multi-head attention and position-wise
feed forward layers. As per the definition of
the Transformer Encoder by Vaswani et al.
(2017), Multi-head attention layer is perform-
ing scaled-dot product on the input sequence.
A normalization layer is used after multi-head
attention layer and point-wise feed forward
network to normalize the output vector of each
layer. Global average pooling is used to reduce
the 3-D output vector of the final encoder to
a 2-D vector which is passed as input to the Figure 4: Number of Layers vs Validation Accu-
Classifier. racy
4 Experiments and Results

Due to its suitability to handle datasets with
Experiments are performed by varying the imbalanced classes, Binary Focal Loss (Lin
Document Encoder model configurations and et al., 2017) is used to calculate the loss at
application of additional details to case sen- each train step. At each training step, Focal
tences using the Critical Sentence Identifica- Loss down-weights the loss for examples classi-
tion model (Jayasinghe et al., 2021). Docu- fied with higher accuracy of the dominant class
ment Encoder is experimented using different and up-weights the loss for incorrectly classi-
RNN configurations and Transformer Encoder fied examples of the minority class.
configurations. To identify the number of lay- We summarize out findings in Table 1. It
ers best suitable for the transformer encoder, is curious to note that GRU with Sentence-
it was experimented with layers 6,3,2, and 1. BERT edges out the random baseline of 50%
As seen in the Fig 4, the best number of layers by only a narrow margin. This is an testament
for the transformer encoder was found to be 1 to the fact that the problem of Winning Party
in this case. RNN and Transformer Encoder Prediction is non-trivial. The additional de-
components are used to encode the case doc- tails provided by the critical sentence identifi-
uments. RNN models are experimented with cation model (Jayasinghe et al., 2021), proved
both GRU and LSTM variations. Pre-trained to be effective in predicting the winning party
Sentence-BERT by Reimers and Gurevych as per the results depicted in Table 1. This im-
(2019), based on BERT (Devlin et al., 2018) provement is better visible in the case of GRUs
and DistilBERT by Sanh et al. (2019),a dis- than in the case of Transformers. Neverthe-
tilled version of the RoBERTa-base (Liu et al., less, even with transformers, the improvement
2019), models are used for sentence embed- is relatively significant. DistilBERT (Sanh
ding. Model building, training and evaluation et al., 2019) embeddings have clearly outper-
are done using Tensorflow v2.8. formed pure Sentence-BERT (Reimers and
The following configurations were used for Gurevych, 2019) configurations. The best per-
the Transformer Encoder: forming configuration therefore is to use trans-
former encoders with DistilBERT sentence em-
• Number of Encoder layers = 1 beddings and the critical sentence annotation.
• Number of Attention Heads = 8 5 Conclusion and Future Work

• Vector Dimension = 768 Legal domain corpora carries its own complex-
ities due to the domain nature. Therefore ap-
Classifier model, which predicts the proba- plying NLP in the legal domain requires do-
bility of petitioner winning takes the output main specific approaches. In this study, we
from document encoder as the input and it is showed that our model with critical sentence
configured using a sequence of Dense Layers annotation with a transformer encoder using
starting from 128 nodes. RoBERTa based sentence embedding is able to
210
Critical
Sentence Sentence
Model Embedding Annotation Accuracy Macro F1
Sentence-BERT N 56.32 53.14
GRU DistilBERT N 65.71 57.14
DistilBERT Y 73.05 63.27
LSTM DistilBERT Y 72.04 65.52
GRU - Bidirectional DistilBERT Y 75.46 63.88
Sentence-BERT N 69.26 60.85
Transformer Encoder DistilBERT N 74.88 64.96
DistilBERT Y 75.75 66.54
Table 1: Winning Party Prediction Metrics
obtain an accuracy of 75.75%, outperforming cal evaluation of gated recurrent neural net-
other models. The need for domain-specific works on sequence modeling. arXiv preprint
arXiv:1412.3555.
models can also be seen by the increase in ac-
curacy when the critical sentence annotation Cornell Law School. 2020a. Case law. https://www.
is used. This system can be horizontally ex- law.cornell.edu/wex/case_law. Accessed: 2022-
tended by adding more sub models to pro- 08-18.
vide features to the final model. While the Cornell Law School. 2020b. Legal party. https:
results obtained by DistilBERT (Sanh et al., //www.law.cornell.edu/wex/party. Accessed:
2019) sentence embeddings are impressive, ex- 2022-08-18.
tending the conclusions drawn by Sugathadasa Cornell Law School. 2020c. Petitioner. https:
et al. (2017) for word embeddings, it can be //www.law.cornell.edu/wex/petitioner. Ac-
postulated that legal-domain specific sentence cessed: 2022-08-18.
embeddings would potentially reap better re-
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
sults. Also as future work, the impact of hav- Kristina Toutanova. 2018. Bert: Pre-training of
ing models trained with supervised approaches deep bidirectional transformers for language un-
and unsupervised approaches should be experi- derstanding. arXiv preprint arXiv:1810.04805.
mented, as legal domain has a deficit of labeled
Michael C Dorf. 1994. Prediction and the rule of
data compared to its large corpora. law. UCLA L. Rev., 42:651.
Viraj Gamage, Menuka Warushavithana, Nisansa

References de Silva, Amal Shehan Perera, Gathika Rat-
nayaka, and Thejan Rupasinghe. 2018. Fast Ap-
Nikolaos Aletras, Dimitrios Tsarapatsanis, Daniel proach to Build an Automatic Sentiment Anno-
Preoţiuc-Pietro, and Vasileios Lampos. 2016. tator for Legal Domain using Transfer Learning.
Predicting judicial decisions of the european In Proceedings of the 9th Workshop on Computa-
court of human rights: a natural language pro- tional Approaches to Subjectivity, Sentiment and
cessing perspective. PeerJ Computer Science, Social Media Analysis, pages 260–265.
2:e93.
Sepp Hochreiter and Jürgen Schmidhuber. 1997.
Melonie de Almeida, Chamodi Samarawickrama, Long short-term memory. Neural computation,
Nisansa de Silva, Gathika Ratnayaka, and 9(8):1735–1780.
Amal Shehan Perera. 2020. Legal Party Extrac-
tion from Legal Opinion Text with Sequence to Oliver Wendell Holmes. 1920. The path of the law.
Sequence Learning. In 2020 20th International Collected Legal Papers, pages 167–173.
Conference on Advances in ICT for Emerging
Regions (ICTer), pages 143–148. IEEE. Sahan Jayasinghe, Lakith Rambukkanage, Ashan
Silva, Nisansa de Silva, and Amal Shehan Per-
Junyoung Chung, Caglar Gulcehre, KyungHyun era. 2021. Critical sentence identification in le-
Cho, and Yoshua Bengio. 2014. Empiri- gal cases using multi-class classification. In 2021
211
IEEE 16th International Conference on Indus- Chanika Ruchini Mudalige, Dilini Karunarathna,
trial and Information Systems (ICIIS), pages Isanka Rajapaksha, Nisansa de Silva, Gathika
146–151. IEEE. Ratnayaka, Amal Shehan Perera, and Ramesh
Pathirana. 2020. Sigmalaw-absa: Dataset for
V. Jayawardana, D. Lakmal, Nisansa de Silva, aspect-based sentiment analysis in legal opinion
A. S. Perera, K. Sugathadasa, B. Ayesha, and texts. arXiv preprint arXiv:2011.06326.
M. Perera. 2017a. Word Vector Embeddings
and Domain Specific Semantic based Semi- Peng Qi, Yuhao Zhang, Yuhui Zhang, Jason
Supervised Ontology Instance Population. In- Bolton, and Christopher D Manning. 2020.
ternational Journal on Advances in ICT for Stanza: A python natural language process-
Emerging Regions, 10(1):1. ing toolkit for many human languages. arXiv
preprint arXiv:2003.07082.
Vindula Jayawardana, Dimuthu Lakmal, Nisansa
de Silva, Amal Shehan Perera, Keet Sugath- Isanka Rajapaksha, Chanika Ruchini Mudalige,
adasa, and Buddhi Ayesha. 2017b. Deriving a Dilini Karunarathna, Nisansa de Silva, Gathika
Representative Vector for Ontology Classes with Rathnayaka, and Amal Shehan Perera. 2020.
Instance Word Vector Embeddings. In 2017 Rule-based approach for party-based sentiment
Seventh International Conference on Innovative analysis in legal opinion texts. arXiv preprint
Computing Technology (INTECH), pages 79–84. arXiv:2011.05675.
IEEE.
Isanka Rajapaksha, Chanika Ruchini Mudalige,
Vindula Jayawardana, Dimuthu Lakmal, Nisansa Dilini Karunarathna, Nisansa de Silva,
de Silva, Amal Shehan Perera, Keet Sugath- Amal Shehan Perera, and Gathika Ratnayaka.
adasa, Buddhi Ayesha, and Madhavi Perera. 2021. Sigmalaw PBSA-A Deep Learning Model
2017c. Semi-Supervised Instance Population of for Aspect-Based Sentiment Analysis for the
an Ontology using Word Vector Embedding. In Legal Domain. In International Conference
Advances in ICT for Emerging Regions (ICTer), on Database and Expert Systems Applications,
2017 Seventeenth International Conference on, pages 125–137. Springer.
pages 1–7. IEEE.
G. Ratnayaka, T. Rupasinghe, Nisansa de Silva,
Daniel Martin Katz, Michael J Bommarito II, and M. Warushavithana, V. Gamage, M. Perera, and
Josh Blackman. 2014. Predicting the behavior A. S. Perera. 2019a. Classifying Sentences in
of the supreme court of the united states: A gen- Court Case Transcripts using Discourse and Ar-
eral approach. arXiv preprint arXiv:1407.6333. gumentative Properties. ICTer, 12(1).
Julia Kreutzer, Isaac Caswell, Lisa Wang, Ah- Gathika Ratnayaka, Thejan Rupasinghe,
san Wahab, Daan van Esch, Nasanbayar Ulzii- Nisansa de Silva, Viraj Gamage, Menuka
Orshikh, Allahsera Tapo, Nishant Subramani, Warushavithana, and Amal Shehan Perera.
Artem Sokolov, Claytone Sikasote, et al. 2022. 2019b. Shift-of-Perspective Identification
Quality at a Glance: An Audit of Web-Crawled Within Legal Cases. In Proceedings of the 3rd
Multilingual Datasets. Transactions of the Asso- Workshop on Automated Detection, Extraction
ciation for Computational Linguistics, 10:50–72. and Analysis of Semantic Information in Legal
Texts.
André Lage-Freitas, Héctor Allende-Cid, Orivaldo
Santana, and Lívia de Oliveira-Lage. 2019. Pre- Gathika Ratnayaka, Thejan Rupasinghe, Nisansa
dicting brazilian court decisions. arXiv preprint de Silva, Menuka Warushavithana, Viraj Gam-
arXiv:1905.10348. age, and Amal Shehan Perera. 2018. Identifying
Relationships Among Sentences in Court Case
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaim- Transcripts Using Discourse Relations. In 2018
ing He, and Piotr Dollár. 2017. Focal loss for 18th International Conference on Advances in
dense object detection. In Proceedings of the ICT for Emerging Regions (ICTer), pages 13–
IEEE international conference on computer vi- 20. IEEE.
sion, pages 2980–2988.
Gathika Ratnayaka, Nisansa de Silva, Amal She-
Yihung Liu and Yen-Liang Chen. 2017. A two- han Perera, and Ramesh Pathirana. 2020. Effec-
phase sentiment analysis approach for judge- tive approach to develop a sentiment annotator
ment prediction. Journal of Information Sci- for legal domain in a low resource setting. arXiv
ence, 44. preprint arXiv:2011.00318.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Nils Reimers and Iryna Gurevych. 2019. Sentence-
Du, Mandar Joshi, Danqi Chen, Omer Levy, BERT: Sentence Embeddings using Siamese
Mike Lewis, Luke Zettlemoyer, and Veselin BERT-Networks. In Proceedings of the 2019
Stoyanov. 2019. Roberta: A robustly opti- Conference on Empirical Methods in Natural
mized bert pretraining approach. arXiv preprint Language Processing. Association for Computa-
arXiv:1907.11692. tional Linguistics.
212
Chamodi Samarawickrama, Melonie de Almeida,

Amal Shehan Perera, Nisansa de Silva, and
Gathika Ratnayaka. 2021. Identifying legal
party members from legal opinion texts using
natural language processing. Technical report,
EasyChair.
Chamodi Samarawickrama, Melonie de Almeida,
Nisansa de Silva, Gathika Ratnayaka, and
Amal Shehan Perera. 2020. Party Identification
of Legal Documents using Co-reference Resolu-
tion and Named Entity Recognition. In 2020
IEEE 15th International Conference on Indus-
trial and Information Systems (ICIIS), pages
494–499. IEEE.
Victor Sanh, Lysandre Debut, Julien Chaumond,
and Thomas Wolf. 2019. DistilBERT, a distilled
version of BERT: smaller, faster, cheaper and
lighter. ArXiv, abs/1910.01108.
Rafe Athar Shaikha, Tirath Prasad Sahua, and
Veena Anand. 2020. Predicting Outcomes of Le-
gal Cases based on Legal Factors using Classi-
fiers. Procedia Computer Science 167 (2020)
2393–2402.
Keet Sugathadasa, Buddhi Ayesha, Nisansa
de Silva, Amal Shehan Perera, Vindula Jayawar-
dana, Dimuthu Lakmal, and Madhavi Perera.
2017. Synergistic Union of Word2Vec and Lex-
icon for Domain Specific Semantic Similarity.
IEEE International Conference on Industrial
and Information Systems (ICIIS), pages 1–6.
Keet Sugathadasa, Buddhi Ayesha, Nisansa
de Silva, Amal Shehan Perera, Vindula Jayawar-
dana, Dimuthu Lakmal, and Madhavi Perera.
2018. Legal Document Retrieval using Docu-
ment Vector Embeddings and Deep Learning.
In Science and Information Conference, pages
160–175. Springer.
Ashish Vaswani, Noam Shazeer, Niki Parmar,
Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Łukasz Kaiser, and Illia Polosukhin. 2017. At-
tention is all you need. Advances in neural in-
formation processing systems, 30.
Bernhard Waltl, Georg Bonczek, Elena

Scepankova, Jörg Landthaler, and Florian
Matthes. 2017. Predicting the outcome of
appeal decisions in germany’s tax law. In
Electronic Participation, pages 89–99, Cham.
Springer International Publishing.
213

Court Cases Winning Stratergys

Uploaded by

Copyright:

Available Formats

Court Cases Winning Stratergys

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Court Cases Winning Stratergys

Uploaded by

Copyright:

Available Formats

The 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022)

Legal Case Winning Party Prediction With Domain Specific

Sahan Jayasinghe, Lakith Rambukkanage, Ashan Silva, Nisansa de Silva,

Department of Computer Science & Engineering, University of Moratuwa, Sri Lanka

Abstract a text rich domain with a growing need for

1.2 Legal Party documents contain a prescriptive values and

• Replaced abbreviations specific to legal 3.2 Model Architecture

A case document in US Supreme Court is 1. Has a negative impact towards petitioner

2. Has a positive impact towards petitioner

single-node layer which outputs the probabil-

Figure 2: RNN based Model

Figure 1: Winning Party Prediction Workflow

When the nature of a legal case is consid-

cates of the same configuration and are built

4 Experiments and Results

• Number of Attention Heads = 8 5 Conclusion and Future Work

Table 1: Winning Party Prediction Metrics

Viraj Gamage, Menuka Warushavithana, Nisansa

Chamodi Samarawickrama, Melonie de Almeida,

Bernhard Waltl, Georg Bonczek, Elena

You might also like