ShrutiRijhwani CV
ShrutiRijhwani CV
ShrutiRijhwani CV
io
Shruti Rijhwani Google Scholar
[email protected]
Education
Carnegie Mellon University Pittsburgh, PA
Ph.D. in Language Technologies, School of Computer Science 2018 – May 2022 (expected)
– Advisor: Graham Neubig
– Research Focus: Natural Language Processing
∗ Deep learning for multilingual and low-resource information extraction and sequence-to-sequence tasks.
∗ Select publications: TACL 2021 [1], EMNLP 2020 [6], ACL 2020 [7], TACL 2020 [9], AAAI 2019 [14]
– Awards: Forbes 30 Under 30 in Science 2022 [link]; Bloomberg Ph.D. Fellowship
WoRK ExpeRience
Bloomberg AI New York, NY
Research Intern (Mentors: Jing Wang, Daniel Preoțiuc-Pietro, Anju Kambadur) May – August 2020
– Project: Multilingual named entity recognition for noisy text outputs from OCR systems.
– Skills: Python, PyTorch
Page 1 of 4
AwaRds and HonoRs
• Forbes 30 Under 30 in Science [link] 2022
• Bloomberg Ph.D. Fellowship 2018–2021
• Honorable Mention, Society of Fellows in Critical Bibliography Essay Prize (awarded for [11]) 2021
• Carnegie Mellon University Graduate Research Fellowship 2016–2018
• Best Presentation Award, Student Research Symposium at Carnegie Mellon University 2018
• Best Poster Award, Machine Learning Project Symposium at Carnegie Mellon University 2016
Invited TalKs
• AI4Bharat Talk Series 2022
“No Language Left Behind: Unlocking Text Data for Under-Resourced Languages”
• Microsoft Research India NLP Talk Series 2022
“No Language Left Behind: Unlocking Text Data for Under-Resourced Languages”
• NeuralSpace Platform Launch Event 2022
“Cross-lingual Transfer for Low-Resource NLP”
• Featured Session, Grace Hopper Conference 2021
“Digitizing Endangered Language Texts: How NLP Can Help Language Revitalization”
• SIGTYP Lecture Series 2021
“Cross-Lingual Entity Linking for Low-Resource Languages” [video]
• George Mason NLP Research Group 2021
“OCR Post-Correction for Endangered Language Texts”
• Language Technologies Institute Colloquium at Carnegie Mellon University 2020
“Zero-shot Neural Transfer for Cross-lingual Entity Linking” [video]
• University of Utah Data Science Seminar 2020
“Entity Linking for Low-Resource Languages” [video]
• Microsoft Research India Podcast 2020
“Building a Career in Research Through the MSR India Research Fellow Program” [podcast]
• Natural Language, Dialog, and Speech Symposium at the New York Academy of Sciences 2019
“Temporally-Aware Named Entity Recognition”
ReseaRch MentoRing
Masters Students
• Adithya Pratapa
Project: Morphosyntactic Evaluation of Generated Texts; published at EMNLP 2021 [3].
• Rosaline Su
Project: Dependency Induction with Visual Perception; published at CoNLL 2021 [4].
• Shuyan Zhou
Project: Entity Linking for Low-Resource Languages; published at TACL 2020 [9] and DeepLo 2019 [16].
• Aman Madaan
Project: Practical Comparable Data Collection for Low-Resource Languages via Images; published at PML4DC [13].
• Yu-Hsiang Lin, Chian-Yu Chen, Jean Lee, Zirui Li, Yuyan Zhang
Project: Choosing Transfer Languages for Cross-Lingual Learning; published at ACL 2019 [15].
Undergraduate Students
• Lindia Tjuatja
Project: Transfer Learning for OCR Post-Correction; published at WiNLP 2021 [5].
Page 2 of 4
Teaching and Academic SeRvice
• Teaching Assistant at Carnegie Mellon University
Machine Translation and Sequence-to-Sequence Models (11-731) Fall 2019
Search Engines (08-710) Spring 2018
Data Mining (08-711) Spring 2018
• LTI Diversity, Equity, and Inclusion Committee at CMU, 2021-2022
• Organizer, Workshop on Computational Methods for Endangered Languages at ACL 2022
• Chair, Student Research Workshop at ACL 2020
• Organizer, CMU SCS Graduate Application Support Program, 2020
• Diversity and Inclusion Committee at NAACL 2019
• Mentoring
CMU Language Technologies Mentoring Program (for new graduate students; 2021), CMU Graduate Application
Support Mentor (2020, 2021), CMU AI Mentoring Program (for undergraduates; 2019, 2020, 2021)
• Reviewing
AAAI 2022, AAAI 2021, ARR 2021, EACL 2021, NAACL 2021, ACL 2021, AmericasNLP 2021, AAAI 2020, HAMLETS
2020, LREC 2020, EMNLP 2020, *SEM 2020, AACL SRW 2020, AfricaNLP2020, TALLIP 2019, CALCS 2018
Publications
[1] “Lexically-Aware Semi-Supervised Learning for OCR Post-Correction” [pdf]
S. Rijhwani, D. Rosenblum, A. Anastasopoulos, G. Neubig
Transactions of the Association for Computational Linguistics (TACL), 2021.
[9] “Improving Candidate Generation for Low-resource Cross-lingual Entity Linking” [pdf]
S. Zhou, S. Rijhwani, J. Wieting, J. Carbonell, G. Neubig
Transactions of the Association for Computational Linguistics (TACL), 2020.
Page 3 of 4
[10] “AlloVera: A Multilingual Allophone Database” [pdf]
D. R. Mortensen, X. Li, P. Littell, A. Michaud, S. Rijhwani, A. Anastasopoulos, et al.
Language Resources and Evaluation Conference (LREC), 2020.
[11] “Damaged Type and Areopagitica’s Clandestine Printers” [pdf] [press coverage]
C. N. Warren, P. Wiliams, S. Rijhwani, M. G’Sell
Milton Studies, 2020.
[12] “A Summary of the First Workshop on Language Technology for Language Documentation and Revitalization”
[pdf]
G. Neubig, S. Rijhwani, A. Palmer, J. MacKenzie, H. Cruz, X. Li, M. Lee et al.
First Joint SLTU and CCURL Workshop, 2020.
[13] “Practical Comparable Data Collection for Low-Resource Languages via Images” [pdf]
A. Madaan, S. Rijhwani, A. Anastasopoulos, Y. Yang, G. Neubig
Practical Machine Learning for Developing Countries Workshop (PML4DC), 2020.
[18] “Estimating Code-Switching on Twitter with a Novel Generalized Word-Level Language Detection Technique”
[pdf]
S. Rijhwani, R. Sequiera, M. Choudhury, K. Bali, C. S. Maddila
Annual Meeting of the Association for Computational Linguistics (ACL), 2017.
[19] “Does the Geometry of Word Embeddings Help Document Classification? A Case Study on Persistent
Homology-Based Representations” [pdf]
P. Michel∗ , A. Ravichander∗ , S. Rijhwani∗
Second Workshop on Representation Learning for NLP, 2017.
[20] “Code-Switching as a Social Act: The Case of Arabic Wikipedia Talk Pages” [pdf]
M. Yoder, S. Rijhwani, C. Rosé, L. Levin
Second Workshop on NLP and Computational Social Science, 2017.
[21] “Understanding Language Preference for Expression of Opinion and Sentiment: What do Hindi-English Speakers
do on Twitter?” [pdf]
K. Rudra, S. Rijhwani, R. Begum, K. Bali, M. Choudhury, N. Ganguly
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2016.
[22] “Experiments with Cross-lingual Systems for Synthesis of Code-Mixed Text” [pdf]
S. Sitaram, S. K. Rallabandi, S. Rijhwani, A. W. Black
Ninth ISCA Speech Synthesis Workshop (SSW), 2016.
Page 4 of 4