Proposed PHD in Data Science
Proposed PHD in Data Science
Proposed PHD in Data Science
By:
Doctoral Program Committee, HDSI (AY 2019)
Gal Mishne, Virginia De Sa, Yian Ma, Jingbo Shang, Vineet Bafna, Lily Xu, Michael Holst,
Rayan Saab, Armin Schwartzman, George Sugihara, Dimitris Politis.
Contacts:
Academic:
Rajesh K. Gupta
Director, Halıcıoğlu Data Science Institute (HDSI)
(858) 822-4391
[email protected]
Administrative:
Yvonne Wollmann
Student Affairs Manager
Halıcıoğlu Data Science Institute (HDSI)
(858) 246-5427
[email protected]
Version History:
April 12, 2020: Version 1.1 submitted for preliminary review by HDSI faculty.
Oct 6, 2020: Version 2.0 submitted for administrative review.
Oct 12, 2020: Version 2.2 updated with inputs from HDSI faculty council.
Nov 30, 2020: Version 3.1 submitted to Graduate Council for review.
Jan 28, 2021: Version 4.0 revised and updated based on feedback from the Graduate Council.
Online Link: https://bit.ly/HDSI-PHD
1
A Proposal for a Program of Study in Data
Science leading to a degree in
Doctor of Philosophy in Data Science (PhD/DS)
Executive Summary
The Halıcıoğlu Data Science Institute proposes a doctoral degree program in “Data Science”
(PhD/DS) to serve the need for advanced graduate studies in the area of Data Science, a field in
which HDSI currently offers a well-received Bachelor of Science degree as a part of its academic
mission “to promote a unified campus-wide approach to research and teaching in Data Science.”
The proposed doctoral program will join similar degree programs coming up across the country
as the emerging field continues to define its core intellectual thrusts and its academic community.
The nascent field of Data Science spans mathematical models, computational methods and
analysis tools for navigating and understanding data in a broad range of application domains.
The scientific community in the area is accordingly drawn from many different existing
disciplines driven in the near term by the immediate demand and limited success of applying
data science methods and tools in application areas such as information technology,
communications, financial markets. These early successes have led to a demand for data
scientists in a whole range of industries from drug discovery to healthcare management, from
manufacturing to enterprise business processes as well as government organizations with the
expectation to do “data-driven” tasks such as the ability to create mathematical models of data,
identify trends and patterns using suitable algorithms and present the results in an effective
manner. However, there is also a growing realization that scientific knowledge is not enough for
data scientists who must also demonstrate awareness of ethical responsibilities in their work and
outcomes.
The goal of the doctoral program is to teach students knowledge, skills and awareness
required to perform data-driven tasks, and using this shared background, lay the foundation for
research that expands the boundaries of knowledge in Data Science. To achieve these goals, the
graduate program is structured as a set of three key requirements related to coursework,
examinations and dissertation compliance. The course preparation consists of breadth and depth
requirements of 48 units taken for letter grade and 4 units of satisfactory completion of
professional preparation courses. After a required preliminary advisory assessment at the end of
first year, the examination requirements consist of a research qualifying examination and
dissertation defense examination. The dissertation compliance requirement approved thesis
document that specifically meets reproducibility requirements. The implementation plan is
designed to open the program for internal transfers in Fall 2021 with a formal announcement and
new admissions starting Fall 2022.
The nascent discipline of Data Science currently draws researchers from diverse fields that share
a quantitative intellectual tradition, from mathematics, computer science, engineering, physical
sciences, and quantitative social sciences. Naturally, the proposed program builds upon
intellectual traditions and training in disciplines that currently constitute the primary drivers of
knowledge advances in Data Science: computer science, mathematics, statistics, and electrical
engineering. While these are the disciplines from which the majority (but not all) of the current
HDSI faculty are drawn, we are keenly aware of the importance of various use cases that are
principally driving adoption of data science advances in practice including engineering,
medicine, governance, journalism, and archeological discovery. More importantly, beyond use
cases scholarship in Data Science must also demonstrate awareness of ethical responsibilities for
the direct role data has on our social, cultural and personal lives.
This places a significant academic challenge on the HDSI faculty to create a program with a
well-defined intellectual core that invites and cultivates diverse intellectual traditions. Such a
program can not simply be a collection of diverse existing topics or multiple courses and degrees
in sciences and humanities stacked on an individual, or specialization of an existing program.
Instead, a streamlined and integrated approach to curriculum is needed that is accessible to
students drawn from different undergraduate degree backgrounds. The HDSI faculty have
addressed the challenge of program accessibility in its Masters of Science (MS) program recently
approved by the Graduate Council of the Academic Senate. Building upon the MS/DS program,
the doctoral program is structured to cultivate both a generalist’s penchant for persistence in
results validated by proofs, and robust experimentation as well as a specialist’s view of practical
impact validated by real-world demonstrations, user studies and trials.1
1
David Epstein in “Range: Why generalists triumph in a specialized world” (Riverhead Books, 2019) on
the importance of broad thinking and diverse experiences.
The educational objectives of the proposed degree program include knowledge and skills
expected of all Data Science graduates namely: (a) collect raw data from various sources and
convert this raw data into a curated form suitable for computational modeling and analysis (e.g.,
its use in designing experiments); (b) understand learning algorithms and how to appropriately
use them in a given domain by developing effective optimization methods; (c) interpret the
results of these algorithms and iteratively drill down into the data, perform analysis, visualize
results and carry out scientific enquiry appropriate for the targeted domains.2 These educational
goals are to be achieved through required courses structured into breadth, depth and elective
groups. A successful completion of the doctoral degree program will require a demonstrated
advance in the state-of-the-art in data science evidenced through traditional means of academic
research success: peer reviewed publications, software (tools) or system artifacts and evidence of
generalizability and reproducibility documented in a well-written and approved dissertation
document. These requirements are discussed in detail in Section 2.10.
A successful execution of the proposed program also induces imperatives and resource
commitments by the Institute that are discussed later in this document. These include effective
partnership with academic units, institutes and centers for maximum exposure of potential
domain experts to graduate student training including any rotation programs, necessary teaching
capacity by HDSI faculty for timely graduation of students, and essential advising and
counseling services for the students to appropriately guide them towards graduation and into
post-graduation careers.
2
These conclusions have been arrived at through discussions within the HDSI faculty council informed by
national debate on this subject organized in a series of meetings by the National Academy of Sciences,
Division of Engineering and Physical Sciences under the “Roundtable on Data Science Post-Secondary
Education”, 2016-17.
Data, collected or synthesized, is the primary means of such knowledge exploration and
integration. Historically, data analysis has been a domain of Statistics, a field that has reached
across multiple centuries. The growth of scientific enquiry especially through the eighteenth
century post-Napoleon period of quantitative scientific discovery relied upon calculus and
probability to understand measurement data. Statistical analysis spread widely to many areas of
human enquiry, in particular, areas of social sciences such as economics, clinical psychology etc.
These efforts contributed to significant growth in statistics.
Statistics departments are now common in most universities. At UC San Diego, while there is no
department of Statistics, statistics faculty are part of the Department of Mathematics and HDSI
on the General Campus, and also in the Division of Biostatistics in the School of Medicine. As
Statistics matured with strong foundational results and practical methods influencing many
application domains, more recent advances in computing hardware, software, engineering of
sensory devices, etc. enabled not only volumetric growth in data but also in computational means
to handle such data. Recent advances in algorithmic processing, machine learning, have
significantly advanced computational means for data processing. Early efforts in defining
computational means of handling large data sets and streams placed a new field of Data Science
at the intersection of statistics and computer science3 while others characterized it as a growth
area of Statistics with strong applications focus.4,5
While strong footprints of Computer Science, Mathematics and Statistics can be seen in its
origins, Data Science has emerged as a discipline in its own right to define either the core
problems of the sciences and society, or fundamental theories and underlying methods and tools
to solve these problems. Many of these problems concern reasoning, spanning intellect and
knowledge domains that are assisted by computing machines, thus referred to as machine
intelligence or artificial intelligence (AI). An independent and rich tradition in signal processing,
information theory, detection and estimation theory from Electrical Engineering has contributed
significantly to modern automation methods in AI. While AI has caught the imagination of
computer scientists and mathematicians since the early years of computing machines nearly half
a century ago, technological advances have only recently made it possible to realize answers to
some of the pressing questions such as:
3
David Blei, Padhraic Symth, “Science and Data Science”, PNAS August 2017.
4
Bin Yu, “Let us own Data Science”, IMS Presidential Address, October 2014.
5
David Donoho, “50 Years of Data Science,” Journal of Computational and Graphical Studies, Dec 2017.
As we are beginning to provide answers to such questions, typically in the form of new software
and systems in various application domains such as improved automated diagnostics from
radiological images, we are beginning to face an entirely new set of sophisticated questions such
as:
● Artificial Sentience: What are the ethical, moral and business considerations when an
algorithm learns by observations and produces new products and services? Who are the
ultimate beneficiaries of these intellectual or material products: for instance in a
healthcare setting, is it the patient or the doctor being observed, the business creating
new services or the machine itself?
The list of such questions is mind-boggling and touches pretty much every area of human
enquiry6. As an academic institution, fortunately our focus is limited to how knowledge advances
in the emerging domain will be achieved and how will we create a talent pool for the emerging
area? As mentioned earlier, the academic areas that have made the most early advances in
methods, tools and systems to perform such data analysis are statistics and computer science
especially in the context of understanding brain and cognition. Such talent is typically found in
the departments of electrical engineering and computer science, cognitive science, as well as
mathematics, statistics in natural sciences.
New sensing, data-collection and computing devices have also brought together these domains,
enabling practitioners to relax assumptions about the nature of the process that generates the data
and use real-life datasets instead. Thus, the analysis methods can be directly interfaced with
real-life systems to actually capture and analyze real data (and sometimes in real time as well).
Without the axioms underlying data generation processes, the mathematics and statistics required
to arrive at robust answers analytically become exceedingly complex. It is also precisely in these
circumstances where computing steps in and provides us computational models and solutions
that can deliver practical answers. Yet, neither -- mathematical analysis or computational models
-- can provide generic answers as problem-solving methods (and tools) that individual
6
Recently, a number of attempts to define Grand Challenge problems in the Data Science area have
been made. Prominent among these are essays by Jeannette Wing, Bin Yu, and Xuming He & Xihong
Lin. https://hdsr.mitpress.mit.edu/pub/d9j96ne4/release/2
In this context, UC San Diego provides a rich tapestry of domain experts starting with perhaps
one of the most complex of application domains – the human brain and mind – and spread across
the triumvirate of general campus, health and marine sciences. Over the past four years the
Institute leadership has engaged deeply with a broad community of nearly 500 researchers across
the campus through many meetings in small group settings. These efforts yielded a core group of
founding faculty who came together and organized their ideas in data science. There are now
over two hundred faculty affiliated with HDSI drawn from all schools and divisions, and nearly
all departments who participate in various HDSI events, including its weekly Friday seminars
during the academic year. HDSI affiliates are organized into six research clusters shown below
and on HDSI website under Research:
Each cluster consists of a number of interested groups where the researchers and practitioners
come together for joint research efforts in response to various research funding opportunities,
engagements with the industry, etc. HDSI provides personnel and material support to the entire
community for both proposal preparation as well as industry engagements. A complete list of 44
different research areas covered by the HDSI affiliates is available on the website.
Over the past two years, HDSI has recruited core faculty members, as well as drawn a number
of existing faculty through partial appointments into building an active governing body, Faculty
Council. As of this writing the faculty council consists of 11 full-time faculty members, 13
partially-appointed faculty members and 24 formally appointed faculty with no teaching
responsibilities at HDSI (i.e., the so-called, 0% appointments). A complete list of faculty
members and their specializations is provided in Section 1.8.
Over the past year, the HDSI faculty council has worked to identify broader research
challengesthat are central to Data Science as a field. A compilation of all these efforts reduced
core areas of research into following eight research themes that form the scope of the HDSI
academic programs and continue to drive our faculty recruiting strategy listed below.
3. Data This theme covers the entire gamut of machines and systems that
Infrastructure enable us to curate, organize, visualize and navigate large datasets,
identify structure in such data; design, deployment and security of
systems and their software stack including new programming
paradigms, languages and methods.
5. Digital Digital humanities is an umbrella term that in UCSD context spans both
Humanities social sciences, arts and humanities. The research topics include
privacy, public policy, ethics, computational social science,
computational linguistics, and philosophy of information.
6. Systems and A large and heterogeneous group of topics from algorithms and
Applications demonstrable systems, their use in specific domains from medical
signal processing, economics, geospatial data systems to political and
economic systems as well as applications in cyber-physical systems
and robotics.
In short, the demand for doctoral training in Data Science is high and continues to grow with the
need for future leaders in Data Science both in research and education. Further, the program not
only serves the HDSI mission of educating talent in the area of Data Science, but also serves as a
vehicle for continued engagement and proliferation of Data Science training across various
graduate programs through foundational, core and elective course offerings that engage domain
experts into the field of Data Science (See Section 1.5 on HDSI Strategy for Partnership with
Other Academic Units). The program provides an excellent means to create new educational
opportunities for students, especially for underserved and economically-disadvantaged student
populations who can benefit from graduate scholarships offered by HDSI as a part of its core
foundation-supported activities.
Proposal submitted for UCSD Graduate Council, Reviewed November 30, 2020
by mid-December 2021
Program open for admissions (internal transfers only) Summer 2021 (July)
Program announced for new admissions Early Fall 2021 (Application Deadline
Jan 15, 2022)
How do we partner with other academic units? Over 200 founding faculty as a part of HDSI
Faculty Affiliate program are a starting point for a deeper engagement with HDSI and its
governance. HDSI's relationships with other academic units are governed by jointly appointed
faculty members on the HDSI Faculty Council with specific roles in keeping campus units
apprised of HDSI plans and progress through its weekly meetings. HDSI programs are overseen
by standing committees of the HDSI Faculty Council. Before hiring any of our core full-time or
part-time HDSI faculty, it has been the role of the HDSI Faculty Council to define and develop
both the Data Science curriculum and research directions. The HDSI Faculty Council now
consists of faculty drawn from all across UC San Diego: the home departments of Council
members are in Engineering (e.g., Computer Science, Electrical Engineering), Physical
Sciences (e.g., Mathematics, Physics), Arts & Humanities (Philosophy, Visual Arts), Social
Sciences (e.g., Cognitive Science, Communication, Political Science), Medicine (e.g.,
Biostatistics and Bioinformatics, Radiology, Pediatrics), the Scripps Institution of
Oceanography, and the Supercomputer Center. With such a diverse background to draw upon,
the HDSI Faculty Council has managed to create a unifying vision for Data Science, and to steer
the Institute towards a future that is based on interdisciplinary collaboration with all units on
Campus.
Following senate regulations, the Faculty Council has developed a detailed set of Bylaws
[attached with the proposal] to facilitate the governance and growth of HDSI as an academic
unit. The HDSI Faculty Council remains open to new faculty interested in joining HDSI via a
well-defined review, advise and consent process. Using this governance structure, HDSI has
successfully conducted six joint searches in 2019, and four in 2020. The Faculty Council
currently consists of 48 faculty members:
● 11 faculty members with 100% appointed in HDSI (2 Full Professors, 1 Associate, and 8
Assistant Professors)
● 13 faculty members with joint appointments with another department (Communication,
Computer Science and Engineering, Neurobiology, Bioengineering, Mathematics,
Political Science, Biostatistics, Philosophy)
● 24 faculty members with current or proposed 0% appointments in HDSI. These are
among the original faculty council members who have guided recruiting. All of them
will eventually transition to 0% appointments with HDSI as the ongoing process
completes.
We note that in its proposed three-year hiring plan, HDSI has requested the largest number of
joint searches among all divisions on the general campus. Partners of HDSI can be found in
Program Engagements: The proposed doctoral program lists a number of new core courses that
are also part of our MS program and designed to be broadly accessible. Some of the proposed
courses will be taught by faculty in other departments, and thereby cross-listed with courses
outside HDSI. When and if relevant courses are available in other UCSD departments, students
will be encouraged to enroll in them. Notably, HDSI has partnered with the Computer Science
and Engineering Department to create an Online M.S. program that was recently approved by
the Graduate Council, and that will serve as an excellent on-ramp preparation for a few selected
students into the Ph.D. program. Indeed, HDSI currently offers scholarships to students that
cover the cost of attending online courses taught by HDSI-affiliated faculty.
Among the related programs that partially cover some of the topical areas of the PhD/DS
program are the doctoral programs in Computer Science by CSE, Electrical Engineering by ECE,
Statistics by Math. More precisely, there are specializations of these programs that feature
elective courses on Machine Learning, Statistical learning and inference. The proposed program
is directly and entirely dedicated to Data Sciences and differs from existing programs in two
material ways: in the breadth of the student population it serves and in the scope of the
transdisciplinary training it provides as discussed below:
(a) The proposed degree is targeted to students drawn from a wide variety of backgrounds in
their undergraduate education in an effort to serve a diverse group of learners interested in Data
Science. This is in contrast to existing programs that either target a different population of
students or focus on subject areas specific to their domain. For instance, a doctoral degree in
Computer Science targets to admit students “with a strong academic background in computer
science and engineering and/or a related field.” Students in the PhD program must select four
courses from ten different breadth areas that include Artificial Intelligence and Robotics.
Similarly, the Machine Learning and Data Science (ML/DS) specialization in the ECE Ph.D.
program is one of 13 specializations, and one of the three “impacted” programs, that is, capacity
controlled areas along with Circuits and Robotics that are restricted to ECE students with a
required Bachelor’s (and optionally MS) degree in Engineering, Sciences or Mathematics.
Mentored by notable researchers in the areas of information and coding theory, statistical signal
processing, robotics and controls, the doctoral specialization provides deep insights into
intellectual underpinnings for data analytics and machine learning and its various application
domains. Students in the Ph.D. program are required to meet 48 units of course work structured
into three sets of courses that cover basic knowledge of programming, linear algebra, probability
and statistics, a set of required courses and another set of technical electives. The nature of these
courses as well as CSE courses, and their coordination with HDSI courses are discussed further
below. Among other programs, the department of mathematics offers Ph.D. degrees in
mathematics with specialization either in Computational Science (CSME) or Statistics.
(b) The courses offered by the HDSI graduate programs will be available to the students in
related programs, and in fact, will be taught by faculty jointly appointed with other departments
representing domain knowledge. For instance, a faculty member jointly appointed with HDSI
and Bioengineering is planning to offer a graduate course in Biomedical Data Analysis. Such a
course will constitute a core requirement in the Bioengineering graduate program as well as a
HDSI cross-listed elective course for students with background and interest in biology and
engineering. Thus, by serving as a catalyst for creation of new courses and student support,
HDSI seeks to enhance the overall capacity of UC San Diego in serving educational and training
needs of a growing population of students whose interests extend beyond the offerings of the
existing programs.
Impact on Existing Programs: We do not anticipate any adverse material impact on existing
PhD programs in CSE, ECE, Mathematics or Cognitive Science. Their respective specializations
in Artificial Intelligence, Machine Learning/Data Science and Statistics are part of much larger
PhD programs among a dozen or so other specializations and are heavily oversubscribed by
students in their respective departments with class enrollments routinely over 200 students.
On the contrary, we expect a positive impact from increased participation of various academic
units in Data Science related subject areas (such as Computational Biology, Computational
Chemistry, Computational Social Science) that are currently inaccessible to graduate students
from other departments despite their need and demand, an assessment supported by letter from
the dean of division of social science. The proposed PhD/DS program expands the pool of
applications by enabling students from diverse backgrounds such as Economics, Cognitive
There should also be a positive impact on certain specialization area courses offered by the
partner departments: some of the data science graduate students will increase enrollments in
these classes most of which are cross-listed with DSC courses and/or taught for HDSI or
jointly-appointed HDSI faculty in the Math, CSE, ECE departments. We expect to see a spread
of data science graduate students across half a dozen or so available area specializations.
6. Contributions to Diversity
Our vision for how the proposed program will advance UC’s goals for diversity is informed by
the founding document “HDSI Strategy for Inclusive Excellence.” It is a living document
available online at https://bit.ly/HDSI-Diversity that will be updated with information available from
Diversity Dashboard and our surveys.
The nascent nature of the HDSI organization provides us with additional flexibility to
incorporate Equity, Diversity and Inclusion (EDI) goals into the DNA of the new institution, that
is, embedded in all our processes and actions ab initio with these goals. In particular, the three
core tenets of “access & success, climate and accountability” are vigorously pursued. The
Institute has taken three concrete steps towards EDI goals that will directly impact its programs
-- including the proposed graduate program -- in the coming years. First, HDSI faculty recruiting
is carefully planned with anti-bias training required from the entire faculty council before any
faculty search is initiated. Second, every faculty member hired into HDSI has been provided with
$30K in support of EDI goals. These funds are held by the Institute and released for specific and
approved activities that advance diversity goals. The faculty members are encouraged to pool
these resources and seek additional matching resources from the Institute to launch substantive
programmatic activities.
Beyond this “access and success” part, the third element of HDSI strategy directly addresses
climate and accountability. HDSI has proposed and eventually succeeded in using its endowment
resources to identify and recruit a full-time coordinator for broadening participation
(https://bit.ly/HDSI-BPC). While a permanent position is pending and yet to be created, working
with the administration we have been able to recruit Saura Naderi who has been tasked full-time
in developing measures, establishing metrics and directing the activities related to EDI goals7.
The personnel and EDI share-pool mentioned earlier empowers HDSI faculty to put to action
concrete plans -- including seminars, enrichment activities and additional counseling, etc that are
implemented, tracked and accounted for. With increased direct attention to BPC (broadening
participation in computing) plans in research proposals to agencies such as the NSF, HDSI’s
7
https://datascience.ucsd.edu/about/dei/ website provides a starting point for engagement with HDSI
personnel who are dedicated to achieving EDI goals of the Institute.
Beyond the three elements of the HDSI EDI strategy mentioned earlier, the proposed doctoral
program will also provide an excellent vehicle for deploying our fellowship support to encourage
URM participation as well as making the program accessible for academically strong but
economically disadvantaged students to ensure the program provides an affordable pathway for a
broad and diverse student population. Among the programs we have already devised and
launched8 are scholarships for graduate students (a commitment of $600K for the current year,
and likely to rise in coming years), as well as access to learning outside the classroom through
mechanisms such as EdX micromasters programs where the Institute offers financial support to
all students interested in taking on-ramp classes at their own pace. This on-ramping is a critical
element of our strategy to leverage the existing micromasters program. It enables students from
different backgrounds, who may otherwise be rejected from the doctoral program, to demonstrate
that they can do well in the program at low or no cost through our scholarship to undergraduate
students across the campus.
Beyond strategic decisions and choices to appoint personnel, devote resources, we are cultivating
a climate for faculty to conceive of new ideas that directly contribute to inclusive excellence.
Among the measures that we seek to improve are the participation rates of women and
underrepresented minorities in our classes and degree programs, retention rates, progress towards
graduation and placement results. Institutional role models and mentoring are key means that are
already implemented in the HDSI foundations and shall remain a cornerstone of HDSI faculty
recruiting and leadership advancement.
8
Please see page 60-61 of our annual report at http://bit.ly/HDSIfirstyear.
To meet the growing scope and demand for data science, a number of UC campuses are offering
or planning to offer undergraduate and graduate programs in data science while at the same time
building new academic departments and schools of data science to support the nascent field.
Prominent among these is UC Berkeley’s Division of Computing, Data Science and Society
(CDSS) consisting of the departments of Statistics, Electrical Engineering and Computer Science
(which is jointly part of Engineering and CDSS), and the School of Information. CDSS at UC
Berkeley provides specialization in the form of a “Designated Emphasis in Computational and
Data Science and Engineering” to existing Ph.D. programs through curriculum specialization in
the individual Ph.D. programs.
At UC Davis and UC Irvine, the departments of Statistics have taken leadership in data science
as a part of their existing degree programs in Statistics, especially due to organizational
structures of the underlying departments (such as Statistics being part of a School of Information
and Computer Science at UCI). While the current focus in these emerging academic units is on
bachelor’s and master’s programs new doctoral programs and specializations are beginning to
appear.
To a first order, none of the specializations of existing Ph.D. degree programs on other campuses
prepare students for a research career in Data Science, an important objective of HDSI’s
proposed doctoral program. More importantly, no program provides the wide accessibility to
students from diverse educational backgrounds to Data Science and its applications. Indeed, the
structure of the proposed graduate program consisting of foundation and core courses makes it
possible for HDSI to offer a single program that admits students with training as broad as
engineering, sciences, social sciences, business and humanities and produces students with a
graduate degree in Data Science with well-defined research specializations depending upon the
application domain of the dissertation research that make it easier for them to pursue different
career paths in targeted domains.
Nationally, New York University (NYU) offers Ph.D. in Data Science since 2017 with five
required courses in programming, probability and statistics, big data information and
representation and nine elective courses drawn from the data science areas of machine learning,
artificial intelligence and statistics. Columbia University offers a data science specialization of
Computer Science, Electrical Engineering, Industrial Engineering & Operations Research and
Statistics doctoral programs. Among other notable national programs are Yale University’s
Statistics and Data Science program, PhD in Data Science and Operations offered by the
Marshall School of Business at the University of Southern California, and PhD in Statistics and
The Institute faculty, and members of its faculty council, are listed below with their annual
teaching workload in HDSI. The Graduate Admissions and Graduate Program are among the
standing committees of the HDSI Faculty Council. These committees are supported by a
full-time academic coordinator as well as an assistant director of training programs to ensure
program operation and academic advising of the graduate students.
In addition to the 29 faculty members listed above (15.5 FTE), the Institute is also planning to
fill one teaching faculty (LSOE) and one advancing faculty diversity (AFD) position in the
current recruiting season. It anticipates additional 1-2 new faculty members to join the Institute
for a total faculty strength of 16-17 FTE including 3 FTE LPSOE and 8 U18 lecturers.
Together, these provide a capacity of 51-52 courses annually by the current ladder-rank faculty in
addition to 5 cross-listed courses as well as teaching by U18 continuing lecturers for a combined
total annual capacity of 58-74 courses. The current Data Science undergraduate program
accounts for 35 courses/sections per year. Conservatively, the Institute has the capacity to offer
6-10 graduate courses per quarter that enables it to adequately serve the proposed doctoral
program.
Financial Support for Doctoral Students in the program will follow the current campus support
model consisting of direct support from the Graduate Division (formerly block grant), Graduate
Student Research (GSR) support from grants and contracts, and teaching assistant (TA)
employment funding. The Graduate Division support typically corresponds to one year of
non-employment based support including a levelized tuition (for resident and non-resident
students). The total amount of this support is based on historical enrollment and a function of
overall contracts and grant activity and is a part of the annual budgeting process. The current TA
support provides for 8-10 TA FTEs who will be drawn from the graduate student pool in Data
Science.
To ease the transition process, for the first five years of the program launch, the Institute will set
aside 20% of our annual graduate student funding liability in the foundation accounts as a
contingency measure to ensure support continuity in the face of any short-term GSR funding
shortfall as the Institute faculty ramp up extramural funding support through competitive grants.
Thus, we are confident that using a combination of resources from the Graduate Division, GSR,
TA and our foundation accounts, we will be able to guarantee five years of guaranteed funding to
every doctoral student in the program. In the steady state, we plan to deploy our fellowship
support to encourage URM participation as well as make the program accessible for
academically strong but economically disadvantaged students to ensure the program provides an
affordable pathway for a broad and diverse student population.
HDSI's existing faculty members have significant experience in building, launching and
directing graduate programs in the CSE, Bioengineering, Mathematics, Cognitive Science and
Biostatistics units. The academic coordinator position was specifically designed with a view
towards broadening participation, improving student learning experience and career placement
outcomes. The Institute is planning to be physically co-located in the same building with a
segment of the Teaching and Learning Commons (TLC) starting 2021. This colocation will
present us with additional opportunities for engagement with UC San Diego’s expertise in
improving learning experience and outcomes for all our students.
2. Program
1. Undergraduate Preparation for Admission into the Doctoral Program
The HDSI faculty have spent significant time discussing and formulating a plan that enables
maximum participation of interested students into the envisioned graduate program. Arguably,
this is the chief distinction of the campus-wide Data Science program. We are also keenly aware
of our primary obligation to ensure successful and timely completion of the graduate degree
program given the significant level of individual and institutional investment in terms of time
and resources. Balancing these requirements has required us to structure the incoming stream of
students into essentially three broad categories:
1. students who come with preparation in computing and/or information sciences at a level
to master algorithmic programming and cloud computing skills;
2. students with preparation in mathematics and statistics at a level to master probability and
statistical methodology necessary for meaningful data analysis;
3. students who enter the program from other areas of science that rely upon collecting and
analyzing observational or experimental data in order to advance scientific
understanding. These are students with a degree in natural sciences such as physics,
We note that these are broad and overlapping categories. Even when students come prepared in
both advanced computing and mathematics/statistics, Data Science research problems challenge
them to apply these skills meaningfully in diverse applications to advance knowledge.
Graduate admissions process will use text analysis methods to automatically sort and bin
admitted students into three pools and thus drive the subsequent advising process including prior
communication to the students regarding their preparation options using online and other offers
by UC San Diego and other organizations. HDSI student advising will recruit an advisor
dedicated to the graduate program advising and will develop pathways for newly admitted
students to take specific upper-level undergraduate courses from different areas, in order to
solidify their backgrounds when/if there is some perceived weakness.
2. Admission Requirements
A Ph.D. degree in Data Science is an advanced degree that prepares students for leadership in
data science research in academia, industry or civic organizations. To be successful in this
program, the students must have a background in quantitative analysis typically seen in degree
programs with substantial mathematical preparation and programming skills. Course work or
equivalent experience in programming, calculus, probability and statistics are required.
3. Foreign Language
A demonstrated proficiency in English is expected for international applicants. Foreign language
proficiency is not required for this degree.
9
http://senatestage.ucsd.edu/Operating-Procedures/Senate-Manual/Regulations/715
● Annual review of the progress in the doctoral program by the graduate committee of
HDSI faculty council;
● Teaching requirements including completion of teacher training course (DSC 599) and
minimum of one quarter of teaching experience at half-time (50%) appointment as a
Teaching Assistant over the course of the degree program;
Time Limits: Assuming a student has no deficiencies and is full-time enrolled in the program,
our normative length of time pre-candidacy is 3 years and 2 years in candidacy. Extension of
total time from matriculation to degree beyond six years will require petition and approval from
the graduate division. HDSI has instituted several mechanisms and incentives to ensure
expeditious time-to-degree. These include a full-time graduate students advisor in HDSI
Graduate Affairs11, preliminary assessment examination and advisory in the first year, and annual
10
This exception is stipulated in view of a large number of formally appointed faculty on HDSI faculty
council (at 25% or 0%) drawn from different departments and divisions thus making it impossible for a
student to find an “outside” faculty member in some areas.
11
Academic and career advising are among the highest profile investments by HDSI and stipulated
explicitly as a part of the founding gift agreement for the Institute. We plan to build a portal and services
5. Plan of Study
The program plan will follow Plan A consistent with the Regulation 715 of the San Diego
division of the Academic Senate.
Before admission to the candidacy for the Ph.D. degree, the student must have passed a
preliminary assessment examination conducted by a committee constituted by the Graduate
Committee (GradCom) of the HDSI Faculty Council. This committee shall not include any
assigned or selected research advisor.
The doctoral dissertation committee, chaired by the academic research advisor, shall be
appointed by the Dean of Graduate Studies under the authority of the Graduate Council of the
Academic Senate. The committee members shall be chosen from at least two departments, and at
least two members shall represent academic specialities that differ from the student’s chosen
specialization. In all cases, the doctoral committee will include one tenured or emeritus UCSD
faculty member from outside the HDSI. In exceptional conditions, a faculty member with home
department outside of HDSI and with 25% or less appointment in HDSI may be petitioned to the
graduation division for meeting this requirement. Additional rules per Regulation 71512 on the
composition and conduct of the doctoral committee shall apply.
6. Unit Requirements
For the conferral of the Ph.D. degree in Data Science, 48 units (12 courses) will be required to
be taken for a letter grade and 4 units of professional preparation units must be taken for a
passing (satisfactory) grade. The professional preparation consists of 1 unit of faculty research
seminar, 2 units of TA/tutor training and 1 unit of survival skills course. Out of the 12 courses, at
least 10 must be graduate-level courses; at most two can be upper-level undergraduate courses.
36 units or 9 courses must be completed within six quarters from the start of the degree program.
Data Management & Data Systems, Data Security DSC 10X DSC 20X Introductory &
Mezzanine Courses
Data Science Theoretical Foundations (builds upon DSC 14X DSC 24X
lower division DSC 4X series)
Applied Machine Learning: Data Mining (incl. DSC 15X DSC 25X Multiple
Graph mining, time-series mining), recommender domain-specific ML
systems, ML-based vision, Deep learning applications.
applications. Natural Language Processing
Arts, Humanities, Society, Policy and Social DSC 16X DSC 26X
Sciences
3. Program Structure
Courses in Data Science Graduate Program are structured into three groups of courses: Group A,
Group B and Group C. Group A courses are introductory level courses taught at the level of
undergraduate senior or mezzanine courses. Group B are core graduate level courses with
prerequisites from Group A courses. Group C are advanced, specialized and free-standing
courses, often part of the required courses in the Data Science specialization of Graduate
Program in other departments. In all three groups, required courses are indicated as such; they
can not be substituted by other courses without exception approval from the graduate program
committee.
Group A: Preparatory Knowledge and Skill Areas [Credit for maximum of 3 courses]
We have identified five important knowledge and skills necessary for understanding (and
advancing) core data science knowledge. It is, therefore, important that all our entering students
Given the breadth of the applicant pool, it is understandable that among the incoming students
interested in the Data Science graduate program, there may be some lacking the basic
background at the undergraduate level in one or more of the above areas. This would prohibit
them from taking the relevant graduate level courses. Accordingly, we have devised five
foundational knowledge area courses described in the catalogue copy and listed below. A student
can receive credit towards the Ph.D. degree for a maximum of three courses from the list of
courses below. We expect that students graduating from quantitative undergraduate backgrounds
would have taken a majority of these courses (or equivalent). Students with an undergraduate
degree from the Data Science major or a Data Science minor would have taken in all the five
areas mentioned above thus obviating the need for background preparation.
1. DSC 200: Data Science Programming [New], 4 units: Computing structures and
programming concepts such as object orientation, data structures such as queues, heaps,
lists, search trees and hash tables. Laboratory skills include Jupyter notebooks, RESTful
interfaces and various software development kits (SDKs). Instructors: Aaron Fraenkel,
Yoav Freund
2. DSC 202: Data Management for Data Science [New], 4 units: Principles of data
management, relational data model, relational algebra, SQL for data science, NoSQL Databases
3. DSC 210: Numerical Linear Algebra [New], 4 units: Linear algebraic systems, least
squares problems, orthogonalization methods, ill-conditioned problems, eigenvalue and
singular value decomposition, principal component analysis. Instructors: Rayan Saab,
Alex Cloninger, Gal Mishne
5. DSC 212: Probability and Statistics for Data Science [New], 4 units: Probability,
random variables, distributions, central limit theorem, maximum likelihood estimation,
method of moments, confidence intervals, hypothesis testing, Bayesian estimation,
introduction to simulation and the bootstrap. Instructors: Jelena Bradic, Dimitris Politis,
Armin Schwartzmann
Group B: Core Knowledge and Skill Areas [Ph.D. students take at least 6 courses]
Building upon the foundation courses in Group A, the graduate program identifies several core
graduate courses. Four core courses are required for all Ph.D. students, including those with a
Bachelors in Data Science. The four required courses are:
1. DSC 240: Machine Learning [New], 4 units: A graduate level course in machine
learning algorithms: decision trees, principal component analysis, k-means, clustering,
logistic regression, random forests, boosting, neural networks, deep learning. Instructors:
Misha Belkin, Yian Ma, Jelena Bradic, Gal Mishne, Virginia de Sa
2. DSC 260: Data Ethics and Fairness [New], 4 units: Ethical considerations regarding
privacy and control of information. Principles of fairness, accountability, and
transparency. Use of metadata to information algorithms. Algorithmic fairness. Policy
issues such as the Fair Information Practices Principles Act, and laws concerning the
“right to be forgotten.” Instructor: R. Stuart Geiger, David Danks
(*) Depending on academic preparation, a Ph.D. student can take an advanced course on Applied
Statistics, such as MATH 282B instead of DSC 241. Similarly, instead of DSC204A, a student
can take a course on Algorithms, such as CSE 202: Design and Analysis of Algorithms.
In addition, a doctoral student must select at least 2 out of the following 8 core courses13.
5. DSC 203: Data Visualization and Scalable Visual Analytics [New], 4 units:
Commonly used algorithms and techniques in data visualization. Interactive reasoning
and exploratory analysis though visual interfaces. Application of data visualization in
various domains including science, engineering, and medicine. Scalable interactive
methods involving exploring with big data and visualization methods. Techniques to
evaluate effectivity and interpretability of analytical products for diverse users to obtain
insights in support of assessment, planning, and decision making. [Prerequisite: DSC
202] Instructors: Ilkay Altintas, Juergen Schulze
6. DSC 204B: Big Data Analytics & Applications : The goal of this course is to introduce
the student to the methods and methodologies of big data analytics. Methods covered
include: I/O bottleneck and the memory hierarchy, HDFS, Spark, XGBoost and
tensorflow. Methodologies include: writing jupyter notebooks that can be understood and
used by people of diverse background Replicability and statistical significance.
[Prerequisite: DSC 204B] Instructors: Yoav Freund.
9. DSC 244: Large-Scale Statistical Analysis [New], 4 units: Exploratory data analysis,
diagnostics, bootstrap, large-scale (multiple) hypothesis testing, false discovery rate,
empirical Bayes methods. [This class may be cross-listed with Mathematics and/or
Biostatistics.] Instructor: Armin Schwartzman, Jelena Bradic
10. DSC 245: Introduction to Causal Inference [New], 4 units: Causal versus predictive
inference, potential outcomes and randomized experiments (A/B testing), structural
causal models (interventions, counterfactuals, causal diagram, do-operator, d-separation),
identification of causal effect (back-door and front-door criterion, do-calculus),
estimation of causal effect (matching, propensity score, g-computation, doubly robust
estimation, regression discontinuity and instrumental variables, conditional effects),
structure learning (constraint and score-based algorithms), advanced topics (mediation
and path-specific effects, bounding causal effect, selection bias, external validity and
transportability, processing missing data, causal inference in networks) [Prerequisite:
DSC 212, 240] Instructors: Babak Salimi
11. DSC 250: Advanced Data Mining [New], 4 units: Graph mining and basic text analysis
(including keyphrase extraction and generation), set expansion and taxonomy
construction, graph representation learning, graph convolutional neural networks,
heterogeneous information networks, label propagation, and truth findings. [Prerequisite:
DSC 190A or CSE158 or equivalent] Instructor: Jingbo Shang
12. DSC 261: Responsible Data Science [New], 4 units: responsible data management,
algorithmic fairness (fairness definitions, impossibility results, causal fairness, building
fair ML models, fairness beyond classification), algorithmic transparency (interpretability
vs explainability, auditing-black-box algorithms, algorithmic recourse), privacy and data
protection, sampling bias, reproducibility [Prerequisite: DSC 260, 240, 245] Instructors:
Babak Salimi
Thus, together with Group A and Group C courses, doctoral students are required to take a
minimum of 5 courses for letter-grade credit. On the other end, students can satisfy all letter
grade course requirements except (satisfactory completion of professional preparation) teaching,
survival skills and research seminar courses. These students are expected to enroll into individual
research (DSC 298) in a section offered by the faculty advisor to meet residency requirements
and maintain graduate student standing during the period of dissertation research.
Group C courses aim to provide either practical experiences in chosen specialization areas, or
advanced training for students preparing for doctoral programs. The courses include required
professional preparation courses: 2 unit TA/tutor training (DSC 599), 1 unit of academic survival
skills (DSC 295) and 1 unit faculty research seminar (DSC 293), all of which must be completed
with a Satisfactory (S) grade using the S/U option.
Courses in this group also serve as a means to directly engage faculty in departments across the
campus who are directly interested in Data Science related topics and instruction. Consequently,
we make important courses taught by HDSI affiliated faculty visible to the Data Science
graduate students. However, their availability is subject to schedule and enrollment constraints of
the individual departments. Based on written approval from participating departments, courses
available in a given domain in a given year will be announced beginning of the academic year
with a pre-registration deadline for capacity planning purposes.
DSC 293: Faculty Research Seminar: 1 unit (S/U): Weekly faculty research seminar.
Individual HDSI colloquia and distinguished lecturers may be included at the discretion of the
instructor. Instructor: HDSI Faculty.
DSC 294: Research Rotation: 4 units (S/U): Special topics research under the direction of an
HDSI faculty member. The research topics may include training in specific research
methodologies consisting of practical laboratory skills, computational skills or proof systems in a
research group/laboratory in which the student may pursue doctoral dissertation research.
Prerequisites: Data Science graduate students and consent of the instructor.
DSC 295: Academia Survival Skills: 1 unit (S/U): Basic skills necessary to succeed as a
researcher in Data Science including scripting, cloud computing skills, fellowship proposal
preparation, CV preparation, writing reviews, preparing posters etc.
DSC 231: Embedded Sensing and IOT Data Models and Methods: Sensory data and control
is mediated by devices near the edge of sensor networks, referred to as IOT (Internet of Things)
devices. Components of IOT platforms: signal processing, communications/networking, control,
real-time operating systems. Interfaces to cloud computing stack, publish-subscribe protocols
such as MQTT, embedded software/middleware components, metadata schema, metadata
normalization methods, applications in selected CPS (cyber-physical system) applications.
Instructor: Rajesh Gupta
DSC 251: Machine Learning in Control: Estimation of stability and uncertainty, optimal
control, and sequential decision making. Instructor: Yian Ma
DSC 252: Statistical Natural Language Processing, 4 units. Diving deep to the classical NLP
pipeline: tokenization, stemming, lemmatization, part-of-speech tagging, named entity
recognition, parsing, and machine translation. Finite-state transducer, context-free grammar,
14
Academic advisors are appointed by the HDSI GradCom and are required to be different from
the student's research advisor (or chair of dissertation committee). The primary responsibility of
an academic advisor is to provide an assessment of student progress, and be a spokesperson for
the student welfare to the HDSI faculty.
DSC 253: Advanced Data-driven Text Mining, 4 units: Unsupervised, weakly supervised, and
distantly supervised methods for text mining problems, including information retrieval,
open-domain information extraction, text summarization (both extractive and generative), and
knowledge graph construction. Bootstrapping, comparative analysis, learning from seed words
and existing knowledge bases will be the key methodologies. Instructor: Jingbo Shang
DSC 254: Statistical Signal and Image Analysis. 4 units. A graduate level course on signal and
image analysis spanning three main themes. Statistical signal processing: random processes,
stochasticity, stationarity, Wiener filter, Kalman filter, matched filter ; Signal processing:
time-frequency representations, wavelets, signal processing with sparse representation
(dictionary learning) ; Image processing: registration, image degradation and restoration: noise
models + denoising, image pyramids, random fields. Instructor: Gal Mishne, Armin
Schwartzman
DSC 213: Statistics on Manifolds. 4 units. This is a graduate topics course covering statistics
with manifold constraints. Topics include: Frechet means and variances, principal geodesic
analysis, directional statistics, random fields on manifolds, statistical distances between
distributions, transport problems, and information geometry. Manifold constraints will be
considered on simplexes, spheres, Stiefel manifold, stratified manifolds, cone of positive definite
matrices, trees, compositional data, and other relevant manifolds. Instructor: Armin
Schwartzman, Alex Cloninger
CSE 234: Data Systems for Machine Learning. 4 units. Data management and systems issues
across the whole lifecycle of ML-based analytics in real-world applications, including: data
sourcing, preparation, and organization for ML; programming models and systems for scalable
ML training, feature engineering, and model selection; systems for ML inference, deployment,
and explanations; and governed ML platforms and feature stores. Instructor: Arun Kumar
DSC 261: Responsible Data Science, 4 units. Computational aspects of responsible data
science. Computational approaches for enforcing fairness in machine learning, interpretability,
explainability, privacy. Prerequisites: DSC 240, DSC 241. Instructor: Babak Salimi.
MATH 281A-B-C: Mathematical Statistics (4-4-4 units). Math 281A consists of statistical
models, sufficiency, efficiency, optimal estimation, least squares and maximum likelihood, large
sample theory. Math 281B continues and discusses Hypothesis testing and confidence intervals,
one-sample and two-sample problems. Bayes theory, statistical decision theory, linear models
MATH 284: Survival Analysis. 4 units. Survival analysis is an important tool in many areas of
applications including biomedicine, economics, engineering. It deals with the analysis of time to
events data with censoring. This course discusses the concepts and theories associated with
survival data and censoring, comparing survival distributions, proportional hazards regression,
nonparametric tests, competing risk models, and frailty models. The emphasis is on
semiparametric inference, and material is drawn from recent literature Instructor: Lily Xu,
Jelena Bradic
MATH 285. Stochastic Processes (4 units). Elements of stochastic processes, Markov chains,
hidden Markov models, martingales, Brownian motion, Gaussian processes. Recommended
preparation: undergraduate probability theory. Instructor: Ruth Williams
MATH 287A. Time Series Analysis (4 units). Discussion of finite parameter schemes in the
Gaussian and non-Gaussian context. Estimation for finite parameter schemes. Linear vs.
nonlinear time series. Stationary processes and their spectral representation. Spectral estimation.
Students who have not taken MATH 282A may enroll with consent of the instructor. Instructor:
Dimitris Politis
MATH 287B: Multivariate Analysis. 4 units. Bivariate and more general multivariate normal
distribution. Study of tests based on Hotelling’s T2. Principal components, canonical
correlations, and factor analysis will be discussed as well as some competing nonparametric
methods, such as cluster analysis. Students who have not taken MATH 282A may enroll with
consent of the instructor. Instructor: Ery Arias-Castro
MATH 287D: Statistical Learning Theory. 4 units. Topics include regression methods:
(penalized) linear regression and kernel smoothing; classification methods: logistic regression
and support vector machines; model selection; and mathematical tools and concepts useful for
theoretical results such as VC dimension, concentration of measure, and empirical processes.
Instructor: Jelena Bradic.
COGS 243: Statistical Inference and data analysis (4 units): This course provides a rigorous
treatment of hypothesis testing, statistical inference, model fitting, and exploratory data analysis
techniques used in the cognitive and neural sciences. Students will acquire an understanding of
mathematical foundations and hands-on experience in applying these methods using Matlab.
Cognitive science PhD students must enroll for four units and will be required to do assignments
and a final project. All other students can enroll for two units and will be required to complete all
c. A student can receive credit towards the M.S. or Ph.D. degree for a maximum of two
courses (8 units) taken at the upper-division undergraduate level, subject to the approval
of the student’s faculty advisor. These two courses can be transferred (as discussed
above), or taken during the course of the graduate program. For example, a student can
make use of the equivalencies discussed in part (a) above.
8. Field Examinations
No field examinations are required.
9. Qualifying Examinations
As discussed in Section 2.4, successful completion of a Ph.D. program in Data Science requires
timely completion of a preliminary assessment, research qualifying and a final dissertation
defense examination.
After a student successfully completes the preliminary assessment examination, in the next
annual review of the student (conducted annually in the Fall Quarter as a part of the Annual
Faculty Retreat), the GradCom of the HDSI Faculty Council assigns the academic advisor to
provide necessary updates to the GradCom and helps in setting up the doctoral dissertation
committee.
15
This exception is stipulated in view of a large number of formally appointed faculty on HDSI
faculty council (at 25% or 0%) drawn from different departments and divisions thus making it
impossible for a student to find an “outside” faculty member in some areas.
Given the diversity of background training and intellectual persuasion of entering Ph.D. students,
a subset of the HDSI faculty council felt strongly that a rotation training program would be
essential to providing an informed match for each Ph.D. student. The interdisciplinary nature of
data science makes research rotation experience a desirable aspect of the Ph.D. program while
addressing principally different advising cultures in constituent areas of data science.
Accordingly, HDSI seeks a principled way to address this difference in academic advising
culture. One possibility is to require rotation of all candidates, but allow for exceptions through
individual review of candidates who demonstrate strong background work and inclination to
work with a specific faculty member. The other possibility is to identify a subset of admitted
students who would be good candidates for participation in the first year rotation program. While
the exact details will be worked out by the Graduate Committee, we plan to offer participation in
the rotation program to all students at the time of the admission.
A research rotation is a guided research experience lasting one quarter (10 weeks) obtained by
registering for DSC 294 with an instructor. Ph.D. students will participate in a minimum of 2
research rotations during their first year, and with a minimum of two different faculty members,
and as much as four rotations including summer quarter. A student may rotate twice under the
same faculty member as long as they rotate with at least two faculty members. The goal is to
help the student identify and develop their research interests and to expose students to new
methodological approaches or domain knowledge that may be outside the scope of their eventual
thesis. Research rotations must complete before the start of the second year with a signed
commitment form from a faculty advisor.
The Graduate Committee (GradCom) will develop detailed guidelines on the selection process
and conduct of the Rotation Program that specifically address questions such as rotation
schedule, whether or not it is arranged by faculty advisor or by the student themselves,
orientation of students into the rotation program, guidelines on academic advising by the rotation
advisors and student evaluations in a rotation program, and exception conditions including
extension of the rotation experience for a student to four quarters and other special
circumstances.
Further, industry surveys have repeatedly shown a soaring need for data scientists. Chief and
credible among these are reports by McKinsey16, IBM17, Bloomberg18. As mentioned, in a survey
of our students graduating from Data Science major, roughly a third of them have indicated their
interest in a graduate degree in Data Science program despite strong placement opportunities for
our graduates in the industry for students training in key data science areas of AI and Machine
Learning. Since our undergraduate major in Data Science covers basic and advanced courses in
these areas, the graduate interest is primarily for a doctoral research degree program. In 2019,
nearly a half (1571) of over 3000 applicants to various graduate degree programs at UC San
Diego who indicated interest in Machine Learning and Artificial Intelligence related topics in
Electrical Engineering, Computer Science and Cognitive Science, directly indicated their interest
in Data Science programs at HDSI. HDSI offered scholarships to 10 of these students admitted
into degree programs in Computer Science, Math, Electrical Engineering or Cognitive Sciences.
Thus, the demand for a graduate training in Data Science is high and continues to grow. Further,
the program not only serves the HDSI mission of educating talent in the area of Data Science,
but also serves as a vehicle for continued engagement and proliferation of Data Science training
across various graduate programs through new foundational, core and elective course offerings
that engage domain experts into the field of Data Science. The program provides an excellent
means to create new educational opportunities for students, especially for underserved and
economically-disadvantaged student populations who can benefit from graduate scholarships
offered by HDSI as a part of its core endowment-supported activities that we have mentioned
earlier.
16
https://mck.co/2W4LriY
17
https://bit.ly/3dkiDIS
18
https://bloom.bg/3chfute
While no survey data is available for the doctoral demand, we have plenty of data and evidence
for placement of graduate students in the industry and civil organizations. As a case study, we
examined the entire class of Ph.D. students who graduated from the Ph.D. program at NYU. We
located 30 graduates from the program (from 2017 starting year) in LinkedIn working as
postdoctoral scholars, senior data scientists at companies such as Boston Consulting Group,
Walmart Laboratories, Facebook, Uber; as research analysts in venture capital and investment
banking; and as software engineers at Hulu and FreeWheel.
A detailed market analysis in support of the opportunity in ‘Big Data’ and long-term trends in
this domain comes from a recent McKinsey Global Institute report that identified ‘Game
Changer’ opportunities for US growth19. The May 2011 McKinsey Global Institute report, “Big
Data: The next frontier for innovation, competition and productivity”20, predicted the need for
over 500,000 data scientists by 2018. McKinsey projected a shortfall of 1.5 million additional
19
http://www.mckinsey.com/insights/americas/us_game_changers
20
http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_innovat
ion
In a separate ‘bottom-up’ study, the EMC Corporation, a publicly traded company with 60,000
employees, interviewed 497 data science and business intelligence professionals from around the
world. The results of their study on the need for Data Scientists pointed to some interesting
trends in the computing industry. About two-thirds of the individuals polled believe the demand
for data scientists will outpace supply in the next five years with nearly 30% coming from
professionals in disciplines other than computer science. The study also cited the lack of training
and resources as the biggest obstacles to data science in organizations. These observations
directly support the case for the need for rigorous scientific training for the professionals moving
into the data science field.
Data Scientists constitute a separate category of jobs that are currently posted along with IT and
business analytics positions. As of this writing, Indeed.com posts openings for 5953 Data
Science jobs, Glassdoor lists 21,166 Data Science jobs with a salary range from $76,000 to
$148,000 for an average of $118,700. This compares with an average salary of $76,500 for
software engineers, $70,700 for computer engineers and $110,200 for computer scientists.
Indeed, driven by the opportunities available, a whole industry has sprung up on Data Science
placements.21 Data Science graduates will be well qualified for job titles such as data analysts,
business intelligence and predictive analysis professionals. The students are likely to find
employment across many areas including internet companies, banking, insurance, investments,
engineering and healthcare. We will work with Career and Placement Services as well as Alumni
Board to ensure mentoring and placement of graduates from the Data Science program.
21
https://www.dataquest.io/blog/career-guide-find-data-science-jobs/
4. Ways in which the program will meet the needs of the society
Data Scientists are highly sought after, showing a societal need for individuals with this
professional competency. Furthermore, Data Sciences are already having an impact on many
aspects of society, including e-commerce, financial industries, technology companies, health
care, and academia. There are few aspects of society that will not be affected by Data Science.
The proposed program directly serves the current and growing need of professionals in the area
and its applications.
6. Program Differentiation
Sections 1.5 and 1.6 cover in detail related programs at UC San Diego and in the UC system.
The growth in Data Science degree program is following a middle-out process, starting with a
large number of MS programs counting over 100, with emerging data science bachelor’s degree
programs such as UC Berkeley and UC Irvine in addition to BS in Data Science offered by
HDSI, now in its fifth year. New York University has offered a PhD program in Data Science
since 2017, with specializations in Data Science of existing PhD degrees offered by Columbia,
Michigan and many other schools. In contrast to emerging programs as specializations of
Statistics, or Engineering degrees, HDSI organization presents us with the capability to design
integrated programs in Data Science for a broader and deeper training through a large and
diversified set of electives. With the increasing participation from faculty and departments across
the campus in creating additional electives/specialization courses, we hope to extend the reach
and impact of Data Science as a discipline.
22
https://datascience.ucsd.edu/about/faculty/hdsi-faculty-council/
In addition to the 48 faculty members listed above (15.5 FTE), the Institute is also planning to
fill one teaching faculty (LSOE) and one advancing faculty diversity (AFD) position in the
current recruiting season. It anticipates additional 1-2 new faculty members to join the Institute
for a total faculty strength of 16-17 FTE including 3 FTE LPSOE and 8 U18 lecturers.
Together, these provide a capacity of 51-52 courses annually by the current ladder-rank faculty in
addition to 5 cross-listed courses as well as teaching by U18 continuing lecturers for a combined
total annual capacity of 58-74 courses. The current Data Science undergraduate program
accounts for 35 courses/sections per year. Conservatively, the Institute has the capacity to offer
6-10 graduate courses per quarter that enables it to adequately serve the proposed doctoral
program.
6. Resource Requirements
As mentioned earlier that no new or additional resource requirements are expected from the
campus in support of the proposed Ph.D. program. Instead, the Institute’s continuing and planned
expenses in graduate scholarships, faculty recruiting and cyber-infrastructure resources
(including personnel) will be key enablers for the successful operation of the proposed graduate
program. Starting Winter 2021, the Institute has been allocated space on two floors of the 38,000
square feet Literature Building that would provide ample space for housing the faculty, students,
and advising staff for the graduate and undergraduate programs. HDSI’s current undergraduate
program has over 1000 students in its majors and minors, thus making the undergraduate major
to be the 8th largest major. This provides a funding source for graduate Teaching Assistants who
will be primarily drawn from the proposed doctoral program. Faculty fully or primarily
appointed in HDSI currently direct research projects worth $5.5M annually that are managed by
HDSI. The sponsored research is expected to grow as additional faculty join starting Fall 2021.
The combined research and teaching activities will be taken into consideration by the graduate
division in setting the graduate scholarship support that is expected to cover one full-year of
non-employment based graduate student support for each of the entering graduate students. We
point out that this graduate support is realized due to additional teaching and research activities
(and associated revenues) by the HDSI faculty. HDSI faculty will continue to fund and supervise
students working on Data Science research projects who are drawn from Ph.D. programs in other
departments as well.
Overall, HDSI’s guarantee of financial support is rooted in four primary sources of funding for
graduate students: (a) Graduate division support of graduate students based on campus policy on
distribution of scholarships support to graduate students. It was earlier known as “block grants”
derived on the basis of campus policies for academic units based on their need and extramurally
funded research activities; (b) Teaching assistant support. Currently, TA support is at 8 TA FTE
per year and expected to rise with increase in undergraduate enrollment in our majors and minors
(from currently at 700 students in the major, 5000 students in classes annually to 1000 majors
and 10000 students in classes annually in three years).
Graduate students will be trained and once determined to be qualified per university regulations,
they will be offered TAships; (c) Extramurally funded research projects including training
grant(s). Extramural funding is likely to be the largest source of funding for our graduate
students, given the extensive growth and consistent availability of research funded by
organizations such as NSF, DARPA, DOE, ARL and others. Data Science areas are among the
most intensely invested areas of research both by public and private organizations (foundations).
Based on budget analysis provided as a part of 3-year FTE planning, we anticipate annual
research support of $200K/year per faculty appointed in the institute; (d) endowment-supported
graduate student scholarships. We have currently budgeted $600K per year for this program. We
expect to grow this program with the growing industry contributions and philanthropic support to
the Institute. Financial aid will be available to approximately one quarter of our best students in
the early years. As we scale the program, the ratio of financial support may drop to no less than
15% of the total student population. In addition, as outlined in our EDI strategy (Section 1.6), the
Institute will directly offer scholarship for URM students.
8. Governance
The program is offered by the Halicioglu Data Science Institute, established as an academic unit
by the Academic Senate on June 6, 2018. HDSI faculty council is the governing body of all
academic programs by the Institute. A copy of Bylaws is attached in the Appendix.
Department Chairs
1. James McKernan, Chair, Department of Mathematics
2. Jonathan Cohen, Chair, Department of Philosophy
3. Brian Goldfard, Chair, Department of Communication
4. Bill Lin, Chair, Department of Electrical and Computer Engineering
5. Kun Zhang, Chair, Department of Bioengineering
6. Sorin Lerner, Chair, Department of Computer Science and Engineering
7. Stefan Leutgeb, Neurobiology Section, Division of Biological Sciences
8. Thad Kousser, Chair, Department of Political Science.
External Reviewers:
1. Professor Alexander Aue, Chair, Department of Statistics, Co-Director, Center for Data
Science and Artificial Intelligence Research, UC Davis
2. Professor Larry Wasserman, Department of Statistics and Data Science, CMU
3. Professor George Michailidis, Founding Director, UF Informatics Institute, University of
Florida
4. Professor Sharad Mehrotra, Information and Computer Science, UC Irvine.
Faculty, Instructors
1. Jorge Cortes, MAE247
2. James Fowler, POLI 287
3. Trey Idekar, BNFO 286 / MED 283
4. Massimo Franceschetti, ECE 227
5. Vineet Bafna, CSE 283 / BENG 203, CSE 280A
6. Alex Cloninger, DSC 210, Math 170A, Math 277A
7. Arun Kumar, CSE 234, DSC 202, DSC 204
29 November 2020
TO: Graduate Council
FROM: Albert P. Pisano, Dean of Engineering
RE: Doctor of Philosophy Degree in Data Science (Ph.D./DS)
I am writing to express my strong support for the new, proposed Doctor of Philosophy
Degree in Data Science (Ph.D./DS), to be offered by the Halicioglu Data Science Institute
(HDSI). There already exist significant collaborations between HDSI and the Jacobs School of
Engineering, and this new Ph.D. degree will serve to strengthen and expand that collaboration. I
am pleased to report that there are six jointly appointed faculty between HDSI and the Jacobs
School on which this strong collaboration will be based: Benjamin Smarr, Bioengineering,
Jingbo Shang, CSE, Barna Saha, CSE, Joav Freund, CSE, Arun Kumar, CSE, and Rajesh Gupta,
CSE. In my conversations with faculty colleagues I find there is broad support for the proposed
Ph.D. program. Indeed, because HDSI is a unit that has faculty who conduct research with Ph.D.
students, it seems appropriate that HDSI have the ability to offer the proposed degree. Further,
Engineering is willing to collaborate with HDSI in areas of common research interest, including
Algorithms, Artificial Intelligence, Machine Learning, Data Infrastructure and Systems, as well
as application areas of the research. There are many opportunities for broadening the course
offerings at UCSD, and the course offerings in Data Science will benefit the Ph.D. and MS
students in CSE and ECE. Similarly, a number of the graduate classes in CSE and ECE are sure
to be of interest to Data Science PhD students.
I am confident that HDSI and Engineering will move forward together in a mutually-
beneficial way, and I anticipate there will be high demand from students for this program, and
look forward to an exciting new crop of Ph.D. researchers.
Sincerely,
The proposal (and all of the HDSI work) recognizes that the application of data science to
applied problem solving requires partnership with domain experts. HDSI has worked to organize
a Data Science in Society cluster that fulfills this philosophy. It includes a number of faculty
members from GPS. As a result, the PhD program has a roster of pertinent researchers (and
teachers) available already in place whose interests will lead to constructive spillovers to the
teaching and research programs of GPS.
The cluster is more than an aspirational. HDSI is already providing support for some of the large
research initiatives at GPS. Two examples are the big data program for the analysis of the
politics and economics of China that is housed at our 21st Century China Center and the other is
the "Big Pixel" program employing satellite imagery analysis in our Center on Global
Transformation. These and other initiatives are leading to a new set of graduate courses on
marrying data science to policy analysis. One of our senior faculty members, Professor John
Ahlquist, has made a multi-year commitment to being the "Sherpa" for this undertaking. We
expect some of these courses will be available to HDSI PhD students.
Finally, it should be noted that GPS has no plan for a PhD program that would conflict with the
proposed HDSI offering.
November 30, 2020
In another letter to the Graduate Council, we offered strong support to HDSI’s plans for a M.S.
program in Data Science. I am pleased to also support HDSI’s plans for a PhD program in Data
Science. The proposed program shows there is strong coherence across their degree programs, at the
undergraduate and masters’ level as well. This proposal for a PhD is a natural extension of their
curricular planning to date.
We do not have a surplus of courses and programs about data science. Although areas such as
machine learning and artificial intelligence are also taught in Computer Science and Cognitive Science
(and Mathematics), the teaching emphasis and the cases under study across the divisions are different
enough as not to be redundant or overlapping. Further, because we have a number of faculty in the
Social Sciences (e.g. Voytek, Roberts, Geiger) who participate in the teaching programs of the HDSI,
their perspectives are regularly considered and incorporated into HDSI courses – thus maintaining
cooperative intellectual and research programs across divisions while teaching to the needs and
ambitions of the different PhD programs.
I agree with HDSI that the demand for high-level training in data science is such that having multiple
PhD programs is not a problem. In fact, the areas of machine learning and artificial intelligence will
benefit from being taught in different ways for different populations of students who will bring their
skills to a broad job market that has an acute need for skill in this area. The proposed program has
many interesting and thoughtful elements, and it is clear they have thought very carefully about
pedagogy at this level. I welcome their innovative teaching into our campus community.
Sincerely,
Carol Padden
Dean, Division of Social Sciences
The proposal frames the program’s core objectives very clearly around three main competency
areas, namely, to train students to (a) collect raw data for computational modeling and analysis;
(b) appropriately use algorithms in a specific domain by developing effective optimization
methods; and (c) interpret, analyze, and visualize the results of these algorithms to complete
relevant scientific inquiry.
The program is designed around multiple specialization tracks, in order to allow students from
diverse academic backgrounds to both develop shared core competencies and explore domain-
elective courses.
As noted in the proposal, the demand for PhD education in Data Science is growing across
multiple institutions. The structure of the HDSI proposed program is especially nimble and
innovative, as it will allow doctoral students to train across various graduate programs through
foundational, core and elective course offerings, in partnership with other Academic Units. This
feature makes the program especially competitive.
Not only will the PhD program in Data Science create timely transdisciplinary opportunities for
many students; it will also play a crucial role in serving underserved and economically-
disadvantaged student populations, thanks to the graduate scholarships offered by HDSI as a part
of the Institute’s foundation-supported initiatives.
The proposal for the PhD in Data Science is well-argued and meticulously presented. I believe
the PhD degree program in Data Sciences will provide a welcome addition to graduate studies at
UC San Diego. I look forward to seeing this program take off, and to further opportunities of
collaboration between the Division of Arts and Humanities and HDSI.
Sincerely,
On behalf of the Herbert Wertheim School of Public Health and Human Longevity Science
(HWSPH), I am pleased to offer support for your proposal for a program of graduate study in data
science leading to a Doctor of Philosophy in Data Science (PhD/DS).
Thank you for the opportunity to review and comment on your proposal. Strengths of this
proposal are that it addresses an emerging field of study for which there is a demand for training,
it is highly relevant to a wide range of industries, uses a campus-wide collaborative approach,
and I see it as complementary to the degrees we offer in the HWSPH. It was also great to have
faculty from the HWSPH’s Biostatistics and Bioinformatics group (Drs. Xu and Schwartzman)
included in the planning process.
I offer my best wishes as you create this important training program, and look forward to
supporting it when it is approved.
Sincerely,
Dec. 1, 2020
Dear Colleagues,
Several faculty members at Rady and I have had an opportunity to review HDSI’s proposed degree program for a
Doctor of Philosophy in Data Science. We are very supportive of the proposal and believe it will be well-
received by other units on campus. This proposal is a timely one and will help address an unmet and growing
need for graduate education in data science.
There is a lot to like in the details of HDSI’s proposal. First, a fundamental goal of the program is to “lay the
foundation for future researchers who can expand the boundaries of knowledge in Data Science itself”. This is an
important aspect that will help produce capable researchers who will become leaders in theory and practice of
data science and advance the emerging field.
We view the PhD in Data Science program as complementary to our degree programs and believe the experience
of graduate students focusing in the areas of data science and its applications will be positively impacted by its
existence. For instance, our students will mutually benefit from some of the new graduate courses that are created
as part of this proposal. While our students typically take their breadth electives within Rady, some students seek
courses in departments across campus. These electives require approval by program directors who assess fit.
In closing, I am supportive of the Doctor of Philosophy in Data Science program presented in this proposal. It is
very well thought out and designed with aspects that make it unique within UC San Diego. I anticipate that the
program will be successful in achieving its goals.
Best regards,
BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ
Yours sincerely,
BERKELEY · DAVIS · IRVINE · LOS ANGELES · RIVERSIDE · SAN DIEGO · SAN FRANCISCO
JONATHAN COHEN
PROFESSOR AND CHAIR
DEPARTMENT OF PHILOSOPHY
9500 GILMAN DRIVE, DEPT. 0119
LA JOLLA, CALIFORNIA 92093–0119
C
(760) 814-1110
FAX: (858) 534-8566
[email protected]
SANTA BARBARA
http://aardvark.ucsd.edu
October 8, 2020
· SANTA CRUZ
On behalf of the Department of Philosophy, I write to offer support for the proposal for a PhD program in
Data Science that would be housed within Halicioğlu Data Science Institute (HDSI).
Data science is clearly an important emerging field with connections to many areas of intellectual inquiry
spread across our University. The creation of a PhD program that would capitalize on these resources in a
way that benefits a new generation of scholars is an exciting prospect.
We hope and expect that the establishment of such a program will lead to further cooperation in research
and instruction between Philosophy and HDSI in the areas of causal discovery, machine learning, data ethics,
and more. We look forward to discussing ways in which we might contribute as the shape of the new program
becomes clearer.
We are confident that HDSI has the infrastructure and expertise to run the proposed PhD program. Our
Department does not expect that the program will negatively impact our research or pedagogical missions
at any level, and so endorse the proposal without reservations.
Please feel free to contact me for any additional questions.
Sincerely,
Subject: Support for proposed doctoral degree program in “Data Science” (PhD/DS
Dear Rajesh,
I am writing to express support from the Department of Communication for the proposed
doctoral degree program in “Data Science” (PhD/DS) to be offered by the Halıcıoğlu Data
Science Institute (HDSI). As a key partner to HDSI in building Data Science, the Communication
Department views the proposed program as an important step in advancing interdisciplinary
cross-fertilization between the two units. Our department has two faculty affiliated with HDSI:
Kelly Gates and Lilly Irani, and one, Stuart Geiger, who holds a joint appointment across the two
units. Since joining the faculty this fall, Prof. Geiger has been working to establish a working
group on Critical Data Studies which promises to build a tighter fabric of connections between
HDSI and Communication. The creation of the proposed PhD promises to set up a platform to
expand the interactions among our faculty around research and graduate advising/mentorship.
The proposal lays out a plan for a program with clear standards for academic rigor. The
curriculum includes a well-considered set of requirements as well as options that balance the
establishment of shared scholarly concerns with cross-fertilization of contributing disciplines.
The proposal also articulates the impressive scope and strengths of faculty who would
participate in the program and establishes a reassuring picture of the adequacy of the facilities
that will be dedicated to research and teaching. Finally, the initial success of the undergraduate
program and the interest from graduate students in affiliated departments signals that HDSI can
anticipate a strong applicant pool, while the growth of the field bodes well for the placement
prospects for its graduates.
In summary, I confirm support of this proposal and look forward to the opportunities for
collaborations between faculty and students in our Department and HDSI.
Sincerely,
Dear Colleagues,
It is my pleasure to write this strong letter of support for the newly proposed Doctoral Degree in Data
Science (PhD/DS), to be offered by the Halicioğlu Data Science Institute (HDSI). The propsed
doctoral degree will serve the need for advanced graduate students in the area of Data Science.
Demand for data scientists is clearly exploding in both academia and industry, as data science is
being applied in all aspects of society. The proposed PhD/DS program in HDSI is very timely to
serve this need. The proposed program is very strong in both quality and academic rigor. Further,
HDSI is fully capable of administering this program given the size and expertise of the HDSI faculty
as well the facilities and budgets available to HDSI. Also, the exploding demand for data scientists
in both academia and industry will ensure a strong applicant pool as well as exceptionally strong
placement prospects for the graduates of the proposed PhD/DS program. In addition, the proposed
HDSI PhD/DS program will facilitate closer engagements and collaborations between the faculty in
HDSI and ECE, as well as other departments across campus.
Overall, the proposed HDSI PhD/DS program will undoubtedly bring much greater visibility to UC
San Diego as the preeminent university for artificial intelligence, machine learning, and data science.
I look forward to cooperating with HDSI on this program as well as other initiatives.
Best regards,
I am writing on behalf of the Department of Bioengineering to enthusiastically endorse and support the
graduate program Doctor of Philosophy in Data Science.
Given the vast importance of Data Sciences in modern society, it is imperative that we train qualified
professionals who can join the workforce solving problems where big data is the paradigm. I have
reviewed your proposed program and the design of the curriculum is excellent and will be ideal for training
students.
The proposed PhD Program will benefit from the interactions with Bioengineering and our top-notched
Bioengineering graduate program. We anticipate that students in your PhD Program will have the options
to take a number of Bioengineering courses related to systems biology, genomics and imaging. I am also
excited that we have an outstanding joint hire in Ben Smarr, who will serve as a bridge between our
programs. I should also add that my colleague Dr. Shankar Subramaniam is launching a new course on
Biomedical Data Sciences in the academic year 2020-21, which would be valuable to the two recently
proposed MS Programs in BE and HDSI, as well as this PhD program.
Several other courses offered by Bioengineering including graduate courses in technologies that generate
vast data in biomedicine and complex modeling courses that transform data into knowledge would be
valuable for our Programs.
I look forward to working with you and helping make the PhD/DS graduate program harbinger
of the future.
Best regards,
Kun Zhang
Professor and Chair
Department of Bioengineering
DATE: 11/29/2020
Dear Rajesh,
The department of Computer Science and Engineering fully supports the proposed PhD program
in Data Science. Since HDSI is a unit that has faculty who conduct research with PhD students, it
only makes that HDSI have its own PhD program. We are happy to work with HDSI on areas of
common interest, including Algorithms, Artificial Intelligence, Machine Learning, Data
Infrastructure and Systems, and application areas of common interest. There are many
opportunities for broadening the course offerings at UCSD, and the course offerings in Data
Science will benefit the PhD and MS students in CSE. Similarly, some of the graduate classes in
CSE, such as the CSE 202 course on Algorithms (but others too), will be of interest to Data
Science PhD students, and we are happy to make seats available to the Data Science PhD
students in those classes.
Sincerely,
Sorin Lerner
UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD
BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ
Nov 2, 2020
Rajesh K. Gupta
Director
Halıcıoğlu Data Science Institute
Dear Rajesh,
I write to express my highest enthusiasm for the proposed doctoral degree program in Data Sciences by
the Halıcıoğlu Data Science Institute (HDSI). As you know, there is a strong connection of Neurobiology
research with many of the core themes of HDSI, including artificial intelligence, machine learning, and
scientific discovery. To take advantage these shared interests, Neurobiology and HDMI have hired several
faculty at the intersection between our fields over just the past three years, including Gal Mishne and Alex
Cloninger at HDSI and Marcus Benna and Yonatan Aljadeff in Neurobiology. In addition, our first joint
faculty member, Mikio Aoi, has just been hired and arrived on campus.
A PhD program in Data Sciences will fill a major gap in our current offerings of PhD programs, in that
your program will not be merely geared towards prospective students in mathematics, computer science
and engineering but will also attract those with an avid interest in one of the application sciences such as
chemistry and biology. From our perspective, computational neuroscience, in combination with big data
from neural recordings is a discipline that strongly benefits from the integration with data science, and
your PhD program will be unique in attracting students at this intersection between disciplines.
Conversely, many of the foundational ideas for machine learning have at least a loose analogue in circuit
mechanisms that are used by the brain, and there is an enormous potential in further applying findings
from rigorous experimental research to engineering applications. These are new frontiers that can be
effectively approached by the type of PhD applicant that only your program can attract, such as students
with a background in both the life sciences and in computer sciences. Importantly, there is also an
increasing number of prospective employers in both academia and industry who are in need of a
workforce who can lead projects in data analytics in fields that include molecular biology, biochemistry,
and neurobiology.
HDSI has already brought together an impressively interdisciplinary group of faculty who have the
expertise to train a new generation of data scientists. Including trainees at the doctoral level is particularly
valuable, because PhD students do not only contribute to the training of students at the undergraduate and
Masters level but are also invaluable for the research mission and the continued national and international
leadership of faculty at HDSI. The launch of your proposed PhD program is therefore well timed in that
all the necessary expertise across disciplines is now in place so that there will be a rewarding interaction
that will further strengthen the status of HDSI as one of the premier institutions of its kind. Based on the
faculty with appointments and joint appointments in HDSI, their expertise is well suited to teach the range
of classes and seminars that are proposed. Taken together, the PhD program in Data Sciences will not
only lead to a rigorous education of the students in the program but to also fill a gap that is currently not
covered by more specialized PhD programs in the analytical and application disciplines. By admitting
students who can bridge gaps between these established programs, numerous PhD programs will be
substantially strengthened by the cohort of students that can go between these diverse fields.
Given that close collaborations between HDSI and our department have already been developing among
faculty, I anticipate that your PhD program will only further foster these interactions and become a pillar
for the type of interdisciplinary science that UC San Diego stands for. Such interdisciplinarity will benefit
the entire campus community and particularly students at all levels within your program as well as
beyond your program. I therefore most enthusiastically support the addition of a PhD program in Data
Sciences.
Sincerely,
As chair of the Department of Political Science, I am writing to voice my department’s strong support for
the Halıcıoğlu Data Science Institute’s Proposal for a Program of Study in Data Science Leading to a
Degree in Doctor of Philosophy in Data Science. We have reviewed this proposal and are excited about
its potential to be both a rigorous and successful program in its own right and to serve as a central force
uniting other campus strengths in bolstering UC San Diego’s emerging leadership in data science
education and research.
We believe that this proposal lays out that rigorous course of study in data science that the faculty
associated with the HDSI – which includes some of our faculty – are highly qualified to deliver. It will
deliver foundational skills early in the program and important applied machine learning skills as students
progress. We are especially encouraged by the inclusion of a course in Arts, Humanities, Society, Policy
and Social Sciences, which will connect students with the diverse disciplinary strengths of our campus.
We believe that students graduating with a Ph.D. in the proposed data science degree would have strong
placement prospects both within and outside of academia. Within academia, there is an increasing demand
across disciplines to hire faculty with data science expertise. The Department of Political Science has
itself hired successfully in data science and is continuously looking to expand the group of political
methodologists with a data science focus. Data scientists are also in high demand in the non-academic
sector, and supply is still limited. The Political Science Department has been very successful in placing its
students with data science expertise in a variety of companies, ranging from Amazon to Facebook to
Google. We therefore believe that students who graduate with a dedicated data science degree would have
very strong non-academic job prospects.
The proposed program also offers potential synergy effects across departments on campus. A data science
PhD program would offer additional courses that would be attractive to PhD students from other
programs. At the same time, Data Science could potentially draw from existing courses that fit very well
in the proposed curriculum. If I can answer any further questions about this matter, please feel free to call
me at 858-246-0721 or to email me at [email protected].
Sincerely,
Thad Kousser
October 28, 2020
I am writing in enthusiastic support of the PhD Program in Data Science currently proposed at UC San
Diego. In my capacity as a Co-Director of the UC Davis Center for Data Science and Artificial Intelligence
Research and as chair of our campus-wide Data Science Steering Committee, I have been closely following
the developments at UC San Diego and the HDSI, viewing them as remarkably useful and exemplary for
our own deliberations. I believe USCD has gotten crucial decisions right in the past and is in the process of
adding another successful piece to its data science portfolio.
The planned introduction of a PhD Program in Data Science is the natural next step for your campus and
completes the educational data science infrastructure, complementing the already existing undergraduate
major and the recently approved MS program. While the latter programs help provide industry,
government and academia with graduates versed in the application of diverse data scientific tools, a
maturing of the field will require mapping out and building the intellectual foundations that make data
science a unique new field, and placing it within the existing disciplinary landscape. This is best done in
conjunction with the development of a strong PhD program that allows for this process to play out in a
coordinated yet flexible fashion. Outside of academia, future PhD graduates will take on leadership
positions in industry and government that require more data science expertise than expected of those
with undergraduate and MS degrees. I imagine the presence of the HDSI will enable a seamless integration
of academia, industry and government interests into a coherent whole. Given the all-encompassing role
data science is expected to play in the future, the PhD program will be of great service to all constituents
at UC San Diego. It will in particular help satisfy student demand for rigorous PhD-level training in data
science.
I specifically like the proposers’ thoughtfulness in defining the aims of the PhD program and devising the
curriculum, clearly bearing in mind the transdisciplinarity and evolving nature of the field, within the UC
San Diego data science ecosystem and beyond. The strategy laid out in the proposal made available to me
is sound and constitutes a broad consensus of the involved parties. It is also laudable that guidelines put
forward by the National Academies have been followed in mapping out the structure of the coursework,
inclusive of important ethical components. The proposed curriculum will serve future PhD students in Data
Science well. Faculty members listed as having teaching responsibilities in HDSI programs include
renowned experts and leaders in their fields and should ensure the highest quality of instruction. I liked to
see the on-ramping options that will allow students from diverse backgrounds to enter the program. Once
in existence, I will make sure that undergraduate and MS students in Statistics and other disciplines at UC
Davis are made aware of this exciting new opportunity for graduate education at UC San Diego.
Overall, I view UC San Diego as primed to play a major role in data science research and education on the
national level. The PhD program is the last piece missing to complete the full educational pipeline. The
proposal is well sought out and administered by leading faculty at one of the foremost data science
institutes. The PhD degree will be an outstanding addition to the portfolio of graduate degrees on your
campus, providing your students with a pathway to the high-level jobs in data-intensive fields the US needs
to cultivate in order to ensure a prosperous as well as equitable future. The proposal has my enthusiastic
support.
Yours sincerely,
Alexander Aue
Professor and Chair
Department of Statistics
Co-Director
Center for Data Science and Artificial Intelligence Research
University of California, Davis
+1-530-752-0560
[email protected]
Carnegie Mellon Department of Statistics &
Data Science
232 Baker Hall
Carnegie Mellon University
5000 Forbes Avenue
Pittsburgh, PA 15213
Larry Wasserman
Rajesh Gupta UPMC Professor
HDSI Director (412) 268-8727
UCSD [email protected]
www.stat.cmu.edu/∼larry
October 24, 2020
Data science is one of the fastest growing areas in academia. The reason is that, with
the flood of information that we now have, data science plays a role in every science
and in understanding societal issues. Every statistics, data science and machine
learning doctoral program that I know of has experienced an unprecedented increase
in demand both in terms of applicants and in demand for graduates.
Increasing the capacity to service more doctoral candidates in data science is thus
critical to the infrastructure of science and public policy. In short, we need more
doctoral programs.
HDSI is well positioned to offer a doctoral program. There is already a B.Sc. and
there is a large pool of talented faculty with an impressive array of research ex-
pertise. I have reviewed the proposal and it is clear that the proposed program has
been clearly thought out. I should add that UCSD is unusual in that it does not have
a statistics department. Having a doctoral program in data science will thus fill a
serious gap.
Sincerely,
Larry Wasserman
2
UF Informatics Institute E251 CISE Bldg
PO Box 118545
Gainesville, FL 32611-8545
352-294-3912
It is my pleasure to write a letter of support for the new Ph.D. program in Data Science proposed by the
Halicioglu Data Science Institute (HSDI) of UC San Diego. The demand for graduates in data science is
very high in all sectors of the economy –even during the pandemic--- and the need for Ph.D. level
researchers is subsequently becoming evident in industry, as well as academia.
I have been involved with the design of two data science programs during my career. The first is the
Masters in Data Science at the University of Michigan, while I was faculty there and launched in late
2015. It is jointly administered by the Departments of Computer Science and Statistics and the School
of Information and provides training in both core methodologies (programming, data structures, data
management, probability, statistical inference, data modeling, machine learning, optimization and
computational methods) and domain expertise through elective coursework. The degree also requires a
capstone course that requires students to do an end-to-end data science project involving
understanding the scientific question of interest, data collection and curation, modeling and
computation and finally communication of the results through a written report and an oral presentation.
The second data science program I have been involved with is the Data Science undergraduate major
at the University of Florida, a program jointly administered by the Departments of Statistics,
Mathematics and Computer Science. Its philosophy is analogous to the previous program and aims to
provide training in core methodologies, but also expose students to additional training in data science
problems in specific domains (e.g. social sciences, natural sciences, public health) through specific
thrusts.
The Ph.D. program proposal developed by HSDI is elaborate and well thought out, both in terms of
proposed coursework that covers both in depth state-of-the-art technical topics, but also provides
exposure to a wide range of topics, necessary to produce well rounded data scientists. I was
particularly impressed by the fact that the Ph.D. program will be open to students from various and
diverse backgrounds and it is designed to prepare them for success. To that end, students will attend
(as needed) certain carefully designed preparatory classes on core methods -computing, mathematics
and statistics. Hence, all incoming students will be brought to the same page by the end of first year,
including students who are admitted with little quantitative background.
UCSD has a large Department of Mathematics, but (surprisingly) it does not have a Department of
Statistics. The Statistics group within the UCSD Mathematics Department is very strong in
In summary, I believe this new Ph.D. is carefully designed to accommodate a wide range of students
and thus it represents an exciting development for UCSD. I believe it will be highly successful and I fully
support the proposal.
Sincerely,
George Michailidis
Founding Director, UF Informatics Institute
U Florida Research Foundation Professor
Professor of Statistics and Computer Science
University of Florida
BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ
I would like to commend UC San Diego and HDSI for their initiative to create Ph.D. program in
Data Sciences. This program spearheaded by a set of very talented and dedicated faculty will
undoubtably continue the meteoritic trajectory UC San Diego in on in an important and timely
area of data sciences. As a database faculty with keen interest in data sciences, I have been closely
monitoring UCSDs efforts led by HDSI over the past several years.
It is now well recognized that data science is destined to be the catalyst for disruptive innovations
in science and technology leading to unprecedented changes and improvements to all walks of
modern society. We live in the time of data revolution where machines, sensors, a variety of data
capture devices enable us to collect and monitor every aspect of our lives whether they be personal
experiences, health, social interactions, or our interactions with the engineered and physical
systems. The ability to automatically and seamlessly monitor social as well as physical worlds at
various spatial and temporal granularities has created unprecedented opportunities leading to major
data-centric innovations, new opportunities, new efficiencies, and new industries. Companies such
as Google and Yahoo! have used such data to provide improved search, better personalized
experiences of individuals on the internet, designed novel ways to monetizing and funding the new
ideas through placement of advertisement. Moving beyond internet companies, organizations such
as the health care providers, product companies, political activist groups, and news media have
developed tools to monitor public opinion feedback about their goods and services and use such
feedback to launch new product lines or new models. While the above emphasizes the role/impact
of data-centric approaches to industry, its role to the future of science and technology, new
discoveries whether they be in medicine, health sciences, oceanography, or cosmology, will be
even more profound.
UC San Diego was amongst the early schools to realize the central role data science was to play
to the future of education, and, now with its proposed Ph.D. program it is all set to lead the
academic community in creation of the foundational principles that form the core of data-driven
explorations, as well as, to expand the boundaries of knowledge and contribute to tools and
techniques that will expand the nascent field of Data Sciences. While I cannot emphasize enough
the timeliness of creating such a program and the very strong arguments the proposal makes as to
how such a program will help not just UC, San Diego but the academic community, what I found
truly exceptional was how well thought out are the operational plans to creating such a program.
In particular, the proposal clearly articulates the important role of multidisciplinary research and
education in such a program and, based on such a realization, it is noteworthy that the leadership
at HDSI systematically approached over 200 faculty (drawn from Engineering, Computer
Science, Physical Sciences, Arts, Humanities, Social Sciences, Medicine, and Health) who are
now are part of the affiliates program to create a cohesive integrative vision of Data Sciences
outlined in the proposal. The approach emphasizes not just principles, algorithms, mathematical
foundations, tools and technologies at the core of data-driven approaches, but provides an
integrative view that incorporates domain sciences to set a path forward for doctoral dissertations
based on interdisciplinary collaborations that open up opportunities for breakthrough in areas such
as physics, medicine, social sciences, etc. Indeed, the architects of the proposal have this view
firmly in their minds when they observe two unique aspects the proposed Ph.D. program compared
to existing data science efforts UC San Diego and other universities that are typically part of
Computer Sciences and Machine Learning. While existing efforts can be expected to advance
algorithmic solutions, machine learning, and data management principles that form the theoretical
underpinning of data science, a truly effective program (such as the one promoted by the proposal)
must seek involvement of researchers with multidisciplinary background that embrace an
interdisciplinary curriculum with faculty and students from disciplines interested in exploring data
sciences in order to advance science and technology using data-driven approaches. Indeed,
interactions of specialists and research at the cross-boundaries of disciplines is where the largest
advances in data sciences and benefit of data driven approaches are expected to be.
In looking through the details of the program articulated in the proposal, it is clear that the proposal
writers have done their homework and tried to strike a balance in terms of courses and requirements
that highlights quality and academic rigor while at the same time ensure success of the program
from the very beginning.. As is always the case, additional/new needs will emerge when the
program is launched. The proposal includes mechanisms necessary for such future adaptations
based on emerging needs. With the faculty talent associated with the proposal, both at the
leadership levels as well as excellent new hires associated with HDSI, and faculty affiliated with
HDSI, I have no doubt that once the Ph.D. program is launched, it will be monitored and improved
based on initial lessons learnt and the progress of the program will set the example for other
universities, including my own – UC, Irvine -- to follow on their footsteps.
It is for all these reasons that I very enthusiastically support the proposed UCSD effort.
The proposal is very well thought out. It addresses an emerging need and is the logical next step
for HDSI as it establishes itself to be a center of excellence and leadership in data sciences.
BERKELEY · DAVIS · IRVINE · LOS ANGELES · MERCED · RIVERSIDE · SAN DIEGO · SAN FRANCISCO SANTA BARBARA · SANTA CRUZ
Dear Rajesh,
This is to confirm that I support the listing of the graduate course “MAE247: Cooperative Control of Multi-Agent
Systems” as an elective for the HDSI Masters program under the Networks specialization and warmly welcome
qualified graduate students in the course.
Sincerely,
Jorge Cortés
Professor
UNIVERSITY OF CALIFORNIA, SAN DIEGO UCSD
BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ
Dear Colleagues:
This is to confirm that I support the listing of the above course as an elective for the HDSI
Masters program under the Networks specialization and welcome qualified graduate
students in this course.
Sincerely,
James H. Fowler
Professor
University of California, San Diego
[email protected]
UC San Diego Health April 29, 2020
Department of Medicine
9500 Gilman Drive MC-0688 Professor Rajesh Gupta, Director HDSI
La Jolla, CA 92093-0688
T: +1 858.822.4558
UC San Diego
F: +1 858.534.4246 [email protected]
[email protected]
idekerlab.ucsd.edu Re: BNFO 286 / MED 283: Network Biology and Biomedicine
May 4, 2020
To: Prof. Rajesh Gupta, Director HDSI
This is to confirm that I support the listing of the following courses that I teach as electives
for the HDSI MS program under the Bio specialization and welcome qualified graduate
students in this course.
Sincerely,
BERKELEY · DAVIS · IRVINE · LOS ANGELES · MERCED · RIVERSIDE · SAN DIEGO · SAN FRANCISCO SANTA BARBARA · SANTA CRUZ
May 3, 2020
Dear Colleagues,
To: Professor Rajesh Gupta, Director HDSI
This is to confirm that I support the listing of the course Math 277A: Topics in Computational and Applied Math:
Diffusion Geometry and Metric Graph Learning as an elective for the HDSI Masters program under the Networks
specialization, and welcome qualified graduate students in the course.
Sincerely yours,
MAY 1, 2020
Dear Colleagues:
To: Professor Rajesh Gupta, Director HDSI
Re: COGS 243: Statistical Inference and Data Analysis (4 units)
From: Angela Yu, Associate Professor
This is to confirm that I support the listing of the above course as a general elective for the HDSI Masters
program under Group B (Core Knowledge and Skills Areas), and welcome qualified graduate students in this
course.
Sincerely,
Angela Yu, PhD
Associate Professor
University of California, San Diego
BERKELEY • DAVIS • IRVINE • LOS ANGELES • MERCED • RIVERSIDE • SAN DIEGO • SAN FRANCISCO SANTA BARBARA • SANTA CRUZ
2020 May 01
This letter confirms my support for including the above class, COGS 280: Neural Oscillations, as an
elective for the Halıcıoğlu Data Science Institute Master program, Computational Neuroscience
Specialization Area.
I look forward to working with students from HDSI!
Sincerely,
UC San Diego
Department of Cognitive Science
Halıcıoğlu Data Science Institute
Neurosciences Graduate Program
Cc:
Professor Doug Nitz, Chair, Cognitive Science
Jennifer Morgan, MSO, HDSI
DEPARTMENT OF COGNITIVE SCIENCE, 0515
La Jolla, CA 92093 Fax: (858) 534-1128
This letter confirms my support for including the above class, COGS 225: Image Recognition, as an elective for the
Halıcıoğlu Data Science Institute Master program, Computational Neuroscience Specialization Area.
Sincerely,
Zhuowen Tu
Professor
Department of Cognitive Science,
Department of Computer Science and Engineering (affiliate)
University of California, San Diego
Email: [email protected]
Tel: +1-858-822-0908
Cc:
Professor Doug Nitz, Chair, Cognitive Science
Jennifer Morgan, MSO, HDSI
SIAVASH MIRARABBAYGI PHONE: (858) 822-6245
ASSISTANT PROFESSOR OF ELECTRICAL AND COMPUTER ENGINEERING E-MAIL: [email protected]
9500 GILMAN DRIVE, MC 0407
LA JOLLA, CA 92093
May 4, 2020
This is to confirm that I support the listing of the above course that I teach for the graduate
program in ECE at the Masters level as an elective for the HDSI Masters program under any
appropriate specialization and welcome qualified graduate students in this course.
Sincerely,
Siavash Mirarab
1
MASSIMO FRANCESCHETTI PHONE; (858) 822-2284
PROFESSOR OF ELECTRICAL ENGINEERING FAX: (858) 534-2486
9500 GILMAN DRIVE, MC 0407 E-MAIL: [email protected]
LA JOLLA, CALIFORNIA 92093-0407
This is to confirm that I support the listing of the above course that I teach for the machine
learning and data science graduate program in ECE at the Masters level as an elective for the
HDSI Masters program under the Networks specialization and welcome qualified graduate
students in this course.
With kindest regards
Massimo Franceschetti
1
Appendix C: Catalogue Copy Description [Draft]
Admission to the graduate program is done through the Graduate Division, UC San Diego. The
application deadline is in December. Admissions are always effective the following fall quarter.
For admission deadline and requirements, please refer to the departmental web page:
http://datascience.ucsd.edu.
Admission decisions for the MS and PhD programs are made separately. A current MS student
who wishes to enter the PhD program must submit a petition, including a new statement of
purpose and three new letters of recommendation, to the HDSI graduate admissions committee.
The goal of the doctoral program is to create leaders in the field of Data Science who will lay the
foundation and expand the boundaries of knowledge in the field.
Course Requirements
There are Foundation, Core, and Elective and Research requirements for the graduate
program. These course requirements are intended to ensure that students are exposed to (1)
fundamental concepts and tools (Foundation), (2) advanced, up-to-date views in topics central
to Data Science for all students (the Core requirement), and (3) a deep, current view of their
research or application are (the Elective requirement). Courses may not fulfill more than one
requirement.
The doctoral program is structured as a total of 52 units in courses grouped into foundational,
core, professional preparation and research experience areas as described below. Successful
completion of the program requires successful and timely completion of three examinations and
completion of a doctoral dissertation. Out of the 52 units, 48 units must be taken for letter grade
and at least 40 units must be using graduate-level courses. Out of the 12 courses, at least 10
must be graduate-level courses; at most two can be upper-level undergraduate courses. 36
units or 9 courses must be completed within six quarters from the start of the degree program.
Group A, Group B and Group C. Group A courses are introductory level courses taught at the
level of undergraduate senior or mezzanine courses. Group B are core graduate level courses
with prerequisites from Group A courses. Group C are advanced, specialized and free-standing
courses, often part of the required courses in the Data Science specialization of Graduate
Program in other departments. In all three groups, required courses are indicated as such; they
can not be substituted by other courses without exception approval from the graduate program
committee.
In addition, a doctoral student must select at least 2 out of the following 8 core courses
1. DSC 203: Data Visualization and Scalable Visual Analytics
2. DSC 204B: Big Data Analytics and Applications
3. DSC 242: High-dimensional Probability and Statistics
4. DSC 243: Advanced Optimization
5. DSC 244: Large-Scale Statistical Analysis
6. DSC 245: Introduction to Causal Inference
7. DSC 250: Advanced Data Mining
8. DSC 261: Responsible Data Science
Thus, together with Group A and Group C courses, doctoral students are required to take a
minimum of 5 courses for letter-grade credit. On the other end, students can satisfy all letter
grade course requirements except (satisfactory completion of professional preparation)
teaching, survival skills and research seminar courses. These students are expected to enroll
into individual research (DSC 298) in a section offered by the faculty advisor to meet residency
requirements and maintain graduate student standing during the period of dissertation research.
DSC 205, DSC 231, DSC 251, DSC 252, DSC 253, DSC 254, DSC 213
CSE 234, MATH 181 A-B-C, MATH 284, MATH 285, MATH 287A-B, COGS 243.
After a student successfully completes the preliminary assessment examination, in the next
annual review of the student (conducted annually in the Fall Quarter as a part of the Annual
Faculty Retreat), the GradCom of the HDSI Faculty Council assigns the academic advisor to
provide necessary updates to the GradCom and helps in setting up the doctoral dissertation
committee.
In order for the program to respond, a student requiring accommodation for disability may make
a request for accommodation upon submission of the student’s intent to apply to the Graduate
Program. Declaration of any disability information is not part of the admissions review process
and will not be a factor in admissions.
UC San Diego has organized the Halicioglu Data Science Institute (HDSI) as an academic unit
tasked with creation and operation of academic programs related to the field of Data Science,
broadly defined as the study of mathematical models, computational methods and analysis
tools for navigating, securing and understanding data, data-driven systems and decisions and
applying these skills to all areas of human enquiry, creativity and applications in natural, social
and engineered systems. Due to its breadth, Data Science is considered a transdisciplinary
subject, that is, spanning and overlapping many existing disciplines such as Mathematics,
Computer Science, Electrical Engineering as well as topical areas of Machine Learning,
Artificial Intelligence, Data/Cyber Infrastructure and Digital Humanities.
Serving as the hub for data science talent and programs, HDSI builds upon unique strengths of
UC San Diego. In particular, UC San Diego seeks institutional presence – both of UCSD in
Data Science as well as Data Science at UCSD – that benefits all existing units, departments,
programs and schools with potential to contribute to the academic discipline. As an academic
unit, HDSI mission consists of three core components: (a) train talent in data science at all
levels via courses, degree and professional training programs; (b) catalyze research in data
science via integrative research projects and initiatives; and (c) cultivate an ecosystem of data
science by engaging industry partners, non-profit and civic organizations with potential to
contribute to data science practice.
Institutionally, HDSI is designated an academic unit with responsibilities that include functions
carried out by traditional academic departments and schools. Accordingly, HDSI functions
under a divisional budget model and direct oversight by the Academic Senate and
administration. The Institute is also endowed by founding gift to ensure HDSI is the hub for
Data Science and is able to carry out its three-part mission by bringing the campus together.
Halicioglu Data Science Institute (HDSI) Bylaws
1. INTRODUCTION
Preamble
The authority for the departments to establish a form of departmental governance is
established by Academic Senate Bylaws (Part I), Title VI, Bylaw 55.A:
“According to the Standing Orders of The Regents, “... the several departments of the
University, with the approval of the President, shall determine their own form of
administrative organization.... No department shall be organized in such a way that would
deny to any of its non-emeritae/i faculty who are voting members of the Academic Senate, as
specified in Standing Order 105.1(a), the right to vote on substantial departmental
questions...”
The source documents, referred to as “Higher Authority” for the Bylaws, are the UCSD
PPM, the UC APM and the UC and UCSD Academic Senate Bylaws. Important, frequently
used policies that are specified in detail in these documents are described and referenced
here, along with Department policies that fill in additional details. In the case of
discrepancies, Higher Authority takes precedence.
Principles
Responsibilities These bylaws describe procedures for discharging faculty responsibilities in an
academic unit. Some responsibilities are assigned by the UCSD Policies and Procedures
Manual (PPM) specifically to the faculty as a whole, others to the Director, and some are
assigned to both. These bylaws describe responsibilities, the rules and guidelines for
performing a responsibility, and the selection of faculty members who will perform a role that
fulfills a responsibility. A basic principle that has been followed in the case of shared
responsibilities is for the Director to initiate decisions and actions, and for the faculty to
approve them or take remedial action. On all academic matters, including courses, degree
programs, senate faculty appointments, faculty approval as a group is a necessary requirement
without any exceptions.
Voting
The HDSI Faculty Council consists of all faculty members that are members of the
Academic Senate, and hold a full or partial (even 0%) appointment with HDSI; HDSI fellows
are also members of HDSI Faculty Council. Faculty members whose HDSI appointment is
solely as adjunct or affiliate are not part of the HDSI Faculty Council.
A faculty member is considered to be in residence for the purposes of a vote if the member is
not on leave from HDSI or university, nor on sabbatical away from the campus, in the quarter
during which the vote is taken.
The HDSI voting population consists of all members of the HDSI Faculty Council that are in
residence.
The means of taking a vote include: show of hands at a faculty meeting, secret written ballot
conducted at a faculty meeting, secret ballot circulated by mail, email or fax ballot. Votes
conducted by written ballots, email and or faxes are referred to as mail votes. Written and
non-written votes conducted at meetings are referred to as meeting votes.
A secret ballot vote must be used for a vote if it is requested by at least one member of the
voting population.
For non-secret votes, a fax or email vote may be used at the discretion of a voting faculty
member. For secret ballot votes, fax and email votes will be allowed for faculty members
who are not in residence, or who are unable to be on campus for any reasons. They are also
allowed for other members unless two faculty members request that email and fax ballots be
restricted for that secret vote. Email votes are counted by the chief administrative officer of
HDSI.
Proxy voting is not allowed. A faculty member who is eligible to vote during a faculty
meeting, but who will be absent for the meeting, may request a ballot in advance, which can
be submitted to and entered into the voting process by the supervisor for that vote.
All written ballots will allow a simple yes, no, or abstain choice on an issue and provide a
space for remarks. The members of a voting population who are in residence and who do
not vote are reported as abstain. The members of a voting population who are not in
residence and who do not vote are reported as absent. The remarks shall be reported along
with any report of the vote’s results, and must be included in any HDSI letter that
summarizes HDSI’s position.
The supervisor of a vote will report the results of all votes to the relevant voting population in
a timely manner. A vote shall not be considered completed until it has been reported to the
voting population.
Some issues will specify a required vote. If a required vote is part of some process, it must be
held. Others will involve a requested vote. A requested vote only occurs under certain
circumstances, which include a certain number or percentage of members of a voting
population requesting the vote. When requested, it must be held in a timely fashion. A
requested vote can be either a mail or meeting vote.
Super-approval: the number of yes votes greater than 2/3 of the size of the HDSI voting
population.
Regular approval: the number of yes votes greater than 50% of the size of the HDSI voting
population.
Simple approval: the number of yes votes greater than 50% of the number of votes cast.
As a general rule, super-approval is required for all required and requested votes that
result in the recruitment of faculty members, changes to the bylaws, creation of (or
significant changes to) descriptions of permanent committee responsibilities, revoking
senate committee assignments, and voting on personnel matters outside the default
voting rules. Regular approval is sufficient for the promotion and/or academic review of
HDSI faculty, and for voting on departmental actions required by policy. Simple
approval is typically used for minor issues in the context of a faculty meeting.
3. FACULTY MEETINGS
A faculty meeting, also referred to as faculty council meeting, is used to carry out a number
of different responsibilities. It may be used by the Director and other faculty members to
make announcements, to provide a forum for discussion of issues of importance, and to
facilitate decisions and actions for which there is no specific provision in the bylaws.
Scheduling
(i) Faculty meetings shall occur at least once a quarter, excluding the summer.
(ii) The Director must call a meeting within a reasonable amount of time if any three
members of the HDSI voting population petition the Director to do so on any issue.
Agenda
The Director shall announce the agenda in writing for each faculty meeting at least two
working days in advance. Urgent items can be added to the agenda at the last minute but
the case for the urgency has to be explicitly stated.
(i) Issues that were the reason for scheduling the meeting, as described above, will be
automatically placed on the agenda.
(ii) For all other issues, any two members of the HDSI voting population may request
that any issue be placed on the agenda for a meeting, and the Director may delay
placing that issue on the agenda for at most one meeting.
Motions
Any issue related to an agenda item may be brought to a vote if it is proposed and seconded
by two members of the HDSI voting population. The Director shall allow a reasonable
amount of time in each faculty meeting to consider faculty motions. If the issue is one for
which the Director has sole, specific authority, the result of the vote will be advisory, but
otherwise it will be binding. The level of required approval will depend on the issue.
Operation
The Director presides over faculty meetings, or delegates this duty to an Associate Director,
or another member of the HDSI voting population. If necessary, a meeting shall have both an
open and a closed part. All academic personnel matters shall be discussed only during closed
parts of the meetings that are restricted to relevant voting members of the HDSI voting
population, the Associate and/or Assistant Director(s), and the relevant academic specialists.
A faculty member may request that an item be discussed in a closed session which is
restricted to members of the relevant HDSI voting population.
Minutes
The Director shall ensure that the minutes for each faculty meeting are published within five
working days following the meeting. As a minimum, minutes shall include a record of each
motion voted on and the outcome of the vote. Faculty members have five working days to
submit corrections to the minutes. Minutes shall be stored in a safe place for at least five
years. Access to the minutes for the closed part of a meeting shall be restricted to faculty
members who were eligible to attend the closed part.
Responsibilities
The specific responsibilities for which the Director has an authority that cannot be delegated
are analogous to those of a Department Chair as described in university regulations (PPM
230-1.IV.B).
Department Consultation: The Director is expected to inform the HDSI faculty and seek
advice for major decisions made with respect to the above responsibilities. The Director shall
inform the faculty of staff organization and responsibilities, and seek advice on how these
arrangements can best be used to support faculty duties and responsibilities. The Director
must be receptive to questions and facilitate appropriate remedial procedures as required.
Associate Directors are appointed by the Chancellor at the recommendation of the Director.
Associate Directors must be tenured members of the Academic Senate.
Assistant Director appointments are made by the Director. Assistant Director(s) can be staff
members.
The Director and Associate/Assistant Director(s) can be re-appointed by the Chancellor for
an unlimited number of consecutive terms.
Associate Director appointments are for a period of three years, subject to annual review.
If a Director does not wish to be reappointed, then the new appointments procedure specified
in [PPM 230 2 III A] will be followed. The procedure requires that the tenured members of
the HDSI voting population meet to consider their recommendation of a new Director. In the
case where the recommendation is not unanimous, a vote will take place whose results will
be included as part of the recommendation to the Chancellor.
In the case where a Director wishes to be reappointed, the reappointments procedure specified
in [PPM 230 2 III B] will be followed. The procedure requires that the reappointment ad hoc
committee consult with faculty members. In the year of the reappointment, the tenured
members of the HDSI voting population will meet to determine their recommendation, which
will be forwarded to the committee. In the case where the recommendation is not unanimous, a
vote will take place whose results will be included as part of the recommendation to the
Chancellor.
Any two tenured faculty members may request a secret ballot vote of no confidence in the
Director. The faculty will meet to select by vote, a committee of two HDSI faculty members
who will administer the vote, and report it to the office of the Chancellor via the Dean and/or
Vice Chancellor.
5. STANDING COMMITTEES
Responsibilities
Certain responsibilities will be managed by permanent (standing) committees listed below in
no particular order:
• Space Planning & Collaboratories (SpaceCom)
• Computing and Cyber-Infrastructure (CI)
• Graduate Admissions & Scholarships (GradAdmin)
• Grad and Post-doctoral Programs (GradCom)
• Undergraduate Programs and Scholarships (UGS)
• Colloquia, DLS and Sponsorships (DLS)
• Equality, Diversity and Inclusion (EDI)
• Industry Liaison and Institutional Partnerships (ILIP)
• Recruiting (RecCom): multiple recruiting committees may be appointed to conduct
searches in broadly different areas.
Committee Selection
In order to facilitate effective governance, the Director chooses committee chairs and, in
consultation with the committee chair, chooses the members of the committee. All permanent
committee members must be members of the appropriate HDSI voting population. The
Director and/or Associate Directors may be a member or chair of one or more standing
committees.
In some cases, such as the Recruiting Committee, the policies and procedures are
specified by a higher authority, such as the UCSD PPM or the UC APM. Some of the
more important procedures and policies for this committee are summarized in the
Recruiting Section of these Bylaws.
In other cases, such as Undergraduate Program, Master's Program and Ph.D. program, the
significant policies and procedures are the responsibility of the faculty and must be
approved by a faculty vote. Examples include the procedure for selecting admissions to
the Ph.D. program.
Advisory Board
The chair may create an Advisory Board that consists of up to five distinguished
individuals that do not necessarily belong to the HDSI Faculty Council. The Advisory Board
does not have authority to make decisions. Its creation and operation must follow the
procedures specified in Sections 5.2, 5.3 and 5.4 above.
6. FACULTY APPOINTMENTS
UC Academic Bylaws assign the responsibility for faculty appointments to the tenured
members of a Department [VI.55.B.1]. A vote will be held among the tenured members of
the HDSI voting population to extend the responsibility for faculty appointments to all
members of the Faculty Council, tenured or not; this vote will require super-approval.
Responsibilities for faculty appointments include identifying, evaluating and voting on new
HDSI faculty members. All appointments, except part-time lecturers, require a vote with
super-approval. This includes research series and adjunct appointments. The Director has
sole responsibility for the appointment of part-time lecturers which can be delegated to the
Chair of the suitable committee. Appointments to visiting positions will either be voted on or
the HDSI voting population will delegate this responsibility to the Director or a duly
appointed committee.
Adjunct, visiting and research appointments will be processed individually and require a
vote with regular approval.
The operation of the HDSI faculty as a recruiting committee of the whole is authorized by
the PPM. Some of these responsibilities are delegated by the faculty to the Recruiting
committee. The Recruiting committee and its chair are chosen by the Director. They use the
following procedures.
Hiring Plan
At the beginning of the recruiting season, the Director, in consultation with the HDSI Faculty
Council, will formulate a hiring plan. This plan will be based on the expected number of
positions that will be available, the expected levels of appointments, and targeted areas. Any
specific strategy to be adopted to meet diversity goals will be part of this plan. Examples of
strategy include flexibility in target areas, seniority or any specific target(s) of opportunity
available that season.
Screening
The Recruiting committee, which will be made up from members of the HDSI
Faculty Council, will be responsible for initial screening of applicants for all open positions.
This consists of all positions except for part- time lecturers.
The Recruiting committee will evaluate candidates, solicit letters of reference, and
recommend candidates to be invited for a visit to HDSI. The committee will make every
effort to consider all candidates fairly and to use an appropriate comparison process. The
Recruiting committee will consider both the plan for hiring and the excellence of candidates
which may result in exceptions to the plan consistent with the strategy specified.
The Recruiting committee shall provide the Director its recommendation concerning which
candidates to invite for a visit. The Director will share the committee’s recommendation
with the HDSI faculty. All members of the appropriate HDSI voting population may
examine the applicant files and suggest to the Committee that specific additional candidates
be added to the committee’s recommendation list. In the case of disagreement, a vote will
be scheduled in a timely manner to allow new candidates to be considered along with the
others. Adding an applicant to the list will require, under the general rules for faculty
meetings, regular approval.
Institutional/Departmental Evaluation
After all candidates in a particular search or search specialization who were approved by the
screening process have completed their interviews with HDSI, the Director will call a meeting
of the relevant HDSI voting population to discuss the candidates. At this meeting, a vote will
be held in which each faculty member may vote yes or no for each candidate. Super-approval
is required for a candidate to be further considered. The result of Institutional Evaluation is a
list of faculty candidates recommended for making an offer of a faculty appointment in HDSI
and/or jointly with another department/school. The list may be unordered or partially ordered.
The actual offer and offer order will be specified by an offer strategy discussed next.
Offer Strategy
In the case of multiple candidates for one or more positions, the Director may formulate a
strategy for scheduling the approved candidates for a formal vote and offer. This will take into
account: the original plan for hiring, the approved candidates, the maximal number of offers to
have out at one time, balance between areas, financial cost considerations and the responsiveness
of the candidates to academic and EDI goals of the Institute and of the hiring plans. The Director
will present the strategy to the faculty for advice and consent. If requested by a member of the
voting population, the strategy will be put to a faculty vote, where it will require majority
approval. If no strategy is proposed, or no proposed strategy is approved, the candidates will be
offered positions in an order determined by the number of votes they received. In the case of ties,
the Director will make the decision.
Documentation
The documentation required for a proposed HDSI appointment is covered in PPM 230.20.IX.
Included in this documentation is the Departmental Recommendation Letter. This letter is
meant to summarize the Department position. The file, including this letter, shall be made
available for inspection by the HDSI voting faculty for a period of not less than five working
days before submission of the file. The Director shall announce to the members of the HDSI
voting population when the letter is available for inspection. If a faculty member objects to the
Department letter, or to the process that was used, the faculty member has the option of
including a letter of dissent, which may be signed by one or more faculty members. Dissenting
faculty members must submit their letter within the five-day inspection period. A file shall not
be submitted to the administration that has not had a five-day inspection period.
If desired, the Director may also include a confidential letter in a file, which can be used to
express the Director’s personal opinion.
If significant additional evidence for a file arrives after the file has been submitted, the
Director has the option of submitting the additional information or recalling the file
for additional faculty consideration and/or processing.
Endowed Chairs
The procedure for the awarding of an endowed Chair, including the required faculty consultation
is described in [PPM 230 8]. In the case where an endowed chair is to be used in faculty
recruiting, the candidate must satisfy both the procedure for faculty hiring and the endowed chair
appointment.
7. FACULTY PROMOTIONS
The relevant responsibilities in this category include: identification of HDSI faculty members
who may be eligible for normal or accelerated advancement, assembling a promotion file, and
carrying out a faculty vote; this includes HDSI faculty members with partial HDSI
appointments as long as they have a non-zero % HDSI appointment.
Responsibilities with regard faculty promotions are carried out by the Director, the relevant
voting population subset of the faculty, an ad hoc committee, and the individual faculty
member who is up for promotion.
The Director shall select the chair of an ad hoc committee, which will consist of the chair and
two additional faculty members. The Director shall select the members of the ad hoc
committee in consultation with the ad hoc chair. The committee members must be chosen
from the voting population for the candidate’s promotion.
In the case that a faculty member has a partial (but non-zero) HDSI appointment, HDSI shall
follow the above procedure and arrive at its own recommendation, even if the candidate’s
other department(s) are also conducting a separate review. In case that there is disagreement
in the assessment of the candidate’s file by the different academic units, the candidate will be
given the option to apply to move his/her FTE to the academic unit of his/her choice, before
the promotion recommendation is filed to the Campus.
Screening
In the academic year preceding a normal or proposed accelerated promotion, the Director
shall determine which faculty members are eligible for a normal merit promotion within
rank, or promotion to the next rank.
Any faculty member may request consideration for an accelerated promotion, either at the
time of what would be a normal promotion or at an intermediate time in a promotion cycle.
A faculty member who would be a member of the voting population for a proposed
accelerated promotion may also propose such a promotion for another faculty member.
Institutional/Departmental Evaluation
Faculty members who are eligible for a normal promotion, or who have been proposed
for an accelerated promotion, will be informed in a timely manner of the procedures for
preparing their promotion files, and the deadlines for submission of materials for which
they are responsible.
The ad hoc committee will, if necessary, choose references and oversee the assembly of a
candidate’s file. The Director has the final authority over the selection of references.
Objections to the choice by a faculty member may be made in a dissenting letter to be added
to the file before it is submitted.
It is required by the PPM that voting members have the opportunity to express their
opinions of the promotion case. There will be a pre-vote meeting scheduled for the voting
faculty that must be held far enough in advance of the vote to allow suggestions related to
the processing of the file to be implemented.
The chair of the ad hoc committee shall prepare a letter to the Director that details the
committee’s recommendation.
The relevant voting population shall vote on all promotions to a new rank, advancements
from Full Professor Step V to Step VI, advancements from Full Professor Step IX to
Above Scale (AS), and all accelerated advancements either within or to a new rank. In the
case of merit advancements that are not accelerated, for which the Director in consultation
with the ad hoc committee recommends approval, no faculty vote is required. In the case
of merit advancements for which the Director in consultation with the ad hoc committee
recommends disapproval, the candidate may request that their file be put forward for a
vote, along with the department's negative recommendation.
Voting Population
The voting populations for promotions of members of the Academic Senate will be based
on the default populations specified in UC Academic Senate Bylaw 55. They are defined in
the following table. All references to Professor in the table, unless designated otherwise,
refer to ladder rank Professor appointments who are members of the HDSI voting
population.
For voting purposes, all cases that involve the removal of the Acting modifier from the
title of a member of the Academic Senate shall be treated as promotions to the rank in
question. The table contains an entry for the voting population for promotion to Assistant
Professor. This may occur as the result of the removal of the "Acting" designation, or a
promotion from Instructor. There is no corresponding case for Lecturer PSOE. An
appointment to Senior Lecturer PSOE is considered to be comparable to that of an initial
appointment as Acting Full Professor, based on the salary restrictions. A subsequent
promotion from Senior Lecturer PSOE to Senior Lecturer SOE is covered by row 5 in the
table.
In the case of promotions for non-academic senate members, these promotions will be
determined by faculty members of the Academic Senate using a voting population that
parallels that for Academic Senate promotions. Such appointments include adjunct and
research series appointments and are referred to as “at a level equal to ...” in the table
below.
Assistant Professor, Assistant Full Professors, Associate Professors, Assistant Professors, Full
Teaching Professor Teaching Professor, Associate Teaching Professors
Teaching Professor (Lecturer Full Professors, Associate Professors, Full Teaching Professors,
SOE) Associate Teaching Professors (Senior Lecturers SOE)
Associate Professor Full Professors, Associate Professors, Associate Teaching
Associate Teaching Professor Professors
Associate Professor In
Residence
Documentation
The clarification and additional documentation details for appointments, as contained in
the Section for recruiting, shall also apply to promotions. In addition, a faculty member
who is a candidate for promotion may, after examination of the redacted promotion file,
include a letter in the file.
These are responsibilities that are not covered in the bylaws. They may be unanticipated,
infrequent, or minor in nature. These responsibilities may be carried out by the Director or
Associate Director(s), by temporary committees, or by the faculty as a whole.
The Director and the faculty shall have the authority to create a temporary committee
and choose its members. The committee is expected to be advisory.
The Director will have primary authority for these responsibilities. In substantial matters for
which the Director and/or faculty has authority, the Director may request a faculty vote.
In matters for which the faculty shares or has authority, the faculty may request a vote.
All votes will be approved by regular approval, except in matters that are covered in the
other sections of the bylaws, for which a higher level of approval is indicated.
The Director should keep the faculty informed of all important issues and decisions
taken with respect to the issues.
9. BYLAW CHANGES
Changes to, additions and deletions from the bylaws are carried out by the HDSI voting
population.
Suggestions for changes to the bylaws, and requests for a vote on the suggested changes,
may be made by any member of the HDSI voting population in accordance with the
regulations for faculty meetings. A vote is required on a suggested change, and such a
vote may be either a meeting vote or a mail vote. All such votes require super-approval.
Ilkay Altintas
San Diego Supercomputer Center Telephone: (858) 822-5453
9500 Gilman Drive Fax: (858) 822-3693
MC 0505 E-mail: [email protected]
La Jolla, CA 92093-0505
Professional Preparation
Middle East Technical University, B.S. Computer Engineering 1999
Ankara, Turkey
Middle East Technical University, M.S. Computer Engineering 2001
Ankara, Turkey
University of Amsterdam, Ph.D. Computational Science 2011
Amsterdam, Netherlands
Appointments
2018-. Fellow, Halicioglu Data Science Institute, UCSD
2016-. Associate Research Scientist, San Diego Supercomputer Center, UCSD
2016-. Faculty Co-Director, Master of Advanced Studies in Data Science and Engineering, UCSD
2015-. Chief Data Science Officer, San Diego Supercomputer Center (SDSC), UCSD
2015-. Division Director, Cyberinfrastructure Research, Education and Development, SDSC,
UCSD
2014-. Founder and Director, Workflows for Data Science Center of Excellence, SDSC, UCSD
2012-. Lecturer, Department of Computer Science and Engineering, UCSD
2012-2016 Assistant Research Scientist, San Diego Supercomputer Center, UCSD
2008-2014 Deputy Coordinator for Research, San Diego Supercomputer Center, UCSD
2004-2014 Founder and Director, Scientific Workflow Automation Technologies Laboratory, SDSC,
UCSD
2005-2007 Assistant Director, National Laboratory for Advanced Data Research (NLADR) - Data,
SDSC, UCSD
2001-2004 Research Programmer (P/A III), SDSC, UCSD
1999-2001 Research Assistant, Middle East Technical University (Ankara, TURKEY)
1
4. J. Wang, P. Korambath, I. Altintas, J. Davis, D. Crawl. Workflow as a Service in the Cloud:
Architecture and Scheduling Algorithms. In Proceedings of International Conference on
Computational Science (ICCS 2014), pages 546-556. DOI: 10.1016/j.procs.2014.05.049
5. B. Ludaescher, I. Altintas, C. Berkley, D. Higgins, E. Jaeger-Frank, M. Jones, E. Lee, J. Tao, Y.
Zhao, Scientific Workflow Management and the Kepler System, Concurrency and Computation:
Practice & Experience, 18(10), pp. 1039-1065, 2006. (Cited by 2124 in October 2019.)
Other Selected Products
6. I. Altintas, M.K. Anand, T. Vuong, S. Bowers, B. Ludaescher, P.M.A. Sloot, “A Data Model for
Analyzing User Collaborations in Workflow-Driven eScience,” The International Journal of
Computers and Their Applications (IJCA), 2011. Vol. 18, No. 3, p.160 – 180, Dec, 2011.
7. I. Altintas, A.W. Lin, J. Chen, C. Churas, M. Gujral, S. Sun, W. Li, R. Manansala, M. Sedova, J.S.
Grethe, and M. Ellisman, “CAMERA 2.0: A Data-centric Metagenomics Community Infrastructure
Driven by Scientific Workflows,” In Proceedings of the SWF 2010 at IEEE SERVICES '10, pp.
352-359, 2010. DOI=10.1109/SERVICES.2010.89
8. A. Goderis, C. Brooks, I. Altintas, E. Lee, and C. Goble, “Heterogeneous composition of models of
computation,” FGCS, vol. 25, no. 5, pp. 552–560, 2009.
9. I.Altintas, O. Barney, E. Jaeger-Frank, Provenance Collection Support in the Kepler Scientific
Workflow System, in Provenance and Annotation of Data, LNCS Volume 4145/2006, pages 118-
132, 2006. (Cited by 361 in October 2019.)
10. I. Altintas, C. Berkley, E. Jaeger, M. Jones, B. Ludaescher, and S. Mock, “Kepler: An extensible
system for design and execution of scientific workflows,” in Intl. Conference on Scientific and
Statistical Database Management (SSDBM), Greece, 2004. (Cited by 1103 in October 2019.)
2
ERY ARIAS-CASTRO
CONTACT INFORMATION
Department of Mathematics Voice: (858) 534-3590
University of California, San Diego Fax: (858) 534-5273
La Jolla, CA 92093-0112 (USA) E-mail: [email protected]
EDUCATION
Ph.D. in Statistics, Stanford University 2004
M.S. in Artificial Intelligence and Applied Mathematics, École Normale Supérieure de Cachan and Washington
University in Saint Louis 1998
B.S. in Mathematics, École Normale Supérieure de Cachan 1997
PROFESSIONAL
Professor, Mathematics, University of California, San Diego 2015–present
Associate Professor, Mathematics, University of California, San Diego 2011–2015
Assistant Professor, Mathematics, University of California, San Diego 2005–2011
Postdoctoral Fellow, Mathematical Sciences Research Institute Spring 2005
Postdoctoral Fellow, Institute for Pure and Applied Mathematics Fall 2004
MEMBERSHIPS
Institute of Mathematical Statistics
SERVICE
Committee work within UCSD: Academic Integrity Review Board [2012-13], Faculty mediator [2018-]
Associate Editor: Annals of Statistics [2013-19], Journal of the American Statistical Association [2014-], Jour-
nal of the Royal Statistical Society [2016-], Electronic Journal of Statistics [2015-], ESAIM - Probability and
Statistics [2015-], ALEA [2019-]
Area Chair: Conference on Learning Theory (COLT) [2016], Artificial Intelligence and Statistics (AISTATS)
[2017, 2019, 2020]
Guest Editor: Special Issue on Detection, IEEE Journal of Selected Topics in Signal Processing, 2012
Reviewer: IEEE Transactions on Image Processing, IEEE Transactions on Information Theory, IEEE Transac-
tions on Signal Processing, IEEE Transactions on Pattern Analysis and Machine Intelligence, Journal of Mathe-
matical Imaging and Vision, Annals of Statistics, Electronic Journal of Statistics, Journal of Multivariate Analy-
sis, Journal of the Royal Statistical Society (Series B), Annals of Applied Statistics, Statistical Science, Journal
of Nonparametric Statistics, Solar Energy, The Astrophysical Journal, Bernoulli, Journal of the American Sta-
tistical Association, ESAIM: Probability and Statistics, Journal of Machine Learning Research, Conference on
Learning Theory, Artificial Intelligence and Statistics, International Conference on Machine Learning, etc.
Conference Organization: Math+Stats+X, a conference in honor of David Donoho’s 60th birthday, 2017 (co-
organizer); IMS Meeting, 2014 (session organizer); Probability and Statistics Day, 2013 (co-organizer); Meeting
of New Researchers in Statistics and Probability, 2014 (board member); Meeting of New Researchers in Statistics
and Probability, 2013 (board member); Meeting of New Researchers in Statistics and Probability, 2012 (chair
and local chair); Quality and Productivity Research Conference, 2012 (session organizer)
Other: NSF grant panel [2012, 2016]
7. Briefly list the most important publications and presentations from the past five years –
title, co-authors if any, where published and/or presented, date of publication or
presentation.
• Mikhail Belkin, Daniel Hsu, Siyuan Ma, Soumik Mandal, Reconciling modern
machine learning practice and the bias-variance trade-off, PNAS, 2019, 116 (32).
• Chaoyue Liu, Libin Zhu, Mikhail Belkin, Toward a theory of optimization for
over-parameterized systems of non-linear equations: the lessons of deep learning,
arxiv, 2020.
• Chaoyue Liu, Mikhail Belkin, Accelerating Stochastic Training for Over-
parametrized Learning, ICLR 2020.
• Mikhail Belkin, Daniel Hsu, Ji Xu, Two models of double descent for weak
features, arxiv 2019.
• Mikhail Belkin, Siyuan Ma, Soumik Mandal, To understand deep learning we
need to understand kernel learning, ICML 2018.
• Siyuan Ma, Mikhail Belkin, Kernel machines that adapt to GPUs for effective
large batch training, SysML 2019.
• Mikhail Belkin, Daniel Hsu, Partha Mitra, Overfitting or perfect fitting? Risk
bounds for classification and regression rules that interpolate, Neural Inf. Proc.
Systems (NeurIPS) 2018.
• Siyuan Ma, Raef Bassily, Mikhail Belkin, The power of interpolation:
understanding the effectiveness of SGD in modern over-parametrized learning,
ICML 2018.
Jelena Bradic
Education
Academic Experience
Stanford University, Statistics Department,
Visiting Associate Professor, 2019-present
University of California San Diego, Halicioglu Data Science Institute,
Associate Professor (with tenure), 2019-present
University of California San Diego, Mathematics Department,
Associate Professor (with tenure), 2018-present
University of California San Diego, Mathematics Department,
Assistant Professor (on maternity leave 2011/2012), 2011-2018
Professional Associations
Institute of Mathematical Statistics, American Statistical Association, Bernoulli Society
Publications
Bradic, Jelena and Jianqing Fan and Zhu, Yinchu (2020), Testability of high-dimensional linear
models with non-sparse structures, to appear at the Annals of Statistics
Bradic, Jelena and Claekens, Gerda and Gueuning,Thomas (2020), Testing fixed effects in
high-dimensional misspecified linear mixed models, Journal of the American Statistical Association:
Theory & Methods, 115 (529), 1-16.
Zhu, Yinchu and Bradic, Jelena (2018), Linear hypothesis testing in dense high-dimensional linear
models, Journal of American Statistical Association:Theory & Methods, 113(524), 1583-1600.
Li, Alexander Hanbo and Bradic, Jelena (2018), Boosting in the presence of outliers: classification
with non- convex loss functions, Journal of American Statistical Association: Theory & Methods, 512
(113), 660-674.
Ryzhov, Ilya and Han, Bin and Bradic, Jelena (2016), Cultivating Disaster Donors: A Case
Application of Scalable Analytics for Big Data, Management Science, 62(3), 849-866.
ALEXANDER C. CLONINGER
[email protected]
EMPLOYMENT
UNIVERSITY OF CALIFORNIA, SAN DIEGO La Jolla, CA
Assistant Professor July 2017-present
Mathematics Department 2017-present
Halicioğlu Data Science Institute 2020-present
EDUCATION
UNIVERSITY OF MARYLAND College Park, MD
Ph.D. in Applied Math and Scientific Computing Program May 2014
Adviser: Wojciech Czaja and John J. Benedetto
Thesis: Exploiting Data-Dependent Structure for Improving Sensor Acquisition and Integration
WORK EXPERIENCE
INSTITUTE FOR DEFENSE ANALYSIS Bowie, MD
Center for Computing Sciences 6/2010 - 9/2013
SERVICE ACTIVITIES
NSF Panel Reviewer Apr. 2020
Organizer of Mini-symposium “Distance Metrics High Dim. Point Clouds”, ICIAM Jul. 2019
Organizer of Mini-symposium “High Dim. Machine Learning”, DSCO SF Institute Mar. 2019
Organizer of Panel “AI and DNN in Radiation Oncology”, ASTRO Oct. 2018
Organizer of Undergraduate Math Colloquium Talks Fall 2018
Founding Faculty HDSI Institute, UCSD Mar. 2018
Organizer of Mini-symposium “Laplacians and Applications”, SIAM PDE Conference Dec. 2017
Organizer of Applied Math Seminar at Yale University 2014-2016
SELECTED PUBLICATIONS
A Potapov, I Colbert, K Kreutz-Delgado, A Cloninger, S Das. “PT-MMD: A Novel Statistical Frame-
work for the Evaluation of Generative Systems.” ASILOMAR, 2019.
X. Cheng, A. Cloninger, and R.R. Coifman. “Two Sample Statistics Based on Anisotropic Kernels.”
Information and Inference, 2019.
A. Cloninger, B. Roy, C. Riley, and H. Krumholz. “People Mover’s Distance: Class level geometry using
fast pairwise data adaptive transportation costs.” Applied and Computational Harmonic Analysis,
2019.
G. Mishne, U. Shaham, A. Cloninger, I. Cohen. “Diffusion Nets.” Applied and Computational Har-
monic Analysis, 2018.
A Cloninger, S Steinerberger. “On the dual geometry of Laplacian eigenfunctions.” Experimental
Mathematics, 2018.
J. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, Y. Kluger. “Deep Survival: A Deep Cox
Proportional Hazards Network.” BMC medical research methodology, 2018.
A. Cloninger. “A Note on Markov Normalized Magnetic Eigenmaps.” Applied and Computational
Harmonic Analysis, 2017.
A. Cloninger, W. Czaja, and T. Doster. “The Pre-image Problem for Laplacian Eigenmaps Utilizing
L1 Regularization with Applications to Data Fusion.” Inverse Problems, 2017.
A. Cloninger, S. Steinerberger. “Spectral Echolocation via the Wave Embedding.” Applied and Com-
putational Harmonic Analysis, 2017.
U. Shaham, A. Cloninger, R. Coifman. “Provable approximation properties for deep neural networks.”
Applied and Computational Harmonic Analysis, 2017.
A. Cloninger. “Prediction models for graph-linked data with localized regression.” SPIE: Wavelets
and Sparsity XVII, 2017.
Nicholas S Downing, Alexander Cloninger, Arjun K Venkatesh, Angela Hsieh, Elizabeth E Drye, Ronald
R Coifman, Harlan M Krumholz. “Describing the performance of US hospitals by applying big data
analytics.” PloS One, 2017.
A Cloninger, S Steinerberger. “On suprema of autoconvolutions with an application to Sidon sets.”
Proceedings of the American Mathematical Society, 2017.
A. Cloninger, R. Coifman, N. Downing, H. Krumholz. “Bigeometric Organization with Deep Nets.”
Applied and Computational Harmonic Analysis, 2016.
A. Cloninger, W. Czaja. “Eigenvector Localization on Data-Dependent Graphs.” SampTA, 2015.
A. Hafftka, H. Celik, A. Cloninger, W. Czaja, R. Spencer. “2D Sparse Sampling Algorithm for ND
Fredholm Equations with Applications to NMR Relaxometry.” SampTA, 2015.
R. Bai, P. Basser, A. Cloninger, W. Czaja. “Efficient 2D MRI Relaxometry Using Compressed Sens-
ing.” Journal of Magnetic Resonance, 2015.
N Jamil, X Chen, A Cloninger. “Hildreth’s algorithm with applications to soft constraints for user
interface layout.” Journal of Computational and Applied Mathematics, 2015.
SELECTED TALKS
Kernel approaches in global statistical distances, local measure detection, and active learning. Collo-
quium talk, Claremont Graduate University, Claremont, CA, February 5, 2020.
Dual Geometry of Laplacian Eigenfunctions with Applications to Graph Wavelets, Cuts, and Visual-
ization. Jubilee of Fourier Analysis and Applications, College Park, MD, September 21, 2019.
Manifold Learning with Diffusion Variational Autoencoders. Approximation Theory 16, Vanderbilt
University, Nashville, TN, May 21, 2019.
Fast Detection of Inter-Group Differences in Images. Statistical, Variational, and Learning Techniques
in Image Analysis, Joint Math Meetings, Baltimore, MD, January 19, 2019.
New Developments in AI/Deep Learning. Artificial Intelligence and Deep Learning Within Radiation
Oncology, 2018 ASTRO Meeting, San Antonio, TX, October 23, 2018.
Fast Point Cloud Distances and Multi-Sample Testing. Applied Harmonic Analysis and Data Process-
ing, Mathematisches Forschungsinstitut Oberwolfach, Oberwolfach, Germany, March 29, 2018.
Deep Learning Function Approximation on Manifolds. Applied Harmonic Analysis, Massive Data Sets,
Machine Learning, and Signal Processing Workshop, Casa Matematica Oaxaca, October 18, 2016
Defining Distances Between High-Dimensional Point Clouds. Symposium on Advanced Computational
Methods in Biomedical Imaging, National Institutes of Health, October 6, 2016
BIOGRAPHICAL SKETCH
NAME: de Sa, Virginia R
Completion
DEGREE FIELD OF STUDY
EDUCATION Date
1994-1995 Postdoctoral Fellow, Computer Science, University of Toronto (mentor: Geoff Hinton)
1996-2001 Postdoctoral Fellow, Physiology, University of California at San Francisco (mentors: Michael
Merzenich, Michael Stryker)
2001-2008 Assistant Professor, Cognitive Science, University of California at San Diego
2008-2018 Associate Professor, Cognitive Science, University of California at San Diego
2018-present Professor, Cognitive Science, University of California at San Diego
2019-present Associate Director, Halıcıoğlu Data Science Institute, University of California at San Diego
1988 Medal in Mathematics and Engineering (88), Queen's University [highest (in ME dept) standing]
1988 Professional Engineer's Gold Medal (88), Queen's University [highest standing in final year]
1988 Governor-General's Medal, Queen's University [highest standing throughout 4 years of Eng]
1988-1992 Natural Sciences and Engineering Research Council of Canada (NSERC) 1967 Science and
Engineering Scholarship [one of 47 given to graduating students across Canada]
1994-1995 Natural Sciences and Engineering Research Council of Canada (NSERC) Postdoctoral Fellowship
1996-1998 Sloan Postdoctoral Fellowship
2001-2007 NSF CAREER Award
2003-2004 UCSD Faculty Career Development Program Award
2007-2008 UCSD Chancellor’s Collaboratories award
2012-2013, 2016-2017, 2018-2019 Kavli Innovative Research Award
2016-2017, 2017-2018 UCSD Frontiers of Innovation Scholars Program
2019-2020 Kavli Symposium Inspired Proposal Award
1999, 2000 Advanced Tutor, EU Advanced Course in Computational Neuroscience, Trieste, Italy
2001-2002 Co-chair for the Neural Information Processing Systems workshops
2002 Member of NSF, Knowledge and Cognitive Systems, grant review panel
2003 Member of NSF, Machine Learning, grant review panel
2008 Member of NSF, Robust Intelligence, grant review panel
2009-present Institutional Review Board for Neurosky
2013,2016 Member of NSF, Human Centered Computing, grant review panel
2007,2014,2017,2018,2019,2020 Program Committee (Neural Information Processing Systems (NIPS))
2016,2017,2018,2019,2020 Program Committee (Cognitive Science Conference)
2018,2019 Program Committee (International Conference on Learning Representations (ICLR))
2019 Program Committee (International Joint Conference on Artificial Intelligence (IJCAI))
2020 NSF grant Panel Review CISE
Noh, E., Liao, K., Mollison, M.V., Curran, T., & de Sa, V.R. (2018). Single-trial EEG analysis predicts memory
retrieval and reveals source-dependent differences. Frontiers in Human Neuroscience 12:258. doi:
10.3389/fnhum.2018.00258
Mousavi, M., & de Sa, V.R. (2019). Temporally Adaptive Common Spatial Patterns with Deep Convolutional
Neural Networks. Proceedings of the 41st Annual International Conference of the IEEE EMBS Engineering in
Medicine and Biology Society (EMBC'19)
Xu, X. Huang, J. & de Sa, V.R. (2019) Pain Evaluation in Video using Extended Multitask Learning from
Multidimensional Measurements. Proceedings of Machine Learning Research (Machine Learning for Health
ML4H at NeurIPS 2019).
Liao, K., Mollison, M., Curran, T., and de Sa, V.R. (2018). Single-Trial EEG Predicts Memory Retrieval Using
Leave-One-Subject-Out Classification. First International Workshop on Machine Learning for EEG Signal
Processing (MLESP 2018).
Noh, E., Liao, K., Mollison, M.V., Curran, T., & de Sa, V.R. (2018). Single-trial EEG analysis predicts memory
retrieval and reveals source-dependent differences. Frontiers in Human Neuroscience 12:258. doi:
10.3389/fnhum.2018.00258
Mousavi, M. & de Sa, V.R. (2019) Spatio-temporal analysis of error-related brain activity in active and passive
brain-computer interfaces. Brain-Computer Interfaces. https://doi.org/10.1080/2326263X.2019.1671040
Mousavi, M., Koerner, A.S., Zhang, Q., Noh, E., & de Sa, V.R. (2017) Improving motor imagery BCI with user
response to feedback. Brain-Computer Interfaces. doi 10.1080/2326263X.2017.1303253
de Sa, V.R. Using insights from cortical architectures for neural networks. Invited presentation at Cell Press
Beijing Conference: AI and the Brain. Nov 6-7, 2019 Sunrise Kempinski Hotel, Beijing China
Tang, S. & de Sa, V.R. (2019). Exploiting Invertible Decoders for Unsupervised Sentence Representation
Learning. ACL 2019.
Eldridge | 2
Publications Conference Papers
Unperturbed: spectral analysis beyond Davis-Kahan.
J. Eldridge, M. Belkin, Y. Wang
Algorithmic Learning Theory (ALT), 2018.
Journal Articles
Robust features for the automatic identification of autism spectrum dis-
order in children.
J. Eldridge, A.E. Lane, M. Belkin, S. Dennis.
Journal of Neurodevelopmental Disorders, 2014.
Workshop Abstracts
Graphons, mergeons, and so on!
J. Eldridge, M. Belkin, Y. Wang.
Abstract, talk. Workshop on Geometry and Machine Learning,
2016.
Technical Reports
Denali: A tool for visualizing scalar functions as landscape metaphors.
J. Eldridge, M. Belkin, Y. Wang.
http://denali.cse.ohio-state.edu/tech_report.pdf
Eldridge | 3
Reviewing Theoretical Computer Science, special issue on Algorithmic Learning
Theory.
IEEE Transactions on Pattern Analysis and Machine Learning.
Talks Invited
Tulane CS Colloquium, November 2017.
Air Force Research Laboratory ATR Summer Seminar, June 2017.
Information Theory and Applications, Graduation Day Talk, Feb. 2017.
Italian Institute of Technology Machine Learning Seminar, Dec. 2016.
Conference
NIPS 2016, full oral. Video: https://youtu.be/en_qtNAtkUs
COLT 2015, best student paper. Video: https://goo.gl/c7M42J
Seminar
Consistent Clustering. AI Seminar, OSU, November 2017.
Graphons, mergeons, and so on! Topology, Geometry, and Data Analysis
(TGDA) seminar, OSU, November 2016.
Graphons, mergeons, and so on!. AI Seminar, OSU, November 2016.
What do we seek in a hierarchical clustering?, AI Seminar, OSU, April
2015.
Eldridge | 4
Name: Aaron Fraenkel
Academic Experience:
● UCSD, Assistant Teaching Professor, Chair Undergraduate Program, 2018-2020
● Boston College, Visiting Assistant Professor, 2012-2014
● Pennsylvania State University, Chowla Research Assistant Professor, 2011-2012
Non-Academic Experience:
● ID Analytics, Senior Data Scientist, Fraud Modeling and Identity Resolution, 2014-2016
● Amazon.com, Senior Machine Learning Scientist, Security and Abuse, 2016-2018
EDUCATION
Indian Institute of Technology, Kanpur Electrical Engineering B. Tech, 1984.
UC Berkeley, Berkeley, CA Electrical Engineering M.S., 1986.
& Computer Science
Stanford University, Stanford, CA Electrical Engineering Ph. D., 1994.
ACADEMIC APPOINTMENTS
2018-now Distinguished Professor, Computer Science & Engineering, UC San Diego
2018-now Director, Halicioglu Data Science Institute, UC San Diego
2003-2018 Qualcomm Chair Professor, Computer Science & Eng., UC San Diego
2006 Visiting Professor, EPFL, Lausanne, Switzerland
2005 Visiting Professor, Electrical Engineering, Stanford University
2002-2003 Professor of Information and Computer Science, UC Irvine
1998-2002 Associate Professor of Information and Computer Science, UC Irvine
1996-1997 Assistant Professor of Information and Computer Science, UC Irvine
1994-1996 Assistant Professor of Computer Science, U. Illinois, Urbana-Champaign.
1986-1993 Senior Design Engineer, Intel Corporation, Santa Clara, California.
MAJOR Organizer: Associate Editor for VLDB‘21, XLDB‘18, SIGMOD‘18 DEEM Work-
SERVICE shop, SIGKDD‘18 CMI Workshop, SoCal DB Day 2018
PC Member: SIGMOD ‘17–‘20, VLDB ‘18–‘21, MLSys ‘19–‘20, ICDE‘17, SIG-
MOD‘17 Demo and SRC, HotCloud‘16, SIGMOD‘16 URC
Reviewer: ACM TODS 2017 and 2015, IEEE TKDE 2014
Proposal Reviewer/Panelist:
NSF SBIR/STTR Phase II 2020, DOE Solar Office 2020, NSF HDR Data Science
Corps 2019
Key Department/University Service:
2019–20: HDSI Faculty Recruiting Committee; 2019–20: CSE Bylaws Committee;
2018: UCSD LGBTQIA+ Undergraduate Scholarships Committee; 2017–20: CSE
MS Committee; 2017: UCSD SDSC Sustainability Committee; 2016–17: CSE PhD
Admissions Committee
Key Contributions to Diversity:
2019–20: Represented UCSD CSE at NSF Workshop on Departmental BPC Plans;
co-authored CSE’s Departmental BPC plan
2019: Organized a panel on LGBTQ+ community resources on UCSD on CSE Cele-
bration of Diversity Day
2017–18: Represented UCSD and CSE twice at oSTEM National Conference
2017: Co-proposed/created a new UCSD CSE PhD diversity-focused scholarship
2017–: Active member of UCSD CSE DEI Committee
2017–: Actively involved with UCSD LGBT Resource Center and oSTEM activities
(Q & A panels, talks, scholarships, etc.) as an out faculty member
E-mail: [email protected]
Website: https://sites.google.com/view/yianma
EXPERIENCE
Assistance Professor, Halicioğlu Data Science Institute, University of California, San Diego July 2020 –
Visiting Faculty, Google Brain Health and Google Research August 2019 – July 2020
Post-doctoral Fellow, Electrical Engineering and Computer Sciences September 2017 – August 2019
University of California, Berkeley, CA, USA
Advisor: Michael I. Jordan
EDUCATION
Ph.D. of Science, Applied Mathematics June 2017
University of Washington, Seattle, WA, USA
Advisors: Emily B. Fox and Hong Qian
Bachelor of Engineering, Computer Science and Engineering (honor thesis) June 2012
Shanghai Jiao Tong University, Shanghai, China
SELECTED PUBLICATIONS
• Yi-An Ma, Yuansi Chen, Chi Jin, Nicolas Flammarion, Michael I. Jordan. Sampling can be faster than
optimization, Proc. Natl. Acad. Sci., 2019.
• Chris Aicher, Yi-An Ma, Nick Foti, Emily B. Fox. Stochastic gradient MCMC methods for state space
models, SIAM J. Math. Data Sci., 2019.
• Yi-An Ma, Emily B. Fox, Tianqi Chen, Lei Wu. Irreversible samplers from jump and continuous
Markov processes, Stat. Comput. (2018).
• Niladri S. Chatterji, Nicolas Flammarion, Yi-An Ma, Peter L. Bartlett, Michael I. Jordan. On
the theory of variance reduction for stochastic gradient Monte Carlo, in Proceedings of International
Conference on Machine Learning 35 (ICML 2018).
• Yi-An Ma, Nick Foti, Emily B. Fox. Stochastic gradient MCMC methods for hidden Markov models,
in Proceedings of International Conference on Machine Learning 34 (ICML 2017).
• Xiaojie Qiu, Andrew Hill, Jonathan Packer, Dejun Lin, Yi-An Ma, Cole Trapnell. Single-cell mRNA
quantification and differential analysis with Census, Nature Methods (2017).
• Yi-An Ma, Tianqi Chen, Emily B. Fox. A complete recipe for stochastic gradient MCMC, in Advances
in Neural Information Processing Systems 28 (NIPS 2015).
SELECTED TALKS
• Bridging MCMC and Optimization
– Statistics Department Seminar, Mathematics Department, University of California, Davis; April 2020.
– Mathematics Department Seminar, Duke University; Sept. 2019.
– Invited Talk at Microsoft Research New England, Boston, MA; Aug. 2019.
– Invited Talk at Google Research, San Francisco, CA; July 2019.
– Statistics Department Seminar, University of Warwick; Feb. 2019.
– Machine Learning Department Seminar, Carnegie Mellon University; Feb. 2019.
– Halicioğlu Data Science Institute (HDSI) Seminar, University of California, San Diego; Feb. 2019.
– Statistics Department Seminar, Rutgers University; Feb. 2019.
– Department of Statistics and Data Science Seminar, Yale University; Feb. 2019.
– Statistics Department Seminar, Eberly College of Science, Penn State; Jan. 2019.
– Courant Institute and Center for Data Science Seminar, New York University; Jan. 2019.
– Stewart School of Industrial and Systems Engineering (ISyE) Seminar, Georgia Tech; Jan. 2019.
SELECTED AWARDS
• 2017 Stein fellowship (declined for other opportunities).
• Best undergraduate thesis “Lyapunov functions for oscillatory and chaotic dynamical systems” awarded
by the computer science department, Shanghai Jiao Tong University.
SERVICES
• Journals: reviewer for Journal of the American Statistical Association (JASA), Biometrika, Bernoulli,
Journal of Machine Learning Research (JMLR), Statistics and Computing.
• Conferences: reviewer for Advances in Neural Information Processing Systems (NeurIPS/NIPS), Inter-
national Conference on Machine Learning (ICML), Annual Conference on Learning Theory (COLT);
served on the program committee of AAAI conference on Artificial Intelligence.
• Secretary for the University of Washington Chapter of Society for Industrial and Applied Mathematics
(SIAM), 2015-2016.
PATENT
• Patent: Real Time Supervise Machine against Traffic Law Violation awarded by the State Intellectual
Property of the Peoples Republic of China in 2012 (Patent No.: 201120076406.X).
1. Gal Mishne
2. Education –
BSc, Electrical Engineering, 2009
BSc, Physics, 2009
PhD, Electrical Engineering 2017
3. Academic experience –
Yale University, Gibbs Assistant Professor, 2017-2019
UC San Diego, Assistant Professor, 2019-present
4. Non-academic experience –
Rafael Advanced Defense Systems Ltd., Image processing engineer, 2008-2014
5. Awards
AMS-Simons Travel Grant, 2018
SIAM Early Career Travel Award, 2017
Wolf Foundation Award for Ph.D. students, 2016
6. Service activities
• Hiring committee – HDSI/Neurobiology, 2020
• PhD program committee – HDSI, 2020
• Reviewer – Cosyne 2020; ICML 2020; Neural Computation; Involve; IEEE
Transactions on Image Processing; IEEE ICASSP; Elsevier Information Sciences;
Journal of Mathematical Imaging and Vision; Neurons, Behavior, Data analysis,
and Theory; Advances in Computational Mathematics,
• DeepMath 2020 – Co-organizer
7. Briefly list the most important publications from the past five years
• X. Cheng and G. Mishne, ``Spectral embedding norm: To look deep into the spectrum of
the graph Laplacian", accepted to SIAM Imaging Sciences.
• G. C. Linderman, G. Mishne}, A. Jaffe, Y. Kluger and S. Steinerberger,``Randomized
nearest neighbor graphs, giant components and applications in data science", accepted to
Advances in Applied Probability.
• S. Gigante, A. S. Charles, S. Krishnaswamy and G. Mishne, ``Visualizing the PHATE of
Neural Networks", NeurIPS-2019, December 2019.
• G. Mishne, Eric C. Chi and R. R. Coifman, ``Co-manifold learning with missing data",
ICML 2019, June 2019.
• X. Cheng, G. Mishne, and S. Steinerberger, ``The geometry of nodal sets and outlier
detection", Journal of Number Theory, vol. 185, pp 48--64, 2018.
• G. Mishne, R. Talmon, I. Cohen, Y. Kluger and R. R. Coifman, ``Data-driven tree
transforms and metrics", IEEE Transactions on Signal and Information Processing over
Networks, vol. 4, no. 3, pp. 451--466, Sept. 2018
• G. Mishne, U. Shaham, A. Cloninger and I. Cohen, ``Diffusion Nets", Applied and
Computational Harmonic Analysis, Aug. 2017.
• G. Mishne, R. Talmon, R. Meir, J. Schiller, M. Lavzin, U. Dubin and R. R. Coifman,
``Hierarchical coupled-geometry analysis for neuronal structure and activity pattern
discovery", IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 7, pp.
1238-1253, Oct. 2016.
Education
Academic Positions
Professional Associations
2020 Co-Principal investigator, NIH grant T32 MH122376-01 [304118-00001] `Advanced data
analytics training for behavioral and social sciences research'.
2019 Principal investigator, NSF grant DMS 19-14556, `Computer-intensive methods for
nonparametric analysis of dependent data'.
2013 Awarded the Econometric Theory Multa Scripsit Award.
2013 Fellow of the Institute of Advanced Study, Technische Universitaet Muenchen.
2012 Co-Principal investigator, NSF grant DMS 12-23137, `ATD: Detection of Clusters in
Spatial Data and Images'.
2012 Awarded the Tjalling C. Koopmans Econometric Theory Prize 2009-2011 for the paper
"Higher-Order Accurate, Positive Semi-Definite Estimation of Large-Sample Covariance and
Spectral Density Matrices," Econometric Theory, Vol. 27, No. 4, August 2011, pp. 703-744.
2011 Fellowship from the John Simon Guggenheim Memorial Foundation for the project
"Model-free Prediction and Regression".
2011 Elected Fellow of the American Statistical Association (ASA). Citation reads: ``For path-
breaking research in nonparametric statistics, for outstanding applications of this methodology
to time series analysis, resampling, subsampling, and function estimation; and for exemplary
leadership and service to the profession, especially for conference organization and prolific
editorial work.''
2004 Elected Fellow of the Institute of Mathematical Statistics (IMS). Citation reads: ``Prof.
Politis received the award for innovative methodology in the analysis of time series and models
of spatial dependence, as well as groundbreaking theory in nonparametric statistics".
Professional Service
Editorial Work
Publications
Co-author of over 100 journal papers (see list at: www.math.ucsd.edu/~politis/) and of the books:
--SUBSAMPLING, D.N. Politis, J.P. Romano, M. Wolf, Springer, New York, 1999.
--MODEL-FREE PREDICTION AND REGRESSION: A TRANSFORMATION-BASED
APPROACH TO INFERENCE, D.N. Politis, Springer, New York, 2015.
--TIME SERIES: A FIRST COURSE WITH BOOTSTRAP STARTER, T.S. McElroy and
D.N. Politis, Chapman and Hall/CRC Press, 2020.
1. Name: Rayan Saab
2. Education:
3. Academic experience:
• 2017–present: Associate Professor, Mathematics, The University of California,
San Diego (UCSD), San Diego, CA
• 2013–2017: Assistant Professor, Mathematics, The University of California, San
Diego (UCSD), San Diego, CA
• 2011–2013: Visiting Assistant Professor, Mathematics, Duke University,
Durham, NC
• 2010–2011: Postdoctoral Researcher, Mathematics, The University of British
Columbia, Vancouver, Canada;
9. Briefly list the most important publications and presentations from the past five years –
title, co-authors if any, where published and/or presented, date of publication or
presentation
• T. Huynh, R. Saab, “Fast binary embeddings, and quantized compressed sensing
with structured matrices", Communications on Pure and Applied Mathematics,
Vol. 73, no. 1, pp 110 – 149, 2020.
• D. Needell, R. Saab, T. Woolf, “Simple Classification using Binary Data", The
Journal of Machine Learning Research, Vol. 19, no. 1, pp. 2487 – 2516, 2018.
• R. Saab, R.Wang, Ö. Yılmaz, “Quantization of compressive samples with stable
and robust recovery", Applied and Computational Harmonic Analysis, vol. 44,
pages 123–143, 2018.
• K. Knudson, R. Saab, R. Ward, “One-bit compressive sensing with norm
estimation", IEEE Transactions on Information Theory, vol. 62, no. 5, pages
2748–2758, 2016.
• M. A. Iwen, B. Preskitt, R. Saab, A. Viswanathan, “Phase retrieval from local
measurements: Improved robustness via eigenvector-based angular
synchronization," Applied and Computational Harmonic Analysis, Vol. 48, no. 1,
pp. 415 – 444, 2020.
• R. Saab, R. Wang and Ö. Yılmaz, “From Compressed Sensing to Compressed
Bit-Streams: Practical Encoders, Tractable Decoders," IEEE Transactions on
Information Theory, Vol. 64, no. 9, pp. 6098-6114, 2018.
Service Activities
1. Academic Advising: 9 postdoctoral scholars, 17 PhD students (4 as main thesis adviser + 5
as PhD committee member + 8 other), 4 MSc students, 2 BSc student.
2. University Service:
UC San Diego: HDI MS Program Cmte. (AY 2019-20); HDSI PhD Program Cmte. (AY
2019-20); HDSI Faculty Council (AY 2019-20); HDSI Advisory Board (AY 2018-19);
Biostatistics Executive Committee (AY 2018-20); Biostatistics PhD Program Admissions
Cmte. (AY 2016-20); Biostatistics PhD Program Education Cmte. (AY 2016-20); BS in
Public Health Steering Cmte. (AY 2018-19); Biostatistics Hiring Cmte. (AY 2016-17).
NC State University: Written Preliminary Exam Committee (AY 2014-15); Hiring
Committee (AY 2014-15); Basic Exam Committee (AY 2013-14).
Harvard SPH: High Dimensional Data Seminar co-Chair (AY 2007-12); Qualifying Exam
Cmte. (AY 2008-11); Newsletter Cmte. (AY 2007-09); Degree Program Cmte. (AY
2007-08); Diversity Cmte. (AY 2007-08); Seminar Cmte. (AY 2006-07).
1
3. Reviewer services:
a. Associate Editor, Electronic Journal of Statistics (2016-Present); Econometrics and
Statistics, Special issue on Neuroimaging (2018 - 2020)
b. Journal reviewer: 77 times for 17 statistical journals; 29 times for 16 scientific journals.
c. Grant Reviewer: Emerging Imaging Technologies in Neuroscience (EITN) Study
Section, NIH (2020); Dutch Research Council (NWO) (2019); Israeli-Qu\'{e}bec
Collaboration in Medical Bio-Imaging (2017); Statistics Program, Division of
Mathematical Sciences, NSF (2017); Israeli-Quebec Collaboration in Medical Bio-
Imaging (2017); Biostatistical Methods and Research Design Study Section (BMRD),
NIH (2015); Network for Translational Research: Optical Imaging, NCI/NIH (2008), In-
vivo Cellular and Molecular Imaging Centers, NCI/NIH (2007).
4. Conference organization: Comp. and Methodological Statistics, Pisa, Italy (2018), London,
UK (2017) and Sevilla, Spain (2016); Joint Statistical Meetings, Montreal, QC (2013) and
Vancouver, BC (2010); International Biometric Society, Fort Collins, CO (2012) and San
Luis Obispo (2011); Harvard Cancer Center (2009); Radcliffe Institute for Advanced Study
(2008).
2
Please use the following format for the faculty vitae (2 pages maximum in Times New Roman 12
point type)
1. Name
Jingbo Shang
9. Briefly list the most important publications and presentations from the past five years –
title, co-authors if any, where published and/or presented, date of publication or
presentation
There are about 40 publications in the past 5 years. The complete list can be found on my
website (https://shangjingbo1226.github.io/publications/) or my Google scholar page
(https://scholar.google.com/citations?user=0SkFI4MAAAAJ&hl=en). Here are a few examples:
1. “Contextualized Weak Supervision for Text Classification,” D. Mekala and J. Shang. Annual
Meeting of the Association for Computational Linguistics (ACL), 2020.
2. “Empower Entity Set Expansion via Language Model Probing,” Y. Zhang, J. Shen, J. Shang
and J. Han. Annual Meeting of the Association for Computational Linguistics (ACL), 2020.
3. “NetTaxo: Automated Topic Taxonomy Construction from Large-Scale Text-Rich Network,”
J. Shang, X. Zhang, L. Liu, S. Li and J. Han. The Web Conference (WWW), 2020.
4. “Integrating Local Context and Global Cohesiveness for Open Information Extraction,” Q.
Zhu, X. Ren, J. Shang, Y. Zhang, A. EI-Kishky and J. Han. International Conference on Web
Search and Data Mining (WSDM), 2019.
5. “CrossWeigh: Training Named Entity Tagger from Imperfect Annotations,” Z. Wang, J.
Shang, L. Liu, L. Lu, J. Liu and J. Han. ACL SIGDAT Empirical Methods in Natural
Language Processing (EMNLP), 2019.
6. “Learning Named Entity Tagger using Domain-Specific Dictionary,” J. Shang, L. Liu, X. Gu,
X. Ren, T. Ren and J. Han. ACL SIGDAT Empirical Methods in Natural Language
Processing (EMNLP), 2018
7. “Empower Sequence Labeling with Task-Aware Neural Language Model,” L. Liu, J. Shang,
F. Xu, X. Ren, H. Gui, J. Peng and J. Han. AAAI Conference on Artificial Intelligence
(AAAI), 2018.
8. “Automated Phrase Mining from Massive Text Corpora,” J. Shang, J. Liu, M. Jiang, X. Ren,
C. Voss and J. Han. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2018.
9. “MetaPAD: Meta Pattern Discovery from Massive Text Corpora,” M. Jiang, J. Shang, T.
Cassidy, X. Ren, L. Kaplan, T. Hanratty and J. Han. ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD), 2017.
2. EDUCATION.
Ph.D., Neurobiology and Behavior Program, University of Washington.
B.S., Biological Sciences, University of California at Santa Cruz.
3. POSITION.
Assistant Professor in Bioengineering and the Halicioğlu Data Science Institute, University of California 3/2019-
NIH K99 fellow, UC Berkeley, 2017-2019
NIH F23 fellow, UC Berkeley, 2014-2017
Postdoctoral researcher, Kriegsfeld lab, UC Berkeley, 2013
5. CURRENT MEMBERSHIPS
Society for Behavioral Neuroendocrinology, 2014-
Endocrine Society, 2012-
Society for Research in Biological Rhythms, 2011-
Society for Neuroscience, 2007-
Inaugural student president, Center for Sensorimotor Neural Engineering, UW, 2011-2012
President, Neurobiology and Behavior Community Outreach, UW, 2010-2012
Organizing member of Neurobiology and Behavior Community Outreach, UW, 2006-2012
7. SERVICE
Science in Society Outreach
Vocal proponent of data rights in biomedicine, and frequent public speaker / interviewee about the role of technology
advancing public health, and the importance of the individual in this process.
Recent publications include Wired, The Economist, BBC Business Daily, San Francisco Chronicle, Readers’ Digest, and many
others globally.
Education outreach
Over 100 K-12 “What do Brains do” school visits and lab tours;
Special emphasis on schools in lower socioeconomic status neighborhoods;
Organizer, “Brain Days Fair” at University of Washington through the Neurobiology and Behavior Community Outreach;
Created lesson plans for neuroscience classroom activities with over 10,000 downloads;
My most important work of the last 5 years has been leveraging time series data into insights about health outcomes. In
animal models these include the ability to detect pregnancy within hours of conception, and to predict pregnancy outcome
within the first day of pregnancy; to develop within-individual tracking of Huntington’s disease progression; to
demonstrate that female animals show less variability than males – not more - across ovulatory cycles when faster
biological rhythms are included in analyses; and that disruptions to circadian rhythms during pregnancy cause autism-like
outcomes in resultant offspring.
I have transitioned this work into human populations, contributing a number of important insights in the last few years.
These include: determining wearable device design for inferring circadian phase from body temperature; inferring internal
hormonal concentrations from time series features of wearable sensor data; prediction of student academic performance
from sleep and circadian metrics; co-discovery of circadian and ultradian rhythms within the gastric system; seasonal
variation in cardiac output across large populations.
My work is now focused on developing early insights into COVID-19 and other illnesses from wearable device data.
Manuscripts in preparation include detection of fever, prediction of illness onset, and classification of illness variants from
physiological time series features. This work is being carried out with participation from 50,000 wearable device users,
and is an important template for expanded capture of “natural experiments” for developing tools for individuals at public-
health scale using distributed hardware infrastructure of personal tracking devices.
Together, these works focus on unlocking the novel potential of continuous physiological data generated from wearable
devices and related technologies, with a focus on modernizing women’s health, education outcomes, and long-term care
and monitoring in illness.
Academic Conference & Workshop Organization: NeurIPS Workshop Selection Program Commit
Service tee (2020), FAT* – CoChair for the Computer Science Track (2020), FAT/ML – Workshop
Organizer & Webmaster (2017, 2018), INFORMS Session Organizer (20132019)
Grant Reviewing: National Science Foundation Panelist (2019)
Journal Reviewing: Management Science, IEEE Transactions on Signal Processing, Statis
tical Analysis & Data Mining, Artificial Intelligence, Information Sciences, Minds & Ma
chines, Big Data, Epidemiology, Nature Digital Medicine, Artificial Intelligence & Law,
Journal of Quantitative Criminology, IBM Journal of Research & Development.
Conference Program Committee: NeurIPS (2018, 2019, 2020), ICML (2019), ICLR (2020),
FAT* (2018, 2019), AISTATS (2019), AAAI (2019), HCOMP (2019), UAI (2018), ISIT (2018)
Page 1 of 2
Advising PhD Students
Jennifer Chien, UCSD CSE 2020 – Present
Jamelle WatsonDaniels, Harvard SEAS 2020 – Present
Eric Mibuari, Harvard SEAS 2018 – Present
Hao Wang, Harvard SEAS 2017 – Present
MS Students
Haorang Zhang, University of Toronto 2020 – Present
Vinith Suriyakumar, University of Toronto 2020 – Present
Alexander Spangher, Columbia University 2017 – 2019
Undergraduates
Tynan Seltzer, Harvard SEAS 2018 – Present
Charles Marx, Haverford College Summer 2019
Jiaming Zeng, MIT 2014 – 2016
Selected 1. Predictive
Predictive Multiplicity
Multiplicity in
in Classification
Classification.
Classification
Papers Charles Marx, Flavio Calmon, Berk Ustun. International Conference on Machine Learning, 2020
2. Learning
Learning Optimized
Optimized Risk
Risk Scores
Scores.
Scores
Berk Ustun, Cynthia Rudin. Journal of Machine Learning Research, 2019
3. Fairness
Fairness without
without Harm:
Harm: Decoupled
Decoupled Classifiers
Classifiers with
with Preference
Preference Guarantees
Guarantees
Guarantees.
Berk Ustun, Yang Liu, David C. Parkes. International Conference on Machine Learning, 2019
4. Repairing without
Repairing without Retraining:
Retraining: Avoiding
Avoiding Disparate
Disparate Impact
Impact with
with Counterfactual
Counterfactual Distributions
Distributions
Distributions.
Hao Wang, Berk Ustun, Flavio Calmon. International Conference on Machine Learning, 2019
5. Actionable
Actionable Recourse
Recourse in
in Linear
Linear Classification
Classification.
Classification
Berk Ustun, Alexander Spangher, Yang Liu. ACM Conference on Fairness, Accountability, and
Transparency, 2019
6. The World
The World Health
Health Organization
Organization Adult
Adult ADHD
ADHD SelfReport
SelfReport Screening
Screening Scale
Scale for
for DSM5
DSM5
DSM5.
Berk Ustun, Lenard Adler, Cynthia Rudin, Stephen Faraone, Thomas Spencer, Patricia Berglund,
Michael Gruber, Ronald C. Kessler. JAMA Psychiatry, 2017
7. Association
Association of
of an
an EEGBased
EEGBased Risk
Risk Score
Score With
With Seizure
Seizure Probability
Probability in
in Hospitalized
Hospitalized Patients
Patients.
Patients
Aaron Struck, Berk Ustun, Andres Ruiz, Jong Woo Lee, Suzette LaRoche, Lawrence Hirsch, Emily
Gilmore, Jan Vlachy, Hiba A. Haider, Cynthia Rudin, Brandon Westover. JAMA Neurology, 2017
8. Interpretable
Interpretable Classification
Classification Models
Models for
for Recidivism
Recidivism Prediction
Prediction.
Prediction
Jiaming Zeng, Berk Ustun, Cynthia Rudin. JRSS Series A, 2016
9. Supersparse
Supersparse Linear
Linear Integer
Integer Models
Models for
for Optimized
Optimized Medical
Medical Scoring
Scoring Systems
Systems.
Systems
Berk Ustun, Cynthia Rudin. Machine Learning, 2015
1. Name
Tsui-Wei (Lily) Weng
9. Briefly list the most important publications and presentations from the past five years –
title, co-authors if any, where published and/or presented, date of publication or
presentation
Period of
Institution, firm or Rank, title, or
employment Location
organization position
From - To:
2020-Present Univ. California, San Diego California Professor
2018-2020 Foundation Research Ohio Co-director of CoP
Community of Practice (CoP)
at Translational Data
Analytics Institute @ OSU
2017-2020 Ohio State University Ohio Professor
2011-2017 Ohio State University Ohio Associate Professor
2012-2013 Institute of Science and Austria Visiting Professor
Technology
2005-2011 Ohio State University Ohio Assistant Professor
Steering Committee:
(06/2020-present) Computational Geometry Steering Committee
Associate Editors:
(2019-present) SIAM Journal on Computing (SICOMP)
(2020-present) Computational Geometry: Theory and Applications
(2010-present) Journal of Computational Geometry
Program committee (Co-)Chair:
(2016) Joint STOC/SoCG Workshop Day
(2019) 35th Symposium on Computational Geometry (SoCG)
Co-organizers:
TGDA@OSU Conference (2016, 2018), TGDA@OSU summer school (2018), NSF CBMS
Conference on Elastic Functional and Shape Data Analysis (2019), AMW workshop on
Women in Computational Topology at JMM (Joint Mathematics Meeting) (2019).
9. Briefly list the most important publications and presentations from the past five years
1
d. Research Interests.
Probability, stochastic processes and their applications; multidimensional reflected diffusions;
stochastic differential (delay) equations; measure-valued processes; fluid and diffusion approx-
imations for complex networks; analysis and control of stochastic networks with applications
to operations management, telecommunications and systems biology.
e. Five Selected Recent Publications.
1. Yingjia Fu and Ruth J. Williams, Stability of a subcritical fluid model for fair bandwidth
sharing with general file size distributions, Stochastic Systems, in press.
2. J. A. Mulvany, A. L. Puha and R. J. Williams, Asymptotic behavior of a critical fluid
model for a multiclass processor sharing queue via relative entropy, Queueing Systems, 93
(2019), 351-397.
3. S. C. Leite and R. J. Williams, A constrained Langevin approximation for chemical reaction
networks, Annals of Applied Probability, 29 (2019), 1541-1608.
4. R. J. Williams, Stochastic Processing Networks, invited article for Annual Review of
Statistics and Its Application, 3 (2016), 323-345.
5. D. Lipshutz and R. J. Williams, Existence, uniqueness and stability of slowly oscillating
periodic solutions for delay differential equations with non-negativity constraints, SIAM J.
on Mathematical Analysis, 47 (2015), 4467–4535.
f. External Professional Activities (5 illustrative examples).
(i) Member of the Council of the National Academy of Sciences, 2019-2022.
(ii) Member of the selection committee for the biennial INFORMS Impact Prize, 2018, 2020
(committee chair in 2020).
(iii) Member of Governance Board of MATRIX (Australian Mathematics Inst.), 2015-pres.
(iv) Associate Editor for Applied Probability Trust Journals: Journal of Applied Probability
and Advances in Applied Probability (2016-present).
(v) President, Institute of Mathematical Statistics, 2012.
g. UC San Diego Recent Service Activities (5 illustrative examples).
(i) Founding Faculty and Council Member, Halicioğlu Data Science Institute, 2018-present
(ii) Faculty Advisory Committee for Moore Science Communication grant, 2017–19.
(iii) Physical Sciences Task Force on the Status of Women in the Physical Sciences, 2017–18.
(iv) Council Member, Mathematics Department, 2016–2020 (one of three Council members
elected to represent the faculty).
(v) Chair, Mathematics Department Hiring Committee, 2016-2018.
2
1. Name: Arya Mazumdar
6. Briefly list the most important publications and presentations from the past five years –
title, co-authors if any, where published and/or presented, date of publication
a. A Mazumdar, S Pal, “Semisupervised clustering by queries and locally encodable
source coding," IEEE Transactions on Information Theory, vol. 67, no. 2, 2021.
Preliminary Version in NeurIPS 2017 (Spotlight paper).
a. V Gandikota, D Kane, R Maity, A Mazumdar, “vqSGD: vector quantized
stochastic gradient descent," AISTATS , 2021.
b. A Ghosh, R Maity, A Mazumdar, “Distributed Newton can communicate less and
resist byzantine workers," NeurIPS, 2020.
c. V Gandikota, A Mazumdar, S Pal, “Recovery of sparse linear classifiers from
mixture of responses," NeurIPS, 2020.
d. S Ubaru, S Dash, A Mazumdar, O Gunluk, “Multilabel classification by
hierarchical partitioning and data-dependent grouping," NeurIPS, 2020.
e. R McKenna, R Maity, A Mazumdar, G Miklau, “A workload-adaptive mechanism
for linear queries under local differential privacy," VLDB, 2020.
f. Arya Mazumdar, Soumyabrata Pal, “Recovery of sparse signals from a mixture of
linear samples," International Conference on Machine Learning (ICML), 2020.
g. A Krishnamurthy, A Mazumdar, A McGregor, S Pal, “Algebraic and Analytic
Approaches for Parameter Learning in Mixture Models," ALT, 2020.
h. A Krishnamurthy, A Mazumdar, A McGregor, S Pal, “Sample complexity of
learning mixture of sparse linear regressions," NeurIPS, 2019.
i. W Huleihel, A Mazumdar, M Medard, S Pal, “Same-cluster querying for
overlapping clusters," NeurIPS, 2019.
j. L Flodin, V Gandikota, A Mazumdar, “Superset technique for approximate
recovery in one-bit compressed sensing," NeurIPS, 2019.
k. A Mazumdar, A McGregor, S Vorotnikova, “Storage capacity as information
theoretic vertex cover and the index coding rate,"IEEE Tran on Information
Theory, vol. 65, no. 9, 2019.
l. A Mazumdar, B Saha, “Clustering with noisy queries ," NIPS, 2017.
m. A Mazumdar, B Saha, “Query complexity of clustering with side information,"
NIPS, 2017.
n. A Barg, A Mazumdar, “Group testing schemes from codes and designs," IEEE
Transactions on Information Theory, vol. 63, no. 11, Nov 2017.
o. S Ubaru, A Mazumdar, Y Saad, “Low rank approximation and decomposition of
large matrices using error correcting codes," IEEE Tran on Information Theory,
vol. 63, no. 9, Sep 2017.
p. A Mazumdar, “Nonadaptive group testing with random set of defectives," IEEE
Transactions on Information Theory, vol. 62, no. 12, Dec 2016.
Selected Guan, J, Ryali, C, & Yu, A J (2018). Computational modeling of social face perception in
Pubs humans: Leveraging the active appearance model. bioRxiv, https://doi.org/10.1101/360776.
Ryali, C, & Yu, A J (2018). Beauty-in-averageness and its contextual modulations: A Bayesian
statistical account. Adv. in Neural Information Processing Systems, 32.
Guo, D, Yu, A J (2018). Why so gloomy? A Bayesian explanation of human pessimism bias
in the multi-armed bandit task. Adv. in Neural Information Processing Systems, 32.
Ryali, C, Gautam, R, & Yu, A J (2018). Demystifying excessively volatile human learning: A
Bayesian persistent prior and a neural approximation. Adv. in Neural Information Processing
Systems, 32.
Wang, W, Hu, S, Ide, J S, Zhornitsky, S, Zhang, S, & Yu, A J, Li, C-S R (2018). Motor
preparation disrupts proactive control in the stop signal task. Frontiers in Human Neuroscience,
doi: 10.3389/fnhum.2018.00151.
Cogliati Dezza, I, Yu, A J, Cleeremans, A, Alexander, W (2017). Learning the value of infor-
mation and reward over time when solving exploration-exploitation problems. Nature Scientific
Reports, 7:16919.
Harlé, K M, Guo, D, Zhang, S, Paulus, M, Yu, A J (2017). Anhedonia and anxiety underlying
depressive symptomatology have distinct effects on reward-based decision-making. PLoS ONE,
12(10):e0186473.
Harlé, K M, Zhang, S, Ma, N, Yu*, A J, & Paulus, M P* (2016). Reduced neural recruitment
for Bayesian adjustment of inhibitory control in methamphetamine dependence. Biological
Psychology: Cog. Neurosci. and Neuroimaging, 1: 48-459. *Co-senior authors.
Li L, Malave, V, Song, A, & Yu, A J (2016). Extracting Human Face Similarity Judgments:
Pairs or Triplets? Proceedings of the Cognitive Science Society Conference.
Ma, N & Yu, A J (2016). Inseparability of Go and Stop in Inhibitory Control: Go Stimulus
Discriminability Affects Stopping Behavior. Frontiers in Decision Neuroscience, 10 (54).
Harlé, K M, Zhang, S, Schiff, M, Mackey, S, Paulus, M P, & Yu, A J (2015). Altered statistical
learning and decision-making in methamphetamine dependence: Evidence from a two-armed
bandit task. Frontiers in Psychology, 6 (1910).
Harlé, K M, Steward, J L, Zhang, S, Tapert, S, Paulus, M P, & Yu, A J (2015). Bayesian
neural adjustment of inhibitory control predicts emergence of problem stimulant use. Brain,
138:3413-26.
Ma, N & Yu, A J (2015). Statistical Learning and Adaptive Decision-Making Underlie Human
Response Time Variability in Inhibitory Control. Frontiers in Psychology, 6(1046).
Ide, J S, Hu, S, Zhang, S, Yu, A J & Li, C-S R (2015). Impaired Bayesian learning for cognitive
control in cocaine dependence. Drug and Alcohol Dependence, 151: 220-227.
Ahmad, S & Yu, A J (2015). A rational model for individual differences in preference choice.
Proceedings of the Cognitive Science Society Conference.
Zhang, S, Song, M, & Yu, A J (2015). Bayesian hierarchical model of local-global processing:
Visual crowding as a case-study. Proceedings of the Cognitive Science Society Conference.
Yu, A J & Huang, H (2014). Maximizing masquerading as matching: Statistical learning and
decision-making in choice behavior. Decision, 1 (4): 275-287.
Harlé, K M, Shenoy, P, Steward, J L, Tapert, S, Yu*, A J, Paulus*, M P (2014). Altered
neural processing of the need to stop in young adults at risk for stimulus dependence. Journal
of Neuroscience, 34(13): 4567-4580. *Co-senior authors.
Ahmad, S, Huang, H, & Yu, A J (2013). Context-sensitivity in human active sensing. Adv. in
Neural Information Processing Systems 26.
Zhang, S & Yu, A J (2013). Forgetful Bayes and myopic planning: Human learning and
decision-making in a bandit setting. Adv. in Neural Information Processing Systems 26.
Shenoy, P & Yu, A J (2013). A rational account of contextual effects in preference choice:
What makes for a bargain? Cognitive Science Society Conference.
Dayanik, S & Yu, A J (2013). Reward-rate maximization in sequential identification under a
stochastic deadline. SIAM Journal on Control and Optimization, 51 (4), 2922-2948.
Yu, A J (2013). Bayesian Models of Attention. Chapter in Handbook of Attention, Eds. S.
Kastner & K. Nobre. Oxford, UK: Oxford University Press.
Ide, J S, Shenoy, P, Yu*, A J, & Li*, C-R (2013). Bayesian prediction and evaluation in the
anterior cingulate cortex. Journal of Neuroscience, 33: 2039-2047. *Co-senior authors.
Shenoy, P & Yu, A J (2012). Rational impatience in perceptual decision-making: a Bayesian
account of discrepancy between two-alternative forced choice and Go/NoGo behavior. Adv. in
Neural Information Processing Systems 25.
Yu, A J (2012). Change is in the eye of the beholder. Nature Neuroscience 15: 933-935.
Shenoy, P, Rao, R, & Yu, A J (2010). A rational decision making framework for inhibitory
control. Adv. in Neural Information Processing Systems 23: 2146-2154.
Yu, A J & Cohen, J D (2009). Sequential effects: Superstition or rational behavior? Adv. in
Neural Information Processing Systems 21: 1873-1880.
Yu, A J, Dayan, P, & Cohen J D (2008). Dynamics of attentional selection under conflict:
Toward a rational Bayesian account. J. Exp. Psy.: Human Perc. and Perf., 35: 700-717.
Frazier, P & Yu, A J (2008). Sequential hypothesis testing under stochastic deadlines. Adv.
in Neural Information Processing Systems 20: 465-72.
Yu, A J. (2007) Adaptive behavior: Humans act as Bayesian learners. Current Biology 17.
Cohen, J D, McClure, S M, & Yu, A J (2007). Should I stay or should I go? How the human
brain manages the tradeoff between exploitation and exploration. Philosophical Transactions
of the Royal Society B: Biological Sciences 362: 933-942.
Yu, A J (2007). Optimal change-detection and spiking neurons. Adv. in Neural Information
Processing Systems 19: 1545-52. MIT Press, Cambridge, MA.
Dayan, P & Yu, A J (2006). Norepinephrine and neural interrupts. Adv. in Neural Information
Processing Systems 18: 243-50.
Yu, A J & Dayan, P (2005). Uncertainty, neuromodulation, and attention, Neuron, 46: 681-
692.
Yu, A J & Dayan, P (2005). Inference, attention, and decision in a Bayesian neural architec-
ture. In Adv. in Neural Information Processing Systems 17: 1577-84.
Yu, A J & Dayan, P (2003). Expected and unexpected uncertainty: ACh and NE in the
neocortex. In Adv. in Neural Information Processing Systems 15.
Yu, A J & Dayan, P (2002). Acetylcholine in cortical inference. Neural Networks, 15 (4/5/6):
719-730.
Dayan, P & Yu, A J (2002). Acetylcholine, uncertainty, and cortical inference. Adv. in Neural
Information Processing Systems 14.
Other Journal editor: Decision, Frontiers in Behavioral Neurosci., Frontiers in Human Neurosci.
Activities Journal Reviewer: Adaptive Behavior, Brain Research, Cognition, Cognitive Psychology,
Computational Brain & Behaivor, Current Biology, Decision, eLife, European Journal of Neuro-
science, Frontiers in Computational Neuroscience, Frontiers in Decision Neuroscience, Frontiers
in Behavioral Neuroscience, Frontiers in Human Neuroscience, Journal of Autonomous Agents
and Multi-Agent Systems, Journal of Neuroscience, Journal of Theoretical Biology, Memory &
Cognition, Nature Communications, Nature Human Behavior, Nature Reviews, Neural Com-
putation, Neuron, PLoS Computational Biology, PLoS ONE, PNAS, Psychological Review,
Psychonomic Bulletin & Review, Psychopharmacology, Science
Conference Reviewer/Organizer: Cogsci, Cosyne, IJCAI, RLDM, NIPS (AC, SAC)
Zhiting Hu
Phone: +1 (412) 320-0630
Halıcıoğlu Data Science Institute
Email: [email protected]
University of California San Diego
[email protected]
La Jolla, CA 92093
Homepage: http://zhiting.ucsd.edu/
Education
2020 Ph.D., Machine Learning Department, Carnegie Mellon University
Advisor: Eric P. Xing
2016 M.S., Language Technologies Institute, Carnegie Mellon University
Advisior: Eric P. Xing
2014 B.S., Computer Science, Peking University, China
Academic Experience
starting 2021.9 Assistant Professor, Halıcıoğlu Data Science Institute, UC San Diego
Non-Academic Experience
2020 – 2021.9 Full-time Visiting Academic, Amazon Alexa AI
2017 – 2020 Full-time Research Scientist, Petuum Inc.
Service Activities
Co-organizer, NeurIPS 2019 Workshop on Learning with Rich Experience: Integration of Learning Paradigms
Zhiting Hu 2
Co-organizer, CVPR 2019 Workshop on Towards Causal, Explainable and Universal Medical Visual Diagnosis
Co-organizer, ICML 2019 Workshop on Learning and Reasoning with Graph-Structured Representations
Co-organizer, ICML 2018 Workshop on Theoretical Foundations and Applications of Deep Generative Models
Reviewer, NeurIPS, ICML, ACL, EMNLP, NAACL, CVPR, AAAI, KDD, WWW, JMLR, MLJ, TPAMI, etc
Outstanding reviewer, EMNLP 2020
Publications (Selected)
Google Scholar Profile
[1] Yue Wu, Pan Zhou, Andrew Gordon Wilson, Eric P Xing, Zhiting Hu.
Improving GAN Training with Probability Ratio Clipping and Sample Reweighting
Neural Information Processing Systems (NeurIPS 2020).
[2] Zhengzhong Liu, Guanxiong Ding, Avinash Bukkittu, Mansi Gupta, Pengzhi Gao, Atif Ahmed,
Shikun Zhang, Xin Gao, Swapnil Singhavi, Linwei Li, Wei Wei, Zecong Hu, Haoran Shi, Xiaodan
Liang, Teruko Mitamura, Eric P Xing, Zhiting Hu.
A Data-Centric Framework for Composable NLP Workflows
Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), demo.
[3] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Tom Mitchell, Eric P Xing.
Learning Data Manipulation for Augmentation and Weighting
Neural Information Processing Systems (NeurIPS 2019).
[4] Zhiting Hu, Haoran Shi, Bowen Tan, Wentao Wang, Zichao Yang, Tiancheng Zhao, Junxian He,
Lianhui Qin, Di Wang, Xuezhe Ma, Zhengzhong Liu, Xiaodan Liang, etc.
Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation
Annual Meeting of the Association for Computational Linguistics (ACL 2019).
Best Demo Paper Nomination, https://github.com/asyml
[5] Zhiting Hu*, Bowen Tan* (equal contrib.), Zichao Yang, Ruslan Salakhutdinov, Eric P Xing.
Connecting the Dots between MLE and RL for Sequence Prediction
ICLR 2019 Workshop on Deep Reinforcement Learning Meets Structured Prediction,
Best Paper Award
[6] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Xiaodan Liang, Lianhui Qin, Haoye Dong, Eric P
Xing.
Deep Generative Models with Learnable Knowledge Constraints
Neural Information Processing Systems (NeurIPS 2018).
[7] Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, Eric P Xing.
On Unifying Deep Generative Models
International Conference on Learning Representations (ICLR 2018).
[8] Zhiting Hu, Zichao Yang, Xiaodan Liang, Ruslan Salakhutdinov, Eric P Xing.
Toward Controlled Generation of Text
International Conference on Machine Learning (ICML 2017).
[9] Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard Hovy, Eric P Xing.
Harnessing Deep Neural Networks with Logic Rules
Annual Meeting of the Association for Computational Linguistics (ACL 2016).
Outstanding Paper Award
Curriculum Vitae (abbreviated)
R. Stuart Geiger, Ph.D
[email protected]
Education
Academic Experience
Service activities
1. Lead organizer of the Best Practices in Data Science working group at the UC-Berkeley Institute for
Data Science (2018-2020)
2. Co-organizer of workshop and event series on Diversity & Inclusion in Data Science at UC-Berkeley
(2018-2020)
3. Program committee member of the ACM Conference on Collective Intelligence (2019-present)
4. Co-organizer of the Data Science Studies / Critical Data Studies conference track at the Annual
Meeting of the Society for the Social Studies of Science (4S) (2016-2018)
5. Undergraduate Research Apprenticeship Mentor, UC-Berkeley (2018-2020)
1. Geiger, R.S., K. Yu, Y. Yang, M. Dai, J. Qiu, R. Tang, and J. Huang. 2020. "Garbage In, Garbage Out:
Do Machine Learning Application Papers in Social Computing Report Where Human-Labeled
Training Data Comes From?" In Proceedings of the ACM Conference on Fairness, Accountability, and
Transparency.
2. Geiger, R.S. 2019. “The Visible and Invisible Work of Maintaining and Sustaining Open-Source
Software.” Keynote at the SciPy (Scientific Python) 2019 conference, Austin, Texas. July 10, 2019.
3. Geiger, R.S. 2018. “Key Values: What We Talk About When We Talk About ‘Open Science.’” Keynote
at the 2018 Hawai’i Open Science Symposium, Manoa, HI. Apr 20, 2018.
4. Geiger, R.S., N. Varoquaux, C. Mazel-Cabasse, and C. Holdgraf. 2018. “The Types, Roles, and
Practices of Documentation in Data Analytics Open Source Software Libraries: A Collaborative
Ethnography of Documentation Work.” Computer Supported Cooperative Work.
https://doi.org/10.1007/s10606-018-9333-1
5. Geiger, R.S. and Halfaker, A. 2017. “Operationalizing conflict and cooperation between
automated software agents in Wikipedia: A replication and expansion of Even Good Bots Fight."
In Proceedings of the ACM on Human-Computer Interaction (Nov 2017 issue, CSCW 2018
Online First). https://doi.org/10.1145/3134684
6. Geiger, R.S. 2017. "Beyond opening up the black box: Investigating the role of algorithmic systems in
Wikipedian organizational culture." Big Data & Society 4(2). http://stuartgeiger.com/algoculture-
bds.pdf
7. Geiger, R.S. 2016. “Bot-based collective blocklists in Twitter: the counterpublic moderation of
harassment in a networked public space.” Information, Communication, and Society 19(6).
http://stuartgeiger.com/blockbots-ics.pdf
Academic Appointments
University of California, San Diego
Chancellor’s Associates Endowed Chair, 2020-.
Associate Professor, Dept of Political Science and Halıcıoğlu Data Science Institute, July 2020-.
Associate Professor, Dept of Political Science, July 2018-2020
Assistant Professor, Dept of Political Science, July 2014-2018.
Education
Harvard University,
Ph.D., Government, 2014
Stanford University
M.S. Statistics, June 2009
Stanford University
B.A. International Relations & Economics, June 2009
Books
- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. Text as Data. Princeton
University Press. (In Press)
- Roberts, Margaret E. Censored: Distraction and Diversion Inside China’s Great Firewall.
Princeton University Press. (2018)
Selected Publications
- Eddie Yang and Margaret E. Roberts. 2021. “Censorship of Online Encyclopedias: Implica-
tions for NLP Models.” In Conference on Fairness, Accountability, and Transparency (FAccT
‘21)
- Grimmer, Justin, Margaret E. Roberts, and Brandon M. Stewart. “Machine Learning for
Social Science: An Agnostic Approach.” Annual Review of Political Science. (In Press)
- Roberts, Margaret E. ”Resilience to online censorship.” Annual Review of Political Science 23
(2020): 401-419.
- Iyad Rahwan, Manuel Cebrian, Nick Obradovich, Josh Bongard, Jean-Franois Bonnefon, Cyn-
thia Breazeal, Jacob W. Crandall, Nicholas A. Christakis, Iain D. Couzin, Matthew O. Jackson,
Nicholas R. Jennings, Ece Kamar, Isabel M. Kloumann, Hugo Larochelle, David Lazer, Richard
McElreath, Alan Mislove, David C. Parkes, Alex ‘Sandy’ Pentland, Margaret E. Roberts, Azim
Shariff, Joshua B. Tenenbaum Michael Wellman. “Machine behaviour.” Nature. 2019 Apr
568(7753):477
- Hobbs, William R., and Margaret E. Roberts. “How sudden censorship can increase access to
information.” American Political Science Review 112.3 (2018): 621-636.
- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2017. “How the Chinese Government
Fabricates Social Media Posts for Strategic Distraction, not Engaged Argument.” American
Political Science Review.
- King, Gary, Patrick Lam, and Margaret E. Roberts. 2017. “Computer-Assisted Keyword and
Document Set Discovery from Unstructured Text.” American Journal of Political Science.
- Roberts, Margaret E, Brandon M. Stewart, and Edo M. Airoldi. 2016. “A model of text for
experimentation in the social sciences.” Journal of the American Statistical Association, 111
(515): 988-1003.
- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2014. “Reverse Engineering Chinese
Censorship: Randomized Experimentation and Participant Observation.” Science, 345 (6199):
1-10.
- Roberts, Margaret E, Brandon M. Stewart, Dustin Tingley, Christopher Lucas, Jetson Leder-
Luis, Shana Gadarian, Bethany Albertson, David Rand. 2014.“Structural Topic Models for
Open-Ended Survey Responses.” American Journal of Political Science, 58 (4): 1064-1082.
- King, Gary, Jennifer Pan, and Margaret E. Roberts. 2013. “How Censorship in China Allows
Government Criticism but Silences Collective Expression.” American Political Science Review,
107(2), 326-343.
Selected Service
• Editorial Board Member: Political Analysis, Asian Survey, American Journal of Political
Science, American Political Science Review, China Quarterly, Political Behavior, World
Politics
• American Political Science Association Methodology Section Chair (2019)
• Text as Data Association Board (2019-)
• Society for Political Methodology Diversity Committee (2014-present)
David Danks
Education
Ph.D., Philosophy, University of California, San Diego, 2001
M.A., Philosophy, University of California, San Diego, 1999
A.B., Philosophy, Princeton University, 1996
Academic experience
(As of July 2021: University of California, San Diego, Professor of Data Science & Philosophy)
Carnegie Mellon University, L.L. Thurstone Professor of Philosophy & Psychology, 2016 –
CMU, Head, Department of Philosophy, 2014 -
CMU, Professor of Philosophy & Psychology, 2014 - 2016
CMU, Associate Professor of Philosophy & Psychology, 2008 - 2014
CMU, Assistant Professor of Philosophy, 2003 - 2008
Florida Institute for Human & Machine Cognition, Research Scientist, 2001 – 2012 (full-time for
2001-2003; part-time from 2003-2012)
Colorado College, Visiting Assistant Professor of Philosophy, 2002 - 2003