UCR Syllabus
UCR Syllabus
UCR Syllabus
Syllabus
Introduction to Data Science with Python
I. Course Information
Course Description:
This class provides students an introduction to data science and methods starting
with essential data exploration exploratory techniques using Python programming
language. These techniques are typically applied before formal modeling commences and
can help inform the development of more complex statistical models. Exploratory
techniques are also important for eliminating or sharpening potential hypotheses about
the world that can be addressed by the data. We will cover in detail the creating
informative plots as well as some of the basic principles of constructing data graphics and
move on to exploring common multivariate statistical techniques used to visualize and
understand high-dimensional data to build models.
In the class, we will utilize fundamentals of statistical inference in a practical manner for
getting things done. After investigating correlations and trends, we will move to building
interactive models that are used in wide variety of industries to give student a “real life”
use case for application of data science. After taking this course, students will be familiar
with using Python for data science tasks and use this information for making informed
choices in analyzing data.
1
Instructional methods
Course will be taught via Moodle.
Assignments
There will be hands on homework and project assignments to give students an
opportunity to apply what they learn in the class.
Visualization
- Matplotlib
- Seaborn
- Bokeh
2
o ANOVA
o Categorical Analysis
- Multivariate Analysis
o Correlations
o Dimensionality Reduction
o Principal Components Analysis
o Clustering
- Regression
o Assumptions
o Linear Regression
o Logistic Regression
- Decision Trees
o Assumptions
o Single Decision Tree
o A forest of trees
o Gradient Boosted Trees
- Neural Networks
o Assumptions
o Sigmoid, Linear and Stochastic Components
o Building a multilayer, multimode neural network using Keras
The relevant datasets and iPython notebooks will be posted on the Moodle site
before the class for the students to get accustomed to the material before the
concepts and examples are introduced.
Course will be delivered online and students are expected to participate. Attendance
and participation will be counted towards to final grade.
The final grade will be based on a data science project. This will be a Kaggle style project
where the students turn in their predicted results for a test dataset along with the Python
code used to create the prediction. The students can select any method; could be as
simple as a linear fit or as complicated as a multi-layer multi-node neural network
solution. The final grade determination will be based on the performance of model on the
test dataset.
3
Your email account is an important tool for your participation this course. Make sure that
your mailbox has enough room to accept messages and attachments. If you are using an
email account provided by your employer, check to see that your account can receive
email from outside your local network. School districts frequently reject emails from our
server because of filtering software and many students never receive course
announcements or other materials. Additionally, do not use an automated responder with
the email account you are using with your course. If you have concerns about getting
unwanted emails because your email account is visible to others in your course, set up an
account specifically for your online course using a free service (Google, Yahoo,
Hotmail).
X. Plagiarism
All written work must be the product of the student submitting the work. While students
may be permitted by the instructor to work together on in-class assignments, all work
done outside the classroom must be done by the student without collaboration or sharing
with other students or non-students. Credit must be given for any material used which is
not created by the student, including images. If a student is determined to have violated
this policy, he/she will receive a zero for the assignment and be reported to the Program
Director. A second finding of plagiarism or cheating will result in the student being
withdrawn from the course by the instructor and reported to the Registrar.
Introduction
This course uses the Moodle course management system. To participate in the course,
you will need Internet access and a web browser which works with Moodle. You may
visit the following website for current information about which web browsers are
compatible with Moodle: http://www.delhi.edu/cis/moodle/browsercheck.php
4
If you need additional instruction you may access the Moodle site tutorial at:
http://docs.moodle.org/20/en/Student_tutorials
Security
If you access the course from a public computer, be sure to log out of the course and
completely close the browser when you are done. This will prevent others from accessing
the course using your student identification. Do not share your NetID and password with
others.
Participation Guidelines
Your online presence is an important part of the class. You should log on at least twice a
week and make contributions to the online forums. Responding to someone's forum post
with "Yes, that's a good point" or "I agree with that" doesn't count as adding to the
discussion. Start a new topic or make a substantive contribution to the existing
discussion.
Participation in the online forums each week is required to earn a passing grade in this
course.
Support