3.Question bank
3.Question bank
3.Question bank
INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048
UNIT I
INTRODUCTION TO DATA SCIENCE
Introduction to Data Science - Benefits and uses of data science and big data - facts of data:
Structured data, Unstructured data, Natural Language, Machine generated data, Audio, Image and
video streaming data.
QUESTIONS
PART A
1 What is Data Science?
2 Why is Data Science important?
3 Mention the uses of Data Science and Big Data.
4 List the application of Data Science and Big Data.
5 Difference between big data and data science.
6 List the characteristics of big data.
7 Define Volume.
8 What are the variety of data handled by data science?
9 Why is data a fact?
10 Compare structured and unstructured data.
11 Which Characteristics is measuring the data processing speed?
12 What is meant by natural language?
13 Mention the benefits of data science.
14 Define Machine-generated data.
15 Write Short notes on video streaming data.
PART B
1 Explain about various facets/categories of data with example.
2 Discuss the benefits and uses of data science.
3 Compare and contrast the structured and unstructured data.
4 Summarize the Audio, Image and video streaming data.
Write Short notes on following
5
i) Structured data ii) video streaming data iii) Machine-generated data
6 Summarize the concepts of data science.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048
PART-C
What is Data Science, and how does it play a pivotal role in extracting actionable insights from large
1
and diverse datasets?
Discuss the challenges and potential risks associated with the utilization of large datasets and
2 advanced analytics techniques, and elaborate on the ethical considerations that arise when handling
and processing sensitive information in the context of Data Science and Big Data.
How NLP techniques are used to analyze and extract insights from unstructured text data, and provide
3
examples of real-world applications in industries such as healthcare, finance, and customer service.
Differentiate from human-generated data and machine-generated data Please elaborate on the various
4
sources and types of machine-generated data, such as sensor data, log files, and IoT data.
UNIT II
THE DATA SCIENCE PROCESS
Overview of the data science process- defining research goals and creating project charter,
retrieving data, cleansing, integrating and transforming data, exploratory data analysis, Build
the models, presenting findings and building application on top of them.
PART A
1 Write the concepts of data science process.
2 List the steps in data science process.
3 How to set research goals?
4 Define project charter.
5 What is meant by retrieving data?
6 Name the types of retrieving data.
7 List the steps of data preparation.
8 Mention few data cleaning techniques.
9 Difference between data validation and data cleaning.
10 What are the essential items to be available in the project charter?
11 Write Short notes on Data exploration process.
12 What is Data preparation?
13 List the process involved in Data cleansing
14 What is meant by integrating data?
15 Name the 4 types of data transformation.
16 What is the process of transforming data?
17 What is the difference between integration and transformation?
18 List the issues in data integration.
19 Name the three types of data integration.
20 What is Exploratory Data Analysis?
21 Write the EDA goals in data sciences
22 Classify the EDA in data science.
23 Mention the steps involved building the model.
24 What is the purpose of model building?
25 What are the applications that built on the concept of data science?
PART B
Dr. N.G.P. INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048
Types of Data - Types of Variables -Describing Data with Tables and Graphs– Outliers,
Relative Frequency, Distributions, Cumulative Frequency Distributions, Frequency
Distributions For Qualitative (Nominal) Data, Graphs For Quantitative Data, Histogram,
frequency polygon, Stem And Leaf Display , Typical shapes, A Graph For Qualitative
(Nominal) Data , Describing Data with Averages, Mode, Median, Mean
PART A
1 Define a Graph-based or network data.
2 Define an outlier.
3 How an outlier can be found?
4 How a missing value problem be handled?
5 How can one Combine data from different data sources?
6 How two tables can be joined?
7 Define Views.
8 List down the steps involved in Data transformation.
9 What is histogram?
10 What can be observed from a box plot?
11 What does a model contain?
12 Define a predictor and target variables.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048
2 Illustrate the frequency distributions for qualitative (nominal) data help uncover patterns, trends, and
key insights.
3 How does the integration of data tables and graphs enhance the management and analysis of large-
scale data in the context of big data processing?
4 Explain the concepts of stem and leaf display with suitable examples.
5 Describe the data with Averages and Mode.
Basics of NumPy Arrays - Computation on NumPy Arrays, Aggregation: Min, Max. Operating on
Data in Pandas, Handling Missing Data. Implementation basic regression analysis
PART A
1 What is a NumPy array?
2 How can you create a NumPy array in Python?
3 Write short notes on types of arrays.
4 Write a syntax and example for fromiter() function.
5 Write short note on operations of NumPy array.
6 Implement the arithmetic operations using NumPy array
7 What are the aggregation functions in NumPy?
8 What is aggregate in programming?
9 Write a python program using min and max aggregation function.
10 List the few Null values Python Pandas Data operations.
11 Name the few Null values Python Pandas Data operations.
12 Define Python Pandas.
13 Mention the Key Features of Pandas.
14 List the Benefits of Pandas.
15 Define DataFrame.
16 Create a DataFrame in Pandas using list.
17 Write the example functions of Pandas DataFrame.
18 Difference between Pandas and NumPy
19 What is meant by Dataset?
20 Write strategies to handle missing values in the dataset.
21 Why do we need to handle missing data?
22 How do you implement regression analysis?
23 What is the step-by-step procedure of regression analysis?
24 What is the basic method used in regression analysis?
25 What are the types of regression analysis?
26 Short notes on importance of regression analysis.
PART B
1 Discuss the Basics of NumPy Arrays.
2 Implement the basic Computation on NumPy Arrays.
3 Compare and contrast NumPy and Pandas.
4 Write a python program to perform the operation of aggregation functions: Min, Max.
5 Explain the methods of handling missing data.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048
UNIT V
DATA VISUALIZATION WITH MATPLOTLIB
General Matplotlib Tips, Simple Line Plots, Simple Scatter Plots, Visualizing Errors Density and
Contour Plots, Histograms, Binnings, and Density, Customizing Plot Legends Customizing Colorbars,
Multiple Subplots, Text and Annotation, Customizing Ticks Customizing Matplotlib: Configurations
and Stylesheets
PART A
1 Generalize the Matplotlib Tips
2 What is the general concept of Matplotlib?
3 Define Matplotlib (Python Plotting Library).
4 What is meant by data visualization?
5 List the four key plots that are used for data visualization.
6 Why need data visualization?
7 Mention the Benefit of Data Visualization
8 What are the three different layers in the architecture of the matplotlib?
9 What is simple line plot in data science?
10 Draw the simple scatter plots.
11 How do you visualize errors?
12 Define Matplotlib - Contour Plot.
13 What is a contour plot of a function in Matplotlib?
14 How do I draw contours in Matplotlib?
15 What is a histogram in Matplotlib?
16 What is the function of histogram in Python?
17 How to binning data in Pandas?
18 What is the purpose of binning data?
19 Define Density Plots with Pandas in Python.
20 How do you find density data?
21 Justify What graph is best for density?
22 Write the syntax for customizing colorbars in python program.
23 What is multiple subplots?
24 How Many Subplots Do You Need?
25 What is Text Annotation?
26 List the Text Annotation types.
Dr. N.G.P. INSTITUTE OF TECHNOLOGY
(An Autonomous Institution)
Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai,
Recognized by UGC &Accredited by NAAC with A+ and NBA (BME, CSE, ECE, EEE & Mech)
Kalapatti Road, Coimbatore-641048