Report Seminar 6741

A Technical Seminar report entitled
On
DATA SCIENCE TOOLS AND LIBRARIES

In partial fulfillment of the requirements for the award of
BACHELOR OF TECHNOLOGY
In
Computer Science and Engineering(data science specialization)
Submitted by
SAGINA VIJAY(20E51A6741)
Under the Esteemed guidance of

Dr. P. MADHURI
Assistant Professor
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
HYDERABAD INSTITUTE OF TECHNOLOGY AND MANAGEMENT

Gowdavelly (Village), Medchal, Hyderabad, Telangana, 501401
(UGC Autonomous, Affiliated to JNTUH, Accredited by NAAC (A+) and NBA)
2023-2024
1
HYDERABAD INSTITUTE OF TECHNOLOGY AND
MANAGEMENT
CERTIFICATE
This is to certify that the Technical Seminar entitled “Data Science Tools and Libraries"
is being submitted by Sagina Vijay bearing hall ticket number 20E51A6741 in partial
fulfillment of the requirements for the degree BACHELOR OF TECHNOLOGY in
COMPUTER SCIENCE AND ENGINEERING by the Jawaharlal Nehru Technological
University, Hyderabad, during the academic year 2023-2024. The matter contained in this
document has not been submitted to any other University or institute for the award of any
degree or diploma.
Internal Supervisor Head of the Department

Dr. P. Madhuri Dr. Ila Chandana Kumari
Associate Professor
2
HYDERABAD INSTITUTE OF TECHNOLOGY AND
MANAGEMENT

(DATA SCIENCE)
DECLARATION
I “Sagina Vijay” student of ‘Bachelor of Technology in CSD’, session: 2023 - 2024,

Hyderabad Institute of Technology and Management, Gowdavelly, Hyderabad, Telangana
State, hereby declare that the work presented in this Technical Seminar entitled ‘Data
science Tools and Libraries’ is the outcome of our own bonafide work and is correct to
the best of our knowledge and this work has been undertaken taking care of engineering
ethics. It contains no material previously published or written by another person nor
material which has been accepted for the award of any other degree or diploma of the
university or other institute of higher learning, except where due acknowledgment has
been made in the text.
Sagina Vijay (20E51A6741)
3
ACKNOWLEDGEMENT
An endeavor of a long period can be successful only with the advice of many
well-wishers.
We would like to thank our chairman, SRI. ARUTLA PRASHANTH, for
providing all the facilities to carry out the Technical Seminar successfully.
We would like to thank our Principal DR. P. RAJESH KUMAR, who has inspired
lot through their speeches and providing this opportunity to carry out our
Technical Seminar successfully.
We are very thankful to our Head of the Department, DR. Ila chandana Kumari
and B-Tech Technical Seminar Coordinator Dr. P. Madhuri We would like to
specially thank my internal supervisor Dr. P.Madhuri,ASSOCIATE PROFESSOR
for Technical Guidance.We wish to convey our gratitude and express sincere
thanks to all D.C(DEPARTMENTAL COMMITTEE) and T.R.C (TECHNICAL
REVIEWCOMMITTEE) members, non-teaching staff for their support and Co-
operation rendered for successful submission of our Technical Seminar.
We also want to express our sincere gratitude to all my family members and my
friends for their individual care and everlasting moral support.
Sagina Vijay (20E51A6741)
4
TABLE OF CONTENTS
LIST OF FIGURES ……………………………………………………………………...i

ABSTRACT
……………………………………………………………………………...ii
1. CHAPTER - 01…………………………………………………………………….....1
● INTRODUCTION
2. CHAPTER - 02……………………………………………………………………….2
● DATA SCIENCE TOOLS AND LIBRARIES.
2.1 What is data science tools and libraries?

2.2 Which Tool is Used most in Data science?
2.3 Benefits of Tools and Libraries and in Real World.
3 CHAPTER -03………………………………………………………………………12
●DATA SCIENCE TOOLS
3.1.1 Programming Languages
3.1.2 Integrated Development Environments
3.1.3 Data Visualization Tools
3.1.4 Notebook Sharing and Collaboration
4 CHAPTER - 04…………………………………………………………………….14
● DATA SCIENCE LIBRARIES
4.1 Introduction to Libraries
5
4.2 Advanced tools and libraries
5 CONCLUSION ……………………………………………………………………20
6 REFERENCES…………………………………………………………………….21
6
LIST OF FIGURES
Sl.no CAPTION
1. Data Scientist tools

2. Data Scientist libraries
3. Matplotlib Plot result
4. Info about Python libraries from git-hub
5. Convolution Neural Network
7
i
ABSTRACT
Data science tools and libraries are essential resources that empower data professionals to
efficiently work with data, extract insights, and build predictive models. They encompass
a variety of software solutions to different stages of the data science workflow, from data
acquisition and preprocessing to analysis, visualization, and model deployment. These
tools abstract away complex underlying processes, enabling data scientists to focus on
extracting meaningful information from data. Here's an abstract breakdown of these tools
and libraries:-
8
Tools:-- Libraries:--
Programming Languages Pandas
Integrated Development Environments NumPy
Data Warehousing and Processing Scikit-Learn
Data Visualization Tools Pytorch
Notebook Sharing and Collaboration Matplot Lib
Keywords: Data Science Tools and Libraries
ii
1. INTRODUCTION
In modern civilization groups of people, technology has emerged and evolved as a robust
tool to resolve modern-day problems and challenges. The invention of computers, which
were initially used as computing devices for mathematics, has extended their
compatibility with other machines and improved their capability to supply a big selection
of operations from distinct and diverse kinds of applications. This computing revolution
9
forced every industry to exponential growth by better performance and quick
improvements by overcoming challenges. Computer science sub-fields like data science
which uses statistics, probability, and their related methods to analyze and understand the
insights of information, Machine learning for exploratory data analysis and building
models by training data, AI which is employed to form intelligent systems, Deep learning
which uses different layers during a network to predict etc. These technologies have
evolved as an important need within the technology industry to seek out solutions for
ever-challenging problems. The last decade witnessed a considerable and extraordinary
amount of stored data. Growth of knowledge in every industry including healthcare,
automotive, manufacturing, finance, food processing, etc., then came a desire to utilize
this information for building and inventing the best new products and to renovate the
present ones, and also to enhance customer experience in their respective fields. To
handle such amounts of information, there’s a necessity for mathematical tools like
statistics, calculus, infinitesimal calculus, probability, etc., they play a prominent role in
understanding, interpreting, and converting information to information. Now comes a
desire for an honest programming language that is powerful and versatile to implement
the methods required to develop data science applications, which is simple to use and
popular among developers. Python could be a high-level general-purpose programming
language which had built-in data types like lists, arrays, etc python ASCII text file is
compiled to be byte code without a necessity for separate compilation. In recent years,
python with the assistance of mathematical libraries like Numpy, Pandas, Scipy, and
Scikit-learn made Python really for machine learning and deep learning.
2.DATA SCIENCE TOOLS AND LIBRARIES
2.1 What is Data Science and it’s Operations?
 Data science is an interdisciplinary field that combines subject-matter expertise with

skills in mathematics, statistics, computer programming, advanced analytics, artificial
intelligence (AI), and machine learning to extract actionable insights from an
organization's data. These insights can form the basis for making decisions and can be
factored into long-term plans.
10
 Data science is one of the fastest-growing fields across all sectors since the
availability of data is increasing at an unprecedented rate.
 OPERATIONS IN DATA SCIENCE
 Data Extraction:
Data extraction is the method of obtaining data from an information base or SaaS
platform so as that it's replicated to a destination — sort of a data warehouse, designed to
support online analytical processing (OLAP). Data Science operations start with
extracting information from the planet, this data is in any format, shape, or size. Python
provides many libraries for extracting data from the web and universal machines like
requests, beautiful soup, scrappy, and pypdf. you will be ready to extract data from SQL
files and databases using the Pandas library. this will be done by opening a database, or
by running an SQL query.
 Data Processing:
This operation entails steps to transform raw data into usable information. Missing
values, corrupted values, time zone differences, and date range issues are all crucial
checks to make during this procedure. Numpy and Pandas libraries are provided by
Python for data processing, which is also known as data cleaning. The conversion of
information into something that a computer can understand, such as 0's and 1's, is known
as raw data.
 Data Visualization & Analytics:

Once the data has been cleaned and prepared for use, it is critical to understand the
data's insights. Graphs are the best way to learn about data since they provide the data
with an overall meaning. The Python modules pandas and matplotlib are excellent graph
visualization tools. Any firm or corporation relies heavily on data. To uncover
information helpful for corporate decision-making, it is necessary to gather, handle, and
evaluate data flow in a fast and accurate manner. Data analysis is a process for gathering,
transforming, and organizing data in order to generate future predictions and data-driven
decisions. It also aids in the discovery of potential solutions to a business problem.
 Data Modelling:
After data analysis, there are many machine learning algorithms to create a model
based on the data. The design of models heavily relies on statistics and probability.
Python provides a Skit-learn library which had inbuilt methods for machine learning
models such as linear regression, logistic regression, etc. for supervised,
unsupervised, and reinforcement learning
 Scientific Computations:
For scientific computations for researchers, students, and scientist’s python provides
a library called sci-py which has all the methods that are used for many mathematical and
scientific operations.
11
2.2 Which tool is most used for Data Science?
 Python:
The most widely used data science programming language also considered a data
science tool. Python helps data science professionals to perform data analysis over
large datasets and data of different sorts. A good but clean basic syntax, a flexible but
robust integration programming language is required due to the large number of
integrated platforms and environments. Python satisfies all these qualities and it is
also easy to learn.
 Let’s discuss some important characteristics of python
 Integrity:
Python is a programming language that is well-known for its ability to integrate
with other languages. It can be used with a variety of other programming languages,
including C, C++, Java, CORBA, and TensorFlow, as well as a wide range of Computer
Science and Machine Learning tools, including Google Cloud ML Engine, Amazon
Machine Learning, and others. Python not only interacts with platforms and
programming language interfaces, but it also has a library stack that demonstrates the
strength of its integration capabilities.
 Ease of Use:
Python is simple to use because it bases its operations on normal language rather
than on complicated syntax rules. Python programming is as easy to learn as entering an
English sentence into your computer. Installing and downloading Python is also simple.
 OOPS:
In Python, object-oriented Programming (OOPs) is a programming paradigm that
uses objects and classes in programming. It aims to implement real-world entities like
inheritance, polymorphisms, encapsulation, etc. in the programming. The main concept
of OOPs is to bind the data and the functions that work on that together as a single unit so
that no other part of the code can access this data.
 Python's Built-in Data Structures: Python has a variety of mutable and immutable
data structures, including arrays, Strings, and tuples for mutable data and list, set, and
dictionary for immutable data. We can simply organize and perform operations on
data using these data structures.
 Compilation:
Python is generally called an interpreted language however; it combines compiling
and interpreting. When we execute a source code. Python first compiles it into a
bytecode. The bytecode is a low-level platform-independent representation of your
source code, even so, it isn't the binary machine code and cannot be run by the target
machine directly. Actually, the Python Virtual Machine is a set of instructions for a
12
virtual machine (PVM). Byte code is a lower level, platform-independent, effective, and
intermediate.
2.3 Benefits of Tools and Libraries
There are several benefits associated with Data science tools and libraries offer a wide
range of benefits, including:
o Increased productivity and efficiency:
Data science tools and libraries can automate many of the repetitive and time-
consuming tasks involved in data science, such as data cleaning, preparation, and
analysis. This frees up data scientists to focus on more strategic and creative work.
o Improved accuracy and reliability:
Data science tools and libraries are typically well-tested and maintained, which
helps to ensure that the results of data analysis are accurate and reliable.
▪ Reproducibility:
Data science tools and libraries make it easier to reproduce data science workflows,
which is essential for scientific research and for ensuring that data science models are
used responsibly.
▪ Collaboration:
Data science tools and libraries are often open source, which makes it easy for data
scientists to collaborate with each other and share their work.
▪ Reduced costs:
13
Data science tools and libraries can help to reduce the costs associated with data
science projects by automating tasks and reducing the need for custom development.
3. Data Science Tools
Data science tools are used for diving into raw and complicated data (unstructured or
structured data) and processing, extracting, and analyzing it to dig out valuable insights
by applying different data processing techniques such as statistics, computer science,
predictive modeling, and analysis, and deep learning.
14
Fig3: Data scientist tools
3.1 Programming Languages (e.g., Python, R):

▪ Python and R are among the most widely used programming languages in data
science. They offer extensive libraries, frameworks, and packages specifically
designed for data manipulation, analysis, visualization, and machine learning.
15
3.2 Integrated Development Environments (IDEs):
Jupyter Notebook and RStudio are IDEs provide an interactive environment for
writing, executing, and documenting code. Jupyter Notebook and RStudio are popular
choices that allow for code, visualizations, and explanatory text to be combined in a
single document.
3.3 Data Visualization Tools (e.g., Matplotlib, Seaborn,

Tableau):
▪ Visualization tools help transform data into meaningful graphs, charts,
and interactive dashboards, making it easier to communicate insights to
non-technical stakeholders.
3.4 Notebook sharing and Collaboration
Notebook sharing and collaboration is an important aspect of data science. Data
scientists often work on projects together, and they need to be able to share their work
with each other easily and efficiently.
There are a number of ways to share and collaborate on notebooks
o Use a cloud-based notebook platform Cloud-based notebook platforms make
it easy to share notebooks with others and collaborate in real time. Google Colab and
Kaggle Notebooks are two popular options.
o Document your notebooks. It is important to document your notebooks so that
others can understand your work. This includes adding comments to your code,
explaining your analysis steps, and providing context for your findings.
 Jupyter Notebook
▪ An interactive web-based tool called Jupyter Notebook is utilized in data science
initiatives. Jupyter notebooks include more useful capabilities in addition to offering
kernels for programming languages like Python, Scala, and R.
▪ It blends materials written in natural language with code. The second justification is
that Jupyter Notebooks are interactive. It is perfect for data scientists and researchers
since it allows them to experiment with data and see how the code responds to each
command they input.
16
4.DATA SCIENCE LIBRARIES
Introduction to Libraries:
Fig4: Data Science Libraries

4.1 Numpy:
Python isn't developed to perform numerical operations. But the raising interests for
Python from all the engineering, scientific, and exploration communities forced the
inventors of Python to create a package with high-position array
Perpetration.
So Numpy is principally developed on a core data structure called an array. N array is
a type of matrix array, which has rudiments of the same type. A numpy array by
dereliction has a fixed size and shape m * n with equal m rows and n columns when a
new element is added to the matrix exceeding its size, also it clones all the
rudiments from also existing array and creates a new array with equal size also deletes
the original array. Numpy can be used in any Python-integrated development terrain. For
illustration Initialization import numpy as np.
Scipy and Pandas libraries are developed on the array structure bed of the numpy
4.2PANDAS:
 Pandas is a fast, important, flexible and easy to use open source data analysis and
manipulation tool, built on top of the Python programming language.
 Pandas give fast and effective DataFrame objects for data manipulation with
integrated indexing.Pandas is used as a tool for reading and writing data between in-
memory data structures and different formats CSV and text files, Microsoft Excel,
SQL databases, and the fast format. Intelligent data alignment and integrated care of
missing data gain automatic label-based alignment in performing calculations and
17
easily transform disordered data into a structured format pivoting and flexible
reconfiguration of data collections including:
 Fast and efficient data structures: Pandas uses high-performance data
structures, such as NumPy arrays, to store and manipulate data efficiently. This
makes Pandas well-suited for working with large datasets.
 Powerful data manipulation tools: Pandas provides a number of powerful
tools for data manipulation, such as filtering, sorting, grouping, and aggregation.
These tools make it easy to clean, prepare, and analyze data.
 Flexible data analysis tools: Pandas also provides a number of flexible data
analysis tools, such as statistical functions, time series analysis, and machine learning
tools. These tools make it easy to perform complex data analysis tasks.
4.3Matplot Lib:
 Python's Matplotlib toolkit provides a complete tool for building static, animated, and
interactive visualizations. Easy effects are made feasible by Matplotlib, as are
challenging effects
 A tool for visualizing data, Matplotlib is a low-level graph charting framework
written in Python.
 We are free to utilize Matplotlib because it is open-source. For platform portability, it
is primarily written in Python, with a few pieces also written in C, Objective-C, and
Javascript.
Example:
1) import matplotlib.pyplot as plt
2) import numpy as np
3) x = np.linspace(0, 2 * np.pi, 200)
4) y = np.sin(x)
5) fig, ax = plt.subplots()
6) ax.plot(x, y)
7) plt.show()
18
Fig4.3: Matplotlib Plot Result
4.4 Scipy :
A scientific python called Scipy is used for N-dimensional array manipulation. This
library runs on the core of Numpy. This library provides numerous methods for scientific
computations such as optimization, linear programming, calculating distances.
Library Stars Forked Contributors

Numpy 20.8k 7k 1330
Scipy 9.8k 4.3k 1158
Pandas 34.3k 14.6k 2612
Matplotlib 15.7k 6.4k 1172
Seaborn 5722 905 87
Scikit-learn 50.5k 23.2k 2399
Tensor Flow 166k 86.9k 3129
Pytorch 56.9k 15.8k 2326
Keras 55.5k 19.1k 1029
Apache Spark 33.2k 25.8 1808
Fig4.4: Information about Python libraries from GitHub

19
 Advanced Data Science Tools and Libraries:
TensorFlow and PyTorch:
They provide tools for building, training, and deploying neural
networks for tasks like image recognition, natural language processing,
and more.
Apache Spark:
Apache Spark is a powerful framework for distributed data processing and analytics.
Hadoop:
Hadoop is a distributed storage and processing framework for big data.
scikit-learn-extensions:
This library provides additional functionality on top of Scikit-Learn, offering tools for
feature engineering, preprocessing, and model evaluation.
20
 DEEP LEARNING
▪ Deep learning uses a variety of characteristics and representations and is a type of

unsupervised learning. The Keras framework is one of the most important extensions
that Python provides for deep learning. Numerous modules, including initializers,
regularizes, restrictions, activations, losses, metrics, and optimizers, are supported by
Keras.
We can create a wide variety of cutting-edge
applications with Keras, including robotics, picture
recognition, audio/video recognition, and more.
 ARTIFICIAL NEURAL NETWORKS
▪ Many Python packages, modules, and libraries are available for artificial intelligence.
One such library with a potent neural network is neurolab. Single layer neural
networks and multi-layer neural networks are among its primary functionalities.
Numpy, Scipy, and Matplotlib libraries are extensions.
21
 Convolutional Neural Networks (CNNs):
▪ CNNs are a class of deep neural networks primarily used for image analysis and
computer vision tasks. They are designed to automatically learn hierarchical features
from images by using convolutional layers that apply filters to capture spatial
patterns.
Fig4.4: Convolution Neural Network
22
5. CONCLUSION
In this paper we have discussed about characteristics of python
programming language and the reasons behind python to become the most
popular language. We also discussed about various python libraries and there
functionalities on developing data science applications and analysis. We
discussed about the disadvantages of using python in data science projects
and improvements required to meet future needs of the industry. we also
discussed about deep learning and artificial neural networks and python
libraries which support their functionality.
Machine learning is rapidly growing area and its sub branches such as deep
learning and neural networks are headed towards new innovations and
advancements. There is a need for every technology to evolve to meet
machine learning needs in the future, this evolution process can be either by
advancing the existing systems or by knowing its limitations and improving
them. There are many other technologies which are in their respective
developing stages are getting ready for more powerful computational speed,
flexibility and being robust systems. But today python libraries are more
popular in the data science industry for their dynamic usage and
functionalities.
23
6.REFERENCES
1―python-oops-concepts @ https://www.javatpoint.com/
2―www.Tutorialspoint.com
3 ―Top-Python-Libraries-for-Data-Science-
[email protected]
4―https://www.w3schools.com/
5―pypi.org
6―https://www.w3schools.com/python/matplotlib_pyplot.asp
7―Jupyter.Org
8― www.spyder-ide.org
9―Matthew Mayo, KDnuggets on November 2, 2020 in Automated
Machine Learning, AutoML, Data Exploration, Data Processing, Data
Science, Data Visualization, Explainability, Machine
Learning,Pythonhttps://www.researchgate.net/publication/347444225_
Python_And_Its_libraries_in_Data_Science_and _Related_fields
24

Report Seminar 6741

Uploaded by

Copyright:

Available Formats

Report Seminar 6741

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Report Seminar 6741

Uploaded by

Copyright:

Available Formats

A Technical Seminar report entitled

DATA SCIENCE TOOLS AND LIBRARIES

Under the Esteemed guidance of

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

HYDERABAD INSTITUTE OF TECHNOLOGY AND MANAGEMENT

(UGC Autonomous, Affiliated to JNTUH, Accredited by NAAC (A+) and NBA)

(UGC Autonomous, Affiliated to JNTUH, Accredited by NAAC (A+) and NBA)

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Internal Supervisor Head of the Department

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

I “Sagina Vijay” student of ‘Bachelor of Technology in CSD’, session: 2023 - 2024,

Sagina Vijay (20E51A6741)

Sagina Vijay (20E51A6741)

LIST OF FIGURES ……………………………………………………………………...i

2.1 What is data science tools and libraries?

1. Data Scientist tools

5. Convolution Neural Network

Keywords: Data Science Tools and Libraries

2.DATA SCIENCE TOOLS AND LIBRARIES

2.1 What is Data Science and it’s Operations?

 Data science is an interdisciplinary field that combines subject-matter expertise with

 OPERATIONS IN DATA SCIENCE

 Data Visualization & Analytics:

2.3 Benefits of Tools and Libraries

range of benefits, including:

o Increased productivity and efficiency:

o Improved accuracy and reliability:

scientists to collaborate with each other and share their work.

3. Data Science Tools

3.1 Programming Languages (e.g., Python, R):

3.3 Data Visualization Tools (e.g., Matplotlib, Seaborn,

Fig4: Data Science Libraries

3) x = np.linspace(0, 2 * np.pi, 200)

Library Stars Forked Contributors

Fig4.4: Information about Python libraries from GitHub

▪ Deep learning uses a variety of characteristics and representations and is a type of

 ARTIFICIAL NEURAL NETWORKS

Fig4.4: Convolution Neural Network

You might also like