Internship-Review Hiranmai 045

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 20

An

Internship Review

By
D HIRANMAI [4511-21-733-045]
CONTENTS

 History & Need of Python

 Introduction to Machine Learning

 Basic Libraries for Machine Learning

 Real World Usage - Assignment

 Results Sample

 Conclusion
HISTORY & NEED OF PYTHON

 This was designed by Guido van Rossum, who was a Dutch programmer also known as the creator
of this language and created by the Python Software Foundation.

 The programming language is said to be a succeeder of a previously written programming language,


which was ABC programming language.

 Developing python was started in the late 1980s. But The language was finally released in 1991.

 The first version, python 0.9.0, was developed by Guido Van Rossum at alt.sources in February
1991.
■ Need of Python

 Graphics & Visualization

 Built-in Data Analytics Tools

 Exceeding Python Community


INTRODUCTION TO DATA ANALYTICS
 Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover
useful information, draw conclusions, and support decision-making.

 It is a multifaceted process involving various techniques and methodologies to interpret data from various
sources in different formats, both structured and unstructured.

 Data analysis is not just a mere process; it's a tool that empowers organizations to make informed
decisions, predict trends, and improve operational efficiency.

 It's the backbone of strategic planning in businesses, governments, and other organizations.

 Data analysis can be categorized into four main types, each serving a unique purpose and providing
different insights. These are descriptive, diagnostic, predictive, and prescriptive analyses.
 Descriptive analysis, as the name suggests, describes or summarizes raw data and makes it
interpretable.

 It involves analyzing historical data to understand what has happened in the past. This type of analysis
is used to identify patterns and trends over time.

 Diagnostic analysis goes a step further than descriptive analysis by determining why something
happened. It involves more detailed data exploration and comparing different data sets to understand
the cause of a particular outcome.

 Predictive analysis uses statistical models and forecasting techniques to understand the future. It
involves using data from the past to predict what could happen in the future. This type of analysis is
often used in risk assessment, marketing, and sales forecasting.

 Prescriptive analysis is the most advanced type of data analysis. It not only predicts future outcomes
but also suggests actions to benefit from these predictions. It uses sophisticated tools and technologies
like machine learning and artificial intelligence to recommend decisions.
 Data Analysis Process
1. Defining the objectives and opinions

2. Data Collection

3. Data Cleaning

4. Data Analysis

5. Data interpretation and Visualization

6. Data Storytelling
BASIC LIBRARIES FOR DATA ANALYTICS
 NumPy
 Pandas
 SciPy
 Matplotlib
 Scikit-learn
 Seaborn
 Tensorflow
 Keras
 PyTorch
ASSIGNMENT

Twitter Tweets Sentimental Analysis

Analyze the sentiment of twitter tweets to determine the posted tweet whether
positive or negative of users on various situations despite the challenges of noisy
data, sarcasm and context-based language to provide actionable insights.
 1. Understanding the Objective & Mapping Outline for analysis

 Classify tweets as positive or negative.

 Detect emotions like anger, surprise in tweets, etc..

 Monitor sentiment trends over time to understand the public opinoin.

 2. Identifying the required variables and Collecting the Data as per requirements

 Must needed variables : Label, Tweets, and so on.

 Required Datasets are available on www.github.com & www.kaggle.com .


 3. The data now should be preprocessed if it consist of unnecessary variables and Null
Values, etc.

 One column is removed & Some data cells had filled with “NaN”.

 “NaN” values are replaced with suitable category type according to variable.

 4. The preprocess treated data is analyzed. However, the analysis is about tweets posted in
twitter to determine the nature of tweets.
 Analysis would be about the most common type of tweets and we get the idea of what type of users are mostly
active in social media(Twitter).
Result Samples
 Data Sample  Positive & Negative
Data
Composition Count

“1” – Negative
“0” - Positive
 Data Distribution  Length of Tweets
 Word Count of each tweet

 Frequent Words
 Frequency of Negative & Positive tweets
 Removing Special Characters,
Symbols & other
 Word Cloud of Label “0” Data
 Word Cloud of Lable “1” Data
 Predicted Labels by Logistic Regression

False – “0”
True – “1”
CONCLUSION

Finally, I concluded that by analyzing the dataset Twitter tweets, I found


different insights and those are helpful to classify the tweets whether positive or negative.
And this internship had provided the practical implementation view of knowledge.

You might also like