Internship-Review Hiranmai 045

An
Internship Review
By
D HIRANMAI [4511-21-733-045]
CONTENTS
 History & Need of Python
 Introduction to Machine Learning
 Basic Libraries for Machine Learning
 Real World Usage - Assignment
 Results Sample
 Conclusion
HISTORY & NEED OF PYTHON
 This was designed by Guido van Rossum, who was a Dutch programmer also known as the creator
of this language and created by the Python Software Foundation.
 The programming language is said to be a succeeder of a previously written programming language,

which was ABC programming language.
 Developing python was started in the late 1980s. But The language was finally released in 1991.
 The first version, python 0.9.0, was developed by Guido Van Rossum at alt.sources in February
1991.
■ Need of Python
 Graphics & Visualization
 Built-in Data Analytics Tools
 Exceeding Python Community

INTRODUCTION TO DATA ANALYTICS
 Data analysis is a comprehensive method of inspecting, cleansing, transforming, and modeling data to discover
useful information, draw conclusions, and support decision-making.
 It is a multifaceted process involving various techniques and methodologies to interpret data from various
sources in different formats, both structured and unstructured.
 Data analysis is not just a mere process; it's a tool that empowers organizations to make informed
decisions, predict trends, and improve operational efficiency.
 It's the backbone of strategic planning in businesses, governments, and other organizations.
 Data analysis can be categorized into four main types, each serving a unique purpose and providing
different insights. These are descriptive, diagnostic, predictive, and prescriptive analyses.
 Descriptive analysis, as the name suggests, describes or summarizes raw data and makes it
interpretable.
 It involves analyzing historical data to understand what has happened in the past. This type of analysis
is used to identify patterns and trends over time.
 Diagnostic analysis goes a step further than descriptive analysis by determining why something
happened. It involves more detailed data exploration and comparing different data sets to understand
the cause of a particular outcome.
 Predictive analysis uses statistical models and forecasting techniques to understand the future. It
involves using data from the past to predict what could happen in the future. This type of analysis is
often used in risk assessment, marketing, and sales forecasting.
 Prescriptive analysis is the most advanced type of data analysis. It not only predicts future outcomes
but also suggests actions to benefit from these predictions. It uses sophisticated tools and technologies
like machine learning and artificial intelligence to recommend decisions.
 Data Analysis Process
1. Defining the objectives and opinions
2. Data Collection
3. Data Cleaning
4. Data Analysis
5. Data interpretation and Visualization
6. Data Storytelling
BASIC LIBRARIES FOR DATA ANALYTICS
 NumPy
 Pandas
 SciPy
 Matplotlib
 Scikit-learn
 Seaborn
 Tensorflow
 Keras
 PyTorch
ASSIGNMENT
Twitter Tweets Sentimental Analysis
Analyze the sentiment of twitter tweets to determine the posted tweet whether
positive or negative of users on various situations despite the challenges of noisy
data, sarcasm and context-based language to provide actionable insights.
 1. Understanding the Objective & Mapping Outline for analysis
 Classify tweets as positive or negative.
 Detect emotions like anger, surprise in tweets, etc..
 Monitor sentiment trends over time to understand the public opinoin.
 2. Identifying the required variables and Collecting the Data as per requirements
 Must needed variables : Label, Tweets, and so on.
 Required Datasets are available on www.github.com & www.kaggle.com .

 3. The data now should be preprocessed if it consist of unnecessary variables and Null
Values, etc.
 One column is removed & Some data cells had filled with “NaN”.
 “NaN” values are replaced with suitable category type according to variable.
 4. The preprocess treated data is analyzed. However, the analysis is about tweets posted in
twitter to determine the nature of tweets.
 Analysis would be about the most common type of tweets and we get the idea of what type of users are mostly
active in social media(Twitter).
Result Samples
 Data Sample  Positive & Negative
Data
Composition Count
“1” – Negative
“0” - Positive
 Data Distribution  Length of Tweets
 Word Count of each tweet
 Frequent Words
 Frequency of Negative & Positive tweets
 Removing Special Characters,
Symbols & other
 Word Cloud of Label “0” Data
 Word Cloud of Lable “1” Data
 Predicted Labels by Logistic Regression
False – “0”
True – “1”
CONCLUSION
Finally, I concluded that by analyzing the dataset Twitter tweets, I found

different insights and those are helpful to classify the tweets whether positive or negative.
And this internship had provided the practical implementation view of knowledge.

Internship-Review Hiranmai 045

Uploaded by

Copyright:

Available Formats

Internship-Review Hiranmai 045

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Internship-Review Hiranmai 045

Uploaded by

Copyright:

Available Formats

An

 History & Need of Python

 Introduction to Machine Learning

 Basic Libraries for Machine Learning

 Real World Usage - Assignment

 The programming language is said to be a succeeder of a previously written programming language,

 Graphics & Visualization

 Built-in Data Analytics Tools

 Exceeding Python Community

5. Data interpretation and Visualization

Twitter Tweets Sentimental Analysis

 Classify tweets as positive or negative.

 Detect emotions like anger, surprise in tweets, etc..

 Monitor sentiment trends over time to understand the public opinoin.

 Must needed variables : Label, Tweets, and so on.

 Required Datasets are available on www.github.com & www.kaggle.com .

Finally, I concluded that by analyzing the dataset Twitter tweets, I found

You might also like