Internship-Review Hiranmai 045
Internship-Review Hiranmai 045
Internship-Review Hiranmai 045
Internship Review
By
D HIRANMAI [4511-21-733-045]
CONTENTS
Results Sample
Conclusion
HISTORY & NEED OF PYTHON
This was designed by Guido van Rossum, who was a Dutch programmer also known as the creator
of this language and created by the Python Software Foundation.
Developing python was started in the late 1980s. But The language was finally released in 1991.
The first version, python 0.9.0, was developed by Guido Van Rossum at alt.sources in February
1991.
■ Need of Python
It is a multifaceted process involving various techniques and methodologies to interpret data from various
sources in different formats, both structured and unstructured.
Data analysis is not just a mere process; it's a tool that empowers organizations to make informed
decisions, predict trends, and improve operational efficiency.
It's the backbone of strategic planning in businesses, governments, and other organizations.
Data analysis can be categorized into four main types, each serving a unique purpose and providing
different insights. These are descriptive, diagnostic, predictive, and prescriptive analyses.
Descriptive analysis, as the name suggests, describes or summarizes raw data and makes it
interpretable.
It involves analyzing historical data to understand what has happened in the past. This type of analysis
is used to identify patterns and trends over time.
Diagnostic analysis goes a step further than descriptive analysis by determining why something
happened. It involves more detailed data exploration and comparing different data sets to understand
the cause of a particular outcome.
Predictive analysis uses statistical models and forecasting techniques to understand the future. It
involves using data from the past to predict what could happen in the future. This type of analysis is
often used in risk assessment, marketing, and sales forecasting.
Prescriptive analysis is the most advanced type of data analysis. It not only predicts future outcomes
but also suggests actions to benefit from these predictions. It uses sophisticated tools and technologies
like machine learning and artificial intelligence to recommend decisions.
Data Analysis Process
1. Defining the objectives and opinions
2. Data Collection
3. Data Cleaning
4. Data Analysis
6. Data Storytelling
BASIC LIBRARIES FOR DATA ANALYTICS
NumPy
Pandas
SciPy
Matplotlib
Scikit-learn
Seaborn
Tensorflow
Keras
PyTorch
ASSIGNMENT
Analyze the sentiment of twitter tweets to determine the posted tweet whether
positive or negative of users on various situations despite the challenges of noisy
data, sarcasm and context-based language to provide actionable insights.
1. Understanding the Objective & Mapping Outline for analysis
2. Identifying the required variables and Collecting the Data as per requirements
One column is removed & Some data cells had filled with “NaN”.
“NaN” values are replaced with suitable category type according to variable.
4. The preprocess treated data is analyzed. However, the analysis is about tweets posted in
twitter to determine the nature of tweets.
Analysis would be about the most common type of tweets and we get the idea of what type of users are mostly
active in social media(Twitter).
Result Samples
Data Sample Positive & Negative
Data
Composition Count
“1” – Negative
“0” - Positive
Data Distribution Length of Tweets
Word Count of each tweet
Frequent Words
Frequency of Negative & Positive tweets
Removing Special Characters,
Symbols & other
Word Cloud of Label “0” Data
Word Cloud of Lable “1” Data
Predicted Labels by Logistic Regression
False – “0”
True – “1”
CONCLUSION