Twitter-Sentiment Documentation

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 48

I

III
ACKNOWLEDGEMENT

Determination and dedication with sincerity and hard work will lead to the height of success.
In spite of the obstacles faced, the valuable suggestions and their best wishes helped to complete
project titled ― TWITTER SENTIMENT ANALYSIS ― successfully.
We would like to express our gratitude to all the people behind the screen who have
helped our transform an idea into a real time application.
We would like to express our heart-felt gratitude to our parents without whom we
would not have been privileged to achieve and fulfill our dreams. A special thanks to our Secretary,
XXXXXXXXXXX, for having founded such an esteemed institution. We are also grateful to our
Principal, XXXXXXXXX, who most ably run the institution and have had the major hand in
enabling me to do my project.
We profoundly thank XXXXXXX, Head of the Department of Computer Science and
Engineering, who has been an excellent guide and also a great source of inspiration to our work.We
would also like to thank XXXXXXXX, her Technical guidance & constant encouragement.
The satisfaction and euphoria that accompany the successful completion of the task
would be great, but incomplete without the mention of the people who made it possible, whose
constant guidance and encouragement crown all the efforts with success. In this context, We would
like to thank all the other staff members, both teaching and non-teaching, who have extended their
timely help and eased my task.
ABSTRACT

Sentiment Analysis is the process of computationally determining whether a piece of writing is


positive, negative or neutral. It‘s also known as opinion mining, deriving the opinion or attitude of
a speaker.
In the last few years, use of social networking sites has been increased tremendously. Nowadays,
social networking sites generate a large amount of data. Millions of people conveniently express
their views and opinions on a wide array of topics via microblogging websites. In this paper, we
will discuss the extraction of sentiment from a famous microblogging website, Twitter where the
user posts their views and opinion. We have done sentiment analysis on tweets which help to
provide some prediction on business intelligence. We use R Programming and statistical language
for processing data. This data can be of any sector which has an Hash tag # or @ associate with the
keywords. This will result with an application which continuously alter the results when ever we
tried to analyses the results. Results of sentiment analysis on twitter data will be displayed as
different sections presenting positive, negative and neutral sentiments.
INDEX
CHAPTER NO CONTENTS PAGE NO’S
 ABSTRACT
 FIGURES
 SCREENS
 ABBREVIATIONS
1. INTRODUCTION 1
2. LITERATURE SURVEY 2
2.1 Big Data
2.2 About Python
3. SYSTEM ANALYSIS 13
3.1 Existing System
3.1.1 Disadvantages
3.2 Proposed System
3.2.1 Advantages
4. SOFTWARE REQUIREMENTS SPECIFICATION 14
4.1 Feasability Study
4.2 Software Requirements
4.3 Hardware Requirements
5. SYSTEM DESIGN 16
5.1 System Architecture
5.2 Modules
5.3 UML Diagrams
5.4 Data Flow Diagrams
6. IMPLEMENTATION 27
6.1 Source Code
7. SYSTEM TESTING 30
7.1 Introduction To Testing
7.2 Test Cases
7.3 ScreenShots
8. CONCLUSION & FUTURE ENHANCEMENTS 40
9. REFERENCES 41
LIST OF DIAGRAMS

Fig.no Name of the Diagram Page.no


2.1 Big Data 2
2.2 Data Measurements 3
3.1 All the big names using 11
python
4.2 Hardware Requirements 17
5.1 System Architecture 19
5.3.1 Use Case Diagram 20
5.3.2 Class Diagram 21
5.3.3 Sequence Diagram 22
5.3.4 Collaboration Diagram 23
5.3.5 State Chart Diagram 24
5.3.6 Component Diagram 25
5.3.7 Deployment Diagram 26
5.4.1 Level 0 DFD 27
5.4.2 Level 1 DFD 28
5.4.3 Level 2 DFD 28
5.4.4 Level 3 DFD 28
LIST OF ABBREVATIONS

 GIS : Geographic Information System


 NumPy : Ndarray Data Structure
 Scipy : Scientific Computing Function in Python
 Matlab : Matrix Laboratory
 DEAP : Distributed Evolutionary Algorithm in Python
 SCOOP : Scalable Concurrent Operations in Python
 Mlpy : Machine Learning Python
 Sympy : Symbolic Mathematics Function in Python
 Sunpy : Solar Physics in Python
 Bokeh : Data Visualization Library
 ODM : Object Data Manager
 STUB : Small Program Routine
 RAM : Random Access Memory
 ROM : Read Only Memory
1. INTRODUCTION

The opinions of others have a significant influence in our daily decision-making process. These
decisions range from buying a product such as a smart phone to making investments to choosing a
school—all decisions that affect various aspects of our daily life. Before the Internet, people would
seek opinions on products and services from sources such as friends, relatives, or consumer reports.
However, in the Internet era, it is much easier to collect diverse opinions from different people
around the world. People look to e-commerce sites (e.g., Amazon, eBay),social media (e.g.,
Facebook, Twitter) to get feedback on how a particular product or service may be perceived in the
market.
Similarly, organizations use surveys, opinion polls, and social media as a mechanism to obtain
feedback on their products and services. sentiment analysis or opinion mining is the computational
study of opinions, sentiments, and emotions expressed in text. The use of sentiment analysis is
becoming more widely leveraged because the information it yields can result in the monetization
of products and services.

1
2. LITERATURE SURVEY

2.1 BIG DATA:


Bata data is a propelling term that depicts any voluminous measure of sorted out, semi-composed
and unstructured data that can be burrowed for information. Though huge data doesn't suggest a
specific sum, the term is much of the time used when discussing Petabytes and Exabyte's of data.
Bigdata is a term for informational collections that are so extensive or complex that customary
information handling application programming is lacking to manage them. Gigantic data is used
to depict a tremendous volume of data that is expansive to the point that it's difficult to process.
The data is excessively colossal that outperforms current getting ready cutoff. Gigantic Data is an
articulation used to mean an enormous volume of both sorted out and unstructured data that is so
sweeping it is difficult to process using traditional database and programming frameworks. In
most undertaking situations the volume of information is too enormous or it moves too quick or it
surpasses current handling limit. Huge Data can possibly enable organizations to enhance
operations and make speedier, more keen choices. This information, when caught, organized,
controlled, put away, and investigated can enable an organization to increase helpful
understanding to expand incomes, to get or hold clients, and enhance operations.

Fig:2.1 Big Data 3V’s


Big data can be portrayed by 3Vs: the outrageous volume of information, the wide assortment of
sorts of information and the speed at which the information must be must procedures.
VOLUME:
Volume is the V most connected with huge information since, well, volume can be huge.
Affiliations assemble data from an arrangement of sources, including business trades, web based
systems administration and information from sensor or machine-to-machine data. Beforehand,
securing it would've been an issue For example, facebook stores pictures of around 250 billions.
VELOCITY:
Speed is the measure of how quick the information is coming in. Information streams in at an
extraordinary speed and should be managed in an opportune way. For instance, Facebook needs to
deal with a torrent of photos consistently. It needs to ingest everything, process it, document it, and
by one means or another, later, have the capacity to recover it.
VARIETY:
Data arrives in an extensive variety of associations – from sorted out, numeric data in standard
databases to unstructured substance chronicles, email, video, sound, stock ticker data and cash
related trades.

Fig:2.2 Data Measurements


An instance of colossal data might be petabytes (1,024 terabytes) or Exabyte's (1,024 petabytes) of
data involving billions to trillions of records.
2.2 ABOUT PYTHON:
Python is a programming language, which means it‘s a language both people and computers can
understand. Python was developed by a Dutch software engineer named Guido van Rossum, who
created the language to solve some problems he saw in computer languages of the time.
Python is an interpreted high-level programming language for general-purpose programming.
Created by Guido van Rossum and first released in 1991, Python has a design philosophy that
emphasizes code readability, and a syntax that allows programmers to express concepts in fewer
lines of code, notably using significant whitespace. It provides constructs that enable clear
programming on both small and large scales.
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has
a large and comprehensive standard library.
Python interpreters are available for many operating systems. C Python, the reference
implementation of Python, is open source software and has a community-based development
model, as do nearly all of its variant implementations. C Python is managed by the non-profit
Python Software Foundation.
YOU CAN USE PYTHON FOR PRETTY MUCH ANYTHING:
One significant advantage of learning Python is that it‘s a general-purpose language that can be
applied in a large variety of projects. Below are just some of the most common fields where Python
has found its use:
 Data science
 Scientific and mathematical computing
 Web development
 Computer graphics
 Basic game development
 Mapping and geography (GIS software)
PYTHON IS WIDELY USED IN DATA SCIENCE:
Python‘s ecosystem is growing over the years and it‘s more and more capable of the statistical
analysis.
It‘s the best compromise between scale and sophistication (in terms od data processing).
Python emphasizes productivity and readability.
Python is used by programmers that want to delve into data analysis or apply statistical techniques
(and by devs that turn to data science)
There are plenty of Python scientific packages for data visualization, machine learning, natural
language processing, complex data analysis and more. All of these factors make Python a great tool
for scientific computing and a solid alternative for commercial packages such as MatLab. The most
popular libraries and tools for data science are:
PANDAS:
A library for data manipulation and analysis. The library provides data structures and operations
for manipulating numerical tables and time series.
NUMPY:
The fundamental package for scientific computing with Python, adding support for large, multi-
dimensional arrays and matrices, along with a large library of high-level mathematical functions to
operate on these arrays.
SCIPY:
A library used by scientists, analysts, and engineers doing scientific computing and technical
computing.
Being a free, cross-platform, general-purpose and high-level programming language, Python has
been widely adopted by the scientific community. Scientists value Python for its precise and
efficient syntax, relatively flat learning curve and the fact that it integrates well with other languages
(e.g. C/C++).
As a result of this popularity there are plenty of Python scientific packages for data visualization,
machine learning, natural language processing, complex data analysis and more. All of these factors
make Python a great tool for scientific computing and a solid alternative for commercial packages
such as MatLab.
Fig: Job Roles of Python Developers

HERE’S OUR LIST OF THE MOST POPULAR PYTHON SCIENTIFIC


LIBRARIES AND TOOLS:
ASTROPY:
The Astropy Project is a collection of packages designed for use in astronomy. The core astropy
package contains functionality aimed at professional astronomers and astrophysicists, but may be
useful to anyone developing astronomy software.
BIOPYTHON:
Biopython is a collection of non-commercial Python tools for computational biology and
bioinformatics. It contains classes to represent biological sequences and sequence annotations, and
it is able to read and write to a variety of file formats.
CUBES:
Cubes is a light-weight Python framework and set of tools for the development of reporting and
analytical applications, Online Analytical Processing (OLAP), multidimensional analysis and
browsing of aggregated data.
DEAP:
Deap is an evolutionary computation framework for rapid prototyping and testing of ideas. It
incorporates the data structures and tools required to implement most common evolutionary
computation techniques such as genetic algorithm, genetic programming, evolution strategies,
particle swarm optimization, differential evolution and estimation of distribution algorithm.
SCOOP:
Scoop is a Python module for distributing concurrent parallel tasks on various environments, from
heterogeneous grids of workstations to supercomputers.
PSYCHOPY:
PsychoPy is a package for the generation of experiments for neuroscience and experimental
psychology. PsychoPy is designed to allow the presentation of stimuli and collection of data for a
wide range of neuroscience, psychology and psychophysics experiments.
PANDAS:
Pandas is a library for data manipulation and analysis. The library provides data structures and
operations for manipulating numerical tables and time series.
MLPY:
Mlpy is a machine learning library built on top of NumPy/SciPy, the GNU Scientific Libraries.
Mlpy provides a wide range of machine learning methods for supervised and unsupervised problems
and it is aimed at finding a reasonable compromise between modularity, maintainability,
reproducibility, usability and efficiency.
MATPLOTLIB:
Matplotlib is a python 2D plotting library which produces publication quality figures in a variety
of hardcopy formats and interactive environments across platforms. Matplotlib allows you to
generate plots, histograms, power spectra, bar charts, errorcharts, scatterplots, and more.
NUMPY:
NumPy is the fundamental package for scientific computing with Python, adding support for large,
multi-dimensional arrays and matrices, along with a large library of high-level mathematical
functions to operate on these arrays.
NETWORKX:
NetworkX is a library for studying graphs which helps you create, manipulate, and study the
structure, dynamics, and functions of complex networks.
TOMOPY:
TomoPy is an open-sourced Python toolbox to perform tomographic data processing and image
reconstruction tasks. TomoPy provides a collaborative framework for the analysis of synchrotron
tomographic data with the goal to unify the effort of different facilities and beamlines performing
similar tasks.
THEANO:
Theano is a numerical computation Python library. Theano allows you to define, optimize, and
evaluate mathematical expressions involving multi-dimensional arrays efficiently.
SYMPY:
SymPy is a library for symbolic computation and includes features ranging from basic symbolic
arithmetic to calculus, algebra, discrete mathematics and quantum physics. It provides computer
algebra capabilities either as a standalone application, as a library to other applications, or live on
the web.
SCIPY:
SciPy is a library used by scientists, analysts, and engineers doing scientific computing and
technical computing. SciPy contains modules for optimization, linear algebra, integration,
interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks
common in science and engineering.
SCIKIT-LEARN:
Scikit-learn is a machine learning library. It features various classification, regression and clustering
algorithms including support vector machines, random forests, gradient boosting, k- means and
DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy
and SciPy.
SCIKIT-IMAGE:
Scikit-image is a image processing library. It includes algorithms for segmentation, geometric
transformations, color space manipulation, analysis, filtering, morphology, feature detection, and
more.
SCIENTIFICPYTHON:
ScientificPython is a collection of modules for scientific computing. It contains support for
geometry, mathematical functions, statistics, physical units, IO, visualization, and parallelization.
SAGEMATH:
SageMath is mathematical software with features covering many aspects of mathematics, including
algebra, combinatorics, numerical mathematics, number theory, and calculus. SageMath uses the
Python, supporting procedural, functional and object-oriented constructs.
VEUSZ:
Veusz is a scientific plotting and graphing package designed to produce publication-quality plots
in popular vector formats, including PDF, PostScript and SVG.
GRAPH-TOOL:
Graph-tool is a module for the manipulation and statistical analysis of graphs.
SUNPY:
SunPy is a data-analysis environment specializing in providing the software necessary to analyze
solar and heliospheric data in Python.
BOKEH:
Bokeh is a Python interactive visualization library that targets modern web browsers for
presentation. Bokeh can help anyone who would like to quickly and easily create interactive plots,
dashboards, and data applications. Its goal is to provide elegant, concise construction of novel
graphics in the style of D3.js, but also deliver this capability with high-performance interactivity
over very large or streaming datasets.
TENSORFLOW:
TensorFlow is an open source software library for machine learning across a range of tasks,
developed by Google to meet their needs for systems capable of building and training neural
networks to detect and decipher patterns and correlations, analogous to the learning and reasoning
which humans use. It is currently used for both research and production at Google products, often
replacing the role of its closed-source predecessor, DistBelief.
NILEARN:
Nilearn is a Python module for fast and easy statistical learning on NeuroImaging data. Nilearn
makes it easy to use many advanced machine learning, pattern recognition and multivariate
statistical techniques on neuroimaging data for applications such as MVPA (Mutli-Voxel Pattern
Analysis), decoding, predictive modelling, functional connectivity, brain parcellations,
connectomes.
DMELT:
DataMelt, or DMelt, is a software for numeric computation, statistics, analysis of large data
volumes ("big data") and scientific visualization. The program can be used in many areas, such as
natural sciences, engineering, modeling and analysis of financial markets. DMelt can be used with
several scripting languages including Python/Jython, BeanShell, Groovy, Ruby, as well as with
Java.
PYTHON-WEKA-WRAPPER:
Weka is a suite of machine learning software written in Java, developed at the University of
Waikato, New Zealand. It contains a collection of visualization tools and algorithms for data
analysis and predictive modeling, together with graphical user interfaces for easy access to these
functions. The python-weka-wrapper package makes it easy to run Weka algorithms and filters
from within Python.
DASK:
Dask is a flexible parallel computing library for analytic computing composed of two components:
1) dynamic task scheduling optimized for computation, optimized for interactive computational
workloads, and 2) Big Data collections like parallel arrays, dataframes, and lists that extend
common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed
environments.
PYTHON SAVES TIME:
Even the classic ―Hello, world‖ program illustrates this point:
print("Hello, world")
For comparison, this is what the same program looks like in Java:
public class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, world");
}
}
ALL THE BIG NAMES USE PYTHON:

Fig:3.1 All the Big names using python

PYTHON KEYWORDS AND IDENTIFIER:


Keywords are the reserved words in Python.
We cannot use a keyword as variable name, function name or any other identifier. They are used
to define the syntax and structure of the Python language.
In Python, keywords are case sensitive.
There are 33 keywords in Python 3.3. This number can vary slightly in course of time.
All the keywords except True, False and None are in lowercase and they must be written as it is.
The list of all the keywords is given below.
Fig: Key Words in Python
Identifier is the name given to entities like class, functions, variables etc. in Python. It helps
differentiating one entity from another.
RULES FOR WRITING IDENTIFIERS:
Identifiers can be a combination of letters in lowercase (a to z) or uppercase (A to Z) or digits (0 to
9) or an underscore (_). Names like myClass, var_1 and print_this_to_screen, all are valid example.
An identifier cannot start with a digit. 1variable is invalid, but variable1 is perfectly fine.
Keywords cannot be used as identifiers.
>>> global = 1
File "<interactive input>", line 1
global = 1
^

SyntaxError: invalid syntax


We cannot use special symbols like !, @, #, $, % etc. in our identifier.

>>> a@ = 0
File "<interactive input>", line 1
a@ = 0
^

SyntaxError: invalid syntax


Identifier can be of any length.

PYTHON:
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has
a large and comprehensive standard library.
Python interpreters are available for many operating systems. C Python, the reference
implementation of Python, is open source software and has a community-based development
model, as do nearly all of its variant implementations. C Python is managed by the non-profit
Python Software Foundation.

Fig: Python Logo


3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM:
The Existing Database is not able to process the big amount of data within specified amount of
time. Also, this type of database is limited for processing of structured data and has a limitation
when dealing with a large amount of data. So, the traditional solution cannot help an organization
to manage and process unstructured data.
3.1.1 DISADVANTAGES OF EXISTING SYSTEM:
 The limitations of available systems are not sufficient to deal with the complex structure of the
big data. In this section, we present some of the limitations that are present in the existing
system.
 The available systems like Twitter-Monitor and Real Time Twitter Trend Mining System
require extensive data cleaning, data scraping and integration strategies that will ultimately
increase the overhead.
 For real time analytics, the available system is inefficient.
 It is very time consuming process to analyze the huge amount of data in a short period of time.
3.2 PROPOSED SYSTEM:
Python's a great language for writing "sentiment analysis" applications - things which start small
with a few lines of experimental code and then grow. Sentiment analysis encompasses of a few
easy-to-use, easy to configure command-line tools, typically used in conjunction with data.
3.2.1 ADVANTAGES:
 Analysis helps organisation to know more about the customers
 Adjust marketing strategy
 Develop product quality
 Improve customer service
4. SOFTWARE REQUIREMENT SPECIFICATION

The reason for this SRS record is to distinguish the necessities and functionalities for Intelligent
Network Backup Tool. The SRS will characterize how our group and the customer consider the last
item and the attributes or usefulness it must have. This record additionally makes a note of the
discretionary prerequisites which we intend to execute yet are not required for the working of the
venture.
This stage assesses the required necessities for the Images Processing for an orderly method for
assessing the prerequisites a few procedures are included. The initial step associated with dissecting
the prerequisites of the framework is perceiving the idea of framework for a solid examination and
all the case are defined to better comprehend the investigation of the dataset.
INTENDED AUDIENCE AND READING SUGGESTIONS:
This record is proposed for extend engineers, directors, clients, analyzers and documentation
journalists. This report goes for examining plan and execution imperatives,conditions, framework
highlights, outside interface prerequisites and other non utilitarian necessities.
IDENTIFICATION OF NEEDS:
The first and imperative need for a business firm or an association is to know how they are
performing in the market and parallelly they have to know how to conquer their rivals in the market.
To do as such we have to investigation our information in view of all the accessible variables
4.1 FEASIBILITY STUDY:
A credibility contemplate expects to fair-mindedly and soundly uncover the qualities and
inadequacies of a present business or proposed meander, openings and threats present in nature,
the benefits required to bring through, and in the long run the prospects for advance. In its most
clear terms, the two criteria to judge believability are incurred significant injury required and
motivator to the fulfilled.
An inside and out arranged feasibility ponder should give a recorded establishment of the
business or wander, a delineation of the thing or organization, accounting explanations, purposes
of enthusiasm of the operations and organization, publicizing examination and game plans,
budgetary data, authentic necessities and cost duties. All things considered, plausibility looks at
go before specific change and wander utilization. There are three sorts of attainability
 Economical Feasibility
 Technical Feasibility
 Operational Feasibility
ECONOMICAL FEASIBILITY:
The electronic structure manages the present existing system's data stream and technique
absolutely and should make each one of the reports of the manual structure other than a
substantial gathering of the other organization reports. It should be filled in as an electronic
application with specific web server and database server. Advance a segment of the associated
trades happen in different ranges. Open source programming like TOMCAT,JAVA, MySQL and
Linux is used to restrict the cost for the Customer. No extraordinary wander need to manage the
instrument.
TECHNICAL FEASIBILITY:
Surveying the particular probability is the trickiest bit of a believability consider. This is in light
of the fact that, starting at the present moment, not a lot of point by point layout of the system,
making it difficult to get to issues like execution, costs on (by excellence of the kind of
development to be passed on) et cetera.
Different issues must be considered while doing a particular examination. Grasp the differing
progressions required in the proposed system. Before starting the wander, we should be clear
about what are the advances that are to be required for the change of the new system. Check
whether the affiliation by and by has the required advancements. Is the required development
open with the affiliation?
In case so is the utmost sufficient?
For instance – "Will the present printer have the ability to manage the new reports and structures
required for the new system?"
OPERATIONAL FEASIBILITY:
Proposed wanders are profitable just if they can be changed into information systems that will
meet the affiliations working necessities. Simply communicated, this trial of probability asks with
reference to whether the structure will work when it is made and presented. Are there genuine
obstacles to Implementation? Here are questions that will help test the operational achievability of
a wander.
 Is there sufficient help for the wander from organization from customers? In case the
present structure is particularly cherished and used to the extent that individuals won't have the
ability to see purposes behind change, there may be resistance.
 Are the present business methodologies qualified to the customer? If they are not, Users
may welcome a change that will accomplish a more operational and supportive systems.
 Have the customer been locked in with the orchestrating and change of the wander?
Early commitment decreases the chances of impenetrability to the structure.
4.1 SOFTWARE REQUIREMENTS:
Operating System : Windows 7 , Windows 8, (or higher versions)
Language : Python 3.5
Mozilla Firefox(or any browser)
4.2 HARDWARE REQUIREMENTS:
Processor : Pentium 3,Pentium 4 and higher
RAM : 2GB/4GB RAM and higher
Hard disk : 40GB and higher

Processor : Pentium

RAM : 2GB

Hard disk : 80GB

Fig:4.2 Memories
5. SYSTEM DESIGN

The System Design Document describes the system requirements, operating environment, system
and subsystem architecture, files and database design, input formats, output layouts, human-
machine interfaces, detailed design, processing logic, and external interfaces.
This section describes the system in narrative form using non-technical terms. It should provide a
high-level system architecture diagram showing a subsystem breakout of the system, if applicable.
The high-level system architecture or subsystem diagrams should, if applicable, show interfaces to
external systems. Supply a high-level context diagram for the system and subsystems, if applicable.
Refer to the requirements trace ability matrix (RTM) in the Functional Requirements Document
(FRD), to identify the allocation of the functional requirements into this design document.
This section describes any constraints in the system design (reference any trade-off analyses
conducted such, as resource use versus productivity, or conflicts with other systems) and includes
any assumptions made by the project team in developing the system design.
The organization code and title of the key points of contact (and alternates if appropriate) for the
information system development effort. These points of contact should include the Project
Manager, System Proponent, User Organization, Quality Assurance (QA) Manager, Security
Manager, and Configuration Manager, as appropriate.
5.1 SYSTEM ARCHITECTURE:
Fig:5.1 System architecture
5.2 MODULES:
 Loading Data
 Analyse Data
 Exports
5.3 UML DIAGRAMS:
UML (Unified Modeling Language) is a standard vernacular for choosing, envisioning, making,
and specifying the collectibles of programming structures. UML is a pictorial vernacular used to
make programming blue prints. It is in like way used to exhibit non programming structures
similarly like process stream in a gathering unit and so forth.
UML is not a programming vernacular yet rather instruments can be utilized to make code in
different tongues utilizing UML graphs. UML has an incite relationship with question composed
examination and outline. UML expect a fundamental part in portraying trade viewpoints of a
structure.
USE CASE DIAGRAM:
The use case graph is for demonstrating the direct of the structure. This chart contains the course
of action of use cases, performing pros and their relationship. This chart might be utilized to address
the static perspective of the structure.
Fig:5.3.1 Use Case Diagram
In the above diagram, the performing specialists are customer, structure, client, server, Python and
data cleaning. The client exchanges the data to the system which disengages the data into squares
and gives the data to Python. By then Python does the data cleaning which is just performing data
connection and data repairing, by then the results will be secured. These results can be seen using
Python and can be secured in server for future reason. The gained results can be created as reports
by the customer.
CLASS DIAGRAM:
The class graph is the most normally pulled in layout UML. It addresses the static course of action
perspective of the structure. It solidifies the strategy of classes, interfaces, joint attempts and their
affiliations.
Fig:5.3.2 Class Diagram

In the above class diagram, the relationship that is the dependence between each one of the classes
is sketched out. Additionally, even the operations performed in each and every class is similarly
appeared.
SEQUENCE DIAGRAM:
This is a cooperation design which tends to the time requesting of messages. It includes set of parts
and the messages sent and gotten by the instance of parts. This chart is utilized to address the
dynamic perspective of the structure.
Fig:5.3.3 Sequence Diagram

A succession outline indicates question communications masterminded in time arrangement. In the


above graph, there are five articles cooperating with each other. Each protest has a vertical dashed
line which speaks to the presence of a question over some undefined time frame. This graph has
additionally a tall, thin rectangle which is called center of control that demonstrates the timeframe
amid which a protest is playing out an activity, either specifically or through a subordinate system.
COLLABORATION DIAGRAM:
This is a support format, which tends to the principal relationship of articles that send and get
messages. It incorporates set of parts, connectors that interface the parts and the messages sent and
get by those parts. This graph is utilized to address the dynamic perspective of the framework.
Fig:5.3.4 Collaboration Diagram

The joint effort outline contains articles, way and arrangement number. In the above graph, there
are five questions specifically customer, client, framework, Python and server. These items are
connected to each other utilizing a way. A succession number show the time request of a message.
STATE CHART DIAGRAM:
The state graph contains the game-plan of states, occasions and exercises. This graph is noteworthy
for tending to the lead of the interface, class and made effort. The key centralization of state outline
is to show the occasion sort out lead of the request. The state follows diagram the dynamic
perspective of the framework.
Fig:5.3.5 State Chart Diagram

A state outline graph contains two components called states and progress. States speak to
circumstances amid the life of a question. We can without much of a stretch outline a state in Smart
Draw by utilizing a rectangle with adjusted corners. Change is a strong bolt speaks to the way
between various conditions of a question. Name the change with the occasion that activated it and
the activity those outcomes from it.
COMPONENT DIAGRAM:
The imperative portion of part format is segment. This diagram demonstrates within parts,
connectors and ports that understand the piece. Precisely when section is instantiated, duplicates of
inside parts are besides instantiated.

Fig:5.3.6 Component Diagram


A part outline is spoken to utilizing segment. A part is a physical building piece of the framework.
It is spoken to as a rectangle with tab. Part outline portrays the inward handling of the venture. The
information is sent to the Python where sqoop is utilized for information cleaning and the reports
are produced utilizing hive.
DEPLOYMENT DIAGRAM:
The fundamental fragment in game-plan layout is a middle point. The strategy of focus focuses and
their relationship with other is tended to utilizing sending plot. The sending outline is identified
with the area diagram, that is one focus purpose obviously of activity format frequently
includes no short of what one sections. This outline is in like way critical for tending to the static
perspective of the framework.

Fig:5.3.7 Deployment Diagram


An arrangement graph is spoken to utilizing hub. A hub is a physical asset that executes code parts.
They are likewise used to portray run time handling of hubs. The information is sent to the Python
where sqoop is utilized for information cleaning and the reports are produced utilizing hive.
5.4 DATA FLOW DIAGRAMS:
An information stream design (DFD) is a graphical portrayal of the "stream" of information through
a data framework, demonstrating its strategy edges. A DFD is a significant part of the time utilized
as a preparatory stroll to make an overview of the framework, which can later be cleared up. DFDs
can in like way be utilized for the depiction of information prepare. A DFD indicates what sort of
data will be sense of duty regarding and yield from the structure, where the
information will begin from and go to, and where the information will be secured. It doesn't
demonstrate data about the organizing of process or data about whether strategy will work in game-
plan or in parallel.
DFD SYMBOLS:
In the DFD, there are four symbols
 A square defines a source or destination of system data.

 An arrow identifies data flow. It is the pipeline through which the information flows.

 A circle represents a process that transforms incoming data flow into outgoing data flow.

 An open rectangle is a data store, data at rest or a temporary repository of data.

LEVEL 0: SYSTEM INPUT/ OUTPUT LEVEL:


A level 0 DFD describes the system wide boundaries, dealing input to and output flow from the
system and major processes.

Fig:5.4.1 Level 0 DFD


DFD Level 0 is in like way called a Context Diagram. It's a urgent review of the entire structure or
process being bankrupt down or appeared. It's required to be an at first watch, demonstrating the
framework as a particular surprising state handle, with its relationship to outside substances.
LEVEL 1: SUB SYSTEM LEVEL DATA FLOW:
Level 1 DFD delineates the accompanying level of purposes of enthusiasm with the data stream
between subsystems. The Level 1 DFD exhibits how the system is secluded into sub-structures
(shapes), each of which oversees no less than one of the data streams to or from an outside pro,
and which together give most of the helpfulness of the system as a rule.

Fig:5.4.2 Level 1 DFD

LEVEL 2: FILE LEVEL DETAIL DATA FLOW :


Plausibility and danger examination are connected here from various perspectives. The level 2
DFD elucidates the fundamental level of understanding about the system's working.

Fig:5.4.3 Level 2 DFD

LEVEL 3:

Fig:5.4.4 Level 3 DFD


6. IMPLEMENTATION

6.1 SAMPLE CODE:


import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.model_selection import train_test_split # function for splitting data to train and test
sets

import nltk
from nltk.corpus import stopwords
from nltk.classify import SklearnClassifier

from wordcloud import WordCloud,STOPWORDS


import matplotlib.pyplot as plt

data = pd.read_csv('C:/Users/sravan/Desktop/Sentiment.csv')
data = data[['text','sentiment']]
train, test = train_test_split(data,test_size = 0.1)
train = train[train.sentiment != "Neutral"]
train_pos = train[ train['sentiment'] == 'Positive']
train_pos = train_pos['text']
train_neg = train[ train['sentiment'] == 'Negative']
train_neg = train_neg['text']

def wordcloud_draw(data, color = 'black'):


words = ' '.join(data)
cleaned_word = " ".join([word for word in words.split()
if 'http' not in word
and not word.startswith('@')
and not word.startswith('#')
and word != 'RT'])
wordcloud = WordCloud(stopwords=STOPWORDS,
background_color=color,
width=2500,
height=2000).generate(cleaned_word)
plt.figure(1,figsize=(13, 13))
plt.imshow(wordcloud)
plt.axis('off')
plt.show()

print("Positive words")
wordcloud_draw(train_pos,'white')
print("Negative words")
wordcloud_draw(train_neg)

import nltk
nltk.download()

tweets = []
stopwords_set = set(stopwords.words("english"))

for index, row in train.iterrows():


words_filtered = [e.lower() for e in row.text.split() if len(e) >= 3]
words_cleaned = [word for word in words_filtered
if 'http' not in word
and not word.startswith('@')
and not word.startswith('#')
and word != 'RT']

words_without_stopwords = [word for word in words_cleaned if not word in stopwords_set]


tweets.append((words_cleaned,row.sentiment))

test_pos = test[ test['sentiment'] == 'Positive']


test_pos = test_pos['text']
test_neg = test[ test['sentiment'] == 'Negative']
test_neg = test_neg['text']

def get_words_in_tweets(tweets):
all = []
for (words, sentiment) in tweets:
all.extend(words)
return all

def get_word_features(wordlist):
wordlist = nltk.FreqDist(wordlist)
features = wordlist.keys()
return features

w_features = get_word_features(get_words_in_tweets(tweets))

def extract_features(document):
document_words = set(document)
features = {}
for word in w_features:
features['containts(%s)' % word] = (word in document_words)
return features

wordcloud_draw(w_features)

training_set = nltk.classify.apply_features(extract_features,tweets)
classifier = nltk.NaiveBayesClassifier.train(training_set)

neg_cnt = 0
pos_cnt = 0
for obj in test_neg:
res = classifier.classify(extract_features(obj.split()))
if(res == 'Negative'):
neg_cnt = neg_cnt + 1
for obj in test_pos:
res = classifier.classify(extract_features(obj.split()))
if(res == 'Positive'):
pos_cnt = pos_cnt + 1

print('[Negative]: %s/%s ' % (len(test_neg),neg_cnt))


print('[Positive]: %s/%s ' % (len(test_pos),pos_cnt))
7. SYSTEM TESTING

7.1 INTRODUCTION TO TESTING:


Testing is a procedure, which uncovers blunders in the program. Programming testing is a basic
component of programming quality affirmation and speaks to a definitive audit of determination,
outline and coding. The expanding perceivability of programming as a framework component and
chaperon costs related with a product disappointment are propelling variables for we arranged,
through testing. Testing is the way toward executing a program with the plan of finding a mistake.
The plan of tests for programming and other built items can be as trying as the underlying outline
of the item itself It is the significant quality measure utilized amid programming improvement.
Amid testing, the program is executed with an arrangement of experiments and the yield of the
program for the experiments is assessed to decide whether the program is executing as it is relied
upon to perform.
TESTING STRATEGIES:
A technique for programming testing coordinates the outline of programming experiments into an
all around arranged arrangement of steps that outcome in fruitful improvement of the product. The
procedure gives a guide that portrays the means to be taken, when, and how much exertion, time,
and assets will be required. The procedure joins test arranging, experiment configuration, test
execution, and test outcome gathering and assessment. The procedure gives direction to the
specialist and an arrangement of points of reference for the chief. Due to time weights, advance
must be quantifiable and issues must surface as ahead of schedule as would be prudent
Keeping in mind the end goal to ensure that the framework does not have blunders, the distinctive
levels of testing techniques that are connected at varying periods of programming improvement
are:
UNIT TESTING:
Unit Testing is done on singular modules as they are finished and turned out to be executable. It is
restricted just to the planner's prerequisites. It centers testing around the capacity or programming
module. It Concentrates on the interior preparing rationale and information structures. It is
rearranged when a module is composed with high union
 Reduces the quantity of experiments
 Allows mistakes to be all the more effectively anticipated and revealed Black Box
TESTING:
It is otherwise called Functional testing. A product testing strategy whereby the inward workings
of the thing being tried are not known by the analyzer. For instance, in a discovery test on a product
outline the analyzer just knows the information sources and what the normal results ought to be and
not how the program touches base at those yields. The analyzer does not ever inspect the
programming code and does not require any further learning of the program other than its
determinations. In this system some experiments are produced as information conditions that
completely execute every single practical prerequisite for the program. This testing has been
utilizations to discover mistakes in the accompanying classifications:
 Incorrect or missing capacities
 Interface blunders
 Errors in information structure or outside database get to
 Performance blunders
 Initialization and end blunders.
In this testing just the yield is checked for rightness.
WHITE BOX TESTING:
It is otherwise called Glass box, Structural, Clear box and Open box testing . A product testing
procedure whereby express learning of the inner workings of the thing being tried are utilized to
choose the test information. Not at all like discovery testing, white box testing utilizes particular
learning of programming code to inspect yields. The test is precise just if the analyzer comprehends
what the program should do. He or she would then be able to check whether the program veers from
its expected objective. White box testing does not represent blunders caused by oversight, and all
obvious code should likewise be discernable. For an entire programming examination, both white
box and discovery tests are required.
In this the experiments are produced on the rationale of every module by drawing stream diagrams
of that module and sensible choices are tried on every one of the cases. It has been utilizations to
produce the experiments in the accompanying cases:
 Guarantee that every single free way have been Executed.
 Execute every single intelligent choice on their actual and false Sides.
INTEGRATION TESTING:
Coordination testing guarantees that product and subsystems cooperate an entirety. It tests the
interface of the considerable number of modules to ensure that the modules carry on legitimately
when coordinated together. It is characterized as a deliberate procedure for developing the product
engineering. In the meantime reconciliation is happening, lead tests to reveal blunders related with
interfaces. Its Objective is to take unit tried modules and assemble a program structure in view of
the recommended outline
Two Approaches of Integration Testing
 Non-incremental Integration Testing
 Incremental Integration Testing
SYSTEM TESTING:
Framework testing includes in-house testing of the whole framework before conveyance to the
client. Its point is to fulfill the client the framework meets all necessities of the customer's
determinations. This testing assesses working of framework from client perspective, with the
assistance of particular report. It doesn't require any inward learning of framework like plan or
structure of code.
It contains utilitarian and non-useful zones of utilization/item. Framework Testing is known as a
super arrangement of a wide range of testing as all the significant sorts of testing are shrouded in it.
In spite of the fact that attention on sorts of testing may differ on the premise of item, association
procedures, course of events and necessities. Framework Testing is the start of genuine testing
where you test an item all in all and not a module/highlight.
ACCEPTANCE TESTING:
Acknowledgment testing, a testing method performed to decide if the product framework has met
the prerequisite particulars. The principle motivation behind this test is to assess the framework's
consistence with the business necessities and check in the event that it is has met the required criteria
for conveyance to end clients. It is a pre-conveyance testing in which whole framework is tried at
customer's site on genuine information to discover blunders. The acknowledgment test bodies of
evidence are executed against the test information or utilizing an acknowledgment test content and
afterward the outcomes are contrasted and the normal ones.
The acknowledgment test exercises are completed in stages. Right off the bat, the essential tests are
executed, and if the test outcomes are palatable then the execution of more intricate situations are
done.
TEST APPROACH:
A Test approach is the test system usage of a venture, characterizes how testing would be done. The
decision of test methodologies or test technique is a standout amongst the most intense factor in the
achievement of the test exertion and the precision of the test designs and gauges.
Testing should be possible in two ways
 Bottom up approach
 Top down approach
BOTTOM UP APPROACH:
Testing can be performed beginning from littlest and most reduced level modules and continuing
each one in turn. In this approach testing is directed from sub module to primary module, if the
fundamental module is not built up a transitory program called DRIVERS is utilized to recreate the
principle module. At the point when base level modules are tried consideration swings to those on
the following level that utilization the lower level ones they are tried exclusively and afterward
connected with the already inspected bring down level modules TOP DOWN APPROACH:
In this approach testing is directed from fundamental module to sub module. in the event that the
sub module is not built up an impermanent program called STUB is utilized for mimic the sub
module. This sort of testing begins from upper level modules. Since the nitty gritty exercises more
often than not performed in the lower level schedules are not given stubs are composed. A stub is
a module shell called by upper level module and that when achieved legitimately will restore a
message to the calling module demonstrating that appropriate association happened.
VALIDATION:
The way toward assessing programming amid the improvement procedure or toward the finish of
the advancement procedure to decide if it fulfills determined business prerequisites. Approval
Testing guarantees that the item really addresses the customer's issues. It can likewise be
characterized as to exhibit that the item satisfies its proposed utilize when sent on proper condition.
The framework has been tried and actualized effectively and along these lines guaranteed that every
one of the prerequisites as recorded in the product necessities determination are totally satisfied.
7.2 TEST CASES:
Experiments include an arrangement of steps, conditions and sources of info that can be utilized
while performing testing undertakings. The principle expectation of this action is to guarantee
whether a product passes or bombs as far as usefulness and different perspectives. The way toward
creating experiments can likewise help discover issues in the prerequisites or plan of an application.
Experiment goes about as the beginning stage for the test execution, and in the wake of applying
an arrangement of information esteems, the application has a conclusive result and leaves the
framework at some end point or otherwise called execution post condition.
TABLE TEST CASES:

7.3 SCREENSHOTS:
8. CONCLUSION

In this study, a new method for recognizing sentiment in iris has been proposed. The analysis shows
that overall sentiment (both in iris and text) is governed by little sentiment bearing terms. In order
to exploit this fact, a new method that uses Keyword Spotting (KWS) to search for sentiment
bearing terms in iris has been proposed. By focusing on the terms that impact decision and ignoring
non-sentiment bearing words/phrases, the overall system is more immune to speech recognition
errors. Additionally, a new method to create the sentiment bearing keyword list for KWS has also
been proposed.

FUTURE ENHANCEMENT

Two of the three species were gathered in the Gaspé Peninsula "all from a similar field, and singled
out that day and estimated in the meantime by a similar individual with a similar mechanical
assembly.
9. REFERENCES

[1] S. Johnson, ‖Internet changes everything: Revolutionizing public participation and access to
government information through the Internet‖, Administrative Law Review, Vol. 50, No. 2 (Spring
1998) , pp. 277-337
[2] D. Chrysanthos. ‖Strategic manipulation of internet opinion forums: Implications for
consumers and firms.‖ Management Science 52.10 (2006): 1577-1593.
[3] M. Wollmer, et al. ‖Youtube movie reviews: Sentiment analysis in an audio-visual context.‖
Intelligent Systems, IEEE (2013): pages 46-53.
[4] J. Naughton, ‖The internet: is it changing the way we think?‖, The Gaurdian, Saturday 14 August
2010
[5] G. Mishne and N. S. Glance, ‖Predicting movie sales from blogger sentiment,‖ in AAAI 2006
Spring Symposium on Computational Approaches to Analyzing Weblogs, 2006.
[6] L. Barbosa, and J. Feng, ‖Robust sentiment detection on twitter from biased and noisy data.‖, in
Proceedings of the International Conference on Computational Linguistics (COLING-2010). 2010.
[7] E. Cambria, N. Howard, Y. Xia, and T. S. Chua, ‖Computational Intelligence for Big Social
Data Analysis‖, IEEE Computational Intelligence Magazine, 11(3), 8-9, 2016.
[8] E. Cambria, B. Schuller, Y. Xia, and B. White, ‖New avenues in knowledge bases for natural
language processing‖, Knowledge-Based Systems, 108(C), 1-4, 2016.
[9] M. Bautin, L. Vijayarenu, and S. Skiena. ‖International sentiment analysis for news and blogs.‖,
in Proceedings of the International AAAI Conference on Weblogs and Social Media (ICWSM-
2008). 2008.
[10] I. Becker and V. Aharonson. ‖Last but definitely not least: on the role of the last sentence in
automatic polarity-classification.‖, in Proceedings of the ACL 2010 Conference Short Papers. 2010.

You might also like