Predicting Hourly Boarding Demand of Bus Passengers 3.6.2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 81

Predicting hourly boarding demand of bus passengers

using imbalanced records from smart-cards A deep


learning approach
ABSTRACT

The tap-on smart-card data provides a valuable source to learn passengers’ boarding
behaviour and predict future travel demand. However, when examining the smart-
card records (or instances) by the time of day and by boarding stops, the positive
instances (i.e. boarding at a specific bus stop at a specific time) are rare compared to
negative instances (not boarding at that bus stop at that time). Imbalanced data has
been demonstrated to significantly reduce the accuracy of machine learning models
deployed for predicting hourly boarding numbers from a particular location. This
paper addresses this data imbalance issue in the smart-card data before applying it to
predict bus boarding demand. We propose the deep generative adversarial nets
(Deep-GAN) to generate dummy travelling instances to add to a synthetic training
dataset with more balanced travelling and non-travelling instances. The synthetic
dataset is then used to train a deep neural network (DNN) for predicting the
travelling and non-travelling instances from a particular stop in a given time window.
The results show that addressing the data imbalance issue can significantly improve
the predictive model’s performance and better fit ridership’s actual profile.
Comparing the performance of the Deep-GAN with other traditional resampling
methods shows that the proposed method can produce a synthetic training dataset
with a higher similarity and diversity and, thus, a stronger prediction power. The
paper highlights the significance and provides practical guidance in improving the
data quality and model performance on travel behaviour prediction and individual
travel behaviour analysis.
INTRODUCTION

THE rapid progress of urbanization leads to expansion of population in the urban


area, increased demand for travel and associated adverse effects in traffic congestion
and air pollution [1]–[3]. Public transport has been widely recognized as a green and
sustainable mode of transportation to relieve such transport problems. As a
conventional public transport mode, buses have always played a dominant role in
passenger transportation [4], [5]. However, unreliable travel time, bus-bunching and
crowding have led to low level-of services for buses [6]–[8]. This has decreased the
bus ridership in many cities, particularly with the advent of ride-hailing services in
recent years [9]–[11]. To sustain and increase bus patronage, bus operators must find
a way to improve its performance and enhance its image and attraction. Advanced
operation and management for bus systems can significantly improve the level-of-
service and service reliability, which in turn helps increase the bus ridership [12]–
[14]. This requires understanding the spatial and temporal variations in passenger
demand and making necessary changes on the supply side [15]–[18]. The smart-card
system is initially designed for automatic fare collection. As the system also records
the boarding information, for example, who gets on buses, where and when, smart-
card data has become a ready-made and valuable data source for spatio-temporal
demand analysis [19], public transport planning [20]–[23], and further analysis of
emission reduction for the sustainable transport [24], [25]. From the smart-card data,
we can easily observe the passenger flow at bus stops and on bus lines, and from
which to derive the spatial and temporal characteristics of bus trips [26], [27].
However, extracting useful information from big data automatically still poses a
significant challenge. In recent years, machine learning techniques have emerged as
an efficient and effective approach to analyzing large smart-card datasets. For
instance, Liu et al. [28] captured key features in public transport passenger flow
prediction via a decision tree model. Zuo et al. [29] built a three-stage framework
with a neural network model to forecast the individual accessibility in bus systems.
In our own recent research [30], we demonstrate that smartcard data
combined with machine learning techniques can be a powerful approach for
predicting the spatial and temporal patterns of bus boarding. The predictions were
found to be highly accurate at an aggregated level, averaged over all travelers.
However, our research has also thrown light on the data imbalance issues, when
trying to predict travel behavior at the level of individual travelers and fine spatial-
temporal details. For instance, the boarding of an individual smart-card holder at a
specific stop during a particular time window (e.g. an hour) is a rare event: most of
the records would denote negative (non-travelling, or not boarding at this bus stop
during this time window) instances, and only a few are positive (travelling, boarding
at this stop at this time) instances. Such data imbalance issues can significantly
reduce the efficiency and accuracy of machine learning models deployed for
predicting travel behavior at the level of individual travelers and fine spatial-
temporal details. This motivates this current study where we propose an over-
sampling method, deep generative adversarial nets (Deep-GAN) model (initially
developed in the context of image generation) to address the data imbalance issue in
predicting disaggregate boarding demand (i.e. individual passengers boarding
behavior during each hour of the day). We show that, with the synthesized and more
balanced database, the prediction accuracy improves significantly. The performance
of the proposed approach, based on the Deep- GAN method, is further benchmarked
against other resampling methods (including Synthetic Minority Oversampling
Technique and Random Under-Sampling) and is shown to have superior
performance.
The rest of the paper is organized as follows. Section II reviews the key
resembling methods and their applications in transport studies. Section III describes
the specific data imbalance issue in predicting the hourly boarding demand. Section
IV uses a Deep-GAN to provide a synthesized, more balanced training data sample
and a deep neural network (DNN) to predict the individual smart-card holders’
boarding actions (boarding or not boarding) in any hour of a day. Section V applies
the proposed method to a real-world case study, and the results are discussed in
Section VI. Finally, Section VII summarizes the main findings and contributions of
this paper and suggests future investigations.
EXISTING SYSTEM

Smart card data has emerged in recent years and provide a comprehensive, and cheap
source of information for planning and managing public transport systems. This
paper presents a multi-stage machine learning framework to predict passengers’
boarding stops using smart card data.

The framework addresses the challenges arising from the imbalanced nature of the
data (e.g. many non-travelling data) and the ‘many-class’ issues (e.g. many possible
boarding stops) by decomposing the prediction of hourly ridership into three stages:
whether to travel or not in that one-hour time slot, which bus line to use, and at
which stop to board. A simple neural network architecture, fully connected networks
(FCN), and two deep learning architectures, recurrent neural networks (RNN) and
long short-term memory networks (LSTM) are implemented. The proposed approach
is applied to a real-life bus network.

We show that the data imbalance has a profound impact on the accuracy of
prediction at individual level. At aggregated level, FCN is able to accurately predict
the rideship at individual stops, it is poor at capturing the temporal distribution of
ridership. RNN and LSTM are able to measure the temporal distribution but lack the
ability to capture the spatial distribution through bus lines.

Disadvantages
• The data generated by SMOTE and ADASYN are susceptible to outliers. They may
generate some data in the majority data space due to minority outlier instances
(usually noisy data), causing blurred classification borderlines and making the
learning difficulties of the classification model.
• The under-sampling methods usually have to pay the price of losing parts of the
information of the majority of data because they have to remove a part of the data.
Although the Easy Ensemble and Balance Cascade tried to solve the problem of lost
information, they increased the number of models tens of times, significantly
increasing the computational burden.
• Little study has noticed the loss caused by the data imbalance issue in the public
transport system. There is also no research to validate the efficiency of the existing
resampling methods on imbalanced data in the boarding prediction task.

Proposed System

• The data imbalance issue in the public transport system has received little attention,
and this study is the first to focus on this issue and propose a deep learning approach,
Deep-GAN, to solve it.
• This study compared the differences in similarity and diversity between the real and
synthetic travelling instanced generated from Deep-GAN and other over-sampling
methods. It also compared different resampling methods for the improvement of data
quality by evaluating the performance of the next travel behaviour prediction model.
This is the first validation and evaluation of the performance of different data
resampling methods based on real data in the public transport system.
• This paper innovatively modelled individual boarding behaviour, which is
uncommon in other travel demand prediction tasks. Compared to the popular
aggregated prediction, this individual-based model is able to provide more details on
the passengers’ behaviour, and the results will benefit the analysis of the similarities
and heterogeneities.
Advantages

 The system proposes an over-sampling method, deep generative adversarial


nets (Deep-GAN) model (initially developed in the context of image
generation) to address the data imbalance issue in predicting disaggregate
boarding demand (i.e. individual passengers boarding behavior during each
hour of the day).
 The system shows that, with the synthesized and more balanced database, the
prediction accuracy improves significantly. The performance of the proposed
approach, based on the Deep- GAN method, is further benchmarked against
other resampling methods (including Synthetic Minority Oversampling
Technique and Random Under-Sampling) and is shown to have superior
performance.

SYSTEM REQUIREMENTS

➢ H/W System Configuration:-

➢ Processor - Pentium –IV


➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.
 Back-End : Django-ORM

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).


Architecture Diagram
Service Provider
Login,
Accepting all Information Browse and Train & Test Data Sets,
Web Server Datasets Results Storage View Trained and Tested Accuracy in Bar
Chart,
Accessing View Trained and Tested Accuracy
Data Process all user Results,
queries View Prediction Of Hourly Boarding
Store and retrievals
Demand Type,
View Hourly Boarding Demand Type
Ratio,
WEB Download Trained Data Sets,
Database
ViewRemote
Hourly User
Boarding Demand Type
Tweet Server
REGISTER AND LOGIN,
Ratio Results,
Predicting Hourly Boarding Demand Type,
ViewYOUR
VIEW All Remote
PROFILE. Users,
Modules
Service Provider

In this module, the Service Provider has to login by using valid user name and
password. After login successful he can do some operations such as
Browse and Train & Test Data Sets, View Trained and Tested Accuracy in Bar
Chart, View Trained and Tested Accuracy Results, View Prediction Of Hourly
Boarding Demand Type, View Hourly Boarding Demand Type Ratio,
Download Trained Data Sets, View Hourly Boarding Demand Type Ratio
Results, View All Remote Users,

View and Authorize Users


In this module, the admin can view the list of users who all registered. In this,
the admin can view the user’s details such as, user name, email, address and
admin authorizes the users.

Remote User
In this module, there are n numbers of users are present. User should register
before doing any operations. Once user registers, their details will be stored to
the database. After registration successful, he has to login by using authorized
user name and password. Once Login is successful user will do some
operations like REGISTER AND LOGIN, Predicting Hourly Boarding
Demand Type, VIEW YOUR PROFILE.
Decision tree classifiers
Decision tree classifiers are used successfully in many diverse areas. Their most
important feature is the capability of capturing descriptive decision making
knowledge from the supplied data. Decision tree can be generated from training
sets. The procedure for such generation based on the set of objects (S), each
belonging to one of the classes C1, C2, …, Ck is as follows:

Step 1. If all the objects in S belong to the same class, for example Ci, the
decision tree for S consists of a leaf labeled with this class
Step 2. Otherwise, let T be some test with possible outcomes O1, O2,…, On.
Each object in S has one outcome for T so the test partitions S into subsets S1,
S2,… Sn where each object in Si has outcome Oi for T. T becomes the root of
the decision tree and for each outcome Oi we build a subsidiary decision tree by
invoking the same procedure recursively on the set Si.

Gradient boosting
Gradient boosting is a machine learning technique used
in regression and classification tasks, among others. It gives a prediction model in the
form of an ensemble of weak prediction models, which are typically decision trees.[1]
[2]
When a decision tree is the weak learner, the resulting algorithm is called gradient-
boosted trees; it usually outperforms random forest.A gradient-boosted trees model is
built in a stage-wise fashion as in other boosting methods, but it generalizes the other
methods by allowing optimization of an arbitrary differentiable loss function.

K-Nearest Neighbors (KNN)


 Simple, but a very powerful classification algorithm

 Classifies based on a similarity measure


 Non-parametric
 Lazy learning
 Does not “learn” until the test example is given

 Whenever we have a new data to classify, we find its K-nearest neighbors


from the training data

Example

 Training dataset consists of k-closest examples in feature space


 Feature space means, space with categorization variables (non-metric
variables)
 Learning based on instances, and thus also works lazily because instance
close to the input vector for test or prediction may take time to occur in
the training dataset

Logistic regression Classifiers


Logistic regression analysis studies the association between a categorical dependent
variable and a set of independent (explanatory) variables. The name logistic
regression is used when the dependent variable has only two values, such as 0 and 1
or Yes and No. The name multinomial logistic regression is usually reserved for the
case when the dependent variable has three or more unique values, such as Married,
Single, Divorced, or Widowed. Although the type of data used for the dependent
variable is different from that of multiple regression, the practical use of the
procedure is similar.

Logistic regression competes with discriminant analysis as a method for analyzing


categorical-response variables. Many statisticians feel that logistic regression is more
versatile and better suited for modeling most situations than is discriminant analysis.
This is because logistic regression does not assume that the independent variables are
normally distributed, as discriminant analysis does.

This program computes binary logistic regression and multinomial logistic regression
on both numeric and categorical independent variables. It reports on the regression
equation as well as the goodness of fit, odds ratios, confidence limits, likelihood, and
deviance. It performs a comprehensive residual analysis including diagnostic residual
reports and plots. It can perform an independent variable subset selection search,
looking for the best regression model with the fewest independent variables. It
provides confidence intervals on predicted values and provides ROC curves to help
determine the best cutoff point for classification. It allows you to validate your
results by automatically classifying rows that are not used during the analysis.

Naïve Bayes

The naive bayes approach is a supervised learning method which is based on a


simplistic hypothesis: it assumes that the presence (or absence) of a particular feature
of a class is unrelated to the presence (or absence) of any other feature .
Yet, despite this, it appears robust and efficient. Its performance is comparable to
other supervised learning techniques. Various reasons have been advanced in the
literature. In this tutorial, we highlight an explanation based on the representation
bias. The naive bayes classifier is a linear classifier, as well as linear discriminant
analysis, logistic regression or linear SVM (support vector machine). The difference
lies on the method of estimating the parameters of the classifier (the learning bias).

While the Naive Bayes classifier is widely used in the research world, it is not
widespread among practitioners which want to obtain usable results. On the one
hand, the researchers found especially it is very easy to program and implement it, its
parameters are easy to estimate, learning is very fast even on very large databases, its
accuracy is reasonably good in comparison to the other approaches. On the other
hand, the final users do not obtain a model easy to interpret and deploy, they does not
understand the interest of such a technique.
Thus, we introduce in a new presentation of the results of the learning process. The
classifier is easier to understand, and its deployment is also made easier. In the first
part of this tutorial, we present some theoretical aspects of the naive bayes classifier.
Then, we implement the approach on a dataset with Tanagra. We compare the
obtained results (the parameters of the model) to those obtained with other linear
approaches such as the logistic regression, the linear discriminant analysis and the
linear SVM. We note that the results are highly consistent. This largely explains the
good performance of the method in comparison to others. In the second part, we use
various tools on the same dataset (Weka 3.6.0, R 2.9.2, Knime 2.1.1, Orange 2.0b
and RapidMiner 4.6.0). We try above all to understand the obtained results.

Random Forest
Random forests or random decision forests are an ensemble learning method for
classification, regression and other tasks that operates by constructing a multitude of
decision trees at training time. For classification tasks, the output of the random
forest is the class selected by most trees. For regression tasks, the mean or average
prediction of the individual trees is returned. Random decision forests correct for
decision trees' habit of overfitting to their training set. Random forests generally
outperform decision trees, but their accuracy is lower than gradient boosted trees.
However, data characteristics can affect their performance.
The first algorithm for random decision forests was created in 1995 by Tin Kam
Ho[1] using the random subspace method, which, in Ho's formulation, is a way to
implement the "stochastic discrimination" approach to classification proposed by
Eugene Kleinberg.
An extension of the algorithm was developed by Leo Breiman and Adele Cutler, who
registered "Random Forests" as a trademark in 2006 (as of 2019, owned by Minitab,
Inc.).The extension combines Breiman's "bagging" idea and random selection of
features, introduced first by Ho[1] and later independently by Amit and Geman[13]
in order to construct a collection of decision trees with controlled variance.
Random forests are frequently used as "blackbox" models in businesses, as they
generate reasonable predictions across a wide range of data while requiring little
configuration.

SVM

In classification tasks a discriminant machine learning technique aims at


finding, based on an independent and identically distributed (iid) training dataset,
a discriminant function that can correctly predict labels for newly acquired
instances. Unlike generative machine learning approaches, which require
computations of conditional probability distributions, a discriminant
classification function takes a data point x and assigns it to one of the different
classes that are a part of the classification task. Less powerful than generative
approaches, which are mostly used when prediction involves outlier detection,
discriminant approaches require fewer computational resources and less
training data, especially for a multidimensional feature space and when only
posterior probabilities are needed. From a geometric perspective, learning a
classifier is equivalent to finding the equation for a multidimensional surface
that best separates the different classes in the feature space.

SVM is a discriminant technique, and, because it solves the convex optimization


problem analytically, it always returns the same optimal hyperplane parameter
—in contrast to genetic algorithms (GAs) or perceptrons, both of which are
widely used for classification in machine learning. For perceptrons, solutions
are highly dependent on the initialization and termination criteria. For a
specific kernel that transforms the data from the input space to the feature
space, training returns uniquely defined SVM model parameters for a given
training set, whereas the perceptron and GA classifier models are different each
time training is initialized. The aim of GAs and perceptrons is only to minimize
error during training, which will translate into several hyperplanes’ meeting
this requirement.
 Flow Chart : Remote User

Start

Login

Yes No
Status

REGISTER AND LOGIN Username & Password


Wrong
Register and Login

Register
Predicting and Boarding
Hourly Login Demand Type,

VIEW YOUR PROFILE

Logout
 Flow Chart : Service Provider

Start

Login

Yes No
Status

Browse and Train & Test Data Sets, Username &


Password Wrong

View Trained and Tested Accuracy


in Bar Chart

Log Out
View Trained and Tested Accuracy
Results

View Prediction Of Hourly Boarding


Demand Type,

View Hourly Boarding Demand Type Ratio,

Download Trained Data Sets,

View Hourly Boarding Demand Type


Ratio Results,

View All Remote Users


1. Class Diagram :

Service Provider

Login, Browse and Train & Test Data Sets, View Trained and Tested
Accuracy in Bar Chart, View Trained and Tested Accuracy Results, View
Methods Prediction Of Hourly Boarding Demand Type, View Hourly Boarding
Demand Type Ratio, Download Trained Data Sets, View Hourly Boarding
Demand Type Ratio Results, View All Remote Users,

Fid, Trip ID, Route ID, Stop I D, Stop Name, Week Beginning, Number Of
Members Boarding’s, Prediction.

Login
Register
Login,
Login Register
(), Reset (),
Methods
Register (), Reset ()
Methods Register ().
User Name, Password
User Name, Password. User Name, Password, E-
Members mail, Mobile, Address, DOB,
Members
Gender, Pin code, Image

Remote User

REGISTER AND LOGIN, Predicting Hourly Boarding Demand Type,


VIEW YOUR PROFILE.
Methods

Tweet Servervvv
Fid, Trip ID, Route ID, Stop I D, Stop Name, Week Beginning, Number Of
Members Boarding’s, Prediction. Tweet Server

Tweet Server

Tweet Server

Tweet Server
 Use case Browse and Train & Test Data
Sets,

View Trained and Tested Accuracy in Bar Chart

View Trained and Tested


Accuracy Results

REGISTER AND LOGIN

Service
Provider
Predicting Hourly Boarding Remote User
Demand Type,

VIEW YOU’RE PROFILE

View Prediction Of Hourly Boarding


Demand Type,

View Hourly Boarding Demand Type Ratio,

Download Trained Data Sets,

View Hourly Boarding Demand Type Ratio


Results, View All Remote Users
1.1 PYTHON

Python is a high-level, interpreted, interactive and object-oriented scripting language. Python


is designed to be highly readable. It uses English keywords frequently where as other languages
use punctuation, and it has fewer syntactical constructions than other languages.

 Python is Interpreted: Python is processed at runtime by the interpreter. You do not need
to compile your program before executing it. This is similar to PERL and PHP.
 Python is Interactive: You can actually sit at a Python prompt and interact with the
interpreter directly to write your programs.
 Python is Object-Oriented: Python supports Object-Oriented style or technique of
programming that encapsulates code within objects.
 Python is a Beginner's Language: Python is a great language for the beginner-level
programmers and supports the development of a wide range of applications from simple
text processing to WWW browsers to games.

1.2 History of Python


Python was developed by Guido van Rossum in the late eighties and early nineties at the National
Research Institute for Mathematics and Computer Science in the Netherlands.

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,
SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU General
Public License (GPL).

Python is now maintained by a core development team at the institute, although Guido van
Rossum still holds a vital role in directing its progress.

1.3 Python Features


Python's features include:

 Easy-to-learn: Python has few keywords, simple structure, and a clearly defined syntax.
This allows the student to pick up the language quickly.
 Easy-to-read: Python code is more clearly defined and visible to the eyes.
 Easy-to-maintain: Python's source code is fairly easy-to-maintain.
 A broad standard library: Python's bulk of the library is very portable and cross-platform
compatible on UNIX, Windows, and Macintosh.

 Interactive Mode: Python has support for an interactive mode which allows interactive
testing and debugging of snippets of code.
 Portable: Python can run on a wide variety of hardware platforms and has the same
interface on all platforms.
 Extendable: You can add low-level modules to the Python interpreter. These modules
enable programmers to add to or customize their tools to be more efficient.
 Databases: Python provides interfaces to all major commercial databases.
 GUI Programming: Python supports GUI applications that can be created and ported to
many system calls, libraries and windows systems, such as Windows MFC, Macintosh,
and the X Window system of Unix.
 Scalable: Python provides a better structure and support for large programs than shell
scripting.

Python has a big list of good features:

 It supports functional and structured programming methods as well as OOP.


 It can be used as a scripting language or can be compiled to byte-code for building large
applications.
 It provides very high-level dynamic data types and supports dynamic type checking.
 IT supports automatic garbage collection.
 It can be easily integrated with C, C++, COM, ActiveX, CORBA, and Java.
2.1 ARITHMETIC OPERATORS

Operator Description Example

+ Addition Adds values on either side of the operator. a+b=


30

- Subtraction Subtracts right hand operand from left hand operand. a–b=-
10

* Multiplies values on either side of the operator a*b=


Multiplication 200

/ Division Divides left hand operand by right hand operand b/a=2

% Modulus Divides left hand operand by right hand operand and b%a=
returns remainder 0

** Exponent Performs exponential (power) calculation on operators a**b =10


to the
power 20

// Floor Division - The division of operands where the 9//2 = 4


result is the quotient in which the digits after the and
decimal point are removed. But if one of the operands 9.0//2.0
is negative, the result is floored, i.e., rounded away = 4.0, -
from zero (towards negative infinity): 11//3 = -
4, -
11.0//3 =
-4.0

2.2ASSIGNMENT OPERATOR

Operator Description Example

= Assigns values from right side operands to left c=a+b


side operand assigns
value of a
+ b into c

+= Add AND It adds right operand to the left operand and c += a is


assign the result to left operand equivalen
t to c = c
+a

-= Subtract It subtracts right operand from the left operand c -= a is


AND and assign the result to left operand equivalen
t to c = c
-a

*= Multiply It multiplies right operand with the left operand c *= a is


AND and assign the result to left operand equivalen
t to c = c
*a

/= Divide It divides left operand with the right operand and c /= a is


AND assign the result to left operand equivalen
t to c =
c / ac /= a
is
equivalen
t to c =
c/a

%= Modulus It takes modulus using two operands and c %= a is


AND assign the result to left operand equivalent to c = c
%a

**= Performs exponential (power) calculation c **= a is


Exponent on operators and assign value to the left equivalent to c = c
AND operand ** a

//= Floor It performs floor division on operators c //= a is


Division and assign value to the left operand equivalent to c =
c // a

2.3 IDENTITY OPERATOR

Operator Description Example

is Evaluates to true if the variables on x is y,


either side of the operator point to the here is results
same object and false otherwise. in 1 if id(x)
equals id(y).
is not Evaluates to false if the variables on x is not y,
either side of the operator point to the here is
same object and true otherwise. not results in
1 if id(x) is
not equal to
id(y

2.4 COMPARISON OPERATOR

Operator Description Example

& Binary AND Operator copies a bit to the result if it exists in both (a & b)
operands (means
0000 1100)

| Binary OR It copies a bit if it exists in either operand. (a | b) = 61


(means
0011 1101)

^ Binary XOR It copies the bit if it is set in one operand but not both. (a ^ b) = 49
(means
0011 0001)

~ Binary Ones It is unary and has the effect of 'flipping' bits. (~a ) = -61
Complement (means
1100 0011
in 2's
complement
form due to
a signed
binary
number.

<< Binary Left Shift The left operands value is moved left by the number of bits a << 2 =
specified by the right operand. 240 (means
1111 0000)

>> Binary Right The left operands value is moved right by the number of a >> 2 = 15
Shift bits specified by the right operand. (means
0000 1111)

2.5 LOGICAL OPERATOR

Operator Description Example

and Logical If both the operands are true then condition (a and b)
AND becomes true. is true.

or Logical OR If any of the two operands are non-zero then (a or b)


condition becomes true. is true.

not Logical Used to reverse the logical state of its operand. Not(a
NOT and b) is
false.

2.6 Membership Operators

Operator Description Example

in Evaluates to true if it finds a variable in the specified x in y, here in


sequence and false otherwise. results in a 1 if x
is a member of
sequence y.

not in Evaluates to true if it does not finds a variable in the x not in y, here
specified sequence and false otherwise. not in results in a
1 if x is not a
member of
sequence y.

Python Operators Precedence

Operator Description

** Exponentiation (raise to the power)

~+- Complement, unary plus and minus (method names for the last two are
+@ and -@)

* / % // Multiply, divide, modulo and floor division

+- Addition and subtraction

>> << Right and left bitwise shift

& Bitwise 'AND'

^| Bitwise exclusive `OR' and regular `OR'

<= < > >= Comparison operators

<> == != Equality operators

= %= /= //= -= += *= Assignment operators


**=

is is not Identity operators

in not in Membership operators


not or and Logical operators

3.1 LIST

The list is a most versatile data type available in Python which can be written as a list of comma-
separated values (items) between square brackets. Important thing about a list is that items in a list
need not be of the same type.

Creating a list is as simple as putting different comma-separated values between square brackets.
For example −

list1 = ['physics', 'chemistry', 1997, 2000];

list2 = [1, 2, 3, 4, 5 ];

list3 = ["a", "b", "c", "d"]

Basic List Operations


Lists respond to the + and * operators much like strings; they mean concatenation and repetition
here too, except that the result is a new list, not a string.

Python Expression Results Description

len([1, 2, 3]) 3 Length

[1, 2, 3] + [4, 5, 6] [1, 2, 3, 4, 5, 6] Concatenation

['Hi!'] * 4 ['Hi!', 'Hi!', 'Hi!', 'Hi!'] Repetition

3 in [1, 2, 3] True Membership

for x in [1, 2, 3]: print x, 123 Iteration


Built-in List Functions & Methods:
Python includes the following list functions −

SN Function with Description

1 cmp(list1, list2)

Compares elements of both lists.

2 len(list)

Gives the total length of the list.

3 max(list)

Returns item from the list with max value.

4 min(list)

Returns item from the list with min value.

5 list(seq)

Converts a tuple into list.

Python includes following list methods

SN Methods with Description

1 list.append(obj)

Appends object obj to list


2 list.count(obj)

Returns count of how many times obj occurs in list

3 list. extend(seq)

Appends the contents of seq to list

4 list.index(obj)

Returns the lowest index in list that obj appears

5 list.insert(index, obj)

Inserts object obj into list at offset index

6 list.pop(obj=list[-1])

Removes and returns last object or obj from list

7 list.remove(obj)

Removes object obj from list

8 list.reverse()

Reverses objects of list in place

9 list.sort([func])

Sorts objects of list, use compare function if given

3.2 TUPLES
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The
differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use
parentheses, whereas lists use square brackets.

Creating a tuple is as simple as putting different comma-separated values. Optionally we can put
these comma-separated values between parentheses also. For example −

tup1 = ('physics', 'chemistry', 1997, 2000);


tup2 = (1, 2, 3, 4, 5 );
tup3 = "a", "b", "c", "d";

The empty tuple is written as two parentheses containing nothing −

tup1 = ();

To write a tuple containing a single value you have to include a comma, even though there is only
one value −

tup1 = (50,);

Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.

 Accessing Values in Tuples:


To access values in tuple, use the square brackets for slicing along with the index or indices to
obtain value available at that index. For example –

tup1 = ('physics', 'chemistry', 1997, 2000);


tup2 = (1, 2, 3, 4, 5, 6, 7 );
print "tup1[0]: ", tup1[0]
print "tup2[1:5]: ", tup2[1:5]

When the code is executed, it produces the following result −

tup1[0]: physics
tup2[1:5]: [2, 3, 4, 5]

Updating Tuples:
Tuples are immutable which means you cannot update or change the values of tuple elements. We
are able to take portions of existing tuples to create new tuples as the following example
demonstrates −

tup1 = (12, 34.56);


tup2 = ('abc', 'xyz');
tup3 = tup1 + tup2;
print tup3
When the above code is executed, it produces the following result −

(12, 34.56, 'abc', 'xyz')

Delete Tuple Elements


Removing individual tuple elements is not possible. There is, of course, nothing wrong with
putting together another tuple with the undesired elements discarded.

To explicitly remove an entire tuple, just use the del statement. For example:

tup = ('physics', 'chemistry', 1997, 2000);


print tup
del tup;
print "After deleting tup : "
print tup

Basic Tuples Operations:


Python Expression Results Description

len((1, 2, 3)) 3 Length

(1, 2, 3) + (4, 5, 6) (1, 2, 3, 4, 5, 6) Concatenation

('Hi!',) * 4 ('Hi!', 'Hi!', 'Hi!', 'Hi!') Repetition

3 in (1, 2, 3) True Membership

for x in (1, 2, 3): print x, 123 Iteration


Built-in Tuple Functions

SN Function with Description

1
cmp(tuple1, tuple2):Compares elements of both tuples.

2
len(tuple):Gives the total length of the tuple.

3
max(tuple):Returns item from the tuple with max value.

4
min(tuple):Returns item from the tuple with min value.

5
tuple(seq):Converts a list into tuple.

3.2 DICTIONARY
Each key is separated from its value by a colon (:), the items are separated by commas, and the
whole thing is enclosed in curly braces. An empty dictionary without any items is written with just
two curly braces, like this: {}.

Keys are unique within a dictionary while values may not be. The values of a dictionary can be of
any type, but the keys must be of an immutable data type such as strings, numbers, or tuples.

Accessing Values in Dictionary:


To access dictionary elements, you can use the familiar square brackets along with the key to
obtain its value. Following is a simple example −

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

print "dict['Name']: ", dict['Name']


print "dict['Age']: ", dict['Age']
Result –

dict['Name']: Zara
dict['Age']: 7

Updating Dictionary
We can update a dictionary by adding a new entry or a key-value pair, modifying an existing
entry, or deleting an existing entry as shown below in the simple example −

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

dict['Age'] = 8; # update existing entry


dict['School'] = "DPS School"; # Add new entry
print "dict['Age']: ", dict['Age']
print "dict['School']: ", dict['School']

Result −

dict['Age']: 8
dict['School']: DPS School

Delete Dictionary Elements


We can either remove individual dictionary elements or clear the entire contents of a dictionary.
You can also delete entire dictionary in a single operation.

To explicitly remove an entire dictionary, just use the del statement. Following is a simple
example –

dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}

del dict['Name']; # remove entry with key 'Name'


dict.clear(); # remove all entries in dict
del dict ; # delete entire dictionary
print "dict['Age']: ", dict['Age']
print "dict['School']: ", dict['School']

Built-in Dictionary Functions & Methods –


Python includes the following dictionary functions −

SN Function with Description

1 cmp(dict1, dict2)

Compares elements of both dict.

2 len(dict)

Gives the total length of the dictionary. This would be equal to the number of items in
the dictionary.

3 str(dict)

Produces a printable string representation of a dictionary

4 type(variable)

Returns the type of the passed variable. If passed variable is dictionary, then it would
return a dictionary type.

Python includes following dictionary methods −


SN Methods with Description

1 dict.clear():Removes all elements of dictionary dict

2 dict. Copy():Returns a shallow copy of dictionary dict

3 dict.fromkeys():Create a new dictionary with keys from seq and values set to value.

4 dict.get(key, default=None):For key key, returns value or default if key not in


dictionary

5 dict.has_key(key):Returns true if key in dictionary dict, false otherwise

6 dict.items():Returns a list of dict's (key, value) tuple pairs

7 dict.keys():Returns list of dictionary dict's keys

8 dict.setdefault(key, default=None):Similar to get(), but will set dict[key]=default


if key is not already in dict

9 dict.update(dict2):Adds dictionary dict2's key-values pairs to dict

10 dict.values():Returns list of dictionary dict's values

A function is a block of organized, reusable code that is used to perform a single, related action.
Functions provide better modularity for your application and a high degree of code reusing. Python
gives you many built-in functions like print(), etc. but you can also create your own functions.
These functions are called user-defined functions.

Defining a Function
Simple rules to define a function in Python.
 Function blocks begin with the keyword def followed by the function name and parentheses
( ( ) ).
 Any input parameters or arguments should be placed within these parentheses. You can
also define parameters inside these parentheses.
 The first statement of a function can be an optional statement - the documentation string of
the function or docstring.
 The code block within every function starts with a colon (:) and is indented.
 The statement return [expression] exits a function, optionally passing back an expression to
the caller. A return statement with no arguments is the same as return None.

def functionname( parameters ):


"function_docstring"
function_suite
return [expression]

Calling a Function
Defining a function only gives it a name, specifies the parameters that are to be included in the
function and structures the blocks of code.Once the basic structure of a function is finalized, you
can execute it by calling it from another function or directly from the Python prompt. Following is
the example to call printme() function −

# Function definition is here


def printme( str ):
"This prints a passed string into this function"
print str
return;
# Now you can call printme function
printme("I'm first call to user defined function!")
printme("Again second call to the same function")
When the above code is executed, it produces the following result −

I'm first call to user defined function!


Again second call to the same function

Function Arguments
You can call a function by using the following types of formal arguments:

 Required arguments
 Keyword arguments
 Default arguments
 Variable-length arguments

Scope of Variables
All variables in a program may not be accessible at all locations in that program. This depends on
where you have declared a variable.

The scope of a variable determines the portion of the program where you can access a particular
identifier. There are two basic scopes of variables in Python −

Global variables Local variables

Global vs. Local variables


Variables that are defined inside a function body have a local scope, and those defined outside
have a global scope.

This means that local variables can be accessed only inside the function in which they are
declared, whereas global variables can be accessed throughout the program body by all functions.
When you call a function, the variables declared inside it are brought into scope. Following is a
simple example −

total = 0; # This is global variable.

# Function definition is here

def sum( arg1, arg2 ):

# Add both the parameters and return them."


total = arg1 + arg2; # Here total is local variable.

print "Inside the function local total : ", total

return total;

sum( 10, 20 );

print "Outside the function global total : ", total

Result −

Inside the function local total : 30

Outside the function global total : 0

A module allows you to logically organize your Python code. Grouping related code into a module
makes the code easier to understand and use. A module is a Python object with arbitrarily named
attributes that you can bind and reference.Simply, a module is a file consisting of Python code. A
module can define functions, classes and variables. A module can also include runnable code.

Example:
The Python code for a module named aname normally resides in a file named aname.py. Here's an
example of a simple module, support.py

def print_func( par ):

print "Hello : ", par

return

The import Statement


The import has the following syntax:

import module1[, module2[,... moduleN]

When the interpreter encounters an import statement, it imports the module if the module is present
in the search path. A search path is a list of directories that the interpreter searches before
importing a module. For example, to import the module support.py, you need to put the following
command at the top of the script −
A module is loaded only once, regardless of the number of times it is imported. This prevents the
module execution from happening over and over again if multiple imports occur.

Packages in Python
A package is a hierarchical file directory structure that defines a single Python application
environment that consists of modules and sub packages and sub-sub packages.

Consider a file Pots.py available in Phone directory. This file has following line of source code −

def Pots():

print "I'm Pots Phone"

Similar way, we have another two files having different functions with the same name as above −

 Phone/Isdn.py file having function Isdn()


 Phone/G3.py file having function G3()

Now, create one more file __init__.py in Phone directory −

 Phone/__init__.py

To make all of your functions available when you've imported Phone,to put explicit import
statements in __init__.py as follows −

from Pots import Pots

from Isdn import Isdn

from G3 import G3

After you add these lines to __init__.py, you have all of these classes available when you import
the Phone package.

# Now import your Phone Package.

import Phone
Phone.Pots()

Phone.Isdn()

Phone.G3()

RESULT:

I'm Pots Phone

I'm 3G Phone

I'm ISDN Phone

In the above example, we have taken example of a single functions in each file, but you can keep
multiple functions in your files. You can also define different Python classes in those files and
then you can create your packages out of those classes.

This chapter covers all the basic I/O functions available in Python.

Printing to the Screen

The simplest way to produce output is using the print statement where you can pass zero or more
expressions separated by commas. This function converts the expressions you pass into a string
and writes the result to standard output as follows −

print "Python is really a great language,", "isn't it?"

Result:

Python is really a great language, isn't it?

Reading Keyboard Input

Python provides two built-in functions to read a line of text from standard input, which by default
comes from the keyboard. These functions are −

 raw_input
 input

The raw_input Function

The raw_input([prompt]) function reads one line from standard input and returns it as a string
(removing the trailing newline).
str = raw_input("Enter your input: ");
print "Received input is : ", str

This prompts you to enter any string and it would display same string on the screen. When I typed
"Hello Python!", its output is like this −

Enter your input: Hello Python


Received input is : Hello Python

The input Function

The input([prompt]) function is equivalent to raw_input, except that it assumes the input is a valid
Python expression and returns the evaluated result to you.

str = input("Enter your input: ");


print "Received input is : ", str

This would produce the following result against the entered input −

Enter your input: [x*5 for x in range(2,10,2)]


Recieved input is : [10, 20, 30, 40]

Opening and Closing Files

Until now, you have been reading and writing to the standard input and output. Now, we will see
how to use actual data files.

Python provides basic functions and methods necessary to manipulate files by default. You can do
most of the file manipulation using a file object.

The open Function

Before you can read or write a file, you have to open it using Python's built-in open() function.
This function creates a file object, which would be utilized to call other support methods
associated with it.
Syntax
file object = open(file_name [, access_mode][, buffering])

Here are parameter details:

 file_name: The file_name argument is a string value that contains the name of the file that
you want to access.
 access_mode: The access_mode determines the mode in which the file has to be opened,
i.e., read, write, append, etc. A complete list of possible values is given below in the table.
This is optional parameter and the default file access mode is read (r).
 buffering: If the buffering value is set to 0, no buffering takes place. If the buffering value
is 1, line buffering is performed while accessing a file. If you specify the buffering value as
an integer greater than 1, then buffering action is performed with the indicated buffer size.
If negative, the buffer size is the system default(default behavior).

Here is a list of the different modes of opening a file −

Modes Description

r Opens a file for reading only. The file pointer is placed at the beginning of the file. This is the
default mode.

rb Opens a file for reading only in binary format. The file pointer is placed at the beginning of the
file. This is the default mode.

r+ Opens a file for both reading and writing. The file pointer placed at the beginning of the file.

rb+ Opens a file for both reading and writing in binary format. The file pointer placed at the
beginning of the file.

w Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist,
creates a new file for writing.

wb Opens a file for writing only in binary format. Overwrites the file if the file exists. If the file does
not exist, creates a new file for writing.

w+ Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file
does not exist, creates a new file for reading and writing.

wb+ Opens a file for both writing and reading in binary format. Overwrites the existing file if the file
exists. If the file does not exist, creates a new file for reading and writing.

a Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the
file is in the append mode. If the file does not exist, it creates a new file for writing.

ab Opens a file for appending in binary format. The file pointer is at the end of the file if the file
exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for
writing.

a+ Opens a file for both appending and reading. The file pointer is at the end of the file if the file
exists. The file opens in the append mode. If the file does not exist, it creates a new file for
reading and writing.

ab+ Opens a file for both appending and reading in binary format. The file pointer is at the end of the
file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new
file for reading and writing.

The file Object Attributes

Once a file is opened and you have one file object, you can get various information related to that
file.

Here is a list of all attributes related to file object:


Attribute Description

file.closed Returns true if file is closed, false otherwise.

file.mode Returns access mode with which file was opened.

file.name Returns name of the file.

file.softspace Returns false if space explicitly required with print, true otherwise.

Example
# Open a file
fo = open("foo.txt", "wb")
print "Name of the file: ", fo.name
print "Closed or not : ", fo.closed
print "Opening mode : ", fo.mode
print "Softspace flag : ", fo.softspace

This produces the following result −

Name of the file: foo.txt


Closed or not : False
Opening mode : wb
Softspace flag : 0

The close() Method

The close() method of a file object flushes any unwritten information and closes the file object,
after which no more writing can be done.Python automatically closes a file when the reference
object of a file is reassigned to another file. It is a good practice to use the close() method to close
a file.

Syntax
fileObject.close();
Example
# Open a file
fo = open("foo.txt", "wb")
print "Name of the file: ", fo.name
# Close opend file
fo.close()

Result −

Name of the file: foo.txt

Reading and Writing Files

The file object provides a set of access methods to make our lives easier. We would see how to
use read() and write() methods to read and write files.

The write() Method

The write() method writes any string to an open file. It is important to note that Python strings can
have binary data and not just text.The write() method does not add a newline character ('\n') to the
end of the string Syntax

fileObject.write(string);

Here, passed parameter is the content to be written into the opened file. Example

# Open a file
fo = open("foo.txt", "wb")
fo.write( "Python is a great language.\nYeah its great!!\n");

# Close opend file


fo.close()

The above method would create foo.txt file and would write given content in that file and finally it
would close that file. If you would open this file, it would have following content.

Python is a great language.


Yeah its great!!
The read() Method

The read() method reads a string from an open file. It is important to note that Python strings can
have binary data. apart from text data.

Syntax
fileObject.read([count]);

Here, passed parameter is the number of bytes to be read from the opened file. This method starts
reading from the beginning of the file and if count is missing, then it tries to read as much as
possible, maybe until the end of file.

Example

Let's take a file foo.txt, which we created above.

# Open a file
fo = open("foo.txt", "r+")
str = fo.read(10);
print "Read String is : ", str
# Close opend file
fo.close()

This produces the following result −

Read String is : Python is

File Positions

The tell() method tells you the current position within the file; in other words, the next read or
write will occur at that many bytes from the beginning of the file.

32

The seek(offset[, from]) method changes the current file position. The offset argument indicates
the number of bytes to be moved. The from argument specifies the reference position from where
the bytes are to be moved.
If from is set to 0, it means use the beginning of the file as the reference position and 1 means use
the current position as the reference position and if it is set to 2 then the end of the file would be
taken as the reference position.

Example

Let us take a file foo.txt, which we created above.

# Open a file
fo = open("foo.txt", "r+")
str = fo.read(10);
print "Read String is : ", str

# Check current position


position = fo.tell();
print "Current file position : ", position

# Reposition pointer at the beginning once again


position = fo.seek(0, 0);
str = fo.read(10);
print "Again read String is : ", str
# Close opend file
fo.close()

This produces the following result −

Read String is : Python is


Current file position : 10
Again read String is : Python is

Renaming and Deleting Files

Python os module provides methods that help you perform file-processing operations, such as
renaming and deleting files.

To use this module you need to import it first and then you can call any related functions.
The rename() Method

The rename() method takes two arguments, the current filename and the new filename.

Syntax
os.rename(current_file_name, new_file_name)

Example

Following is the example to rename an existing file test1.txt:

import os

# Rename a file from test1.txt to test2.txt


os.rename( "test1.txt", "test2.txt" )

The remove() Method

You can use the remove() method to delete files by supplying the name of the file to be deleted as
the argument.

Syntax
os.remove(file_name)

Example

Following is the example to delete an existing file test2.txt −

#!/usr/bin/python
import os

# Delete file test2.txt


os.remove("text2.txt")

Directories in Python

All files are contained within various directories, and Python has no problem handling these too.
The os module has several methods that help you create, remove, and change directories.
The mkdir() Method

You can use the mkdir() method of the os module to create directories in the current directory.
You need to supply an argument to this method which contains the name of the directory to be
created.

Syntax
os.mkdir("newdir")

Example

Following is the example to create a directory test in the current directory −

#!/usr/bin/python
import os

# Create a directory "test"


os.mkdir("test")

The chdir() Method

You can use the chdir() method to change the current directory. The chdir() method takes an
argument, which is the name of the directory that you want to make the current directory.

Syntax
os.chdir("newdir")

Example

Following is the example to go into "/home/newdir" directory −

#!/usr/bin/python
import os

# Changing a directory to "/home/newdir"


os.chdir("/home/newdir")

The getcwd() Method

The getcwd() method displays the current working directory.


Syntax
os.getcwd()

Example

Following is the example to give current directory −

import os

# This would give location of the current directory


os.getcwd()

The rmdir() Method

The rmdir() method deletes the directory, which is passed as an argument in the method.

Before removing a directory, all the contents in it should be removed.

Syntax:
os.rmdir('dirname')

Example

Following is the example to remove "/tmp/test" directory. It is required to give fully qualified
name of the directory, otherwise it would search for that directory in the current directory.

import os
# This would remove "/tmp/test" directory.
os.rmdir( "/tmp/test" )

File & Directory Related Methods

There are three important sources, which provide a wide range of utility methods to handle and
manipulate files & directories on Windows and Unix operating systems. They are as follows −

 File Object Methods: The file object provides functions to manipulate files.
 OS Object Methods: This provides methods to process files as well as directories.

Python provides two very important features to handle any unexpected error in your
Python programs and to add debugging capabilities in them −
 Exception Handling: This would be covered in this tutorial. Here is a list
standard Exceptions available in Python: Standard Exceptions.
 Assertions: This would be covered in Assertions in Python

List of Standard Exceptions −

EXCEPTION DESCRIPTION
NAME

Exception Base class for all exceptions

StopIteration Raised when the next() method of an iterator does not point to any
object.

SystemExit Raised by the sys.exit() function.

StandardError Base class for all built-in exceptions except StopIteration and
SystemExit.

ArithmeticError Base class for all errors that occur for numeric calculation.

OverflowError Raised when a calculation exceeds maximum limit for a numeric


type.

FloatingPointError Raised when a floating point calculation fails.

ZeroDivisionError Raised when division or modulo by zero takes place for all numeric
types.

AssertionError Raised in case of failure of the Assert statement.


AttributeError Raised in case of failure of attribute reference or assignment.

EOFError Raised when there is no input from either the raw_input() or input()
function and the end of file is reached.

ImportError Raised when an import statement fails.

KeyboardInterrupt Raised when the user interrupts program execution, usually by


pressing Ctrl+c.

LookupError Base class for all lookup errors.

IndexError Raised when an index is not found in a sequence.

KeyError Raised when the specified key is not found in the dictionary.

NameError Raised when an identifier is not found in the local or global


namespace.

UnboundLocalError Raised when trying to access a local variable in a function or


method but no value has been assigned to it.
EnvironmentError
Base class for all exceptions that occur outside the Python
environment.

IOError Raised when an input/ output operation fails, such as the print
statement or the open() function when trying to open a file that does
IOError
not exist.

Raised for operating system-related errors.

SyntaxError Raised when there is an error in Python syntax.


IndentationError

Raised when indentation is not specified properly.

SystemError Raised when the interpreter finds an internal problem, but when this
error is encountered the Python interpreter does not exit.

SystemExit Raised when Python interpreter is quit by using the sys.exit()


function. If not handled in the code, causes the interpreter to exit.

TypeError Raised when an operation or function is attempted that is invalid for


the specified data type.

ValueError Raised when the built-in function for a data type has the valid type
of arguments, but the arguments have invalid values specified.

RuntimeError Raised when a generated error does not fall into any category.

NotImplementedError Raised when an abstract method that needs to be implemented in an


inherited class is not actually implemented.

What is Exception?
An exception is an event, which occurs during the execution of a program that
disrupts the normal flow of the program's instructions. In general, when a Python
script encounters a situation that it cannot cope with, it raises an exception. An
exception is a Python object that represents an error.

When a Python script raises an exception, it must either handle the exception
immediately otherwise it terminates and quits.

Handling an exception
If you have some suspicious code that may raise an exception, you can defend your
program by placing the suspicious code in a try: block. After the try: block, include
an except: statement, followed by a block of code which handles the problem as
elegantly as possible.
The Python standard for database interfaces is the Python DB-API. Most Python database
interfaces adhere to this standard.

You can choose the right database for your application. Python Database API supports a wide
range of database servers such as −

 GadFly
 mSQL
 MySQL
 PostgreSQL
 Microsoft SQL Server 2000
 Informix
 Interbase
 Oracle
 Sybase

The DB API provides a minimal standard for working with databases using Python structures and
syntax wherever possible. This API includes the following:

 Importing the API module.


 Acquiring a connection with the database.
 Issuing SQL statements and stored procedures.
 Closing the connection
SYSTEM TESTING

TESTING METHODOLOGIES

The following are the Testing Methodologies:

o Unit Testing.
o Integration Testing.
o User Acceptance Testing.
o Output Testing.
o Validation Testing.

Unit Testing

Unit testing focuses verification effort on the smallest unit of Software design that is the
module. Unit testing exercises specific paths in a module’s control structure to

ensure complete coverage and maximum error detection. This test focuses on each module
individually, ensuring that it functions properly as a unit. Hence, the naming is Unit Testing.

During this testing, each module is tested individually and the module interfaces are
verified for the consistency with design specification. All important processing path are tested for
the expected results. All error handling paths are also tested.

Integration Testing

Integration testing addresses the issues associated with the dual problems of verification
and program construction. After the software has been integrated a set of high order tests are
conducted. The main objective in this testing process is to take unit tested modules and builds a
program structure that has been dictated by design.
The following are the types of Integration Testing:

1. Top Down Integration

This method is an incremental approach to the construction of program structure. Modules


are integrated by moving downward through the control hierarchy, beginning with the main
program module. The module subordinates to the main program module are incorporated into the
structure in either a depth first or breadth first manner.
In this method, the software is tested from main module and individual stubs are replaced
when the test proceeds downwards.

2. Bottom-up Integration

This method begins the construction and testing with the modules at the lowest level in the
program structure. Since the modules are integrated from the bottom up, processing required for
modules subordinate to a given level is always available and the need for stubs is eliminated. The
bottom up integration strategy may be implemented with the following steps:

 The low-level modules are combined into clusters into clusters that perform a
specific Software sub-function.
 A driver (i.e.) the control program for testing is written to coordinate test case input and
output.
 The cluster is tested.
 Drivers are removed and clusters are combined moving upward in the program
structure

The bottom up approaches tests each module individually and then each module is module is
integrated with a main module and tested for functionality.

7.1.3 User Acceptance Testing

User Acceptance of a system is the key factor for the success of any system. The system
under consideration is tested for user acceptance by constantly keeping in touch with the
prospective system users at the time of developing and making changes wherever required. The
system developed provides a friendly user interface that can easily be understood even by a person
who is new to the system.

7.1.4 Output Testing

After performing the validation testing, the next step is output testing of the proposed
system, since no system could be useful if it does not produce the required output in the specified
format. Asking the users about the format required by them tests the outputs generated or displayed
by the system under consideration. Hence the output format is considered in 2 ways – one is on
screen and another in printed format.

7.1.5 Validation Checking


Validation checks are performed on the following fields.

Text Field:

The text field can contain only the number of characters lesser than or equal to its size. The
text fields are alphanumeric in some tables and alphabetic in other tables. Incorrect entry always
flashes and error message.

Numeric Field:

The numeric field can contain only numbers from 0 to 9. An entry of any character flashes
an error messages. The individual modules are checked for accuracy and what it has to perform.
Each module is subjected to test run along with sample data. The individually tested modules
are integrated into a single system. Testing involves executing the real data information is used in
the program the existence of any program defect is inferred from the output. The testing should be
planned so that all the requirements are individually tested.

A successful test is one that gives out the defects for the inappropriate data and produces
and output revealing the errors in the system.
Preparation of Test Data

Taking various kinds of test data does the above testing. Preparation of test data plays a
vital role in the system testing. After preparing the test data the system under study is tested
using that test data. While testing the system by using test data errors are again uncovered and
corrected by using above testing steps and corrections are also noted for future use.

Using Live Test Data:

Live test data are those that are actually extracted from organization files. After a system is
partially constructed, programmers or analysts often ask users to key in a set of data from their
normal activities. Then, the systems person uses this data as a way to partially test the system. In
other instances, programmers or analysts extract a set of live data from the files and have them
entered themselves.

It is difficult to obtain live data in sufficient amounts to conduct extensive testing. And,
although it is realistic data that will show how the system will perform for the typical processing
requirement, assuming that the live data entered are in fact typical, such data generally will not test
all combinations or formats that can enter the system. This bias toward typical values then does not
provide a true systems test and in fact ignores the cases most likely to cause system failure.

Using Artificial Test Data:

Artificial test data are created solely for test purposes, since they can be generated to test all
combinations of formats and values. In other words, the artificial data, which can quickly be
prepared by a data generating utility program in the information systems department, make possible
the testing of all login and control paths through the program.
The most effective test programs use artificial test data generated by persons other than
those who wrote the programs. Often, an independent team of testers formulates a testing plan,
using the systems specifications.

The package “Virtual Private Network” has satisfied all the requirements specified as per
software requirement specification and was accepted.

7.2 USER TRAINING

Whenever a new system is developed, user training is required to educate them about the
working of the system so that it can be put to efficient use by those for whom the system has been
primarily designed. For this purpose the normal working of the project was demonstrated to the
prospective users. Its working is easily understandable and since the expected users are people who
have good knowledge of computers, the use of this system is very easy.

7.3 MAINTAINENCE

This covers a wide range of activities including correcting code and design errors. To
reduce the need for maintenance in the long run, we have more accurately defined the user’s
requirements during the process of system development. Depending on the requirements, this
system has been developed to satisfy the needs to the largest possible extent. With development in
technology, it may be possible to add many more features based on the requirements in future. The
coding and designing is simple and easy to understand which will make maintenance easier.
TESTING STRATEGY :

A strategy for system testing integrates system test cases and design techniques into a well
planned series of steps that results in the successful construction of software. The testing strategy
must co-operate test planning, test case design, test execution, and the resultant data collection and
evaluation .A strategy for software testing must accommodate low-level tests that are necessary
to verify that a small source code segment has been correctly implemented as well as high level
tests that validate major system functions against user requirements.

Software testing is a critical element of software quality assurance and represents the ultimate
review of specification design and coding. Testing represents an interesting anomaly for the
software. Thus, a series of testing are performed for the proposed system before the system is
ready for user acceptance testing.

SYSTEM TESTING:
Software once validated must be combined with other system elements (e.g. Hardware,
people, database). System testing verifies that all the elements are proper and that overall system
function performance is
achieved. It also tests to find discrepancies between the system and its original objective, current
specifications and system documentation.

UNIT TESTING:

In unit testing different are modules are tested against the specifications produced during
the design for the modules. Unit testing is essential for verification of the code produced during the
coding phase, and hence the goals to test the internal logic of the modules. Using the detailed
design description as a guide, important Conrail paths are tested to uncover errors within the
boundary of the modules. This testing is carried out during the programming stage itself. In this
type of testing step, each module was found to be working satisfactorily as regards to the expected
output from the module.

In Due Course, latest technology advancements will be taken into consideration. As part
of technical build-up many components of the networking system will be generic in nature so that
future projects can either use or interact with this. The future holds a lot to offer to the
development and refinement of this project.
Implementation:-

This project we made by Django , so for implemation use this command inside the project

Paste url in the Browzer so u will get this User inter face
CONCLUSION

The motivation of this study was because we have faced the challenge of imbalanced
data when we used the real world bus smart-card data to prediction the boarding
behavior of passengers at a time window. In this research, we proposed a Deep-GAN
to over-sample the travelling instances and to re-balance the rate of travelling and
non-travelling instances in the smart-card dataset in order to improve a DNN based
prediction model of individual boarding behavior. The performance of Deep-GAN
was evaluated by applying the models on real-world smart-card data collected from
seven bus lines in the city of Changsha, China. Comparing the different imbalance
ratios in the training dataset, we found out that in general, the performance of the
model improves with more imbalanced data and the most significant improvement
comes at a 1:5 ratio between positive and negative instances. From the perspective of
prediction accuracy of the hourly distribution of bus ridership, the high rate of
imbalance will cause misleading load profiles and the absolutely balanced data may
over predict the ridership during peak hours. Comparison of different resembling
methods reveals that both over-sampling and under-sampling benefits the
performance of the model. Deep- GAN has the best recall score and its precision
scores best among the over-sampling methods. Although the performance of the
predictive model trained by the Deep-GAN-data is not significantly beyond other
resembling methods, the Deep- GAN also presented a powerful ability to improve
the quality of training dataset and the performance of predictive models, especially
when the under-sampling is not suitable for the data.
The contributions of this study are:
• The data imbalance issue in the public transport system has received little attention,
and this study is the first to focus on this issue and propose a deep learning approach,
Deep-GAN, to solve it.
• This study compared the differences in similarity and diversity between the real
and synthetic travelling instanced generated from Deep-GAN and other over-
sampling methods. It also compared different resembling methods for the
improvement of data quality by evaluating the performance of the next travel
behavior prediction model. This is the first validation and evaluation of the
performance of different data resembling methods based on real data in the public
transport system.
• This paper innovatively modeled individual boarding behavior, which is
uncommon in other travel demand prediction tasks. Compared to the popular
aggregated prediction, this individual-based model is able to provide more details on
the passengers’ behavior, and the results will benefit the analysis of the similarities
and heterogeneities.

As technology and computing power develop, predicting models will


become more and more refined. In the field of demand prediction of the public
transport systems, the target will gradually evolve from the bus network and bus lines
to individual travel behavior. This advancement can greatly benefit public transport
planning and management, such as the digital twin of the public transport system. It
is foreseeable that future prediction work in public transport systems will also
encounter the challenge of imbalanced data. Our research proposes a Deep-GAN
model to address the data imbalance issue in travel behavior prediction. The
validation via real world data illustrated that the Deep-GAN showed a better ability
to deal with the data imbalance issue and benefits the predictive models compared to
other resembling methods. This research provides valuable experience for more
researchers and managers in dealing with similar data imbalance issues, especially in
public transport.
It may be noted that despite the great performance of Deep- GAN
and DNN models, there are still some limitations. First, in this research, Deep-GAN
is solely applied for the oversampling. However, there is also a hybrid variant of
Deep- GAN where positive instances are over-sampled and negative instances are
under-sampled. The promising results of the Deep-GAN oversampling serve as a
motivation to test the performance of the hybrid Deep-GAN in future research.
Second, this study makes the prediction at the individual level, which creates an
explosion of information and makes the computation more difficult. Classifying the
passengers (using clustering methods for instance) may be useful in terms of
reducing the size of the dataset. Third, the current Deep- GAN does not consider the
spatio-temporal characteristics of boarding behavior. Customizing the networks of
generator and discriminator in GAN based on the characteristics of the boarding
behavior will further improve the quality of generated dummy travelling instances
and the performance of the following predictive models. Finally, the proposed Deep-
GAN selected the features and variants of the data augmentation independently. So,
the improvements are likely to be sub-optimal. Jointly selecting the features and the
optimum imbalance ratio is likely to result in further improvements but at the cost of
computational complexity. This can be tested in future. Similarly, the optimum rate
of imbalance for Deep- GAN has been assumed to be the optimum rate for other
resembling methods. This assumption needs to be tested in future research. Even in
its current form, this research demonstrates the extent of improvement offered by the
Deep-GAN method in addressing the data imbalance issue in modeling boarding
behavior. By better predicting the boarding behavior, the findings can help the public
transport authorities to improve the level-of-service and efficiency of the public
transport system. It can also be extended to other components of the public transport
usage behavior – better prediction of the alighting or transfer behavior, for instance.
REFERENCES

[1] X. Guo, J. Wu, H. Sun, R. Liu, and Z. Gao, “Timetable coordination of


first trains in urban railway network: A case study of beijing,” Applied
Mathematical Modelling, vol. 40, no. 17, pp. 8048–8066, 2016.
[2] W. Wu, P. Li, R. Liu, W. Jin, B. Yao, Y. Xie, and C. Ma, “Predicting
peak load of bus routes with supply optimization and scaled shepard
interpolation: A newsvendor model,” Transportation Research Part E:
Logistics and Transportation Review, vol. 142, p. 102041, 2020.
[3] N. Beˇsinovi´c, L. De Donato, F. Flammini, R. M. Goverde, Z. Lin, R. Liu,
S. Marrone, R. Nardone, T. Tang, and V. Vittorini, “Artificial intelligence
in railway transport: Taxonomy, regulations and applications,” IEEE
Transactions on Intelligent Transportation Systems, 2021.
[4] S. C. Kwan and J. H. Hashim, “A review on co-benefits of mass public
transportation in climate change mitigation,” Sustainable Cities and
Society, vol. 22, pp. 11–18, 2016.
[5] Y. Wang, W. Zhang, T. Tang, D. Wang, and Z. Liu, “Bus
od matrix reconstruction based on clustering wi-fi probe data,”
Transportmetrica B: Transport Dynamics, pp. 1–16, 2021, doi:
10.1080/21680566.2021.1956388.
[6] S. J. Berrebi, K. E. Watkins, and J. A. Laval, “A real-time bus
dispatching policy to minimize passenger wait on a high frequency
route,” Transportation Research Part B: Methodological, vol. 81, pp.
377–389, 2015.
[7] A. Fonzone, J.-D. Schm¨ocker, and R. Liu, “A model of bus bunching
under reliability-based passenger arrival patterns,” Transportation Research
Part C: Emerging Technologies, vol. 59, pp. 164–182, 2015.
[8] J. D. Schm¨ocker, W. Sun, A. Fonzone, and R. Liu, “Bus bunching
along a corridor served by two lines,” Transportation Research Part
B: Methodological, vol. 93, pp. 300–317, 2016.
[9] D. Chen, Q. Shao, Z. Liu, W. Yu, and C. L. P. Chen, “Ridesourcing
behavior analysis and prediction: A network perspective,” IEEE Transactions
on Intelligent Transportation Systems, pp. 1–10, 2020.
[10] E. Nelson and N. Sadowsky, “Estimating the impact of ride-hailing app
company entry on public transportation use in major us urban areas,”
The B.E. Journal of Economic Analysis & Policy, vol. 19, no. 1, p.
20180151, 2019.
[11] Z. Chen, K. Liu, J. Wang, and T. Yamamoto, “H-convlstm-based bagging
learning approach for ride-hailing demand prediction considering
imbalance problems and sparse uncertainty,” Transportation Research
Part C: Emerging Technologies, vol. 140, p. 103709, 2022.
[12] R. Liu and S. Sinha, “Modelling urban bus service and passenger
reliability,” 2007.
[13] J. A. Sorratini, R. Liu, and S. Sinha, “Assessing bus transport teliability
using micro-simulation,” Transportation Planning and Technology,
vol. 31, no. 3, pp. 303–324, 2008.
[14] Y. Wang, W. Zhang, T. Tang, D. Wang, and Z. Liu, “Bus od matrix
reconstruction based on clustering wi-fi probe data,” Transportmetrica
B: Transport Dynamics, pp. 1–16, 2021.
[15] Y. Hollander and R. Liu, “Estimation of the distribution of travel times
by repeated simulation,” Transportation Research Part C: Emerging
Technologies, vol. 16, no. 2, pp. 212–231, 2008.
[16] W. Wu, R. Liu, and W. Jin, “Modelling bus bunching and holding control
with vehicle overtaking and distributed passenger boarding behaviour,”
Transportation Research Part B: Methodological, vol. 104, pp. 175–197,
2017.
[17] W. Wu, R. Liu, W. Jin, and C. Ma, “Stochastic bus schedule coordination
considering demand assignment and rerouting of passengers,”
Transportation Research Part B: Methodological, vol. 121, pp. 275–303,
2019.
[18] W. Wu, R. Liu, and W. Jin, “Designing robust schedule coordination
scheme for transit networks with safety control margins,” Transportation
Research Part B: Methodological, vol. 93, pp. 495–519, 2016.
[19] S. Zhong and D. J. Sun, A Spatio-temporal Distribution Model for Determining
Origin–Destination Demand from Multisource Data. Springer,
Singapore, 2022, pp. 33–52.
[20] M. Bordagaray, L. dell’Olio, A. Fonzone, and Ibeas, “Capturing the
conditions that introduce systematic variation in bike-sharing travel
behavior using data mining techniques,” Transportation Research Part
C: Emerging Technologies, vol. 71, pp. 231–248, 2016.
[21] B. Chidlovskii, “Mining smart card data for travellers’ mini activities,”
IEEE Transactions on Intelligent Transportation Systems, vol. 19, no. 11,
pp. 3676–3685, 2018.
[22] T. Tang, R. Liu, and C. Choudhury, “Incorporating weather conditions
and travel history in estimating the alighting bus stops from smart card
data,” Sustainable Cities and Society, vol. 53, p. 101927, 2020.
[23] X. Zhang, Q. Zhang, T. Sun, Y. Zou, and H. Chen, “Evaluation of
urban public transport priority performance based on the improved topsis
method: A case study of wuhan,” Sustainable Cities and Society, vol. 43,
pp. 357–365, 2018.
[24] F. Chen, Z. Yin, Y. Ye, and D. Sun, “Taxi hailing choice behavior and
economic benefit analysis of emission reduction based on multi-mode
travel big data,” Transport Policy, vol. 97, pp. 73–84, 2020.
[25] D. J. Sun, Y. Zheng, and R. Duan, “Energy consumption simulation
and economic benefit analysis for urban electric commercial-vehicles,”
Transportation Research Part D: Transport and Environment, vol. 101,
p. 103083, 2021.

You might also like