Crime Rate Prediction Based On Clustering: Bachelor of Technology Computer Science and Engineering
Crime Rate Prediction Based On Clustering: Bachelor of Technology Computer Science and Engineering
Crime Rate Prediction Based On Clustering: Bachelor of Technology Computer Science and Engineering
Clustering
A Project Work
submitted in partial fulfillment of the
requirements for the degree of
Bachelor of Technology
in
Computer Science and Engineering
Submitted By
Degree for which the project work is submitted: Bachelor in Technology in Computer
Science and Engineering
I declare that the presented project represents largely my own ideas and work in my
own words. Where others ideas or words have been included, I have adequately cited
and listed in the reference materials. The report has been prepared without resorting to
plagiarism. I have adhered to all principles of academic honesty and integrity. No
falsified or fabricated data have been presented in the report. I understand that any
violation of the above will cause for disciplinary action by the Institute, including
revoking the conferred degree, if conferred, and can also evoke penal action from the
sources which have not been properly cited or from whom proper permission has not
been taken.
Yash Jaiswal
(15SCSE105066)
Sharique Salam
(15SCSE101738)
Umer Farooq
Date: (16SCSE101849)
2
CERTIFICATE
It is certified that the work contained in this project entitled “Crime Rate Prediction
Based On Clustering” submitted by Yash Jaiswal (15SCSE105066), Sharique Salam
(15SCSE101738),Umer Farooq(16SCSE101849), for the degree of Bachelor in
Technology in Computer Science and Engineering is absolutely based on his own work
carried out under my supervision and this project work has not been submitted
elsewhere for any degree.
Date:
Countersigned by
3
Acknowledgements
First and foremost we would like to express our sincere gratitude to our project
supervisor, Prof. HIMANSHU SHARMA for always extending fullest cooperation
and dedicated guidance throughout the project
We would like to thank Ilavendhan A and Indrakumari for being panel members for
our evaluations and providing us with pointers for improvements.
We would also like to thank Suyel Namasudra for coordinating final year projects
successfully.
And last but not least we would like to extend our sincere gratitude towards
Department of Computer Science and Engineering, Galgotias University for providing
the support for us to successfully finish the project.
4
Abstract
Crime, in all its aspects, is a well-known social problem affecting the quality of life and
the socio-economic development of a society. As such, it is a major concern for law
enforcement agencies who are using different advanced technologies to tackle such
issues. The ability to predict the future crimes based on the location, pattern and time
can serve as a valuable source of knowledge for law enforcement agencies and help to
prevent crime. Predicting crime trends is a problem that has a rich history of attempts,
but recent developments in machine learning offer a new way of approach. There are
two approaches to predict crimes: Hotspots of crime and predicting the criminal event
for a specific time based on spatial and temporal patterns of crime dataset. Hotspots of
crime are those geospatial locations where there is a higher percent of probability for a
crime to occur. The objective of this dissertation work is to use advanced machine
learning techniques to predict the crime incident on the basis of the number of crime
occurring at a place, whether the accused or criminal is arrested or not and. The
experimentation is conducted on a dataset containing City of Chicago’s crime records
from 2001 to 2017.
First we go through the clustering techniques of K-means to see how the flat divisions
of crime frequencies into 5 classes can be done. Now this is just a naive technique for
distribution. In the last section of code, we have done the distribution and analysis of
the complete data which features distribution using the map analysis and hourly rate of
crimes and division by the types of crimes that are most prominent. But this is just an
analysis of the present data, and to gain predictive capability we apply machine
learning. First we dive into the logistic regression to get the most basic model of
predicting a date of a crime type, given its other parameters, as this is not a great success
we look forward to advanced techniques of Convolutional Neural Networks (CNN)
which take each row of the input data as independent and also the Recurrent Neural
Networks (RNN) to see the time stack effect of the data. At last we run an ensemble
model to see how it affects the accuracy of these individual classifiers.
5
Table of content
1. Introduction
1.1 Overall description
1.2 Purpose
1.3 Motivations
1.4 Objective
2. Literature Survey
2.1 Crimes and Analysis
2.1.1 Crimes and its effect
2.1.2 Crime Analysis
2.2 Existing Models
2.3 Tools and Algorithms
2.3.1 Tools
2.3.2 Algorithms
2.3.3 Visualization
3. Proposed Model
3.1 Solutions
3.1.1 Feature provided by platform
3.2 Architecture
3.2.1 Preprocessor
3.2.2 Statistical Analyzer
3.2.3 Machine Learner and Predictive Analyser
3.2.4 Descriptive Analyzer
3.2.5 Prescriptive Analyzer
3.2.6 Application Programming Interface (API)
3.2.7 Visualizer
3.3 Design Documents
3.3.1 Use Case View
3.3.2 Data Description
4. Implementation
4.1 Anaconda Console
4.1.1 Feature provided
4.2 Police District Boundary Redrawing
4.2.1 Overview
4.2.2 Proposed Algorithm
4.2.3 Model Work
4.2.4 Feed forward network
4.2.5 Convolutional network
4.3 Model specific
4.3.1 Data and preparation
4.4 Experiments
6
5. Future Work
6. Reference
7
1. Introduction
1.2 Purpose
The research problem that this project try to address can be stated as follows:
How to develop a software platform to conduct descriptive, predictive, and
prescriptive analysis of diverse crime data?
8
1.3 Motivation
High or increased crime-levels make communities decline, as crimes reduce house
prices, neighborhood satisfaction, and the desire to move in a negative manner .To
reduce and prevent crimes it is important to identify the reasons behind crimes, predict
crimes, and prescribe solutions. Due to large volumes of data and the number of
algorithms needed to be applied on crime data, it is unrealistic to do a manual analysis.
Therefore, it is necessary to have a platform which is capable of applying any algorithm
required to do a descriptive, predictive, and prescriptive analysis on large volume of
crime data. Through those three methodologies law-enforcement authorities will be
able to take suitable actions to prevent the crimes. Moreover, by predicting the highly
likely targets to be attacked, during a specific period of time and specific geographical
location, police will be able to identify better ways to deploy the limited resources and
also to find and fix the problems leading to crimes. Designing a tool which is easy to
use with minimal training would help law-enforcing bodies all around the world to
reduce crimes.
1.4 Objectives
•Develop a platform that can be used to analyze crime data using descriptive and
predictive data analytics techniques.
•Using the proposed platform analyze the spatial and temporal (time of day, day
of week, and seasons) relationships in crime data.
9
2. Literature Survey
2.1 Crimes
A crime can be defined as any action or omission that violates a law, which results in a
punishment. Usually what constitutes as a crime depends on the government bodies and
laws that are in existence in those places. To understand the nature of crimes, one has
to understand the nature of the crime, the victim-offender relationship, role of
guardians, and the history of similar incidents Regardless of the reasons why crimes
take place, they put a strain on the communities, towns, and cities. Crimes are a social
nuisance and being able to solve them faster is very important.
According to John and David theories of crimes can be divided into two categories
namely, those that seek to explain the development of criminal offenders and those that
seek to explain the development of criminal events. Criminology has been mainly
developed through theories and research on offenders. Only recently it has begun to
explain the crimes rather than criminality of people involved in it. Criminology consists
of many theories that explain how and why some offenders act in the way they do.
Following are some of theories that explain how places are associated with crimes.
1. Rational Choice Rational Choice suggests that offenders will select targets and
define means to achieve their goals in a manner that can be explained. Further it can be
explained as that human actions are based on rational decisions, that is they are
informed by probable consequences of that action
2. Routine Activity Theory This theory explains the occurrence of crimes as the result
of several circumstances. Namely, a motivated offender, a desirable target, target and
offender must be at the same place at the same time, and lastly absent of other types of
controllers intimate handlers, guardians, and place managers.
3. Crime Pattern Theory This theory combines the above two theories and goes on to
say that how targets come to the attention of offenders is influenced by distribution of
crime events over time, space, and among targets. An offender will come to know of
criminal opportunities while engaging in their day-to-day legitimate work. So a given
10
offender will only know about a subset of available targets. The concept of place is
essential to crime pattern theory.
Crime analysis is a difficult task, as it requires both collection and analysis of large
volumes of data. For example, Brown states that Richmond city in USA has
approximately 100,000 criminal records per year. Given the data volume and need to
apply different algorithmic techniques forbids manual analysis. Whereas an automated
analysis of such a rich data set could identify complex crime patterns and assist in
solving crimes faster.
Data mining techniques can be used in law and enforcement for crime data analysis,
criminal career analysis, bank fraud analysis, and analysis of other critical problems.
Some of the traditional data mining techniques are association analysis, classification
and prediction, cluster analysis, and outlier analysis, which identify patterns in
structured data. By using criminology theories along with modern technology would
help to identify crime patterns quickly and efficiently. To simplify the workload a crime
data analytic platform could be used which would help in simplifying the process while
driving more accurate and insightful conclusions and predictions.
Several applications have been already developed for the purpose of crime analysis.
Most of these tools are developed to help the police forces to identify different crime
patterns and even to predict criminal activities
1. ReCAP
Regional Crime Analysis Program (ReCAP) is a system designed to aid local police
forces through crime analysis and prevention. It works with Pistol 2000 records
management system. The system is quite old and is based on Windows 95 and NT, and
11
works only with a Local Area Network (LAN). This system has three main parts,
namely a database, geographic information system, and data mining tools. It provides
following key functions.
(a) Hot Sheet gives a summary of most important crime activity of the
region.
(b) Summary Reports These reports tally specific crime occurrences over a
user-defined time and area.
(c) Map-oriented Searches Provide a GIS display of the area along with
plotted criminal activity. The crimes can be displayed on the map according
to the type of crime, time/date of crime, location, suspect description,
weapons involved, etc.
(d) Cluster Analysis Uses algorithms such as k-means and nearest neighbor
to perform clustering, to identify statistically significant groupings of
crimes in an area.
Another solution is Ozguls Crime Prediction Model (CPM), which predicts offenders
of terrorist events based on location, date, and modus operandi attributes. It uses both
solved and unsolved crime information, learning from attributes of each crime. These
crimes are clustered according to the attributes. CPM looks for perpetrators of crimes,
with the assumption that majority of crimes in a single cluster were committed by the
same single offender group. This platform focuses mainly on terrorist activities and
groups.
Apache Storm is a free and open source distributed real time computation system.
Storm processes large volumes of high-velocity data in real-time. It is extremely
fast and can process over a million records per second per node on a cluster of
modest size
12
2. Apache Spark
2.3.2 Algorithms
Clustering Algorithms
Several papers have mentioned about many available algorithms that can be used for
the purpose of analyzing crime data. Shyam Varan Nath says in his paper that 10% of
criminals commit 50% of the crimes. Data mining has a higher influence in fields
suchas Law and Enforcement for crime problems, crime data analysis, criminal career
analysis, bank frauds and other critical problems. Following are some common
clustering algorithms that can be used for crime data analysis.
This algorithm is mainly used to partition the clusters based on their mean. As
a first step number of objects are grouped and specified as k clusters. K numbers
of objects are initially selected as the cluster centers. Then again these objects
are assigned based on cluster center. Then cluster means are updated again. This
algorithm is used as a base for most of the other clustering algorithms.
13
2.3.3 Visualization
It is important to visualize the crime data in order to get a clear idea about its distribution
and also to display the analyzed results in a user friendly manner. When achieving this
task it is important to display time dimension within GIS visualization system. But more
focus has been given to display the spatial distribution of high crime areas or hot spots
of crime. A new range of statistical and simulation modeling techniques are proving
very useful in spatial analyses of crime sites and patterns through computer mapping.
Examples of statistical and simulation modeling techniques include point pattern
analysis, nearest neighborhood analysis, location quotient analysis, travel pattern
analysis, neural networks and numerous statistical techniques including cluster
analysis, spatial regression and auto correlation For example, using these tools,
investigators are gaining insight into the location behavior of offenders. Moreover, new
developments in criminological theories like environmental criminology require more
reliance on visual graphical methods such as topology, and fractal geometry so that
patterns, designs and edges stand out. However, almost all crime mapping efforts and
techniques has focused on spatial domains typically by aggregating the crime data over
time. So it is important to identify techniques to display crime data in a way that spatio-
temporal attributes of crimes are presented
1. Pin Maps
A pin map can give the viewer a very good feel for the density and distribution of
crime. The viewer gets a very good overall picture, without losing sight of the
individual crimes. These pin maps can be rotated, zoomed in, translated and tilted to
examine the data more easily from any angle.Figure2.1 shows how pin maps appears
when zoomed in. Furthermore, different variants of pin maps can be created easily.
Perhaps the greatest limitation of a pin map is that it is a purely spatial display. In
order to produce a pin map, the user has to pre specified a time zone for which the
pin map is created.
14
Pin map
This three dimensional display technique is helpful in comparing the spatial data
pertaining to two different time zones. In this technique one type of bars indicates
the crimes committed during one time frame of day while other type of bars indicates
the crimes committed during other time frame of the day. This view is useful in
allocating police personnel to areas during different patrol shifts. One advantage of
this method is the familiarity of the user with height bars in two-dimensional displays.
Depending upon the cell size, this technique can be extended to compare more than
two time zones.
15
Figure: Side-by-side height bars
16
Stacked Time Aggregated Cummilative Graph
4. Hot Spots
Crime hot spots are prime exemplars of the potential value of place in the analysis of
crime. Using this technique we can identify highly concentrated crime areas.
Importance of this is we can use that to have an idea about expected dispatched
police calls for service. While often motivated by pragmatic concerns about what
interventions are likely to be effective in reducing crime, results like these also serve
to sharply focus crime theory on developing satisfactory accounts of these apparently
strong relationships between crime and place. Crime studies that examine the spatial
distribution of crime clearly demonstrate that certain land uses and population
characteristics are associated with crime hot spots. Hot spots are systematic (regular
and predictable) and not just random occurrences
17
: Side-by-side height
18
3. Proposed Model
3.1 Solution
The platform which is going to be developed directly targets the crime domain. It
has the ability to analyze crime data in three different ways, namely Descriptive,
prescriptive, and predictive analytics. The most important part of this platform is
that, it is designed to be scalable to support different types of crime data analysis.
Descriptive Analyzer use both quantitative and qualitative data along with
analytical techniques. Descriptive analyzer basically provides relationships
between crimes and identifies the pattern of crimes and temporal and spatial
relationships between crimes. It also provides statistical summary of a given data
set.
Prescriptive analyser tries to identify the reason behind the crimes and gives the
suggestions to avoid or reduce the crimes. It has the ability to identify significant
factors related to the crimes committed. Through plugins users can manipulate and
extend the platform for specific needs.
Predictive analytics methods are mainly used for predicting category of a crime
which can be occurred somewhere at a given time. Predict crime category system
integrates population and race data accordingly to the given crime data set.
19
6. Discover spatial and temporal patterns of crime data using visualization
features.
3.2 Architecture
Depicts the high-level architecture of the proposed Inquisitors Crime Data Analytic
Platform (CDAP). Data Receiver is used to provide data to the platform and the data
persistence unit is used to store data used by the system and trained models. The
preprocessor is used to preprocess raw data received by the Data Receiver which in
20
turn fed into the next layer. It consists of two core modules, Statistical Analyzer and
Machine Learner. Using those two modules, the Descriptive analyzer, Prescriptive
analyzer, and Predictive analyzer have been implemented. Also there are few other
components which provide a vital contribution to the CDPA. The functionalities which
are provided by prescriptive, descriptive and predictive analyzers are exposed to the
user through an API. The results which are provided by CDAP according to the users
requests can be visualized using the Visualizer component.
3.2.1 Preprocessor
The module Preprocessor which is used by all other modules, consists of the features
and functionalities which are needed to preprocess the data before feeding them into
other modules such as Machine Learner, Statistical Analyzer.
Within this module, functionalities needed to preprocess the input data have been
implemented. After receiving the CSV data file from the user specified location, the
data extracted from that file will be processed in three stages.
2. Data Integration: Data with different representations are put together and
conflicts within the data are resolved.
Statistical analyzer provides basic statistic of data set feed to the framework. It
provides API to get Simple statistics like mean, variance median, column statistics to
complex statistics like FP growth Algorithm based frequent item sets. The Statistical
Analyzer module provides an API to perform basic statistical analysis on the crime
dataset which is provided by the user. It enables user to get following analytical
outcomes.
21
•Column wise statistics: Mean, Variance, Median.
In this platform, predictive analytics methods are mainly used for predicting category
of a crime which can be occurred somewhere at a given time. In order to integrate
predictive analytics features, it is necessary to have a machine learning component as
well.
Descriptive Analyzer use both quantitative and qualitative data and analytical
techniques. Qualitative data and analytical techniques refer to non numerical data as
well as the examination and interpretation of observations for the purpose of
discovering underlying meanings and patterns of relationships. This is most typical of
field research, content analysis, and historical research. Quantitative data are data
primarily in numerical or categorical format. Quantitative analysis consists of
manipulations of observations for the purpose of describing and explaining the
phenomena that those observations reflect and is primarily statistical. Descriptive
Crime analysis employs both types of data and techniques depending on the analytical
and practical need. For example, crime data can be used in various ways, both
22
quantitatively and qualitatively. The information such as date, time, location, and type
of crime is quantitative in that statistics can be used to analyze these variables. Apart
from spatial distribution of crimes, this considers the temporal relationship between
the crimes, geographical situation, Population density and weather condition to give
more detailed description about crime.
Urban areas are divided into several police districts usually based on demographic and
geographic parameters. This is mainly done to divide the workload and to address the
issues which includes homicides, burglaries and other kind of crimes more effectively
by dividing the resources available for the police forces in a more efficient way. For
example San Francisco is divided into 10 police districts. Most of the police districts
that exist today have been formed many years ago and with changes happening over
time and movement of population we have to question whether the existing police
districts provides for the purpose they were created
As part of the solution Inquisitors Crime Data Analytic platform can redraw police
districts boundaries with regard to the distribution of population in a more uniform
manner .The platform provides capability to redistrict given country for any amount
of possible police district according to the population distribution.
Crime Data Analytic platform uses heuristic based approach to rationalize police patrol
beats in efficient way. For improve the functionality of police patrol beats and make
them more effective, workload distribution among crimes will be equal and provide
very low police patrol response time.
23
framework through an API. Both types of functionalities which include in the facades
and not included in the facades can be gained through the Crime Data Analytics
Framework API. For an example Preprocessor Facade in preprocessor module only
provides the functionalities to handle missing data by deleting the entire row. But
thorough the API, user can select whatever the missing value handling option
implemented in the preprocessor module according to his/her preference (like predict
the missing value of replace using the frequent item and so on).
3.2.7 Visualizer
The visualizer component enables the user to interactively analyze the dataset and
the results provided by the predictive, prescriptive and statistical analyzer. This
component provides data visualization through models such as histograms, heat maps
and tables, line graph pinmaps. The data required for the visualization is retrieved by
connecting with the prescriptive, predictive and statistical analyzers through the
provided API.
24
3.3. Design Documents
The crime data Analysis platform has design to handle specific use shows use cases of
the CADP.
25
3.3.2 Data Description
This platform is based on Apache Spark engine. Therefore, within the framework
inside the platform, data will be kept and processed as data structures used by
Apache Spark such as Data Frame, Dataset. But since the user should not have
to aware of how to use spark, those spark data structures have been hidden within
some new custom data structures which enables user to use the platform easily
without a knowledge about Apache Spark. Basic data flow within the platform.
26
4. Implementation
We implemented a web application to visualize the project output to the user.
This application provides user the access to all the implemented features of the
platform, including descriptive, predictive, and prescriptive analytics
4.2.1 Overview
27
the smallest geographical area for which police statistics are kept and reported to
interested parties .
Better districting plans lead to lower response times, officers familiarization with
their assigned area, more efficient use of personnel, more equal division of workload,
a visible police presence, enhanced officer safety, officer accountability, and balanced
police response to calls. Traditionally, these geographic patrol boundaries are drawn
by hand based on a police departments knowledge, experience, and the available
police resources . Most police departments also lack a method for formally evaluating
and comparing the performance of competing district plans, instead relying on the
judgement and intuition of police planners. However, given the complexities of the
police districting plan, it is unlikely that an optimal districting plan will be chosen by
chance using this method. Moreover, as almost always these districts or zones are
formed for some form of jurisdiction, the districts formed should be compact in space
in order to facilitate the application of laws and regulations within the district .
Districting is the process of dividing a larger geographic space into smaller regions or
districts or zones. In our heuristic-based clustering method,
28
3. Each cluster is grown by adding neighbouring polygons to it which are chosen
based on heuristic value F.
29
4.2.3 Model Work
In this work we use four different types of neural networks to forecast crime, which are
described in this section in order of increased complexity. The specifics of data
preparation are discussed in Section 4, but the basic format is as follows for the purpose
of this section. Each city is split into grid cells (beats for Chicago and a square grid for
Portland), and within each cell there are a set of features corresponding to a certain day.
One record contains all the features of all cells for a given day. The prediction of crime
count for the next day is done by predicting one of some number of bins, B, for the
count in each cell (e.g. crime count in ranges 0-5, 6-10, etc.).
The first model we use is the simplest, a feed forward network. This network consists
of several layers of units, where the first layer is the input, the last layer is the output,
and the inner layers are called hidden layers. The input and output layers have fixed
sizes. The size of the input layer corresponds to the number of features in the data, and
the output layer has the same number of units as the number of possible predictions.
The hidden layers have no connection to the data or prediction, hence the name, and so
the number of hidden layers and their sizes are determined experimentally. Increasing
the number of hidden units increases the network’s abilities to learn, but too many
hidden units tend to overfit.
Each unit is connected to every unit in the previous and following layer by a weight,
see Figure 1. Each data record is inserted into the input layer, and then using the network
weights, the values are fed forward through the network until the output layer. The
values are fed from one layer to another by performing a matrix multiplication (which
includes a bias) and then putting the results of this through a nonlinear activation
function. In our work, we use rectified linear units for the hidden layers and a softmax
activation for the final layer. The softmax is key for our predictions, as it gives the
probability of each classification value.
zt = f (xt;W,b)
where xt is one input record at time t and f is a nonlinear function representing the neural
network which is parametrized by matrix W representing the weights for the layer and
30
by the bias vector b. Let C be the number of spatial units (cells or beats). The softmax
layer takes the final activation zt = (zt1,zt2,...,ztC) ∈ RB·C, where ztc ∈ RB for c = 1,...,C, as
input, and outputs a vector of probabilities o for each bin, where the probability of bin
j ∈ {1,...,B} is
where ytcj = 1 if the crime count corresponding to record xt in cell c falls in bin j and 0
otherwise. Of course, we fit the model for each beat and input record, and so our loss
function is given by
L = XXLtc. (1)
t c=1
The loss is back-propagated through the network by taking derivatives of the activation
functions and the matrix multiplication by use of the chain rule. The backpropagation
calculates the gradient, and the weights are updated accordingly. These equations are
applicable to all subsequent models. During training, it is most efficient to split the
training records into minibatches and then to calculate the gradient for each record in
parallel. After the gradients are all computed, the average is taken and the network
weights are updated before the next minibatch is introduced.
Convolutional networks were built to handle images, but for the purposes of our
problem, we can think of a map of a city as an “image.” In this “image,” each grid cell
corresponds to a pixel and the value of each different feature corresponds to the pixel
value for that feature’s channel. A convolutional network works by first sliding a
number of smaller filters over an image in the convolutional layers. This computes a
dot product between the weights in each filter and the local region of the image that the
31
filter is currently covering. An activation function is applied to the dot product, and the
resulting pixel grid is downsampled in the spatial dimensions by means of max pooling.
Max pooling works by sliding a smaller grid over the pixels and only keeping the value
of the highest responding pixel in that grid. This sequence of operations is performed
some number of times until the final pixel grid is flattened, and then used as input to a
traditional feed forward network for classification.
Since the weights of the filters are fixed as they pass over the pixels, the filters in the
lower levels learn local features seen at a small spatial scale. As the pixels are down
sampled, the later filters pick up on global features built from the lower level ones.
The datasets we use are in the areas of crime, public transportation, weather, and census
data. For Chicago, the first two are publicly available through the city of Chicago’s data
portal website, https://data.cityofchicago.org/. For Portland, the crime data was
obtained from the National Institution of Justice Real-Time Crime Forecasting
challenge. Unfortunately, there was no source for public transport data in Portland. The
latter two datasets are also publicly available as well. The census data is available
through the United States Census Bureau at https://www.census.gov/data.html, and the
weather data is available through the National Oceanic and Atmospheric
Administration at https://www.ncdc.noaa.gov/. The details of each dataset are
discussed next.
We split each city in grids, and so all this data must be appropriately mapped to the
grids. For Chicago, we have 274 grid cells, corresponding to the 274 police beats in the
city. The city provides a shapefile of the beats, and so all the datasets that include
latitude and longitude can easily be mapped into the correct police beat. For Portland,
only the districts are provided, and these are too large to be useful for predictions.
Instead, we create our own uniform grid with given maximum/minimum latitudes and
longitudes. We do not use the maximum/minimum of the city limits, since this includes
many irrelevant areas such as rivers or the airport, but rather manually choose these
values.
32
For the crime specific data for Chicago, we use roughly 6 million records going back
to 2001, were each record is a reported crime with many informative fields. Of course,
geographic information is provided, such as longitude, latitude, beat, district, etc., but
there are also fields regarding the type of crime and binary fields, such as if an arrest
was made or if it was a domestic crime. These records are all first aggregated by beat
and day to form the input for our networks. This means the input to the networks
corresponds to a given day, and the true labels are the counts of reported crimes for
each beat. We place the true counts into one of ten bins and we use these bins for
classification. Figure 5 shows the counts of each bin over all days for three Chicago
beats representative of beats with low, medium, and high standard deviations for crime
count. There is no significant change in the distributions over the timeframe of the
dataset.
Aside from just aggregating the total count, we also obtain counts for other fields
including different crime primary types (e.g. theft), number of arrests/nonarrests, and
domestic crimes. Including different primary types is important because they behave
differently from the total count and from one another. Figure 6 shows the different
distributions for three different crime primary types, for which we use Assualt, Theft,
and Narcotics type crimes.
For Portland, the dataset is not as rich, but it has several of the same fields. The provided
information does not include all reported crimes, but rather calls for service that the
police received. The calls are broken into groups based on type, and they include
33
coordinates that allow for easy mapping. Thus for Portland we are predicting the
number of calls for service, rather than the raw crime count. Unlike Chicago, the
Portland data does not include any fields indicating whether or nor an arrest was made
or if the crime was domestic, so these features are missing.
The public transportation data that is only used for Chicago, as no similar datasets are
available for Portland. The data provides station and line information for the Chicago
Transit Authority (CTA) trains and buses. Specifically, we use the daily ridership
number for each train and bus line, as well as the number of entries at each train station.
Total riderships numbers are determined by summing up the entries at all stops along a
given bus or train line. This number is clearly not tied directly to the beats, but these
numbers do affect the beats that each line passes through. The latitude and longitude
Figure . Choropleth maps for counts of Assault, Theft, and Narcotics crime types in
the Chicago beats over all records. Blue is lowest count and red is highest.
of each train station is available, and so each station is mapped to a beat, and then the
number of entries is summed up for each beat.
The weather data is available at various weather stations across the country, but some
of these stations have more complete records than others. To map the data for each city,
at each day we find the nearest available weather station with an entry in the relevant
field and assign it to the beat or cell. The NOAA provides a large amount of fields, but
the majority are incomplete and so are not usable for our purposes. The weather fields
34
we use are temperature maximum, temperature minimum, midday temperature,
precipitation, snowfall, and snow depth.
The census data is more static than the other datasets mentioned, and the only one that
is not provided at a daily level. We use data from the last two censuses, 2000 and 2010.
The census information we use for each day is the one that was taken closest to that
day. The census tracts are mapped into the beats and cells in a similar manner to the
train stations from the transportation data. There are too many features from this dataset
to list here, but they include socioeconomic fields such as income, racial makeup,
family information, age, etc..
For Chicago, we have the following feature breakdown. For FFN, for each beat there
are approximately 40 crime specific features (e.g. number of arrests made and counts
of crime sub-types), 6 weather features, 2 public transportation features, and around 40
census features. About 10 of these crime features contain the history of the beat, such
as the running average of the crime count for the past few days. There are also non-beat
features formed from the crime and transportation datasets. For non-beat specific
transportation features, we use the total ridership numbers for each bus and train line,
of which there are approximately 200. Additionally, this dataset includes day type
information: including day of week, month of year, and day classification as a weekday,
weekend, or holiday. For crime, we also include aggregation counts of different crime
sub-types at the larger spatial region of police districts and community areas. For
models using a CNN, the top spatially correlated variables are used as channels. The
spatial correlation is calculated by determining the Global Moran Index of each feature,
which takes into account both the feature’s value and location. Since having too many
channels can greatly increase the computational cost, only the top eight spatially
correlated variables are chosen for the CNN: the total count from the previous day,
arrests made, domestic crimes, crimes without arrests, narcotics crimes, theft crimes,
assault crimes, and CTA boardings. For the RNN models, the features are essentially
the same as the FFN models, except the beat’s historical features are no longer needed
since they are captured by the hidden state. For Portland, the weather and census
features are the same, and there is no transportation data set. There are around 30 crime
specific features per beat, and there are no non-beat features as with the Chicago data
set.
35
4.4Experiments
In this section, we discuss three experiments designed to test the accuracy and insights
of our models. The purpose of the first experiment is to establish the best network
structure. The second experiment is about the importance of various datasets used. The
final experiment focuses on the importance of the features.
The first experiment uses all of the data sources. In addition to the total crime count
in each beat, we also ran experiments for three sub-types in Chicago. Types 1, 2, and 3
correspond to violent crimes (e.g. assault), theft crimes, and narcotics crimes,
respectively. For the FFN, the model has two hidden layers and a soft max layer for
each beat on top of these layers. There are ten classes in each soft max layer
corresponding to the following count bins: 0, 1-3, 4-6, 7-9, 10-12, 13-15, 16-20, 20-24,
25-29, 30+. Similarly for the RNN, we use two layers, both of which are connected to
the previous temporal time steps.
For the CNN models, we use an input grid of size 30 x 10. For Portland we split the
city into 300 cells so filling in the channels is simple, but for Chicago we map the beat
coordinates onto a 30 x 10 grid and combine the features for beats falling in the same
cell. Since 30 x 10 is a small input for a CNN, we simply use the sequence of
convolution, max pooling, and convolution. This means the final output grid size is
halved, 15 x 5, and the flattened output layer is of size 4800 because the last
convolutional layer uses 64 filters. After flattening, there is one hidden layer with half
the dimensionality of the flattened layer. This hidden layer is combined with the feed
forward features.
The classification accuracies for each network and each prediction task is found in
Table 1. For each task, the FFN has the lowest accuracies and the RNN+CNN has the
highest accuracies, which is expected. The most common bin, corresponding to 1-3
crimes, occurs for 41% of the records, and so these accuracies are well above this
baseline. Although there are engineered temporal and spatial features for the FFN, the
network structure of the RNN+CNN is better able to account for these aspects of the
problem with a simpler feature set. This is likely because the engineered features have
hard limits, such as average crime counts for the past four days or crime count on the
previous day in the larger police districts, whereas the RNN+CNN model has no such
restrictions.
36
In addition to classification accuracies, we also calculated the Mean Absolute Scaled
Error (MASE) for each task using the top performing RNN+CNN model. The MASE
is calculated using the equation
where et is the absolute value of the error at time t, Yt0 is the actual value for the
prediction at time t and n is the total number of predictions made. We use MASE instead
of the more popular MAPE since many actual values are 0 leading to divisions by 0 in
MAPE. Since we use bins instead of exact numerical predictions, the predicted values
used are simply the midpoints of their bins (e.g. 2 for the 1-3 bin). We calculate each qt
for each beat or cell, and then take the average for each task. An error less than one
indicates the forecast is better than the one-step na¨ıve prediction. Using the best models
for each task, we obtained the following MASE values: 0.74 for Chicago total, 0.77 for
Chicago Type 1, 0.78 for Chicago Type 2, 0.83 for Chicago Type 3, and 0.77 for
Portland Total.
Table 1 also shows that the different sub-types have different levels of improvement
for the more complex models. Type 2 crimes, corresponding to theft, have the highest
improvement of all tasks when a CNN is added to the model. On the other hand, Type
3 crimes, narcotics, have a significantly smaller improvement with the addition of only
a CNN. This suggests that spatial aspects are more important for predicting theft crimes
than narcotics. The Type 3 narcotics crimes did have a bigger improvement for models
37
with an RNN, indicating that temporal patterns are more important for this type of
crime.
For the second experiment, we use the RNN+CNN model for predicting Chicago total
count by systematically removing individual datasets. The size of these networks is the
same as in the previous section, with the exception of a reduction in the input layer size.
In order to leverage the results of the previous experiment, we transfer over the weights
from the input layer in the network corresponding to the remaining datasets as well as
the full weights in the hidden layers. We train using the same walk-forward method as
in the first experiment. The average classification accuracy decreases are reported in
Table 2.
Accuracy
Drop
Public 2.3
Transportation
Weather 0.7
Census 4.1
Table. Accuracy when excluding datasets from RNN+CNN model on total crime count
for Chicago.
From Table, it is clear that although each dataset does not contribute equally, all of
them do give some noticeable improvement to the classification accuracy. Census data
offering the largest improvement is not surprising, as there are well known links
between socioeconomic factors and crime rates in communities. Public transportation
also offers a
significant improvement, likely because it gives a sense of the number of people and
traffic through a given beat on given days. With more people passing through a
particular area of the city, there is an increased chance for crime. Weather is the least
informative of the three, since Chicago is a relatively small area and thus fairly
homogeneous with respect to weather patterns, and as a result the values for weather
do not vary much from beat to beat.
38
For the third and final experiment, we explore the conditions that affect model
performance on the Chicago crime data. For beat performance, the most accurate beats
are those that have consistently low counts, e.g. falling in the first two bins. However,
this causes the model to generally under predict the actual count, since the model
predicts the first two bins under most circumstances. The second most accurate subset
of beats are those with consistently high crime counts. The predictions for the high
count beats slightly trail the low count beats overall, but they are better at predictions
when the actual count changes by one bin and much better when the actual count
changes by two or more bins. The mid count beats have the highest variability, with the
performance values in between the high and low count beats. These results are
summarized in Table for the Chicago total crime count on our RNN+CNN model.
Table. Accuracy for different subsets of beats. Low, Mid, and High correspond to the
beats with the lowest, middle, and highest 10% of crime counts.
Next, we study the impact of weather conditions on model accuracy. To carry out this
accuracy test we split the test records into ten percentile based groups for each weather
condition. The accuracies of these record subsets are plotted against the centers of the
percentile bins in Figure 8. We find that temperature does not have an effect on the
accuracy any more than random noise around the overall accuracy. Days with higher
amounts of precipitation and snow, however, do decrease the accuracy of the best model
slightly. The bottom 65% and 85% of precipitation and snowfall values respectively
are 0, and so there is no significant difference in accuracy among these values.
However, looking at the days with the highest values of precipitation and snowfall, the
accuracy does drop slightly. It is unclear whether this is because these types of days are
inherently less predictable or because there are fewer training examples for these days.
39
The effect of weather conditions on the accuracy from Figure is consistent with the
results of Experiment.
Finally, we look at the effect of the public transportation data. While bus routes run
across the whole city and through the vast majority of beats, the train lines run through
a smaller subset of the beats. An interesting result is that the accuracies of a beat
containing a train station is on average 1.2% higher than its neighboring beats. The
accuracy of a beat which contains one or more train lines passing through it (but not
necessarily containing a station) is 0.5% more accurate than its neighboring beats
provided these beats do not also include a train line. This indicates that both the station
entries and total ridership of the line are important for those beats that contain a train
line and/or station. Also contained in the public transportation data is a flag for the type
of day: weekday, weekend, and holiday. Weekdays saw better performance than
weekends and on average had an accuracy 0.9% higher. They also outperformed
holidays by 1.1%. This shows the model performs better during weekdays, which is not
surprising because there are more events and activities that may happen on weekends
or holidays, leading to less consistent conditions in the city.
40
5 Future Work
As the data size and the covering geographical area is increased the solution provided
by the Crime Data Analytic Platform needs more computation power. So it needs
various parallelized and distributed system techniques. System currently uses crime
data, census block data, census tract data, population data and race data to do data
mining, prediction, police district boundary generation and patrol beats generation. But
system can be extended to integrate other relevant data like offender residence,serial
killers data.
Also the proposed algorithm can be further improved to consider about the
prioritization of calls for services. The proposed algorithm can also be improved to
provide police patrol routes, rather than an optimal position for a police car to locate.
This can be done by considering different time periods, seasons, and also special
occasions like New Year Eve. Furthermore using a proper GIS plan, the GIS data can
be integrated with the crime data set. In this way, we can significantly improve the
presision and the recall of the prediction model trained by the CDAP.
When considering about the visualizer, the capability of it can be greatly improved by
integrating more visualizing models such as various graph types. Also binding data
from backend to front end can be optimized further and it will improve the user
experience significantly, Also by providing interactive guidance to the user, it will
make the platform to be used by anyone with or without domain knowledge.
41
References
42
[8] A. Verma, R. Ramyaa, S. Marru, Y. Fan, and R. Singh, “Rationalizing
police patrol beats using voronoi tessellations,” IEEE Intl. Conf. on Intelligence
and Security Informatics (ISI), pp. 165–167, 2010.
[9] J. E. Eck and D. L. Weisburd, “Crime places in crime theory,” Crime and
place:
Crime prevention studies, vol. 4, 2015.
[10] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, and M. Chau, “Crime data
mining:
a general framework and some examples,” Computer, vol. 37, no. 4, pp. 50–56,
2004.
[13] S.Gopalani and R.Arora, “Comparing apache spark and map reduce
with performance analysis using k-means,” International Journal of Computer
Applications, vol. 113, no. 1, pp. 8–11, 2015.
43
[17] Saedsayad.com. (2016) Decision trees. [Online]. Available:
http://www.
saedsayad.com/decisiontree.htm
[22] M.Pazzani, “On the optimality of the simple bayesian classifier under
zero-one loss,” Machine learning, vol. 29, no. 2–3, pp. 103–130, 1997.
44
45
46
47
48
49
50