Cartoonifying An Image: T. E. Computer Engineering
Cartoonifying An Image: T. E. Computer Engineering
Cartoonifying An Image: T. E. Computer Engineering
IMAGE
Submitted in partial fulfillment of the requirements
of the degree of
T. E. Computer Engineering
By
Guide (s):
University of Mumbai
2020-2021
1
CERTIFICATE
This is to certify that the project entitled “Cartoonifying An Image” is a bonafide work of
“Kartik Parekh (37), Shubh Shah (38)” submitted to the University of Mumbai in partial
fulfillment of the requirement for the award of the degree of T.E. in Computer Engineering
Examiners
1.
2.
Date:
Place: Mumbai
Declaration
I declare that this written submission represents my ideas in my own words
and where others' ideas or words have been included, I have adequately cited and
referenced the original sources. I also declare that I have adhered to all principles
of academic honesty and integrity and have not misrepresented or fabricated or
falsified any idea/data/fact/source in my submission. I understand that any
violation of the above will be cause for disciplinary action by the Institute and can
also evoke penal action from the sources which have thus not been properly cited
or from whom proper permission has not been taken when needed.
(Signature)
Kartik Parekh 37
Shubh Shah 38
Date:
Abstract
Since the user for cartoon images retrieval system targets to get relevant images to query image from
database within same object (i.e. a user has cartoon image with object ―any realistic photo or human
face, in this case the user will target to get all relevant image with), therefore An important step in
cartoon image retrieval is defining the object within cartoon image .In this paper, an efficient method for
objects extraction from normal images is introduced; it is based on general assumptions related to color
and locations of objects in cartoon images, the objects are generally drawn near the center of the image,
the background colors is the more frequently drawn near the edges of cartoon image, and the object
colors is less touch for the edges. The processes of color quantization, seed filling and found the object
ghost have been used. The results of conducted tests indicated that the system have promising efficiency
for extracting both single or multi object(s) lay in simple and complex backgrounds of cartoon images.
We have performed qualitative and quantative analysis and test on many images to get a more accurate
result.
Contents
Chapter Contents Page No.
INTRODUCTION 7
1.1 Description 7
1.2 Problem Formulation 7
1 1.3 Motivation 8
1.4 Proposed Solution 8
1.5 Scope of the project 8
2 REVIEW OF LITERATURE 9-11
SYSTEM ANALYSIS 12-13
3.1 Functional Requirements 12
3.2 Non-Functional Requirements 12
3 3.3 Specific Requirements 12-13
Use-Case Diagrams and description 13
3.4
Cartoon images play essential roles in our everyday lives especially in entertainment, education, and
advertisement, that become an increasingly intensive research in the field of multimedia and computer
graphics. The automatically cartoon object extraction is very useful in many applications; one of the
most importantly is the cartoon images retrieval, where the user for cartoon images retrieval system
targets to get similar images to query image from database in character (i.e., a user has cartoon image, so
the user will target to get all relevant image with same real life character). Today, a number of
researchers have exploited the concepts related to content based images retrieval (CBIR) to search for
cartoon images containing particular object(s) of interest. Several region-based retrieval methods
proposed.
Some of the automatic methods, which discriminate the region(s) of interest from the other less useful
regions in an image, have been adapted to retrieve cartoon characters; they use partial features for
recognizing regions and/or aspects which are suitable for cartoon characterization or gesture recognition.
Some efforts go beyond extracting central objects, others used Salient Object Detection (SOD). In this
paper, a simple automatic method for objects extraction from cartoon image is proposed; it is based on
the assumption that the wanted object is founded within or close to the central part of image.
1.1 Description:
The process carried out here is cartoonization in which a given realistic image or any human face can
be converted into a cartoon type filter. Any given realistic photo gets converted into a cartoon type
photo on a click of a ‘convert’ button. Basically this filter involves with smoothing of the image and
masking the sharp edges on the converted image.
Our childhood was filled with cartoons as an integral part of our live at that time, we all used be
very fond of cartoons and animated character, thereby here we propose a system where you can
convert any given image (let it be image of your own) to a cartoonified version of it. We have used
one of the simplified algorithm of ML for converting our image to proper cartoonified version of it.
1.3 Motivation:
Following were the motivation which led us to research:
The higher literature review reveals that there varied gaps within the study of converting image to
a cartoon image.
An obvious disadvantage of smoothing is the fact that it does not only smooth noise, but also
blurs important features such as edges and, thus, makes them harder to identify.
Linear diffusion filtering dislocates edges when moving from finer to coarser scales.
To implement multiple number of bilateral filters.
To apply multiple number of values to the existing parameters.
In this project, we have proposed a technique wherein we import an image and then we start with
multiple processes to get to the final cartoonified output. First we convert the image in gray scale
resolution then use edge detector to detect the sharp edges of the image then after the edges are
extracted we blur the image to give it a more cartoonifying effect and last we apply colour
quantization process to convert the image into cartoon painting and thus after this our final output is
ready.
The project showed that image was successfully converted into a cartoon-style image with help of
Cartoon colour quantization process and masking the edges and giving it a smooth effect also the video
clips were transformed into an animation clip with the help of the python library called cv2. In the future
work, we would like to focus more on generating a portrait defined HD image even though we used the
loss function but still failed to the result. We also plan on focusing more on the video conversion so we
get HD or a 4k quality video which will be more beneficial. Humans around the globe like to see a better
quality of image or movie or let it be any visuals, so achieving a HD quality remains our main focus and
a scope to extend this project.
2 REVIEW OF LITERATURE:
We have referred these three research papers for some more information on our project topic
Cartoonifying An Image.
Paper 1:
Paper 2:
Paper 3:
Dataset: We used around 1000 images (including realistic images, some reallife photos and human face) which
were made to undergo our system for testing and finding accuracy. We used different images like light colour or
dark colour images to check whether the functionality changes or not. The images taken into consideration and
their relevant output is shown in Fig1. All the images taken into consideration gave a cartoonified output.
Libraries in python:
NumPy==1.19.5
OpenCV-python==4.5.1. *
Scipy 1.6.3
3.2 Non-Functional Requirements: These are basically the quality constraints that the system
must satisfy according to the project contract. The priority or extent to which these factors are
implemented varies from one project to other. They are also called non-behavioral requirements.
The processing of each request should be done within 10 seconds.
The system should provide better accuracy.
The image should be clear.
The system should be user friendly.
Hardware:
The hardware environment consists of the following:
CPU: Intel Pentium IV 600MHz or above
Mother Board: Intel 810 or above
Hard disk space: 20GB or more
Display: Color Monitor Memory: 128 MB RAM
Other Devices: Keyboard, mouse
Client side:
Monitor screen: The software shall display information to the user via the monitor screen
Mouse: The software shall interact with the movement of the mouse and the mouse buttons.
The mouse shall activate areas for data input, command buttons and select options from menus.
Keyboard: the software shall interact with the keystrokes of the keyboard.
Software:
Development Tools:
Front End: Django
Back End: Python
Operating System: Windows 10
The actual program that will perform the operations is written in Python.
(Fig 1)
(Fig 2)
4.2 Functional Diagram:
(Fig 3)
5 DESIGN
Although Canny is an excellent edge detector that we can use in many cases in our code we will use
a threshold method that gives us more satisfying results. It uses a threshold pixel value to convert a
grayscale image into a binary image. For instance, if a pixel value in the original image is above the
threshold, it will be assigned to 255. Otherwise, it will be assigned to 0 as we can see in the following
image.
However, a simple threshold may not be good if the image has different lighting conditions in different
areas. In this case, we opt to use cv2.adaptiveThreshold() function which calculates the threshold for
smaller regions of the image. In this way, we get different thresholds for different regions of the same
image. That is the reason why this function is very suitable for our goal. It will emphasize black edges
around objects in the image.
So, the first thing that we need to do is to convert the original colour image into a grayscale image. Also,
before the threshold, we want to suppress the noise from the image to reduce the number of detected
edges that are undesired. To accomplish this, we will apply the median filter which replaces each pixel
value with the median value of all the pixels in a small pixel neighbourhood. The
function cv2.medianBlur()requires only two arguments: the image on which we will apply the filter and
the size of a filter.
The next step is to apply cv2.adaptiveThreshold()function. As the parameters for this function we need to
define:
max value which will be set to 255
cv2.ADAPTIVE_THRESH_MEAN_C: a threshold value is the mean of the neighborhood area.
cv2.ADAPTIVE_THRESH_GAUSSIAN_C: a threshold value is the weighted sum of
neighborhood values where weights are a gaussian window.
Block Size – It determents the size of the neighborhood area.
C – It is just a constant which is subtracted from the calculated mean (or the weighted mean).
For better illustration, let’s compare the differences when we use a median filter, and when we do not
apply one.
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
edges = cv2.adaptiveThreshold(gray, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 5)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray_1 = cv2.medianBlur(gray, 5)
edges = cv2.adaptiveThreshold(gray_1, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 9, 5)
As you can see we will obtain much better results when we apply a median filter. Naturally, edge
detection obviously is not perfect. One idea that we will not explore here and that you can try on your
own is to apply morphological operations on these images. For instance, erosion can assist us here to
eliminate small tiny lines that are not a part of a large edge.
Morphol
ogical transformations with OpenCV in Python
2. Image filtering
Now we need to choose a filter that is suitable for converting an RGB image into a color painting or a
cartoon. There are several filters that we can use. For example, if we choose to
use cv2.medianBlur() filter we will obtain a solid result. We will manage to blur the colors of the image
so that they appear more homogeneous. On the other hand, this filter will also blur the edges and this is
something that we want to avoid.
The most suitable filter for our goal is a bilateral filter because it smooths flat regions of the image while
keeping the edges sharp.
Bilateral filter
Bilateral filter is one of the most commonly used edge-preserving and noise-reducing filters. In the
following image you can see an example of a bilateral filter in 3D when it is processing an edge area in
the image.
Similarly to the Gaussian, bilateral filter replaces each pixel value with a weighted average of nearby
pixel values. However, the difference between these two filters is that a bilateral filter takes into account
the variation of pixel intensities in order to preserve edges. The idea is that two nearby pixels that occupy
nearby spatial locations also must have some similarity in the intensity levels. To better understand this
let’s have a look in the following equation:
BF[I]p=1Wp∑q∈SGσs(∥p−q∥)Gσr(Ip−Iq)Iq
Were:
Wp=∑q∈SGσs(∥p−q∥)Gσr(Ip−Iq)
Here, the term 1Wp is a normalized weighted average of nearby pixels p and q.
Parameters σs and σr control the amount of filtering. Gσs is a spatial Gaussian function that controls the
influence of distant pixels, and Gσr is a range Gaussian function that controls the influence of pixels with
an intensity value different from the central pixel intensity Ip. So, this function makes sure that only
those pixels with similar intensities to the central pixel are considered for smoothing. Therefore, it will
preserve the edges since pixels at edges will have large intensity variation.
Now, to visualize this equation let’s have a look at the following image. On the left we have an input
image represented in 3D. We can see that it has one sharp edge. Then, we have a spatial weight and a
range weight function based on pixel intensity. Now, when we multiply range and spatial weights we
will get a combination of these weights. In that way the output image will still preserve the sharp edges
while flat areas will be smoothed.
This is our final result, and you can see that indeed we do get something similar to a cartoon or a comic
book image. Would you agree that this looks like Superman from coloured comic books?
4. Creating a cartoon effect using color quantization
Another interesting way to create a cartoon effect is by using the color quantization method. This method
will reduce the number of colours in the image and that will create a cartoon-like effect. We will perform
colour quantization by using the K-means clustering algorithm for displaying output with a limited
number of colours.
First, we need to define color_quantization() function.
def color_quantization(img, k):
# Defining input data for clustering
data = np.float32(img).reshape((-1, 3))
# Defining criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 20, 1.0)
# Applying cv2.kmeans function
ret, label, center = cv2.kmeans(data, k, None, criteria, 10, cv2.KMEANS_RANDOM_CENTERS)
center = np.uint8(center)
result = center[label.flatten()]
result = result.reshape(img.shape)
return result
Different values for K will determine the number of colours in the output picture. So, for our goal, we
will reduce the number of colours to 7. Let’s look at our results.
img_1 = color_quantization(img, 7)
cv2_imshow(img_1)
Not bad at all! Now, let’s see what we will get if we apply the median filter on this image. It will create
more homogeneous pastel-like colouring.
blurred = cv2.medianBlur(img_1, 3)
cv2_imshow(blurred)
And finally, let’s combine the image with detected edges and this blurred quantized image.
cartoon_1 = cv2.bitwise_and(blurred, blurred, mask=edges)
cv2_imshow(cartoon_1)
For better comparison let’s take a look at all our outputs.
So, there you go. You can see that our Superman looks pretty much like a cartoon superhero.
5.2 User Interface Design
Streamlit is a Python framework that lets you build web apps for data science projects very quickly.
You can easily create a user interface with various widgets, in a few lines of code. Furthermore,
Streamlit is a great tool for deploying machine learning models to the web, and adding great
visualizations of your data. Streamlit also has a powerful caching mechanism, that optimizes the
performance of your app. Furthermore, Streamlit Sharing is a service provided freely by the library
creators, that lets you easily deploy and share your app with others.
(Fig 6)
(Fig 7)
(Fig 8)
(Fig 9)
6 IMPLEMENTATION
To create a cartoon effect, we need to pay attention to two things; edge and colour palette. Those are
what make the differences between a photo and a cartoon. To adjust that two main components, there are
1. Load image
Before jumping to the main steps, don’t forget to import the required libraries in your notebook,
especially cv2 and NumPy.
import cv2
import numpy as np# required if you use Google Colab
from google.colab.patches import cv2_imshow
from google.colab import files
1. Load Image
The first main step is loading the image. Define the read_file function, which includes
26
I chose the image below to be transformed into a cartoon.
Commonly, a cartoon effect emphasizes the thickness of the edge in an image. We can detect the edge in
In that function, we transform the image into grayscale. Then, we reduce the noise of the blurred grayscale
image by using cv2.medianBlur. The larger blur value means fewer black noises appear in the image.
27
And then, apply adaptiveThreshold function, and define the line size of the edge. A larger line size
28
The main difference between a photo and a drawing — in terms of colour — is the number of distinct
colours in each of them. A drawing has fewer colours than a photo. Therefore, we use colour
Color Quantization
To do colour quantization, we apply the K-Means clustering algorithm which is provided by the OpenCV
library. To make it easier in the next steps, we can define the color_quantization function as below.
We can adjust the k value to determine the number of colours that we want to apply to the image.
total_color = 9
img = color_quantization(img, total_color)
In this case, I used 9 as the k value for the image. The result is shown below.
29
After Color Quantization
Bilateral Filter
After doing colour quantization, we can reduce the noise in the image by using a bilateral filter. It would
There are three parameters that you can adjust based on your preferences:
30
d — Diameter of each pixel neighbourhood
sigmaColor — A larger value of the parameter means larger areas of semi-equal colour.
sigmaSpace –A larger value of the parameter means that farther pixels will influence each
The final step is combining the edge mask that we created earlier, with the colour-processed image. To do
so, use the cv2.bitwise_and function.
cartoon = cv2.bitwise_and(blurred, blurred, mask=edges)
And there it is! We can see the “cartoon-version” of the original photo below.
31
Final Result
32
6.2 Working of the project:
# USAGE
# python train_mask_detector.py --dataset dataset
# partition the data into training and testing splits using 75% of
# the data for training and the remaining 25% for testing
(trainX, testX, trainY, testY) = train_test_split(data, labels,
test_size=0.20, stratify=labels, random_state=42)
# load the MobileNetV2 network, ensuring the head FC layer sets are
34
# left off
35
baseModel = MobileNetV2(weights="imagenet", include_top=False,
input_tensor=Input(shape=(224, 224, 3)))
# construct the head of the model that will be placed on top of the
# the base model
headModel = baseModel.output
headModel = AveragePooling2D(pool_size=(7, 7))(headModel)
headModel = Flatten(name="flatten")(headModel)
headModel = Dense(128, activation="relu")(headModel)
headModel = Dropout(0.5)(headModel)
headModel = Dense(2, activation="softmax")(headModel)
# place the head FC model on top of the base model (this will become
# the actual model we will train)
model = Model(inputs=baseModel.input, outputs=headModel)
# loop over all layers in the base model and freeze them so they will
# *not* be updated during the first training process
for layer in baseModel.layers:
layer.trainable = False
# for each image in the testing set we need to find the index of the
# label with corresponding largest predicted probability
predIdxs = np.argmax(predIdxs, axis=1)
36
# serialize the model to disk
print("[INFO] saving mask detector model...")
model.save(args["model"], save_format="h5")
# USAGE
# python detect_mask_image.py --image images/pic1.jpeg
# load the input image from disk, clone it, and grab the image spatial
# dimensions
image = cv2.imread(args["image"])
orig = image.copy()
(h, w) = image.shape[:2]
# pass the blob through the network and obtain the face detections
print("[INFO] computing face detections...")
net.setInput(blob)
detections = net.forward()
38
face = cv2.cvtColor(face, cv2.COLOR_BGR2RGB)
39
face = cv2.resize(face, (224, 224))
face = img_to_array(face)
face = preprocess_input(face)
face = np.expand_dims(face, axis=0)
40
def detect_and_predict_mask(frame, faceNet, maskNet):
# grab the dimensions of the frame and then construct a blob
# from it
(h, w) = frame.shape[:2]
blob = cv2.dnn.blobFromImage(frame, 1.0, (300, 300),
(104.0, 177.0, 123.0))
# pass the blob through the network and obtain the face detections
faceNet.setInput(blob)
detections = faceNet.forward()
41
locs.append((startX, startY, endX, endY))
42
# only make a predictions if at least one face was detected
if len(faces) > 0:
# for faster inference we'll make batch predictions on *all*
# faces at the same time rather than one-by-one predictions
# in the above `for` loop
faces = np.array(faces, dtype="float32")
preds = maskNet.predict(faces, batch_size=32)
# initialize the video stream and allow the camera sensor to warm up
print("[INFO] starting video stream...")
vs = VideoStream(src=0).start()
time.sleep(2.0)
43
# detect faces in the frame and determine if they are wearing a
# face mask or not
(locs, preds) = detect_and_predict_mask(frame, faceNet, maskNet)
# do a bit of cleanup
cv2.destroyAllWindows()
vs.stop()
44
7. CONCLUSIONS
In the field of face recognition CNN adopted the main method where the convolutional layers are
combined into a single layer. For its combined convolutional neural network gives higher
accuracy than the rest of other algorithms, also it quite fast compares with other algorithms.
Detection of a masked face showed a higher accuracy rate and capable of faster detecting the
mask face and without mask face of a person, which helps systematically detection of a person
over the visual detection.
One of the main reasons behind achieving this accuracy lies in MaxPooling. It provides
rudimentary translation invariance to the internal representation along with the reduction in the
number of parameters the model has to learn. This sample-based discretization process down-
samples the input representation consisting of image, by reducing its dimensionality. Number of
neurons has the optimized value of 64 which is not too high. A much higher number of neurons
and filters can lead to worse performance. The optimized filter values and pool_size help to filter
out the main portion (face) of the image to detect the existence of mask correctly without causing
over-fitting.
The system can efficiently detect partially occluded faces either with a mask or hair or hand. It
considers the occlusion degree of four regions – nose, mouth, chin and eye to differentiate
between annotated mask or face covered by hand. Therefore, a mask covering the face fully
including nose and chin will only be treated as “with mask” by the model.
The main challenges faced by the method mainly comprise of varying angles and lack of clarity.
Indistinct moving faces in the video stream make it more difficult. However, following the
trajectories of several frames of the video helps to create a better decision – “with mask” or
“without mask”.
Appendix
45
46
(Fig 13)
(Fig 14)
47
(Fig 15)
(Fig 16)
48
References
[1] Ariya Das, Mohammad Wasif Ansari, Rohini Basak, Covid-19 Face Mask Detection Using
TensorFlow, Keras and OpenCV, 05th February 2021[Date added to IEEE Explore] DOI:
10.1109/INDICON49873.2020.9342585.
[2] Md. Shahriar Islam, Eimdadul Haque Moon,Md. Ashikujjaman Shaikat, Mohammad Jahangir
Alam, A Novel Approach to Detect Face Mask using CNN, 18 January 2021 [Date added to IEEE
Explore], 10.1109/ICISS49785.2020.9315927.
[3] S. Feng C. Shen N. Xia W. Song M. Fan and B. J "Cowling Rational use of face masks in the
COVID-19 pandemic" Lancet Respirat. Med. vol. 8 no. 5 pp. 434-436 2020.
[4] B. Suvarnamukhi and M. Seshashayee "Big Data Concepts and Techniques in Data Processing"
International Journal of Computer Sciences and Engineering vol. 6 no. 10 pp. 712-714 2018.
[5] C. Kanan and G. Cottrell "Color-to-Grayscale: Does the Method Matter in Image Recognition?"
PLoS ONE vol. 7 no. 1 pp. e29740 2019.
Acknowledgements
We express our deep sense of gratitude to our project guide Ms. Varsha Nagpurkar for encouraging us
and guiding us throughout this project. We were able to successfully complete this project with the
help of her deep insights into the subject and constant help.
We are very much thankful to Dr.Kavita Sonawane, HOD of the Computer department at St. Francis
Institute of Technology for providing us with the opportunity of undertaking this project which has led
to us learning so much in the domain of Machine Learning.
Last but not the least we would like to thank all our peers who greatly contributed to the completion of
this project with their constant support and help.
49
List of Figures
50
List of Abbreviations
51