Floor Plan Generation Using GAN

Download as pdf or txt
Download as pdf or txt
You are on page 1of 144

TRIBHUVAN UNIVERSITY

INSTITUTE OF ENGINEERING

Khwopa College of Engineering


Libali, Bhaktapur
Department of Computer Engineering

A FINAL REPORT ON
FLOOR PLAN GENERATION USING GAN

Submitted in partial fulfillment of the requirements for the degree

BACHELOR OF COMPUTER ENGINEERING

Submitted by
Anusha Bajracharya KCE074BCT011
Luja Shakya KCE074BCT022
Niranjan Bekoju KCE074BCT025
Sunil Banmala KCE074BCT045

Under the Supervision of


Er. Aayush Adhikari

Khwopa College Of Engineering


Libali, Bhaktapur
April 11, 2022
CERTIFICATE

This is to certify that this major project work entitled “Floor Plan Genera-
tion using GAN” submitted by Anusha Bajracharya (KCE074BCT011), Luja
Shakya (KCE074BCT022), Niranjan Bekoju (KCE074BCT025) and Sunil Ban-
mala (KCE074BCT045) has been examined and accepted as the partial fulfillment
of the requirements for degree of Bachelor in Computer Engineering.

.......................................... ..........................................
Er. Anil Verma Er. Aayush Adhikari
External Examiner Project Supervisor
Assistant Professor CTO
Dept. of Electronics and Computer Deepmind Creations
IOE, Pulchowk

..........................................
Er. Dinesh Gothe
Head of Department,
Department of Computer Engineering
Khwopa College of Engineering

i
Copyright
The author has agreed that the library, Khwopa College of Engineering may make
this report freely available for inspection. Moreover, the author has agreed that
permission for the extensive copying of this project report for scholarly purpose
may be granted by supervisor who supervised the project work recorded here in
or, in absence the Head of The Department where in the project report was done.
It is understood that the recognition will be given to the author of the report and
to Department of Computer Engineering, KhCE in any use of the material of this
project report. Copying or publication or other use of this report for financial gain
without approval of the department and author’s written permission is prohibited.
Request for the permission to copy or to make any other use of material in this
report in whole or in part should be addressed to:

Head of Department
Department of Computer Engineering
Khwopa College of Engineering
Liwali,
Bhaktapur, Nepal

ii
Acknowledgement
We take this opportunity to express our deepest and sincere gratitude to our HoD
Er.Dinesh Gothe, for his insightful advice, motivating suggestions for this project
and also for his constant encouragement and advice throughout our Bachelor’s
program.

Also, we would like to thank Er. Bindu Bhandari for providing valuable sug-
gestions and for supporting the project.

Anusha Bajracharya KCE074BCT011


Luja Shakya KCE074BCT022
Niranjan Bekoju KCE074BCT025
Sunil Banmala KCE074BCT045

iii
Abstract
Whenever landowner wants to build a house, he needs to prepare design (floor
plan) of the house. He needs to decide where the main entrace, opening will be,
how is he going to split the room, what portion of buildings will be seperated for
bedroom, kitchen, bathroom etc. These are general questions that hits in mind.
In order to solve these queries, he consult an architect. Architect would use dif-
ferent planning tools to generate the plan of the building. Initially, it would be
difficult for an architect to make plan. So, Floor Plan Generation using GAN that
would produce conceptual floor plan that best suits parcel of the land to provide
a vision that can help architects was introduced. Architects would be able to
choose among generated plan and then modify accordingly. This method would
be relatively easier than directly generating plan from scratch. Moreover, to gen-
erate the plan, the system will get parcel of the land from architect, mapped it to
footprint, room split and finally furnished room. The system will use conditional
GAN for generation. It will also generate the 3D model of generated floor plan.
Here, datasets for training with 55.3% accuracy for parcel and footprint and done
manually for remaining. Similarly 98.27% accurately prepared furnished datasets
using template matching and parameter tuning was prepared. The gan model
image was generated with 1.6629 ± 0.1558 Inception Score for footprint, 2.0637 ±
0.1436 Inception Score for roomsplit, 1.7543 ± 0.0949 Inception Score for room-
split. And the corresponding FID score are 99.148, 55.375, 65.957 respectively.

Keywords: Condtional GAN, U-Net architecuture, 3D model generation

iv
Contents
Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
List of Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv

1 Introduction 1
1.1 Background Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Goals and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Scope and Applications . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Literature Review 3
2.1 AI + Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 House-GAN: Relational Generative Adversarial Networks for Graph-
constrained House Layout Generation . . . . . . . . . . . . . . . . . 3
2.3 Intelligent Home 3D: Automatic 3D-House Design from Linguistic
Descriptions Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 U-Net: Convolutional Networks for Biomedical Image Segmentation 4
2.5 Double U-Net: A Deep Convolutional Neural Network for Medical
Image Segmentatio . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.6 A U-Net Based Discriminator for Generative Adversarial Networks . 5
2.7 Image-to-Image Translation with Conditional Adversarial Networks 5
2.8 Unpaired Image-to-Image Translation using Cycle-Consistent Ad-
versarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Momentum Batch Normalization for Deep Learning with Small
Batch Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10 Plan2Scene: Converting Floorplans to 3D Scenes . . . . . . . . . . 6
2.11 Pixels, voxels, and views: A study of shape representations for
single view 3D object shape prediction . . . . . . . . . . . . . . . . 6
2.12 Raster-to-Vector: Revisiting Floorplan Transformation . . . . . . . 7
2.13 SUGAMAN: Describing Floor Plans for Visually Impaired by An-
notation Learning and Proximity based Grammar . . . . . . . . . . 7
2.14 Learning a Probabilistic Latent Space of Object Shapes via 3D
Generative-Adversarial Modeling . . . . . . . . . . . . . . . . . . . 7
2.15 Interactive 3D Modeling with a Generative Adversarial Network . . 8
2.16 Learning Shape Priors for Single-View 3D Completion and Recon-
struction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.17 A Colour Alphabet and the Limits of Colour Coding . . . . . . . . 8
2.18 Fully Convolutional Networks for Semantic Segmentation . . . . . . 9
2.19 Semantic Segmentation using Adversarial Networks . . . . . . . . . 9

v
2.20 Parsing Floor Plan Images . . . . . . . . . . . . . . . . . . . . . . . 9
2.21 The Rendering Equation . . . . . . . . . . . . . . . . . . . . . . . . 10
2.22 Improved Techniques for Training GANs . . . . . . . . . . . . . . . 10
2.23 GANs Trained by a Two Time-Scale Update Rule Converge to a
Local Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Requirement Analysis 11
3.1 Software Requirement . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Hardware Requirement . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Functional Requirement . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Non-Functional Requirement . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.2 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Feasibility Study 13
4.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Operational Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Time Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

5 Methodology 14
5.1 Agile methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1 Scrum Framework . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Workload by Project Members . . . . . . . . . . . . . . . . . . . . . 17

6 System (or Project) Design and Architecture 18


6.1 System Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2 Generation of Floor Plans . . . . . . . . . . . . . . . . . . . . . . . 19
6.2.1 Foot-print generation . . . . . . . . . . . . . . . . . . . . . . 19
6.2.2 Room Split generation . . . . . . . . . . . . . . . . . . . . . 19
6.2.3 Furnishing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 Qualifying Metrices for Floor Plans . . . . . . . . . . . . . . . . . . 23
6.3.1 Footprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3.2 Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.3.3 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
6.3.4 Thickness & Texture . . . . . . . . . . . . . . . . . . . . . . 26
6.3.5 Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.3.6 Circulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.4 GAN used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.4.1 pix2pix GAN (Paired Image Translation) . . . . . . . . . . . 29
6.5 Datasets and Color Code . . . . . . . . . . . . . . . . . . . . . . . . 30
6.5.1 Types of Room . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.5.2 Types of Furniture . . . . . . . . . . . . . . . . . . . . . . . 32
6.5.3 Color Coding . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.6 HSV Color Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.7 Chain Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.8 Template Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

vi
6.9 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10 Morphological Operation . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10.1 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10.2 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.3 Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.4 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.5 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.11 Canny Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . 39

7 Experiments 40
7.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Color Coding to Dataset . . . . . . . . . . . . . . . . . . . . . . . . 40
7.3 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.3.1 Augmented Dataset Creation . . . . . . . . . . . . . . . . . 42
7.3.2 Parcel and Footprint Generation . . . . . . . . . . . . . . . . 43
7.3.3 Algorithm for furnished generation using template matching 46
7.4 Dataset Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4.1 Footprint Qualify . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4.2 Program Qualify . . . . . . . . . . . . . . . . . . . . . . . . 48
7.4.3 Orientation Qualify . . . . . . . . . . . . . . . . . . . . . . . 49
7.5 Parcel Generation from Cadestrial . . . . . . . . . . . . . . . . . . . 49
7.6 GAN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.3 GAN Architectures . . . . . . . . . . . . . . . . . . . . . . . 52
7.7 Condition for the Training . . . . . . . . . . . . . . . . . . . . . . . 52
7.8 Model Comparision between different types of architecture . . . . . 52
7.9 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.9.1 Loading the dataset and preparing for training . . . . . . . . 52
7.9.2 Generator Model Architecture U-net . . . . . . . . . . . . . 54
7.9.3 Generator Model Architecture U-net summary . . . . . . . . 55
7.9.4 Defining the generator loss . . . . . . . . . . . . . . . . . . . 56
7.9.5 Training Procedure for the generator . . . . . . . . . . . . . 57
7.9.6 Discriminator Model Architecture . . . . . . . . . . . . . . . 58
7.9.7 Discriminator Model Summary . . . . . . . . . . . . . . . . 59
7.9.8 Defining the Discriminator loss . . . . . . . . . . . . . . . . 60
7.9.9 Training procedure for Discriminator . . . . . . . . . . . . . 61
7.9.10 Generator Model Architecture for Triple-U-net brief . . . . . 62
7.10 Generator Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.10.1 Inception Score . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.10.2 FID Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.11 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.11.1 Sigmoid Activation Function . . . . . . . . . . . . . . . . . . 63
7.11.2 ReLU Activation Function . . . . . . . . . . . . . . . . . . . 64
7.11.3 Leaky ReLU Activation Function . . . . . . . . . . . . . . . 64
7.11.4 Softmax Activation Function . . . . . . . . . . . . . . . . . . 64
7.11.5 Tanh Activation Function . . . . . . . . . . . . . . . . . . . 65
7.12 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

vii
7.12.1 Binary Cross Entropy . . . . . . . . . . . . . . . . . . . . . 65
7.12.2 Categorical Cross Entropy . . . . . . . . . . . . . . . . . . . 65
7.12.3 Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . 66
7.12.4 Mean Absolute Error . . . . . . . . . . . . . . . . . . . . . . 66
7.13 Optimization Function . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.13.1 Bracketing Algorithm . . . . . . . . . . . . . . . . . . . . . . 67
7.13.2 Local Descent Algorithm . . . . . . . . . . . . . . . . . . . . 67
7.13.3 First Order Algorithm . . . . . . . . . . . . . . . . . . . . . 67
7.13.4 Second Order Algorithm . . . . . . . . . . . . . . . . . . . . 69
7.14 3D Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.15 Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.15.1 Same padding over valid padding . . . . . . . . . . . . . . . 72
7.15.2 Convolution Layer follows Transpose Convolution . . . . . . 74
7.16 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.17 Batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.17.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.17.2 These things are considered while using batch normalization 75
7.18 Skip connection in U-net architecture . . . . . . . . . . . . . . . . . 76
7.18.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.18.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.19 Upsampling of an image . . . . . . . . . . . . . . . . . . . . . . . . 76
7.19.1 Upsampling by Nearest Neighbour Method . . . . . . . . . . 76
7.19.2 Upsampling by Bi-linear Interpolation . . . . . . . . . . . . 77
7.20 Inference Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.20.1 Use case diagram of FPGAN . . . . . . . . . . . . . . . . . . 78
7.20.2 Pre-processing of Cadestral Map to Generate Parcel . . . . . 79
7.20.3 Step by step Generation . . . . . . . . . . . . . . . . . . . . 80
7.20.4 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . 81
7.20.5 Wall Segmentation and 3D Generation . . . . . . . . . . . . 82
7.21 Issue: Footprint area are generated outside the Parcel area . . . . . 83

8 Expected Outcomes 85

9 Actual Outcome 87
9.1 Review of ROBIN datasets . . . . . . . . . . . . . . . . . . . . . . . 87
9.2 Accuracy of Automatic Generated Parcel and Footprint . . . . . . . 89
9.3 Parameter Tuning for Template Matching for Furnished Datasets . 90
9.4 Templates for template matching . . . . . . . . . . . . . . . . . . . 90
9.5 Accuracy of furnished datasets . . . . . . . . . . . . . . . . . . . . . 91
9.6 Inception Score of prepared datasets . . . . . . . . . . . . . . . . . 91
9.7 Orientation of the Prepared Datasets . . . . . . . . . . . . . . . . . 91
9.8 Footprint of the Footprint Datasets . . . . . . . . . . . . . . . . . . 91
9.9 Program of the Roomsplit Datasets . . . . . . . . . . . . . . . . . . 92
9.10 Inception Score of Generated Image using U-net . . . . . . . . . . . 92
9.11 Inception Score of Generated Image using Triple U-net . . . . . . . 92
9.12 Comparision of Inception Score and Interpretation . . . . . . . . . . 93
9.13 FID Score of Generated Image using U-net . . . . . . . . . . . . . . 93
9.14 FID score of generated images using triple U-Net . . . . . . . . . . 94

viii
9.15 Unit Testing of Models . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.15.1 U-net Based Models . . . . . . . . . . . . . . . . . . . . . . 95
9.15.2 Triple U-net Based Models . . . . . . . . . . . . . . . . . . . 98
9.16 Integration Testing of Models . . . . . . . . . . . . . . . . . . . . . 101
9.16.1 U-net Based Models . . . . . . . . . . . . . . . . . . . . . . 101
9.16.2 Triple U-net Based Models . . . . . . . . . . . . . . . . . . . 104
9.17 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.18 Wall Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.19 3D Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

10 Conclusion and Future Enhancements 110

Bibliography 113

Appendix 114
A Mockup Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . 114
B Expected Outcome Screenshots . . . . . . . . . . . . . . . . . . . . 114
B.1 Splash Screen . . . . . . . . . . . . . . . . . . . . . . . . . . 114
B.2 Main Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
B.3 Manual image file with appropriate scale . . . . . . . . . . . 115
B.4 Manual Map upload from Malpot and area marking . . . . . 116
B.5 Free drawing shape for concept design . . . . . . . . . . . . 116
B.6 Constraint given to plan . . . . . . . . . . . . . . . . . . . . 117
B.7 Choose and proceed GAN design . . . . . . . . . . . . . . . 117
B.8 Generate 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
C GAN with U-net Generator and U-net Discriminator . . . . . . . . 118
D Qualification Metrices . . . . . . . . . . . . . . . . . . . . . . . . . 119
D.1 Orientation of the Prepared Datasets . . . . . . . . . . . . . 119
D.2 Footprint of the Prepared footprint Datasets . . . . . . . . . 120
D.3 Program of the Prepared roomsplit Datasets . . . . . . . . . 121
E Actual OutCome Screenshots . . . . . . . . . . . . . . . . . . . . . 122
E.1 Get Started Page . . . . . . . . . . . . . . . . . . . . . . . . 122
E.2 Upload Cadastral Map Page . . . . . . . . . . . . . . . . . . 122
E.3 After Uploading Cadastral Map . . . . . . . . . . . . . . . . 123
E.4 Preprocessing Cadastral Map for Parcel . . . . . . . . . . . . 123
E.5 Display Parcel and Choosing Model . . . . . . . . . . . . . . 124
E.6 Parcel to Footprint Generation . . . . . . . . . . . . . . . . 124
E.7 Footprint to Roomsplit Generation . . . . . . . . . . . . . . 125
E.8 Roomsplit to Furnished Generation . . . . . . . . . . . . . . 125
E.9 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . 126
E.10 Wall Segmentation for 3D Generation . . . . . . . . . . . . . 126
E.11 3D Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
E.12 Complete Result . . . . . . . . . . . . . . . . . . . . . . . . 128

ix
List of Figures
5.1 Agile methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Scrum framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5 Workload of the project by project members . . . . . . . . . . . . . 17

6.1 Block Diagram of Floor Plan Generation . . . . . . . . . . . . . . . 18


6.2 Generation of footprint . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.3 Room Split from footprint . . . . . . . . . . . . . . . . . . . . . . . 20
6.4 Free Plan Generation . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.5 Program Specific Generation . . . . . . . . . . . . . . . . . . . . . . 21
6.6 Structure Specific Generation . . . . . . . . . . . . . . . . . . . . . 22
6.7 Furnishing the splited room . . . . . . . . . . . . . . . . . . . . . . 22
6.8 Footprint metrics of building footprint . . . . . . . . . . . . . . . . 23
6.9 Query input to the Progam . . . . . . . . . . . . . . . . . . . . . . 24
6.10 Result of the Progam . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6.11 Program of the generated floorplans . . . . . . . . . . . . . . . . . . 24
6.12 Orientation of the floor plan . . . . . . . . . . . . . . . . . . . . . . 25
6.13 Beaux Arts Hall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.14 Villa Style Building and floor plan . . . . . . . . . . . . . . . . . . . 26
6.15 Thickness and Texture of the floorplan . . . . . . . . . . . . . . . . 27
6.16 Connectivity of room in floor plan . . . . . . . . . . . . . . . . . . . 27
6.17 Circulation of room in floor plan . . . . . . . . . . . . . . . . . . . . 28
6.18 Image to Image Translation using Conditional GAN . . . . . . . . . 29
6.19 GAN Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.20 Architecture of the generator . . . . . . . . . . . . . . . . . . . . . . 30
6.21 Data set from ROBIN Dataset . . . . . . . . . . . . . . . . . . . . . 31
6.22 CVC-FP data set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
6.23 Types of Furniture and corresponding symbol . . . . . . . . . . . . 32
6.24 Kelly’s 22 Color of Maximum Contrast . . . . . . . . . . . . . . . . 33
6.26 HSV Color cone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.28 Chain code directions . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.29 Templates example . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.30 Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.31 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.32 Dilate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.33 Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.34 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.35 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.36 The original image . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.37 Image has been reduced to grayscale, and a 5 * 5 Gaussian filter
with σ = 1.4 has been applied . . . . . . . . . . . . . . . . . . . . . 39
6.38 The intensity gradient of the previous image. The edges of the
image have been handled by replicating. . . . . . . . . . . . . . . . 39
6.39 Non- Maximum supression applied to the previous image. . . . . . . 39
6.40 Double Thresholding applied to the previousimage. . . . . . . . . . 39

x
6.41 Hysteresis applied to the previous image. . . . . . . . . . . . . . . . 39

7.1 Overall system’s architecture being followed. . . . . . . . . . . . . . 40


7.2 Required Paired image of parcel and required footprint . . . . . . . 42
7.3 Principal to generate Paired image of parcel and required footprint 42
7.4 Original dataset from ROBIN Floor Plan datasets. . . . . . . . . . 43
7.5 Augmented Dataset with padding and 30 degree clockwise rotation
placed in a square frame, resized to 512*512. . . . . . . . . . . . . . 43
7.6 Dataset with border area around. . . . . . . . . . . . . . . . . . . . 43
7.7 Final dataset rotated back to normal for further operation of dila-
tion and erosin. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.8 Result when rotated final image is dilated by stride of 5*5. . . . . . 44
7.9 Result when dilated image is eroded by stride of 5*5. . . . . . . . . 44
7.10 Parcel image of the final datset. . . . . . . . . . . . . . . . . . . . . 44
7.11 Difference of final dataset and parcel. . . . . . . . . . . . . . . . . . 45
7.12 Result when difference image is dilated by stride of 2*2 then erroded
by stride of 127*127. . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7.13 Dilated image is eroded back by stride of 125*125. . . . . . . . . . . 45
7.14 Dilated image is thresholded to get black and white print. . . . . . 45
7.15 Footprint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.16 parcel added to footprint . . . . . . . . . . . . . . . . . . . . . . . . 46
7.17 Paired image of parcel and required footprint. . . . . . . . . . . . . 46
7.18 Block Diagram of Template Matching for furnished generation . . . 47
7.19 Block Diagram of Footprint Qualify . . . . . . . . . . . . . . . . . . 48
7.20 Block Diagram of Program Qualify . . . . . . . . . . . . . . . . . . 48
7.21 Block Diagram of Orientation Qualify . . . . . . . . . . . . . . . . . 49
7.22 Cadestrial map from Land Revenue Offices . . . . . . . . . . . . . . 50
7.23 Area of land drawn over cadestrial map. . . . . . . . . . . . . . . . 50
7.24 Area of land selected by the user. . . . . . . . . . . . . . . . . . . . 50
7.25 parcel area formed by placing selected area over background. . . . . 50
7.26 U-net Architecture used for Generator . . . . . . . . . . . . . . . . 54
7.27 Generator Training Procedure . . . . . . . . . . . . . . . . . . . . . 57
7.28 Discriminator Architecture . . . . . . . . . . . . . . . . . . . . . . . 58
7.29 Discriminator Training Procedure . . . . . . . . . . . . . . . . . . . 61
7.30 Triple U-net Architecture . . . . . . . . . . . . . . . . . . . . . . . . 62
7.31 Segmented image required for 3D plotting. . . . . . . . . . . . . . . 70
7.32 Graylevel slicing approach . . . . . . . . . . . . . . . . . . . . . . . 71
7.33 3D Model Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.34 Convolutional layer convolved feature . . . . . . . . . . . . . . . . . 71
7.35 Convolution of three channel image . . . . . . . . . . . . . . . . . . 72
7.36 Convolution using valid padding . . . . . . . . . . . . . . . . . . . . 72
7.37 Convolution using same padding . . . . . . . . . . . . . . . . . . . . 73
7.38 Transpose Convolution . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.39 checkerboard Effect due to Transpose Convolution . . . . . . . . . . 74
7.41 Pooling layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.42 Nearest neighbour upsampling . . . . . . . . . . . . . . . . . . . . . 77
7.43 Bi-linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.44 Use case diagram of FPGAN . . . . . . . . . . . . . . . . . . . . . . 78

xi
7.45 Sequence diagram of Pre-processing of Cadestral Map to Generate
Parcel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.46 Sequence diagram of Parcel to Furnished Generation . . . . . . . . 80
7.47 Algorithm for Furniture Mapping . . . . . . . . . . . . . . . . . . . 81
7.48 Wall Segmentation and 3D Generation . . . . . . . . . . . . . . . . 82
7.49 Bordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.50 No bordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.51 Parcel Line width 3 case#1 . . . . . . . . . . . . . . . . . . . . . . 84
7.52 Parcel Line width 3 case#2 . . . . . . . . . . . . . . . . . . . . . . 84
7.53 Parcel Line width 5 case#1 . . . . . . . . . . . . . . . . . . . . . . 84
7.54 Parcel Line width 3 case#2 . . . . . . . . . . . . . . . . . . . . . . 84

9.1 Templates for template matching . . . . . . . . . . . . . . . . . . . 90


9.2 Unit testing for U-net based footprint generation model #1 . . . . . 95
9.3 Unit testing for U-net based footprint generation model #2 . . . . . 95
9.4 Unit testing for U-net based footprint generation model #3 . . . . . 95
9.5 Unit testing for U-net based roomsplit generation model #1 . . . . 96
9.6 Unit testing for U-net based roomsplit generation model #2 . . . . 96
9.7 Unit testing for U-net based roomsplit generation model #3 . . . . 96
9.8 Unit testing for U-net based furnished generation model #1 . . . . 97
9.9 Unit testing for U-net based furnished generation model #2 . . . . 97
9.10 Unit testing for U-net based furnished generation model #3 . . . . 97
9.11 Unit testing for Triple U-net based footprint generation model #1 . 98
9.12 Unit testing for Triple U-net based footprint generation model #2 . 98
9.13 Unit testing for Triple U-net based footprint generation model #3 . 98
9.14 Unit testing for Triple U-net based roomsplit generation model #1 . 99
9.15 Unit testing for Triple U-net based roomsplit generation model #2 . 99
9.16 Unit testing for Triple U-net based roomsplit generation model #3 . 99
9.17 Unit testing for Triple U-net based furnished generation model #1 . 100
9.18 Unit testing for Triple U-net based furnished generation model #2 . 100
9.19 Unit testing for Triple U-net based furnished generation model #3 . 100
9.20 Integration testing for U-net based footprint generation model #1 . 101
9.21 Integration testing for U-net based roomsplit generation model #1 . 101
9.22 Integration testing for U-net based furnished generation model #1 . 101
9.23 Integration testing for U-net based footprint generation model #2 . 102
9.24 Integration testing for U-net based roomsplit generation model #2 . 102
9.25 Integration testing for U-net based furnished generation model #2 . 102
9.26 Integration testing for U-net based footprint generation model #3 . 103
9.27 Integration testing for U-net based roomsplit generation model #3 . 103
9.28 Integration testing for U-net based furnished generation model #3 . 103
9.29 Integration testing for Triple U-net based footprint generation model
#1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.30 Integration testing for Triple U-net based roomsplit generation model
#1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.31 Integration testing for Triple U-net based furnished generation model
#1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9.32 Integration testing for Triple U-net based footprint generation model
#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

xii
9.33 Integration testing for Triple U-net based roomsplit generation model
#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.34 Integration testing for Triple U-net based furnished generation model
#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.35 Integration testing for Triple U-net based footprint generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.36 Integration testing for Triple U-net based roomsplit generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.37 Integration testing for Triple U-net based furnished generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.38 Furniture mapping#1 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.39 Furniture mapping#2 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.40 Furniture mapping#3 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.41 Wall segmentation of generated roomsplit #1 . . . . . . . . . . . . 108
9.42 Wall segmentation of generated roomsplit #2 . . . . . . . . . . . . 108
9.43 Wall segmentation of generated roomsplit #3 . . . . . . . . . . . . 108
9.44 Wall segmentation of generated roomsplit #1 . . . . . . . . . . . . 109
9.45 Wall segmentation of generated roomsplit #2 . . . . . . . . . . . . 109
9.46 Wall segmentation of generated roomsplit #3 . . . . . . . . . . . . 109

xiii
List of Tables
5.1 Sprint Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.1 Chain Code mapping . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.1 Color coding to datasets . . . . . . . . . . . . . . . . . . . . . . . . 41


7.2 Gray level format in segmented image for 3D generation. . . . . . . 70

8.1 Inception score of each model for each pipeline steps . . . . . . . . . 86

9.1 Different types of footprint shape . . . . . . . . . . . . . . . . . . . 87


9.2 Accuracy of Automatic Generated Parcel and Footprint . . . . . . . 89
9.3 Template Matching for Furnished Datasets . . . . . . . . . . . . . . 90
9.4 Accuracy of furnished datasets . . . . . . . . . . . . . . . . . . . . . 91
9.5 Inception Score of Prepared Datasets . . . . . . . . . . . . . . . . . 91
9.6 Inception Score of Generated Image using U-net after 200K steps
training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.7 Inception Score of Generated Image using U-net after 400K steps
training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.8 Inception Score of Generated Image using Triple U-net . . . . . . . 92
9.9 Inception Score of Generated Image using U-net after 400k steps
training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.10 FID Score of Generated Image using U-net . . . . . . . . . . . . . 93
9.11 FID Score of Generated Image using Triple U-net . . . . . . . . . . 94

xiv
List of Abbreviation

Abbreviations Meaning
AI Artificial Intilligence
ASPP Atrous Spatial Pyramid Pooling
cGAN Conditional Generative Adversarial Network
CV Computer Vision
CVC-FP Computer Center Vision - Floor Plan
DNN Deep Neural Network
FID Frechet Inception Distance
FCN Fully Convolutional Network
GAN Generative Adversial Network
GC-LPN Graph Conditioned - Layout Prediction Network
ISCC-NBS Inter-Society Color Council-National Bureau of Standards
LCT-GAN Language Conditioned Textures - Generative Adversarial Network
LOFD Local Orientation and Frequency Description
OCR Optical Character Recognition
R-FP Request for Proposal
ROBIN Repository Of BuildIng plaNs
SUGAMAN Supervised and Unified framework using Grammar and
Annotation Model for Access and Navigation
VAE-GAN Variational Autoencoder- Generative Adversarial Network
VCN Volumetric Convolutional Network
VGG net Visual Geometry Group Network

xv
Chapter 1
Introduction
1.1 Background Introduction
During the early phases of conceptual design in architecture, designers often look
for past references in collections of printed or digitally created floor plans in order
to stimulate creativity, inspiration and assess the building design. The design of
sustainable architecture is also the concern for many architects. Some designers
are concerned about architecture hurdles that can occur using traditional ap-
proach.Given that the point of AI is to create machines or programs capable of
self-direction and learning, this concern is logical.

However, most experts agree AI has the potential to make architecture easier,
more efficient, and even more secure. The most obvious way to support these
phases with computer-aided means is to provide a retrieval method (e.g., in the
form of a specific software solution) that is able to find similar references in a
collection of previously created building designs.

So, an approach of providing real-world experience with proposed building model


without having to break ground was introduced. This not only reduce building
costs but also identify potential risk factors, decrease safety hazards and the delays
that might occur.

1.2 Motivation
Apartment layout is a challenging yet fundamental task for any architect. Know-
ing how to place rooms, decide their size, find the relevant adjacencies among
them, while defining relevant typologies are key concerns that any drafter takes
into account while designing floor plans. While creating new designs, architects
usually go through past designs and the data prepared throughout the making of
the building. Similarly, making building calculations and environmental analysis is
not a simple task if done manually. This leads to wastage of time,money and effort.

So,our approach is instead of investing a lot of time and energy to create something
new, making a computer able to analyze the data in a short time period which
will give recommendations accordingly. With this, an architect will be able to do
testing and research simultaneously and sometimes even without pen and paper.

1
1.3 Problem Definition
Constructing a building is not a one-day task as it needs a lot of pre-planning.
However, this pre-planning is not enough sometimes, and a little bit of more effort
to get an architect’s opinion to life is required.The countless hours of research
at the starting of any new project architectural design is time consuming.About
seven percent of the world’s labor force is in the construction industry, yet it has
traditionally been one of the least technologically advanced industries. From this
it can be conclude that even by using autonomous or semi-autonomous construc-
tion machinery to help with excavation and prep work for building architectural
design is not enough for sustainable development in architectural field.

Computers excel at solving problems with clear answers; crunching data and do-
ing repetitive tasks, which frees up time for humans to be creative and work on
more open-ended problems—and there’s no shortage of those in architecture de-
sign. AI can lead to the organizations or the clients to revert to computers for
masterplans and construction. AI explore better building efficiency and even walk
clients through a structure before it’s built.

1.4 Goals and Objectives


The main objective of this project are:

• To generate highly diverse quantity of conceptual floor plan designs

1.5 Scope and Applications


The major applications of this project are:

• To make architectural designing easier and efficient

• To explore better building efficiency using AI for masterplan and construc-


tion

2
Chapter 2
Literature Review
2.1 AI + Architecture
In this thesis [1], Stanislas Chaillou offered promising results to generate, qual-
ify and allow users to browse through generated floor plans design options. For
qualifying floor plans, he used six metrics proposing a framework that captures
architecturally relevant parameters of floor plans. On one hand, footprint, orien-
tation, thickness & texture are three metrics capturing the essence of a given floor
plan’s style. On the other hand, program, connectivity, and circulation are meant
to depict the essence of any floor plan organization. For generation of architecture
floor plan, he proposed three steps pipeline which are:

a. Parcel to building Footprint


b. Footprint to Room Split
c. Room Split to Furnished

Each step has been carefully engineered and trained with pix2pix GAN. An ex-
tensive database of Boston’s building footprints having dataset of around 700+
annotated floor plans, training a broad array of models was successful. To further
refine the output quality, an array of additional models, for each room type (living
room, bedroom, kitchen, etc. . . ) was trained using a color code for each furniture
type.

2.2 House-GAN: Relational Generative Adver-


sarial Networks for Graph-constrained House
Layout Generation
This proposed architecture by Nelson Nauata, et.al [2] introduced a new house
layout generation problem, which includes task to take an architectural constraint
as a graph (i.e., the number and types of rooms with their spatial adjacency) and
produce a set of axis-aligned bounding boxes of rooms. It makes use of LIFULL
HOME’s database which offers five million real floorplans, from which retrieved
117,587 are used. It uses bubble diagram for illustrating the number of rooms
with their types and connections in a graph where

a. nodes encode rooms with their room types


b. edges encode their spatial adjacency.

Then, generates a diverse set of realistic and compatible house layouts as output.

3
2.3 Intelligent Home 3D: Automatic 3D-House
Design from Linguistic Descriptions Only
Architect design homes by collecting a list of requirements and then generate the
layout of the house using trial and error approach. It takes a lot of time, so for time
saving and allowing people without expertise to participate in the design, Chen
Qi, et.al proposed this new model [3] which consists of following components:

a. Stanford scene graph parser


b. GC-LPN (Graph Conditioned - Layout Prediction Network) based on Graph
Convolutional Network
c. Floor plan post processing
d. LCT-GAN (Language Conditioned Textures - Generative Adversarial Net-
work)
e. 3D scene generation and rendering

These components work sequentially to generates 3D house models automatically


using linguistic description.

2.4 U-Net: Convolutional Networks for Biomed-


ical Image Segmentation
U-Net [4] is a convolutional neural network for fast and precise segmentation of im-
ages. U-Net, evolved from the traditional convolutional neural network, was first
designed and applied in 2015 to process biomedical images. Olaf Ronneberger,
et.al proposed a network that is based on the fully convolutional network and ex-
tended to work with fewer training images after modification to yield more precise
segmentations. The main idea is to supplement a usual contracting network by
successive layers, where pooling operations are replaced by upsampling operators.
Hence, these layers increase the resolution of the output and successive convolu-
tional layer can then learn to assemble a precise output based on this information.

2.5 Double U-Net: A Deep Convolutional Neu-


ral Network for Medical Image Segmentatio
To improve the performance of U-Net on various segmentation tasks, a novel
architecture called Double U-Net [5] was proposed by Debesh Jha, et.al. Double
U-Net consists of a combination of two intertwined U-Nets which includes pre-
trained VGG-19 as encoder in the first U-Net and non-pretained custom decoder
block. Simlarly, it includes Atrous Spatial Pyramid Pooling which is employed to
capture contextual information within the network along with skip connections.
To reduce redundant information, squeeze and excitation blocks are used that also
improves contextual information capture.

4
2.6 A U-Net Based Discriminator for Genera-
tive Adversarial Networks
One of the major challenges to GAN is the capacity to synthesize globally and
locally coherent images with object shapes and textures indistinguishable from
real images. So, to overcome that U-net based discriminator architecture [6] was
introduced by Edgar Schonfeld, et.al where discriminator outputs global and lo-
cal decision of the image belonging to either real or fake else. This discriminator
architecture is created using an encoder decoder network aka U-Net. Here, the
encoder downsamples the image and then capture global image context. Consecu-
tively, decoder upsample and produce matching output resolution along with skip
connections.

2.7 Image-to-Image Translation with Conditional


Adversarial Networks
In 2016, Phillip Isola, et al. [7] published their paper Image-to-Image Translation
with Conditional Adversarial Networks. The paper describes the inner workings
of a Conditional Generative Adversarial Network (cGAN), that goes by the name
of pix2pix, designed for general purpose image-to-image translation. The model
is trained in standard two step way. First, the generator gets image mask as an
input and outputs single image which is feed into discriminator along with the
input mask and it outputs prediction whether this image is real or fake. They
computed loss of the generator using mean square of one minus this prediction.
The discriminator is also conditioned on the input, thus they first feed it, image
mask concatenated with generated image and then again use mean squared error
loss of this predicted output and same sized tensor of zeros.

2.8 Unpaired Image-to-Image Translation using


Cycle-Consistent Adversarial Networks
In this paper, Jun-Yan Zhu, et al. [8] presented a method that can learn to capture
special characteristics of one image collection and figure out how these character-
istics could be translated into the other image collection, all in the absence of any
paired training examples. In addition to the generator and discriminator loss, it
involves one more type of loss given by cyclic-consistency loss. This kind of loss
uses the intuition that if it translate a sample from Domain X to Y using mapping
function G and then map it back to X using function F, how close are the result
from arriving at the original sample. Similarly, it calculates the loss incurred by
translating a sample from Y to X and then back again to Y.

5
2.9 Momentum Batch Normalization for Deep
Learning with Small Batch Size
Batch normalization [9] is a method used to make artificial neural networks faster
and more stable through normalization of the layers’ inputs by re-centering and
re-scaling. Hongwei Yong, et.al proposed a method basically for making scaled
and normalized input for the activation function to work in right way using batch
normalization. It is well-known that normalizing the input data makes training
faster so, batch normalization normalize data for training which can improve the
effectiveness and efficiency in optimizing various deep networks. As proposed,
momentum batch normalization uses the moving average of sample mean and
variance in a mini-batch for training which ensures proper training inside the
DNN.

2.10 Plan2Scene: Converting Floorplans to 3D


Scenes
This paper address [10] the task of converting a floorplan and a set of associated
photos of a residence into a textured 3D mesh model. Madhawa Vidanapathirana,
et.al proposed system does following:
a. lifts floorplan image to a 3D mesh model
b. synthesizes surface textures based on the input photos
c. infers textures for unobserved surfaces using a graph neural network archi-
tecture
The final output is a textured 3D mesh model of a house.

2.11 Pixels, voxels, and views: A study of shape


representations for single view 3D object
shape prediction
This paper [11] makes comparison between surface-based and volumetric 3D object
shape representations. Daeyun Shin, et.al described following representation in
this paper:
a. Voxel or volumetric based representation
Volumetric or voxel-based modeling is based on an approximate represen-
tation of a 3D object in terms of a set of tiny volume elements using an
encoder-decoder network. This is most often associated with solid model-
ing.
b. Surface or Multi-surface representation
This allows the representation of surface subdivisions by a set of polygons.
3D shapes look better qualitatively, as they can encode higher resolution.

6
The paper shows that surface-based methods outperform voxel representations for
objects from novel classes and produce higher resolution outputs.

2.12 Raster-to-Vector: Revisiting Floorplan Trans-


formation
This paper [12] addresses the problem of converting a rasterized floorplan im-
age into a vector-graphics representation. Chen Liu, et.al proposed vectorization
includes two application enabled by vector-graphics representation which are:

a. 3D model popup
A neural architecture first converts a floorplan image into a junction layer,
where data is represented by a set of junctions. Then integer programming
is formulated to aggregate junctions into a set of simple primitives (e.g wall
lines, door lines, or icon boxes) to produce a vectorized floorplan.
b. Interactive editing
Vector-graphics representation allows direct floorplan manipulation by hand
for error correcction or modeling. Here, demonstrated common editing op-
erations includes removing wall/door, moving multiple walls, etc.

2.13 SUGAMAN: Describing Floor Plans for Vi-


sually Impaired by Annotation Learning and
Proximity based Grammar
In this study, Shreya Ghoyal, et al. [13] propose a framework SUGAMAN which
synthesizes textual description from a given floor plan image for the visually im-
paired. With the help of a text reader software the target user can understand the
rooms within the building and arrangement of furniture to navigate. For room
segmentation, symbol spotting, retrieval in floor plan images, they used ROBIN
dataset. ROBIN helps in better visualization of the floor plans and aids in effi-
cient capturing of various high-level features while fine-grained retrieval. Since,
ROBIN has significant number of floor plans, as well as intra-class similarity and
inter-class dissimilarity. However, in ROBIN, there is no textual description avail-
able for a given floor plan images so they further augmented ROBIN dataset to
produce automated textual description.

2.14 Learning a Probabilistic Latent Space of


Object Shapes via 3D Generative-Adversarial
Modeling
Jiajun Wu, et.al [14] propose a novel framework, namely 3D Generative Adversar-
ial Network, which generates 3D objects from a probabilistic space by leveraging

7
recent advances in VCN and GAN.This 3D generation model makes use of IKEA
dataset to generate novel objects and reconstruct 3D objects from images.The
discriminator in GAN, learned without supervision, whcih can be used as an in-
formative feature representation for 3D objects, achieving impressive performance
on shape classification. It requires training of VAE-GAN to capture mapping from
2D to 3D image for high quality 3D objects.

2.15 Interactive 3D Modeling with a Generative


Adversarial Network
Jerry Liu, et.al propose [15] the idea of using a generative adversarial network
(GAN) to assist users in designing real world shapes with a simple interface. This
paper present a novel means of performing 3D shape manipulation which makes
use of 3D-GAN as generative model. In the process, discriminator is employed
to provide a feature space as well as a measurement of realism . It also provides
a way for editing operations with an easy-to-use interface in whicj users edit a
voxel grid with a minecraft-like interface and then hits the “SNAP” button, which
replaces the current voxel grid with a similar one generated by a 3D GAN.

2.16 Learning Shape Priors for Single-View 3D


Completion and Reconstruction
Jiajun Wu, et.al proposed [16] to use learned shape priors to overcome the 2D-3D
ambiguity and to learn from the multiple hypotheses that explain a single-view
observation. This model completes or reconstructs the object’s full 3D shape with
fine details from a single depth or RGB image. This model consists of:
a. 2.5D sketch estimation network
2.5D sketch estimator has an encoder-decoder structure that predicts the
object’s depth, surface normals, and silhouette from an RGB image.
b. 3D shape completion network
3D estimator is an encoder-decoder network that predicts a 3D shape in the
canonical view from 2.5D sketches.
c. Deep naturalness model
This model penalizes the shape estimator if the predicted shape is unnatural
using two losses: a supervised loss on the output shape, and a naturalness
loss offered by the pretrained discriminator.

2.17 A Colour Alphabet and the Limits of Colour


Coding
In 2010, Paul Green [17] elaborated an on-going interest in problems of colour
coding and the ways in which colours and shapes can be used for communicating

8
information. This paper focus on ways to determine the maximum number of
different colours that can be used in a colour code without risk of confusion. In
response to requests for sets of colours that would be as different from each other
as possible for purposes of colour coding, a benchmark proposed by Kenneth Kelly
which is a sequence of colours from which it would be possible to select up to 22
colours of maximum contrast is also discussed in this paper.

2.18 Fully Convolutional Networks for Semantic


Segmentation
Jonathan Long, et al. [18], presented insight is to build “fully convolutional” net-
works that take input of arbitrary size and produce correspondingly-sized output
with efficient inference and learning. They define and detail the space of fully con-
volutional networks, explain their application to spatially dense prediction tasks,
and draw connections to prior models. They also adapt contemporary classifica-
tion networks like AlexNet,VGG net and GoogLeNet into FCN and transfer their
learned representations by fine-tuning to the segmentation task. According to
the paper, fully convolutional training is far more faster than patch wise training,
since fully convolutional training take the whole image at once and produce out-
put unlike patch wise training. Here, the FCN could take input image of arbitrary
size and then provides output of corresponding sized with efficient inference and
learning.

2.19 Semantic Segmentation using Adversarial


Networks
In this paper, Pauline Luc, et al. [19] proposed adversarial approach to train
semantic segmentation models. They train a convolutional semantic segmenta-
tion network along with an adversarial network that discriminates segmentation
maps coming either from the ground truth or from the segmentation network.
They used two datasets which are Stanford Background and Pascal VOC dataset.
Their results show that the adversarial training approach leads to improvements
in semantic segmentation accuracy on both datasets. The same approach is also
suggested by [7] by performing semantic labels ↔ photo, trained on the Cityscapes
dataset.

2.20 Parsing Floor Plan Images


In this paper, Samuel Dodge, et al. [20] introduces a method for analyzing floor
plan images using wall segmentation, object detection, and OCR. They used real-
estate floor plan dataset R-FP and CVC-FP datasets. Here, FCN was used for
wall segmentation, then using OCR, sizes of the walls and furnitures are extracted.
Finally, they show applications in automatic 3D model building and interactive
furniture fitting.

9
2.21 The Rendering Equation
Kajiya, et.al [21] presents a integral equation that generalized all variety of known
rendering equation. Author discuss about Monte Carlo Solution and also presents
a new form of variance reduction and discuss about a range of optical phenomenon
which can be effectively simulated. It presents a good equation that is well suited
for computer graphics.

2.22 Improved Techniques for Training GANs


The Inception Score [22] is one of these ad-hoc metrics that has gained popularity
to evaluate the quality of generative models for images. The inception score was
proposed by Tim Salimans, et al. in their 2016 paper where they use a crowd-
sourcing platform (Amazon Mechanical Turk) to evaluate a large number of GAN
generated images. They developed the inception score as an attempt to remove the
subjective human evaluation of images and discover that their scores correlated
well with the subjective evaluation. The inception score involves using a pre-
trained deep learning neural network model for image classification to classify the
generated images

2.23 GANs Trained by a Two Time-Scale Up-


date Rule Converge to a Local Nash Equi-
librium
The Frechet Inception Distance score [23] is a metric that calculates the distance
between feature vectors calculated for real and generated images. The inception
score does not capture how synthetic images compare to real images. So, Martin
Heusel,et.al introduced FID score which summarizes how similar the two groups
are in terms of statistics on computer vision features of the raw images calculated
using the inception v3 model used for image classification. Lower scores indicate
the two groups of images are more similar, or have more similar statistics, with
a perfect score being 0.0 indicating that the two groups of images are identical.
The FID score is used to evaluate the quality of images generated by generative
adversarial networks, and lower scores have been shown to correlate well with
higher quality images.

10
Chapter 3
Requirement Analysis
3.1 Software Requirement
Software requirement for the prepared system includes:
1. Python
2. Star UML
3. Visual Studio Code
4. Google Colaboratory
5. CUDA
6. Git
7. Slack
8. Texmaker
9. Beamer
10. Clickup
11. Microsoft Team
12. Google Drive
13. Google Calender
14. Docker

3.2 Hardware Requirement


The project required following hardware requirements:
1. Desktop with NVIDIA Graphics Card (12 GB,RTX 3060 ) and 32GB RAM.

3.3 Functional Requirement


The functional requirements for the prepared systems are:
1. The prepared system must be able to generate appropriate designs of foot-
prints, room splits, furnishing and room rendering and allow end users to
choose preferred designs.

3.4 Non-Functional Requirement


These are essential for the better performance of the system. The points below
focus on the non-functional requirements of the system going to be prepared.

11
3.4.1 Reliability
The system should be reliable. The system should consider all necessary rooms
and furnitures that must be present in a normal house.

3.4.2 Maintainability
A maintainable system is required.

3.4.3 Performance
The system should prepare designs quickly as possible in implementation phase.

3.4.4 Accuracy
The system must accurately estimate the total cost of implementing the floor plan.
And,the dimentions of every corners must be accurate too.

12
Chapter 4
Feasibility Study
4.1 Technical Feasibility
All the required hardware are available i.e computer with good GPU and 32GB
RAM and some test programmes were run in Google Colaboratory. So there was
no problem with hardware devices.
Python was used for development of the system, and all the team members are
familiar in the language. All other utility softwares being used are all free of cost
and easy to use. So there was no problem with language and softwares.
As for datasets to train the GAN model, some datasets from internet was collected
and filtered and processed them as per need. Required datasets was prepared. So,
there was no problem with datasets too.
So the project is technically feasible.

4.2 Operational Feasibility


As for operational feasibility, the final system is prepared for normal CPU. The
system is interactive, with user friendly interface and easy to operate.

4.3 Economic Feasibility


For development of the system, all the softwares used are free of cost and hardwares
were available for us to train model.
Deployment of the system is smooth in low end PC too so the deployment is not
much expensive. Also, the project can also be developed to business level.
So,the project is economically feasible since there seems no problem with the
economy for the project.

4.4 Time Feasibility


The project was scheduled to be fininshed within 1 year time. As scheduled in
gantt chart, The project succesfully complete in time.

13
Chapter 5
Methodology
Agile Project Development can be properly used to guarantee the whole success
of the project. Therefore, Agile methodology as described below was used.

5.1 Agile methodology


Agile methodologies are used to develop software based on the concept of the it-
erative development process.It abandons the risk of spending months or years on
a process that ultimately fails because of some small mistake in an early phase.
Inside Agile, scrum framework can be used to manage all the project through
clickUp. Scrum is an Agile-based project management framework where teams
develop products in short project cycles called sprints. At the end of each sprint,
customer feedback was gathered and incorporate their suggestions before contin-
uing the development process.

Figure 5.1: Agile methodology


source: https://www.javatpoint.com/advantage-and-disadvantage-of-agile-methodology

5.1.1 Scrum Framework


In the scrum method, an incremental and iterative process to project management
was followed. Here the project’s life cycle is split into smaller cycles of specific
time periods (called sprints) that was tackle independently. Each sprint has a
recommended duration of 2–4 weeks and will help us quickly develop the project
and reach deadlines on time.
After each sprint cycle is complete, the working software (called increment) was
presented to the stakeholder (usually, collegues and supervisor) for their feedback.

14
It is required to keep a project backlog containing all the TODOs in one place
relating to whole project. Also, keeping the sprint backlog which only has related
backlogs to that sprint cycle is useful to properly focus on one thing.

Figure 5.3: Scrum framework


source: https://www.scrum.org/resources/what-is-scrum

Other than that, scrum master is assigned a role to the project manager who will
be responsible for making a vision of final product of every sprint and present the
project increment(result of each sprint) to supervisor and relay the feedback to
scrum team.
Continous interaction between team and supervisor helps in proficient speed of de-
velopment as well as quality software development indeed.Weekly meeting proved
to be very effective to boost the productivity in team members and keep updated
facts to supervisor.

The use of iterative planning and feedback results in teams that can continuously
align a delivered product that reflects the invisioned product of a manager. It
easily adapts to changing requirements throughout the process by measuring and
evaluating the status of a project. The measuring and evaluating allows accurate
and early visibility into the progress of the project. The ongoing change can some-
times give both the client and the team more than they had originally invisioned
for the product. The Agile Method really is a winning solution for everyone in-
volved in software development.

So, in the actual project development phase it was tried to follow the above men-
tioned method as much as possible. Indeed the scrum master was applied and
decided to conduct a weekly meeting with supervisor to present the progress and
present the ideas to supervisor.

Also, it was tried to make a daily 15 mintues scrum meeting 8:45 to 9:00 AM in
order to present and refresh on working progress among each other.

15
Table 5.1: Sprint Configuration

Sprint Length 2 weeks


Story point weight 1 story point = 1 hr
No.of working days per week 4 days
No.of working hours per day 3 hrs
No.of working days per sprint 8 days
No.of working hours per sprint 8*3 = 24 hrs

A rule to have scrum implemented in a form below:-

Altogether 12 spaces inside the project workspace was used that are:-

a. Project plan
b. wiki
c. sprint - 1
d. sprint - 2
e. sprint - 3
f. sprint - 4
g. sprint - 5
h. Datasets preparation and Validation
i. Training for footprint generation, roomsplit generation and furnished gener-
ation
j. Inference Engine
k. Documentation
l. Final Touch

Then, for each of these sprint spaces, a list name as sprint backlog had been made
where the ideas or tasks mentioned inside the project plan’s backlog listed. As per
dscussion in the meeting each of the tasks are first created inside project plan’s
backlog.
As the tasks are assigned to team members one has to make an issue in related
git repo project folder which then converts to the branch for working in technical
way. Then, after all the commits are pushed into that branch then merge request
is created for the project manager to check work in that branch then verify it
finally to merge it with master branch.

Therefore in this procedure, whole ideation of problem solving broken down into
small tasks and then conversion to git issues and finally branches relating to those
issues merging with master is performed.

16
5.2 Workload by Project Members
According to Clickup Productivity it was observe that the workload covered by
each of project members are as given below: Niranjan Bekoju (28.8%), Luja
Shakya (26.3%), Sunil Banmala (24.3%), Anusha Bajracharya (20.6%) Overall

Figure 5.5: Workload of the project by project members

working schedule of the project untill now is specified and represented as a time-
line graph as in [24]. For section 2 the gantt chart is in [25]

17
Chapter 6
System (or Project) Design and Ar-
chitecture
6.1 System Block Diagram
The block Diagram of the proposed system is shown in the figure below:

Figure 6.1: Block Diagram of Floor Plan Generation

18
6.2 Generation of Floor Plans
From a given parcel land structure to a well furnished floor plan and its 3D view,
the following main steps was followed:

a. Footprint generation from parcel land structures.


b. Room split generation from selected Footprint.
c. Well furnished floor plan from selected room split.
d. 3D view of finally selected well furnished floor plan.

6.2.1 Foot-print generation


This is the first step in generation pipeline. It creates an appropriate building
Footprints for a given parcel geometry. A GAN pix2pix (Paired image translation
using GAN) model was trained. Authors [1] create an array of models for specific
property type:

a. Commercial
b. Resident(House)
c. Resident(Condo)
d. Industrial

Here, each model would create a set of relevant footprint for a given parcel.

Figure 6.2: Generation of footprint

6.2.2 Room Split generation


The footprint generated in previous process customized per needed. Then, the
footprint is split into no. of rooms. Roomsplit generation model is trained for a
specific room-count and yields relevant results on empty building Footprints.

19
Figure 6.3: Room Split from footprint

There are 3 types of generation paradigm for room split based on

a. Conditions of the placement of walls in space


b. Necessity of having given room at a given place

The types of Generation are:

a. Free Plan Generation


b. Program-Specific Generation
c. Structure Specific Generation

6.2.2.1 Free Plan Generation


Only footprint of the building and position of the opening are specified. No struc-
ture of room placement will condition the layout of elements in space. GAN model
will freely plan out space and layout rooms, walls and opening between rooms.

Figure 6.4: Free Plan Generation

20
6.2.2.2 Program-Specific Generation
In this paradigm, user would specify:

a. Footprint of building
b. Position of facade opening
c. Position of a given room within building footprint

Figure 6.5: Program Specific Generation

6.2.2.3 Structure Specific Generation


In this paradigm, user would specify:

a. Footprint of building
b. Position of facade Opening
c. Existence of load bearing walls as initial constraints. By making the input
image of the training set with green lines, signal to the GAN model, the
presence of walls and train it to generate room layout.

21
Figure 6.6: Structure Specific Generation

Among these generation paradigm, Free Plan Generation was used since room
split was focused more, good orientation, connectivity and circulation between
those rooms rather than the specific position of a room and the existance of load
bearing walls.

6.2.3 Furnishing
Now, we’ve room split, the natural next process is furnishing each room (i.e.
addition of furniture across space in each room). Here, geometry of furniture is
not always perfect but furniture types and their relative space is reasonable. The
user have the ability to edit output of the model before transferring it to next
model keeps control of design process.

Figure 6.7: Furnishing the splited room

22
6.3 Qualifying Metrices for Floor Plans
6.3.1 Footprint
Footprint is used to analyze the shape of the floor plan perimeter and translate
it to histogram. It is used to know thin, bulky or symmetrical of the building
footprint as shown in figure 6.8.

6.3.1.1 Technical Standpoint


a. It use polar convexity and turn given outline into a list of discrete values
and it is used to compare with other plans.
b. It is used for qualifying generated footprint.

Figure 6.8: Footprint metrics of building footprint

6.3.2 Program
Program is the quantity analysis tool used to analyse the area covered by the
specific room in given total area of footprint.
Program is used to display the type of room and area it contains. It represent
the room using color code in any given floor plan. It provides color band and
become proxy to describe the program. It aggreagates quantity of room within
floor plan. This color band describes us to compute the programmatic similari-
ties and dissimilarities between any given pair of floorplans. It has mainly two
representation:

a. Colored floor plan


b. One-Dimensional color vector

23
Figure 6.9: Query input to the Progam

Figure 6.10: Result of the Progam

Figure 6.11: Program of the generated floorplans

24
6.3.3 Orientation
Orientation of wall is valuable source of information as it describes enclosure of a
plan and style of plan. Some of the types of style are:

a. Baroque
b. Manhattan
c. Row House
d. Sub Urban

For instance, modern house and gothic cathedral can be distinguished by simple
extracting the histogram of the walls orientation.

6.3.3.1 Technical Standpoint


a. Extract wall of a given floor plan.
b. Sum their length along each direction of space from 0 to 360
c. Plot in the histogram
d. Use to compare across the plan

Figure 6.12: Orientation of the floor plan

25
6.3.4 Thickness & Texture
Thickness and Texture is used for qualifying fatness of the plan. Thickness is
measure of the wall thickness and texture is variation of wall thickness. Thickness
is avg depth of each wall and Texture is the variation of depth of each wall. The
thickness of wall across the plan and geometry of wall surface differ from style to
style.
For eg: Beaux Arts Hall display columns and indented thick walls.

(a) Beaux Arts Hall Building (b) Beaux Arts Hall floor plan

Figure 6.13: Beaux Arts Hall

Similarily, Villa Buildings from Mies Van der Rohe

(a) Villa Building (b) Villa floorplan

Figure 6.14: Villa Style Building and floor plan

6.3.4.1 Technical Standpoint


a. First of all, isolate all the walls of a given plan
b. Calculate the thickness of the wall at each point
c. Output the histogram of wall thickness
d. Also,compute variation of thickness to describe the wall texture

26
Figure 6.15: Thickness and Texture of the floorplan

6.3.5 Connectivity
Connectivity is used to tackles room adjacency. It provides proximity of rooms
to one another and it is a key dimension of a floor plan. Connection between the
room through door and corridor defines the existence of connection between them.

6.3.5.1 Technical Standpoint


a. Fenestration is arrangement, partitioning, designing of windows and doors in
a buildings. By using fenestration, the graph among rooms can be deduce.
b. Connectivity metric is generated.
c. Then, adjacency matrix is build.
d. Finally, graph representation will be generated.

This graph is used to compare floor plans taking into account the similarity of
connection among rooms.

Figure 6.16: Connectivity of room in floor plan

27
6.3.6 Circulation
Circulation captures how people move across the floor plan. By extracting skeleton
of circulation, the people’s movement can be both quantity nad qualify across a
floor plan.

6.3.6.1 Technical Standpoint


a. Extract circulation of a given floor plan
b. Sum its length along each direction for 0 to 360 degree
c. The resulting histogram is used to compared against other floor plan’s cir-
culation.

Figure 6.17: Circulation of room in floor plan

28
6.4 GAN used
For the given parcel, it was required to generate the footprint, generate the room
split and then furnishing the splited room. For each step, the paired image trans-
lation was plan to be used. i.e. pix2pix GAN.

6.4.1 pix2pix GAN (Paired Image Translation)


Image to Image translation with Conditional Adversial Network is a general pur-
pose solution to translate image to image. Here, image translation is paired image
translation. It also learns a loss function to train this mapping. Here, it has auto
loss formulation hence, it is general purpose. No need for hand-engineering the
loss functions.
In conditional GANs, it learns mapping from observed image x along with the

Figure 6.18: Image to Image Translation using Conditional GAN

random noise vector z, and then produced the image z.

G : {x, y} → y (6.1)

Here, generator is trained to produced output that cannot be distinguished from


real images by an adversially trained discriminator, also discriminator is trained
to do as well as possible and detecting the generator’s fakes images.

Figure 6.19: GAN Model

29
6.4.1.1 Objective
The objective function of conditional GAN can be expressed as:

LcGAN {G, D} = Ex,y [logD(x, y)] + Ex,z [log(1 − D(x, G(x, z)))] (6.2)

Here, G tries to minimize this objective against an adversial and D tries to maxi-
mize it.

6.4.1.2 Network Architectures


In Conditional GAN, both generator and discriminator use modules in the form of
convolution-BatchNorm-ReLu. The Generator use the U-Net Architecture. U-Net
Architecture is an encoder-decoder with skip connections between mirrored layers
in the encoder and decoder stacks.
Generally, the skip connections is added between each layer i and n-i, where n is
the total number of layers.

Figure 6.20: Architecture of the generator

6.5 Datasets and Color Code


For the training purpose using Conditional GAN, paired data set for conversion
from one set of image to another set of image was required. Data set required for
the training purpose are:

a. Paired data set between parcel and corresponding footprint


b. Paired data set between footprint and corresponding room split
c. Paired data set between room split and corresponding furnished data

But, unfortunately only these dataset was available:

a. ROBIN dataset [26] (Repository Of BuildIng plaNs)


b. CVC-FP [27] for parsing the floor plan that can be used for image segmen-
tation

30
(a) Three Room Floor Plan (b) Four Room Floor Plan (c) Five Room Floor Plan

Figure 6.21: Data set from ROBIN Dataset

(a) Floor Plan (b) Wall Segmented Image

Figure 6.22: CVC-FP data set

Now, it was required to prepare dataset that will be suitable for purpose. And,
the following task have to be done to prepare the dataset.

a. Identify the furniture in the ROBIN datasets


b. Identify the room in the ROBIN datasets
c. Color code for the furniture in each room
d. Color code for the room in the building
e. Script that is used to convert the furnished data into color code of furniture
f. Script that is used to convert the furnished data into room color code
g. Script that is used generate foot print from the robin data
h. Using the foot print and the criterion of the footprint, parcel will be created

31
6.5.1 Types of Room
There are total 5 types of room as mentioned in the ROBIN dataset.

a. Bedroom
b. Bathroom
c. Entry
d. Kitchen
e. Hall

6.5.2 Types of Furniture


There are all together 12 types of furniture as mention in the ROBIN dataset.
They are illustrated in the figure 6.23.

Figure 6.23: Types of Furniture and corresponding symbol

6.5.3 Color Coding


Now, only the furnished floor plan was available, all types of furniture and types
of room. Now, dataset can be prepared. it was planed to prepare data set by using
the python script which needs to be prepared. This script is supposed to help us
in automation of preparation of the dataset. Before, preparing the dataset, color
codeing of different furniture and room was done using the distinct and highly
contrast color. For this, it was planned to use Kelly’s 22 colors of maximum
contrast that can be used in color coding without risk of confusion. This helps
model to train without confusion.
The order of colours in Kelly’s list was planned so that there would be maximum
contrast between colours in a set if the required number of colours were always
selected in order from the top.

32
Figure 6.24: Kelly’s 22 Color of Maximum Contrast
source: https://www.researchgate.net/figure/Kellys-22-colours-of-maximum-contrast-
set-beside-similar-colours-from-the-colour-alphabet fig9 237005166

6.6 HSV Color Model


In the HSV color model, a color is defined by its hue (H), its saturation (S) and its
lightness or blackness value (V) and so, it resembles the human color perception
more than the additive and the subtractive color models. It is easy to adjust a
color by its saturation and brightness.

Because of these advantages, the color selection using the HSV color space is
used, for example, in many common graphics programs. The standard color se-
lection dialog, for example from the Windows operating system, is also based on
the HSV color model: There is a color field in which the color can be selected
arranged according to hue and saturation, as well as an additional controller for
the brightness from white to black, with which the selected color can be adjusted.

The hue (H) is given as an angle on the chromatic circle, therefore it can reach
values between 0o and 360o . 0o corresponds to the color red, 120o corresponds
to the color green and 240o corresponds to the color blue. The saturation (S) is
declared as percentages and can therefore reach values between 0% and 100% (or
0 to 1). A saturation of 100% means a completely saturated and pure color, the
smaller the saturation, the more the color turns to a neutral gray. The lightness or
blackness value (V) is also given as a percentage, where 0% means no brightness
(hence black) and 100% full brightness, hence a spectrum between the pure color

33
(saturation of 100%) and white (saturation of 0%).

If both, the saturation as well as the lightness are 100%, a pure color results.
If the saturation is 0% and the lightness is 100% it is white and for all cases in
which the lightness is 0% it is black.

Figure 6.26: HSV Color cone


source: https://www.pngwing.com/en/free-png-nhpas

6.7 Chain Code


Chain code is a lossless compression technique used for representing an object in
images. The co-ordinates of any continuous boundary of an object can be repre-
sented as a string of numbers where each number represents a particular direction
in which the next point on the connected line is present. One point is taken as the
reference/starting point and on plotting the points generated from the chain, the
original figure can be re-drawn.By using Chain Codes from 0 to 7 the direction of
that lines can be determined.

34
Table 6.1: Chain Code mapping

Chain code ( dx , dy )
0 (1,0)
1 ( 1 , -1 )
2 ( 0 , -1 )
3 ( -1 , -1 )
4 ( -1 , 0 )
5 ( -1 , 1 )
6 (0,1)
7 (1,1)

Figure 6.28: Chain code directions

6.8 Template Matching


Template Matching is a method for searching and finding the location of a tem-
plate image in a larger image.It simply slides the template image over the input
image (as in 2D convolution) and compares the template and patch of input im-
age under the template image. Several comparison methods are implemented in
OpenCV. It returns a grayscale image, where each pixel denotes how much does
the neighbourhood of that pixel match with template.

Template images are prepared in such a way that, it covers all the essential detail
of template but with optimal area covering for best possible matching.

All the possible orientations for the objects of template is prepared and are fed to
the program.
For eg. In case of Arm chair total of 4 orientation is possible but, incase of dining
table only 2 orientation are enough.

35
Figure 6.29: Templates example

6.9 Thresholding
Thresholding is a point operation. It can be used to create a binary image. This
technique in based on a simple concept that a parameter ’θ’ called Threshold is
choosen and applied to the image.
For every pixel, the same threshold value is applied. If the pixel value is smaller
than the threshold, it is set to 0, otherwise it is set to a maximum value.

Figure 6.30: Threshold

6.10 Morphological Operation


Basic Morphology Operation are:

6.10.1 Erosion
The basic idea of erosion is just like soil erosion only, it erodes away the boundaries
of foreground object. It is normally performed on binary images. It needs two

36
inputs, one is the original image, second one is called structuring element or kernel
which decides the nature of operation. A pixel in the original image (either 1 or
0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is
eroded (made to zero).

Figure 6.31: Erosion

6.10.2 Dilation
Dilation takes two inputs in which one is the input image; the second is called the
structuring element or kernel, which decides the nature of the operation. Image
dilation Increases the object area. The dilation increases the white region in the
image, or the size of the foreground object increases.

Figure 6.32: Dilate

6.10.3 Opening
Opening is just another name of erosion followed by dilation. It is useful in
removing noise.

6.10.4 Closing
Closing is reverse of Opening, i.e. Dilation followed by Erosion. It is useful in
closing small holes inside the foreground objects, or small black points on the
object.

37
Figure 6.33: Opening

Figure 6.34: Closing

6.10.5 Skeletonization
Skeletonization is a process of reducing foreground regions in a binary image to a
skeletal remnant that largely preserves the extent and connectivity of the original
region while throwing away most of the original foreground pixels.
In simpler words, Skeletonization makes a BLOB very thin (typically 1 pixel).
BLOB (Binary Large Object) refers to a group of connected pixels in a binary
image.

Figure 6.35: Skeletonization

38
6.11 Canny Edge Detection
The Canny edge detector is used for edge detection. It used multiple stage to
detect a wide range of edges in image. It has 5 steps as:

Figure 6.37: Image has been re-


duced to grayscale, and a 5 * 5
Figure 6.36: The original image
Gaussian filter with σ = 1.4 has
been applied

Figure 6.38: The intensity gradi-


Figure 6.39: Non- Maximum su-
ent of the previous image. The
pression applied to the previous
edges of the image have been
image.
handled by replicating.

Figure 6.40: Double Threshold- Figure 6.41: Hysteresis applied


ing applied to the previousimage. to the previous image.
source: https://en.wikipedia.org/wiki/Canny edge detector

39
Chapter 7
Experiments
7.1 System Architecture
The overall work flow of the system for processing from cadastrial map to the 3D
model include the key steps as shown in figure 7.1.

Figure 7.1: Overall system’s architecture being followed.

7.2 Color Coding to Dataset


5 colors to represent 5 different rooms for room split dataset in ROBIN FP
datasets, other 12 colors to represent 12 different furnitures for furnished dataset
and other 1 color to represent windows and doors was required, altogether 18
different colors other than black and white, which could easily be differenciated
without confusion. For this, Kelly’s 22 Color of Maximum Contrast was used.
From the colors of Kelly’s 22 Color,first the exact color code was found out from
ISCC-NBS colour system [28] since Kelly’s color were standardized using ISCC-
NBS number. Among them first two which are black and white were ignored since
black was the color of walls and white was the background. So the next five colors
were choosed to represent 5 different rooms. Then next 12 colors were assigned
to 12 different furnitures and finally one color was selected to represent doors and
window in datasets as shown in table 7.1.

40
Table 7.1: Color coding to datasets

Entity Color ISCC-NBS No. Hex Code


Bedroom Yellow 82 #f1bf15
Bathroom Purple 218 #9352a8
Entry Orange 48 #f7760b
Kitchen Light Blue 180 #99c6f9
Hall Red 11 #d51c3c
Arm Chair Grey 265 #c8b18b
Bed Green 139 #23eaa5
Coffee Table Purplish Pink 247 #f483cd
Round Table Blue 178 #276cbd
Large Sofa Yellowish Pink 26 #f59080
Small Sofa Voilet 207 #61419c
Sink Purplish Red 255 #b83773
Twin Sink Greenish Yellow 97 #ebdd21
Small Sink Reddish Brown 40 #8b1c0e
Large Sink Yellow Green 115 #a7dc26
Tub Reddish Orange 34 #e83b1b
Dinning Table Olive Green 126 #20340b
Door Yellowish Brown 75 #673f0b
Window Yellowish Brown 75 #673f0b

This way, the colors to represent different entities were standardized. Except
this, the border of land area, i.e parcel was drawn of color black for easy creation
of following datasets using simple python script.

7.3 Dataset Preparation


ROBIN datasets was availabe which was a well furnished room layouts for 3 rooms,
4 rooms and 5 rooms. But datasets of parcel, footprint, room split and furnished
rooms datasets was required and each of the segmentation done by color repre-
sentation. So, a python cv2 script was created with which all the datasets could
be browse sequencially and place required parcel lines around the floor plans,
place footprint overlay over the original datasets creating footprint datasets, sim-
ilarly placing overlay of different colors over different rooms giving out room split
datasets and placing overlay of different colors over different furniture giving out
furnished datasets.
For training the GAN models to get required results, three different sets of datasets
was required which are paired image datasets in which left image would be the
input for the model and the right one be the result that the GAN model need to
produce. The three sets of data are:
a. Parcel to Footprint
b. Footprint to Roomsplit
c. Roomsplit to Furnished Rooms
These are all the paired image datasets as shown in figure 7.2.

41
Figure 7.2: Required Paired image of parcel and required footprint

To get these paired datasets, steps as in figure 7.3 to prepare the required datasets
was followed.

Figure 7.3: Principal to generate Paired image of parcel and required footprint

7.3.1 Augmented Dataset Creation


In first stage of datasets creation, parcel around the original datasets as shown in
figure 7.4, were placed by creating some enough space around the original floor
plans. For this following step were done: First of all the highest width and highest
height among all the datasets were found out. Then the max among the max height
and max width were found out which gave us the max size. Then the max size was
increased by 25% and a background image, i.e the foundation size of the datasets

42
was created. This was done to ensure that all the datasets were of same size and
not squeezed. Then all the datasets were placed in the square background image
and were saved in a separate folder with same name. Also to increase the size of
datasets, all the images were again places in the background with slight change in
orientation, i.e 30 degree rotation in clockwise direction. This way, the datasets
were refined as shown in figure 7.5, so as to do some slight process and create
other required datasets and the datasets were named as augmented datasets.

Figure 7.4: Original dataset from Figure 7.5: Augmented Dataset with
ROBIN Floor Plan datasets. padding and 30 degree clockwise ro-
tation placed in a square frame, re-
sized to 512*512.

7.3.2 Parcel and Footprint Generation


After getting the augmented datasets, border of the land around the floor plans
were drawn manually by loading the datasets sequencially using python cv2 script
and the floor plans with border around them was saved, for further processing and
the datasets were labelled as final datasets as in figure 7.6.

Figure 7.7: Final dataset rotated


Figure 7.6: Dataset with border area
back to normal for further operation
around.
of dilation and erosin.

For generation of footprint, a python cv2 script was created to draw a black

43
colored overlay over the floor plan resulting the footprint datasets but it was quite
tedious job to do so many datasets manually. So New method with a image pro-
cessing tool using dilation and erosion was introduced. Dilation would increase
the width of any lines or pixels in the image and erosion would decrease the width
of the lines or pixels in the image. Using this concept footprint was created by
following method: First the image was rotated 30 degree anticlockwise as shown in
figure 7.7 if it was rotated once before which was identified using the naming con-
vention and then dilated with 5*5 pixels which would increase the width of white
parts of the image as in figure 7.8, resulting in decrease of width of the black lines.
The lines with width less than 5 pixel would disappear in this step.Then again
erosion with 5*5 stride size is done which would bring back the previous image as
shown in figure 7.9, except the thin lines that disappeared during dilation would
not come back again.This step would clean the floor plans and gives clean parcel.

Figure 7.8: Result when rotated final Figure 7.9: Result when dilated im-
image is dilated by stride of 5*5. age is eroded by stride of 5*5.

If required, the parcel is rotated agian by 30 degree clockwise to get the require
parcel image of the final image as in figure 7.10.
Then a difference between the original image and the parcel was created giving us

Figure 7.10: Parcel image of the final dat-


set.

Floor plan without border line as in figure 7.11, of which first clear the doors and

44
the furniture by doing dilation with 2*2 stride. Then the wall segmentation and
then erode the image with stride of 127*127 giving us the highly eroded image,
i.e the wall segmentation would be changed to a very thick walled, i.e foot print
of the floor plans in figure 7.12.

Figure 7.12: Result when difference


Figure 7.11: Difference of final dataset image is dilated by stride of 2*2 then
and parcel. erroded by stride of 127*127.

The dilate the image using stride of 125*125 giving us the floor plan’s size as
the size of original as shown in figure 7.13 then thresholded the image to get exact
black and white print as shown in figure 7.14.

Figure 7.13: Dilated image is eroded Figure 7.14: Dilated image is thresh-
back by stride of 125*125. olded to get black and white print.

The plan is now the footprint which is then rotated clockwise 30 degree if re-
quired giving us final footprint shown in figure 7.15. Now add the floor plan with
the parcel and get the required footprint datasets shown in figure 7.16.

Since paired image for translation from parcel to footprint was required, concate-
nate the footprint datasets with the parcel and get footprint generation datasetsas
shown in figure 7.17.

45
Figure 7.16: parcel added to foot-
Figure 7.15: Footprint
print

Figure 7.17: Paired image of parcel and required footprint.

7.3.3 Algorithm for furnished generation using template


matching
Furnished data generation using template matching:

a. Get furnished image


b. Get template image of furniture
c. Convert images to grayscale
d. Get size of template image
e. Match Template Image with furnished image
f. Result is threshold and get center coordinates
g. Draw a filled rectangle at center coordinates

46
Figure 7.18: Block Diagram of Template Matching for furnished generation

7.4 Dataset Validation


7.4.1 Footprint Qualify
Steps followed to qualification metrics for Footprint Datasets are:

a. Read footprint Image


b. remove parcel border
c. extract boundary of footprint
d. get Centroid of boundary
e. Create mask image using Centroid and mask value along a region from angles
0 to 360 degree at interval of 5 degree
f. Masked footprint image
g. Count Pixel
h. Draw histogram between angles and area

47
Figure 7.19: Block Diagram of Footprint Qualify

7.4.2 Program Qualify


Steps involved in finding qualifying metrics in roomsplit dataset ”Program” are:

a. Get the image in BGR of which program is to be formed.


b. Convert the image from BGR to HSV format.
c. Define the mean color value of different rooms and get the range of color for
the room from -(5,50,50) to +(5,50,50).
d. With the color range of each room, get the total number of pixel with HSV
value within the range for each rooms.
e. With total number of pixels in each rooms, get the percent of area covered
by each room.
f. With the percent of area covered by each room, plot the data in program.

Figure 7.20: Block Diagram of Program Qualify

48
7.4.3 Orientation Qualify
Steps:

a. Read images from furnished datasets.


b. Resize the image to (512 x 512).
c. Perform morphological closing by (2,2) for removing furniture and doors.
d. Convert image from BGR to Grayscale then threshold the image
e. Inverse the threshold image to make walls white.
f. Skeletonize the image using skeletonization technique.
g. Filter the image using Sobel kernel.
h. Find the contours.
i. Calculate the chain codes for each lines detected from the contour points.
j. Count the chain codes.
k. Plot the polar Histogram orientation for 8 directions.

Figure 7.21: Block Diagram of Orientation Qualify

7.5 Parcel Generation from Cadestrial


For generation of Parcel from Cadestrial for deployment, grabbed parcel area from
cadestrial map by allowing the end user to manually draw over the map and select
his land area and grab the corners of the selected area and place it over a plain
white background giving us a parcel which is ready to go for further processing.
Overall steps followed in these process are summarized below: First of all the
selected image of cadestrial map is displayed in a new window and user is allowed
to select corner points of his land area and the system draws the lines correspond-
ingly as in figure 7.23.

49
Figure 7.22: Cadestrial map from Figure 7.23: Area of land drawn over
Land Revenue Offices cadestrial map.

Then the corner points, i.e selected area were saved in an array. Then the coordi-
nates of the corner points were brought toward origin by subtracting the minimum
value of x coordinates from all x coordinates and minimum valur of y coordinated
from all y coordinates, i.e (min(x),min(y)) from all the corner points. Then the
result points were plotted in a new image giving us the land area of the user as
in as in figure 7.24. Since the land area image was not ready for further process-
ing by the system, it was further processed by placing it at center over a square
background with white color and size 25% bigger than the maximum side of the
land area image giving us the parcel as in figure 7.25 ready for further processing
by the system. Then the parcel image is saved in remote after resizing it to the
required size i.e 512*512.

Figure 7.25: parcel area formed


Figure 7.24: Area of land selected by
by placing selected area over back-
the user.
ground.

50
7.6 GAN Architectures
For the preparation of the GAN model, it was needed to first know the component
of the GAN. A GAN model has three components. Namely:
a. Generator
b. Discriminator
c. Loss Function
Among these components, mainly focus on the generator and discriminator. In
case of the loss function, initially the binary cross entropy, Mean Square Error,
Mean Average Error was used,.

7.6.1 Generator
For generator, first needed to constrained that, a new type of image is going to
generate using a image. So, got condition from a image and then transfer the
content of the image to generate another image. So, requiredto have a generator
model design in such a way that the input and output have the same dimension.

The same scenario is observed in the image segmentation in medical image in [4].
Also the similar architecture but the most advanced version of U-net i.e. Double
U-net based architecture can also be used for the similar generation of image. So,
here, two model for generation of image can be used.

7.6.2 Discriminator
For discriminator, initially, Custom CNN model that takes two input images will
be used as a discriminator and generate a model that gives an output 16*16 model
will be used.

Here, the discriminator model works in similar concept to classifier with clas-
sification classes real or fake. so, it was planned to prepare a model using Google
Net pre-trained in image net datasets and change the top layer. The pre-trained
model is used because it have already learn most generic features in lower layer
(layer closes to the input layer) and also some high level features that using higher
layer. So, the concept of transfer learning for fast training and high accuracy was
used.

Not, only this, u-net based architecture was also used, that is build using en-
coder and decoder with skip connections. Here, the model can classify the whole
image as real or fake and also, it can classify the each pixel as real or fake. So,
this was one of the power architecture for discriminator. Since, the discriminator
is more powerful, so, the generator is also most powerful, since GAN is the com-
petitive algorithm between generator and discriminator.

So, in summary, 2 generator architecture and 3 discriminator architecture was


used. And, in total 6 GAN architecture for pix2pix image translation was pro-
posed.

51
7.6.3 GAN Architectures
a. U-net Generator with Custom CNN Discriminator
b. U-net Generator with Google net Discriminator
c. U-net Generator with U-net based Discriminator as shown in C
d. Double U-net Generator with Custom CNN Discriminator
e. Double U-net Generator with Google net Discriminator
f. Double U-net Generator with U-net based Discriminator

7.7 Condition for the Training


Datasets for training: The datasets for the training purpose is designed for

a. Parcel to Footprint [29]


b. Footprint to Room Split [30]
c. Room Split to Furnished [31]

No. of loop: For now, let’s configure the no. of iteration for each model is
200,000 and 400,000.

GPU machine: RTX 3060 with 3584 cuda cores and 12 GB RAM will be used
for training purpose...

Then, after the val loss and val accuracy was observed, it can be observe from
where the model starts to over fit. If the model over fit, it simply memories the
style and content and the model will not be generic. So, it must be observe where
the model start over fitting.

7.8 Model Comparision between different types


of architecture
As mention in 7.6, there are 6 types of architecture that are going to implement in
this project. So, this model will be compared. So, for the comparison of different
model, need to fix some sort of data that is mentioned in 7.7.
And, after the training is over, it will generate some datasets and then will use
inception score and FID score for the model evaluation. This score will be compare
with each other model as well as with the ground truth image.

7.9 Model Training


7.9.1 Loading the dataset and preparing for training
Before starting the training, required to load the dataset and augment it so that the
learning procedure go well. There are the steps followed to make the images/data
suitable for training:

52
a. Load zip Datasets
b. Extract zip file
c. Load Image
d. Convert Image to float format and divide by 255.0
e. Resize image to 286
f. Random Crop resulting to (256,256)
g. Mirror image randomly

53
7.9.2 Generator Model Architecture U-net

Figure 7.26: U-net Architecture used for Generator

54
7.9.3 Generator Model Architecture U-net summary
Model: "U-net Generator"
____________________________________________________________________________
Layer Output Shape Param # Connected to
============================================================================
input_3 [(None, 256, 256, 3)] 0 []

sequential_32 (None, 128, 128, 64) 3072 [’input_3[0][0]’]

sequential_33 (None, 64, 64, 128) 131584 [’sequential_32[0][0]’]

sequential_34 (None, 32, 32, 256) 525312 [’sequential_33[0][0]’]

sequential_35 (None, 16, 16, 512) 2099200 [’sequential_34[0][0]’]

sequential_36 (None, 8, 8, 512) 4196352 [’sequential_35[0][0]’]

sequential_37 (None, 4, 4, 512) 4196352 [’sequential_36[0][0]’]

sequential_38 (None, 2, 2, 512) 4196352 [’sequential_37[0][0]’]

sequential_39 (None, 1, 1, 512) 4196352 [’sequential_38[0][0]’]

sequential_40 (None, 2, 2, 512) 4196352 [’sequential_39[0][0]’]

concatenate_14 (None, 2, 2, 1024) 0 [’sequential_40[0][0]’,


’sequential_38[0][0]’]

sequential_41 (None, 4, 4, 512) 8390656 [’concatenate_14[0][0]’]

concatenate_15 (None, 4, 4, 1024) 0 [’sequential_41[0][0]’,


’sequential_37[0][0]’]

sequential_42 (None, 8, 8, 512) 8390656 [’concatenate_15[0][0]’]

concatenate_16 (None, 8, 8, 1024) 0 [’sequential_42[0][0]’,


’sequential_36[0][0]’]

sequential_43 (None, 16, 16, 512) 8390656 [’concatenate_16[0][0]’]

concatenate_17 (None, 16, 16, 1024) 0 [’sequential_43[0][0]’,


’sequential_35[0][0]’]

sequential_44 (None, 32, 32, 256) 4195328 [’concatenate_17[0][0]’]

concatenate_18 (None, 32, 32, 512) 0 [’sequential_44[0][0]’,

55
’sequential_34[0][0]’]

sequential_45 (None, 64, 64, 128) 1049088 [’concatenate_18[0][0]’]

concatenate_19 (None, 64, 64, 256) 0 [’sequential_45[0][0]’,


’sequential_33[0][0]’]

sequential_46 (None, 128, 128, 64) 262400 [’concatenate_19[0][0]’]

concatenate_20 (None, 128, 128, 128) 0 [’sequential_46[0][0]’,


’sequential_32[0][0]’]

conv2d_ (None, 256, 256, 3) 6147 [’concatenate_20[0][0]’]


transpose_24

============================================================================
Total params: 54,425,859
Trainable params: 54,414,979
Non-trainable params: 10,880
____________________________________________________________________________

7.9.4 Defining the generator loss


GAN learn a loss that adapts to the data, cGAN learn a structure loss that
penalizes a possible structure that is different from GAN network output and
target image as described in [7].
Generator loss through two losses:

a. Sigmoid Cross-entropy loss (gan loss)


This loss is calculated between the output of discriminator when generated
image is feed and array of ones .
b. L1 loss (L1 loss)
This is Mean Absolute Error between generated image and the target image.

Total loss = gan loss + (LAMBDA * L1 loss)


where, LAMBDA = 100, decided by authors of the paper.

56
7.9.5 Training Procedure for the generator

Figure 7.27: Generator Training Procedure

57
7.9.6 Discriminator Model Architecture

Figure 7.28: Discriminator Architecture

58
7.9.7 Discriminator Model Summary
Model: "Discriminator"
_________________________________________________________________________
Layer Output Shape Param # Connected to
=========================================================================
input_image [(None, 256, 256, 3)] 0 []

target_image [(None, 256, 256, 3)] 0 []

concatenate_29 (None, 256, 256, 6) 0 [’input_image[0][0]’,


’target_image[0][0]’]

sequential_65 (None, 128, 128, 64) 6144 [’concatenate_29[0][0]’]

sequential_66 (None, 64, 64, 128) 131584 [’sequential_65[0][0]’]

sequential_67 (None, 32, 32, 256) 525312 [’sequential_66[0][0]’]

zero_padd (None, 34, 34, 256) 0 [’sequential_67[0][0]’]


ing2d_2
conv2d_41 (None, 31, 31, 512) 2097152 [’zero_padding2d_2[0][0]’]

batch_nor (None, 31, 31, 512) 2048 [’conv2d_41[0][0]’]


malization_63

leaky_re_lu_40 (None, 31, 31, 512) 0 [’batch_normalizati


on_63[0][0]’]
zero_padd (None, 33, 33, 512) 0 [’leaky_re_lu_40[0][0]’]
ing2d_3
conv2d_42 (None, 30, 30, 1) 8193 [’zero_padding2d_3[0][0]’]

=========================================================================

Total params: 2,770,433


Trainable params: 2,768,641
Non-trainable params: 1,792
_________________________________________________________________________

59
7.9.8 Defining the Discriminator loss
Discriminator loss is calculated using real image and generated image.

a. Real Loss (real loss)


The loss generated using real image is real loss.
Calculation of real loss:

real loss = sigmoid cross entropy(discriminator(real image), all ones)


(7.1)
b. Generated Loss (generated loss)
The loss generated using generated image is generated loss.
Calculation of generated loss:

real loss = sigmoid cross entropy(discriminator(generated image), all ones)


(7.2)

total loss = real loss + generated loss (7.3)

60
7.9.9 Training procedure for Discriminator

Figure 7.29: Discriminator Training Procedure

61
7.9.10 Generator Model Architecture for Triple-U-net brief

Figure 7.30: Triple U-net Architecture

7.10 Generator Metrics


Metrics on the basis of which testing of final models are carried out:

• Inception score

• FID score

7.10.1 Inception Score


For the development of better and logical evaluation metrics in the context of
a Deep neural network involving generative models. Generative models need to
be directly evaluated for the application they are intended for and as they are
integrated into more complex systems, it’ll be harder to discern their exact appli-
cation aside from effectively capturing high-dimensional probability distributions
thus necessitating high-quality evaluation metrics that are not specific to applica-
tions.The generative modeling community has developed various ad-hoc evaluative
criteria. The Inception Score is one of these ad-hoc metrics that has gained pop-
ularity to evaluate the quality of generative models for images.
Inception Score(IS) was shown to correlate well with human scoring of the real-
ism of generated images from the CIFAR-10 dataset. The IS uses an Inception
v3 Network pre-trained on ImageNet and calculates a statistic of the network’s
outputs when applied to generated images.

IS(G) = exp(Ex pg DKL (p(y|X)kp(y))) (7.4)

62
where X pg indicated that X is an image sampled from pg , DKL (pkq) is the KL-
divergence between theR distributions p and q, p(y|X) is the conditional class dis-
tribution and p(y) = X p(y|X)pg (X) is the marginal class distribution. The exp in
the expression is there to make the values easier to compare, so it will be ignored
and use ln(IS(G)) without loss of generality.
Considering the exponentiation new improved inception score is as follows:
N
1 X
s(G) = DKL (p(y|X(i) kp̂(y))) (7.5)
N i=1

7.10.2 FID Score


The Frechet Inception Distance score, or FID for short, is a metric that calculates
the distance between feature vectors calculated for real and generated images.
The score summarizes how similar the two groups are in terms of statistics on
computer vision features of the raw images calculated using the inception v3 model
used for image classification. Lower scores indicate the two groups of images
are more similar, or have more similar statistics, with a perfect score being 0.0
indicating that the two groups of images are identical.
The FID score is used to evaluate the quality of images generated by generative
adversarial networks, and lower scores have been shown to correlate well with
higher-quality images. The FID score was proposed as an improvement over the
existing Inception Score or IS.
The FID score is calculated by first loading a pre-trained Inception v3 model.
The output layer of the model is removed and the output is taken as the
activations from the last pooling layer, a global spatial pooling layer.
This output layer has 2,048 activations, therefore, each image is predicted as
2,048 activation features. This is called the coding vector or feature vector for the
image.
A 2,048 feature vector is then predicted for a collection of real images from
the problem domain to provide a reference for how real images are represented.
Feature vectors can then be calculated for synthetic images.
The result will be two collections of 2,048 feature vectors for real and generated
images. The FID score is then calculated using the following equation taken from
the paper: p
d2 = kmu1 –mu2 k2 + T r(C1 + C2 –2 ∗ C1 ∗ C2 ) (7.6)
The score is referred to as d2 , showing that it is a distance and has squared units.
The “mu1 ” and “mu2 ” refer to the feature-wise mean of the real and generated
images, e.g. 2,048 element vectors where each element is the mean feature ob-
served across the images.

7.11 Activation Function


7.11.1 Sigmoid Activation Function
Sigmoid function gives an ‘S’ shaped curve.The function maps any real value into
another value between 0 and 1. Therefore, it is especially used for models where

63
require to predict the probability as an output.
Equation:
1
f (x) = s = (7.7)
1 + e−x
Derivative:
f 0 (x) = s ∗ (1 − s) (7.8)
Range: (0,1)

7.11.2 ReLU Activation Function


ReLU stands for rectified linear activation unit and is considered one of the few
milestones in the deep learning revolution.
Simple formula :
f (x) = max(0, x) (7.9)

7.11.3 Leaky ReLU Activation Function


Leaky ReLU function is an improved version of the ReLU activation function. As
for the ReLU activation function, the gradient is 0 for all the values of inputs that
are less than zero, which would deactivate the neurons in that region and may
cause dying ReLU problem.

Leaky ReLU is defined to address this problem. Instead of defining the ReLU
activation function as 0 for negative values of inputs(x), Leaky ReLU define it as
an extremely small linear component of x. Here is the formula for this activation
function

f (x) = max(0.01 ∗ x, x) (7.10)


Thus, it gives an output for negative values as well.

7.11.4 Softmax Activation Function


Softmax is not a traditional activation function. Other activation functions pro-
duce a single output for a single input. In contrast, softmax produces multiple
outputs for an input array. It not only maps output to a [0,1] range but also maps
each output in such a the output of softmax is therefore a probability distribution.

Equation:
ex
f (x) = P i x (7.11)
( j=θ ei )
Probabilistic interpretation:
Sj = P (y = j|x) (7.12)
Range: (0, 1)
The softmax function is often used in the final layer of a neural network-based
classifier.
Softmax is used for multi-classification in logistic regression model. Softmax can

64
be used to build neural networks models that can classify more than two classes
instead of a binary class solution.

7.11.5 Tanh Activation Function


Tanh is also like logistic sigmoid but better. Tanh is also sigmoidal (s - shaped).
Range: (-1, 1)

The tanh function is mainly used classification between two classes.

For the project, leaky ReLU activation function among all of the above was used.
While calculating, slope saturates when the input gets large in tanh, softmax and
sigmoid function. In this case, ReLU activation function overcomes this problem.
However, the slope of ReLU in the negative range is 0 so once a neuron gets
negative, it’s unlikely for it to recover. This means neurons are not playing any
role in discriminating the input and is essentially useless. Hence, to overcome all
above mentioned problems, leaky ReLU is most convenient for use.

7.12 Loss Function


7.12.1 Binary Cross Entropy
Binary cross entropy is a loss function that is used in binary classification tasks.
These tasks include answer to a question with only two choices (yes or no, A or
B, 0 or 1, left or right). Several independent such questions can be answered at
the same time.
The binary cross entropy is very convenient to train a model to solve many clas-
sification problems at the same time, if each classification can be reduced to a
binary choice (i.e. yes or no, A or B, 0 or 1).
Sigmoid function is only activation function compatible with binary cross entropy.
It can be explained as:
N
1 X
Hp (q) = − yi .log(p(yi )) + (1 − yi ).log(1 − p(yi )) (7.13)
N i=1

Here, Hp (q) is Binary Cross Entropy loss , x is the input , y is label (let labels
be some color to points x: label 1 is green and label 2 is red)/ output, p(y) is
probability of y being label 1, 1 − p(y) is probability of y being label 2.Here,
Binary cross entropy is the average of sum of log of probability of a point being
red or green.

7.12.2 Categorical Cross Entropy


Categorical cross entropy is a loss function that is used in multi-class classifica-
tion tasks. These are tasks where an example can only belong to one out of many
possible categories, and the model must decide which one.

65
Formally, it is designed to quantify the difference between two probability distri-
butions.
The categorical cross entropy is well suited to classification tasks, since one exam-
ple can be considered to belong to a specific category with probability 1, and to
other categories with probability 0. It can be explained as:
esi
f (s)i = PC (7.14)
sj
j e

C
X
CE = − ti .log(f (s)i ) (7.15)
i

Here, ti is the groundtruth, i.e in the form of [0, 0, 0, 1], eg: for [cat, dog.horse, lion]
i.e multi class and f (s)i is result of softmax activation of multiple class eg: [0.2, 0.1, 0.3, 0.4]
And, Higher the probability of the actual class in softmax result, lesser the loss
and viceversa.

7.12.3 Mean Squared Error


The mean squared error (MSE) or mean squared deviation (MSD) of an estimator
(of a procedure for estimating an unobserved quantity) measures the average of the
squares of the errors that is, the average squared difference between the estimated
values and the actual value. MSE is a risk function, corresponding to the expected
value of the squared error loss.
n
1X
M SE = (Yi − Ŷi )2 (7.16)
n i=1

Here, Yi is initial observed values while Ŷi is the estimated or predicted value at
any instance. MSE is the average of square of error between estimated and actual
values.

7.12.4 Mean Absolute Error


mean absolute error (MAE) is a measure of errors between paired observations
expressing the same phenomenon. Examples of Y versus X include comparisons
of predicted versus observed, subsequent time versus initial time, and one tech-
nique of measurement versus an alternative technique of measurement. MAE is
calculated as:
Pn Pn
i=1 |yi − xi | |ei |
M AE = = i=1 (7.17)
n n
where, x is the observed values while y is the predicted values. MAE gives the
average of sum of absolute error between observed values and predicted values at
every instance.
Decision boundary in classificaton task is large in comparison to regression. And
MAE and MSE doesnot fix misclassification enough, but is a right loss for re-
gression where distance between two values can be predicted is small. And from
a probabilistic point of view, the binary cross-entropy arises as the natural cost

66
function to use when a sigmoid function was used in the output layer of the net-
work, and to maximize the likelihood of classifying the input data correctly. And
categorical cross entropy is often used in case of multi class classifier.

7.13 Optimization Function


Optimization is the problem of finding a set of inputs to an objective function
that results in a maximum or minimum function evaluation(Objective Function).
Types of optimization function:

• Continuous Function Optimization

• Combinatorial Optimization problems

Continuous Function Optimization is encountered in problems where the objective


function has real numbers as input arguments and output from the function.
Combinatorial Optimization Problems is encountered in problems where the ob-
jective function has discrete value as input and output from the function.

7.13.1 Bracketing Algorithm


It’s uses are:

a. Intended for optimization problems with only one input variable


b. Optima is known to exist within a specific range
c. Specially used in finding the square roots of the number and solving a equa-
tion
d. They assume single optima is present i.e. Unimodal objective function.

7.13.2 Local Descent Algorithm


Local Descent Algorithm is used for optimization problem with more than one
input variable and single global optima.
Common Example is Line search Algorithm.
It involves choosing a direction to move in the search space and then perform
bracketing type search in a line or hyper plane in the choosen direction.
It is computationally expensive to optimize each directional move in the search
space.

7.13.3 First Order Algorithm


It uses the first derivative to choose the direction to move in the search space.
Steps:

a. First Calculate the gradient of the function.


b. Follow the gradient in the opposite direction(e.g. downhill to the minimum
for minimization problems)using a step size.(Learning Rate)

67
c. Step size is hyper parameter that control how far to move in the search space
d. Small step size takes a longtime and can get stuck.
e. Large step size results in zig-zagging or bouncing around search space.

7.13.3.1 Gradient Descent


Gradient Descent Optimization Algorithm is popular often used as black box op-
timizer. It is most common way to optimize Neural Networks. Gradient descent
is a way to minimize an objective function. Learning rate determines the size of
the steps take to reach local maxima. It follows the direction of slope downhill
until it reach a valley.
Variants of Gradient Descent are:
Batch Gradient Descent: It computes gradient of the cost function with re-
spect to the input in objective function for the entire training dataset.It is
very slow and not used for datasets that does not fit in memory.It is guar-
anteed to converge to global minima for convex error surface.
Stochastic Gradient Descent: It performs parameter update /tuning for each
training examples. It solves problem that exists in batch gradient descent,
performs redundant computation for large datasets. It updates with high
variance that causes objective function to fluctuate heavily. It’s fluctuation
enables it to jump to new and potentially better local minima.
Mini batch gradient descent: It updates for every mini-batches of a training
examples. It reduces variance of parameters update and hence reduce fluctu-
ation. It uses highly optimized matrix optimization.Its common mini-batch
size is 50-256.
Momentum: It helps Stochastic Gradient Descent in the relevant direction and
dampens oscillation. It is done by adding a fraction of the update vector of
the past time.
vt = γvt−1 + η∇θ J(θ) (7.18)
θ = θ − vt (7.19)

Adagrad: It adapts the learning rate to the parameters. It performs larger up-
dates for infrequent. It performs small updates for frequent parameters.
Adagrad used different learning rate for every parameter at every time step.
η
θt + 1, i = θt,i − p (7.20)
Gt,i i + .gt,i

gt,i = ∇θt J(θt,i ) (7.21)

Adadelta: It is an extension of Adagrad.It reduces aggressive decreasing learn-


ing rate in Adagrad. Adagard accumulate all past squared gradients, but
adadelta restrict the window of accumulated past gradients to some fixed
sized w. The sum of gradients is recursively defined as a decaying average
of all past squared gradients.

E[g 2 ]t = γE[g 2 ]t−1 + (1 − γ)gt2 (7.22)

68
Adam: It computes adaptive learning rate for each parameters. It has exponen-
tially decaying average of past squared gradient. It includes momentum in
adadelta or RMSprop.
η
∆wi (t) = − p Mi (t) (7.23)
Gi (t) + 

δL
Mi (t) = αMi (t − 1) + (1 − α) (t) (7.24)
δwi

7.13.4 Second Order Algorithm


Example: Newton’s Method, Secant Method, Quasi Newton Method

Among, this all Optimization function, Adam was choosed in the project for Op-
timization of Deep Neural Network as Adam includes momentum in Adadelta or
RMSprop. and Adadelta is extension of Adagrad. So, in Adam Optimizer there is
advantages of all Momentum, Adagrad and Adadelta. So, Adam was choosed as
optimization function. And, there is a wide practice of using Adam as optimization
function in case of deep neural networks.

69
7.14 3D Generation
For 3D Generation, one approach is 3D plotting of segmented image. For the plot-
ting of 3D model,three python packages was used which are numpy, matplotlib
and opencv using which 3D ploted the segmented image to 3D model. Here, first
segmented image that contain segmentation of wall, door and window was pro-
duced as shown in figure 7.31.
Here, the white background represents floor, dark color band with gray level rep-

Figure 7.31: Segmented image required for 3D plotting.

resents wall, band with gray level 170 represents window, gray level 85 represents
door. The details is shown below in gray level format table 7.2.

Table 7.2: Gray level format in segmented image for 3D generation.

Component Gray Level


floor 255
window 170
door 85
wall 0

But, while generating the segmented image from the room split, it may not get
exact gray level. So, for this gray level slicing approaches was used as shown in
figure 7.32.
Finally, the corresponding gray level with wall, floor, door, and window was
replaced. And the result will be display using the matplotlib 3D projection. The
output will be as shown in figure 7.33.

70
Figure 7.32: Graylevel
Figure 7.33: 3D Model Plotting
slicing approach

7.15 Convolution Layer


Convolution layer is one of the most important layer in deep neural network. It is
used to extract features from the input image. It is also known as feature map.

Figure 7.34: Convolutional layer convolved feature

X X
V (m, n) = z(k, l)y(m − k, n − l) (7.25)
(k,l)w

Here, is a input image of 5*5*1 represented by green color, and a kernel of 3*3*1
represented by yellow color.
The filter slides over the image to performs convolution operation and the result
is shown in convolved feature in image.
Here, the filter moves to the right with certain stride value till it parses the com-
plete width. And, it hops down to the beginning of the image with the same stride
value and repeat the process until the entire image is traversed.
Here, the depth of the kernel is normally equal to the depth of the input image.
Here, two types of operation.

• Same Padding that performs the convolution in such a way that the output
image has same dimension as input image. It first includes required padding
in the input image.

71
Figure 7.35: Convolution of three channel image

• Valid Padding that perform the same operation without padding in input
image. So, the output image dimension is less that that of input image.

7.15.1 Same padding over valid padding


Let’s take here a 3 x 3 image and 2 x 2 filter as shown in figure 7.36. The value is
not important for now. When compute the convolution in the image with stride
one, it was visited four section of the image. One in the top-left, one in the top-
right, one in the bottom-left and another in the bottom-right. Here, the center
pixel is visited 4 times and the pixel at the corners are visited only once and
remaining pixel are visited twice. So in valid padding, the pixel that locates at
the center of the image is observed multiple time, but the pixel at the edge are
not give much more importance. This means the information kept in the center
gets more attention than the information on the edges. Hence, if the pixel at
the edge is most important then it will miss the most important content in the
image. So, it is required to make sure that the filter puts equal emphasis across

Figure 7.36: Convolution using valid padding

72
the entire image. To address this problem, put a frame around the image so that
the information appears at the center of this whole picture as in figure 7.37 which
is also known as padding. The padded value is usually zero. So, it is also called
zero padding. Now, the filter scan the image along with the frame but this time
the filter scans or visited each pixel of image equal no. of time. This is called
same padding.

Figure 7.37: Convolution using same padding

73
7.15.2 Convolution Layer follows Transpose Convolution
The Transpose Convolution layer is a learnable layer that takes the input image
and convoluted it using a filter in such a way that the size of the input image
increases. The difference between the transpose convolution and the up sampling
layer is that up sampling has predefined method to up sample the image but the
transpose convolution use the learnable parameter. Here, in the figure 7.38, the

Figure 7.38: Transpose Convolution

value of the filter are learned. But, there is an issue that the center pixel is visited
four times and is influenced by all the pixels while the others are not. This raise an
common issue called checkerboard issue as shown in figure 7.39 This is the main
disadvantage or defects of the transpose convolution layer.

Figure 7.39: checkerboard Effect due to Transpose Convolution


source: https://img-blog.csdnimg.cn/2020010915020417.png

7.16 Pooling Layer


It is similar to convolutional layer. It reduce the spatial size of the convolved
feature. It helps to decrease the computational power required to process the data
through dimensionality reduction. It is used for extracting dominant features.
Types of Pooling:

74
• Max Pooling: It is used to extract the maximum value of the feature
map.It also performs as Noise Suppressant.
• Average Pooling : It is used to extract the average value of the feature
map.It simply performs dimensionality reduction.

Figure 7.41: Pooling layers

7.17 Batch normalization


Batch normalization is a technique designed to automatically standardize the in-
puts to a layer for each mini-batch in a deep learning neural network. Mini-batch
refers to one batch of data supplied for any given subset of the whole training data.

7.17.1 Advantages
a. Once implemented, batch normalization has the effect of dramatically ac-
celerating the training process of a neural network, and improves the per-
formance of the model, in some cases by halving the epochs or better, and
provides some regularization, reducing generalization error.
b. It helps to coordinate the updation of multiple layers in the model.
c. It does this scaling the output of the layer, specifically by standardizing the
activations of each input variable per mini-batch, such as the activations of
a node from the previous layer.

7.17.2 These things are considered while using batch nor-


malization
a. Use With Different Network Types
It can be used with most network types, such as Multilayer Perceptrons,
Convolutional Neural Networks and Recurrent Neural Networks.
b. Probably Use Before the Activation
Batch normalization may be used on the inputs to the layer before or after
the activation function in the previous layer.

75
c. Use Large Learning Rates
This may require the use of much larger than normal learning rates, that in
turn may further speed up the learning process.

7.18 Skip connection in U-net architecture


Skip connections are pretty standard in CNN. Skip connections are used to con-
nect layers in the encoder with corresponding layers in the decoder that have the
same sized feature maps.

7.18.1 Advantages
a. It is useful to get lost information during encoding stage.
b. They largely allow information to flow from earlier layers to later layers.

7.18.2 Implementation
U-net introduces skip connections from encoder to decoder during encoding stage.
During encoding stage, every single block in encoder that is same resolution as its
corresponding block in decoding stage get this extra connection to go concatenate
with the value so that information which might have been compressed too much
can still trickle through to get into some of later layers.
Forward pass: Skip connections allow information flow to decoder.
Backward pass: Skip connections improve gradient flow to encoder.

7.19 Upsampling of an image


Upsampling is the technique to increase the size of the image.

7.19.1 Upsampling by Nearest Neighbour Method


In Nearest Neighbors, an input pixel value is taken and copied it to the K-Nearest
Neighbors where K depends on the expected output. Nearest Neighbor is best
used for categorical data. The values that go into the grid stay exactly the same,
a 2 comes out as a 2 and 99 comes out as 99. The value of of the output cell is
determined by the nearest cell center on the input grid. Nearest Neighbor can be
used on continuous data but the results can be blocky.

76
Figure 7.42: Nearest neighbour upsampling

7.19.2 Upsampling by Bi-linear Interpolation


In Bi-Linear interpolation, it take the 4 nearest pixel value of the input pixel
and perform a weighted average based on the distance of the four nearest cells
smoothing the output. Bilinear interpolation uses a weighted average of the four
nearest cell centers. The closer an input cell center is to the output cell center,
the higher the influence of its value is on the output cell value. This means that
the output value could be different than the nearest input, but is always within
the same range of values as the input. Since the values can change, bilinear is not
recommended for categorical data. Instead, it should be used for continuous data
like elevation and raw slope values.

Figure 7.43: Bi-linear interpolation

77
7.20 Inference Engine
7.20.1 Use case diagram of FPGAN

Figure 7.44: Use case diagram of FPGAN

78
7.20.2 Pre-processing of Cadestral Map to Generate Par-
cel

Figure 7.45: Sequence diagram of Pre-processing of Cadestral Map to Generate


Parcel

79
7.20.3 Step by step Generation

Figure 7.46: Sequence diagram of Parcel to Furnished Generation

80
7.20.4 Furniture Mapping

Figure 7.47: Algorithm for Furniture Mapping

81
7.20.5 Wall Segmentation and 3D Generation

Figure 7.48: Wall Segmentation and 3D Generation

82
7.21 Issue: Footprint area are generated outside
the Parcel area

Figure 7.49: Bordered

Possible Problem
a. The width of the parcel is small
b. There is a boundary line in the parcel image

Test #1: Remove the boundary line in parcel image

Figure 7.50: No bordered

83
Test #2: Increase the width of the parcel to about 3px and
then check the result

Figure 7.51: Parcel Line width 3 Figure 7.52: Parcel Line width 3
case#1 case#2

Test #3: Increase the width of the Parcel to about 5px and
then check the result

Figure 7.53: Parcel Line width 5 Figure 7.54: Parcel Line width 3
case#1 case#2

Result
From the observation, it was concluded that the boundary line in the image must
be removed and also the width of the parcel line must be wide enough. This is
because, training image has wide parcel line. In addition to this, the parcel area
must cover most of the image else it will produce output outside the parcel area.

84
Chapter 8
Expected Outcomes
Firstly, B.1 the splash screen is displayed to welcome and introduce users about
the application and how it works.

B.2 Main menu consists of two options basically to make user to define area struc-
ture and space to start floor plan.

First one is using map provided by malpot to get an accurate location for the
building as well as maintain the appropriate scale to plan footprint.In this case of
manual file upload as in B.3, a fine quality PNG image with appropriate scale is
required.

Then, as for the precise map area selection option is provided as in B.4.

And another one is to freely draw and mark to specify the area eligible to place
building. For this the empty canvas is given to user with access to draw a free
shape to have abstract concept designing at the early stage as in B.5.

Also, after specifying area of interest option to specify the shape of area eligible
for building using vertices marking over it can be given. Now, finally in B.6 user
can define his/her constraints over the floor plan with markings for openings as
well as entrance and shapes of the floor to proceed towards GAN generated output.

In this stage as in B.7, user are given options of GAN generated raster images
which are displayed alog the line bottom of screen from which one can be selected
and proceed to vector image generation as well as the 3D model generation. Then
the final step is to 3D model the floor plan in browser as in B.8 and make user
experience rendered view eligible for exploration.

A deployable web app using simple python flask framework will be made. Until
now, the static screens eligible for populating with generated results from backend
system of GAN model was completed. Despite the completion of frontend parts,
model training for FPGAN is not yet completed.Therefore, Working prototype
was not mentioned for now. it will be focused in next phase.

So, it was decided to train GAN model based on the various combinations of

85
generators and discriminators as mention in 7.6.3
Then, after training each of above models, these models would be compare based
on inception score as well as FID score for each pipeline followed in the project.
i.e the expected model comparision table based in Inception score will be as: And

Table 8.1: Inception score of each model for each pipeline steps

Models
Pipelines 1 2 3 4 5 6
Parcel to Footprint - - - - - -
Footprint to Roomsplit - - - - - -
Roomsplit to Furnished - - - - - -

for FID score comparision, the similar table would be used seperately.

86
Chapter 9
Actual Outcome
9.1 Review of ROBIN datasets

Table 9.1: Different types of footprint shape

S.N Category Footprint shape No of images Remarks


1 A 10 L shape

2 B 10 L shape

3 C 10 L shape

4 D 10 L shape

5 E 10 U shape

87
6 F 10 U shape

7 G 10 U shape

8 H 10 U shape

9 I 10 T shape

10 J 10 L shape

11 K 10 T shape

12 L 10 T shape

88
13 M 10 H shape

14 N 10 I shape

15 O 10 L shape

16 P 10 Z shape

17 Q 10 Z shape

9.2 Accuracy of Automatic Generated Parcel and


Footprint

Table 9.2: Accuracy of Automatic Generated Parcel and Footprint

Total Image Correctly generated Accuracy %


340 188 55.3 %

The error in the generation of parcel and footprint using automatic generation
script is due to following reasons:

a. Parcel was cut during rotation though 30o .


b. Small spaces(alley) whose width is small than structuring element are taken
within the footprint.

89
9.3 Parameter Tuning for Template Matching
for Furnished Datasets

Table 9.3: Template Matching for Furnished Datasets

Template no Threshold Color Code Inc size Dec size


1 to 4 0.67 139,177,200 -10 15
5 to 8 0.65 165,234,35 -5 10
9 0.66 205,131,244 3 3
10 — — — —
11 to 14 0.66 128,144,245 10 0
15 to 18 0.67 156,65,97 -6 3
19 to 22 0.66 115,55,184 2 2
23 to 26 0.63 33,221,235 5 5
27 to 30 0.64 14,28,139 2 2
31 to 34 0.67 38,220,167 2 2
35 to 38 0.663 27,59,232 5 5
39 to 40 0.625 11,52,32 0 20
41 to 44 0.65 165,234,35 -5 10

9.4 Templates for template matching

Figure 9.1: Templates for template matching

90
9.5 Accuracy of furnished datasets

Table 9.4: Accuracy of furnished datasets

Types of furni- No. of Total Total Accu- Accuracy %


ture Objects Mistakes rate Detection
Armchair 16 0 16 100.00%
Bed 249 2 247 99.20%
Coffee table 300 4 296 98.67%
Round table 0 0 0 –
Large sofa 218 4 214 98.17%
Small sofa 186 7 179 96.24%
Sink 130 0 130 100.00%
Twin sink 4 2 2 30.00%
Small sink 4 0 4 100.00%
Large sink 156 0 156 100.00%
Tub 167 1 166 99.40%
Dining table 165 8 157 95.15%
Total 1615 28 1587 Avg % =98.27%

9.6 Inception Score of prepared datasets

Table 9.5: Inception Score of Prepared Datasets

Data Inception score


Average Standard deviation
Parcel 1.6605539 0.09398122
Footprint 1.9369209 0.08145683
Room Split 1.6493725 0.062907636
Furnished 1.6338873 0.062331554
Final 2.092013 0.1116298

9.7 Orientation of the Prepared Datasets


Using the furnished data, the orientation of each plan was generated. This orien-
tation helps to decide the length of walls in each direction. It can also be used to
interpret the weakness of walls along each direction. Some orientation are shown
in D.1

9.8 Footprint of the Footprint Datasets


Using the footprint data, the Footprint of each footprint was generated. This
footprint helps to decide the area of parcel vs angle. It can also be used to interpret
the weakness of footprint along each direction. Some footprint are shown in D.2

91
9.9 Program of the Roomsplit Datasets
Using the roomsplit data, the program of each roomsplit was generated. This
Program helps to decide the percentage of area of footprint covered by each room.
Some program are shown in D.3

9.10 Inception Score of Generated Image using


U-net

Table 9.6: Inception Score of Generated Image using U-net after 200K steps train-
ing

Data Inception score


Average Standard deviation
Footprint 1.6629143 0.15586044
Room Split 1.9552505 0.13677013
Furnished 1.749122 0.07123042

Table 9.7: Inception Score of Generated Image using U-net after 400K steps train-
ing

Data Inception score


Average Standard deviation
Footprint 1.6304417 0.20766288
Room Split 2.063714 0.14368616
Furnished 1.7543595 0.09494884

9.11 Inception Score of Generated Image using


Triple U-net

Table 9.8: Inception Score of Generated Image using Triple U-net

Data Inception score


Average Standard deviation
Footprint 2.4829316 0.30623966
Room Split 1.9215015 0.14979385
Furnished 1.6710441 0.070594065

92
9.12 Comparision of Inception Score and Inter-
pretation

Table 9.9: Inception Score of Generated Image using U-net after 400k steps train-
ing

Data Inception score Inception score Inception score


(Prepared Dataset) (200K) (400K)
Average Standard Average Standard Average Standard
deviation deviation deviation
Footprint 1.9369 0.0814 1.6629 0.1558 1.6304 0.2076
Room Split 1.6493 0.0629 1.9552 0.1367 2.0637 0.1436
Furnished 1.6338 0.0623 1.7491 0.0712 1.7543 0.0949

From the Comparision of the Inception for the generated image using U-net
Model at 200K and 400K steps of training, it was observed that the inception
score for footprint is high for 200k of training and for room split and furnished,
inception score is high for 400k training. Hence, it can be ensure that the image
generated by Model at 400k is more realistic than that produce at 200K for room
split and furnished.

9.13 FID Score of Generated Image using U-net

Table 9.10: FID Score of Generated Image using U-net

Data FID score for U-net Model


200K steps 400K steps
Footprint 177.571 99.148
Room Split 98.833 55.375
Furnished 74.948 65.957

From the Comparision, it was observed that the FID score of 400K steps U-net
Model is less than that of 200K step U-net model for all footprint, room split and
furnished generation. And, also the FID score is lower the better. So, A decision
to use 400K trained U-net model in the inference engine was taken.

Here, FID score is more prioritze than Inception score because, inception score
tells how realistic the generated image is. And, the FID score tells how close the
generated image with required image is. And, the image must be more close to
required image rather than the realistic image. So, prioritze the FID score and
select 400K steps model for footrprint even though the Inception score of 200K
steps is more than that of 400K steps model.

93
9.14 FID score of generated images using triple
U-Net

Table 9.11: FID Score of Generated Image using Triple U-net

Data FID score for Triple U-net Model


Footprint 169.006
Room Split 62.380
Furnished 39.279

94
9.15 Unit Testing of Models
Output of each model with random input images was tested from testing dataset
to check working of individual models. Here, paired images of each test are shown
below where, left image is input image and right one is output image.

9.15.1 U-net Based Models


9.15.1.1 U-net Based Footprint Generation Model

Figure 9.2: Unit testing for U-net based footprint generation model #1

Figure 9.3: Unit testing for U-net based footprint generation model #2

Figure 9.4: Unit testing for U-net based footprint generation model #3

95
9.15.1.2 U-net Based Roomsplit Generation Model

Figure 9.5: Unit testing for U-net based roomsplit generation model #1

Figure 9.6: Unit testing for U-net based roomsplit generation model #2

Figure 9.7: Unit testing for U-net based roomsplit generation model #3

96
9.15.1.3 U-net Based Furnished Generation Model

Figure 9.8: Unit testing for U-net based furnished generation model #1

Figure 9.9: Unit testing for U-net based furnished generation model #2

Figure 9.10: Unit testing for U-net based furnished generation model #3

97
9.15.2 Triple U-net Based Models
9.15.2.1 Triple U-net Based Footprint Generation Model

Figure 9.11: Unit testing for Triple U-net based footprint generation model #1

Figure 9.12: Unit testing for Triple U-net based footprint generation model #2

Figure 9.13: Unit testing for Triple U-net based footprint generation model #3

98
9.15.2.2 Triple U-net Based Roomsplit Generation Model

Figure 9.14: Unit testing for Triple U-net based roomsplit generation model #1

Figure 9.15: Unit testing for Triple U-net based roomsplit generation model #2

Figure 9.16: Unit testing for Triple U-net based roomsplit generation model #3

99
9.15.2.3 Triple U-net Based Furnished Generation Model

Figure 9.17: Unit testing for Triple U-net based furnished generation model #1

Figure 9.18: Unit testing for Triple U-net based furnished generation model #2

Figure 9.19: Unit testing for Triple U-net based furnished generation model #3

100
9.16 Integration Testing of Models
The output of full workflow was tested in integrated inference engine in flask.

9.16.1 U-net Based Models


Test Case #1

Figure 9.20: Integration testing for U-net based footprint generation model #1

Figure 9.21: Integration testing for U-net based roomsplit generation model #1

Figure 9.22: Integration testing for U-net based furnished generation model #1

101
Test Case #2

Figure 9.23: Integration testing for U-net based footprint generation model #2

Figure 9.24: Integration testing for U-net based roomsplit generation model #2

Figure 9.25: Integration testing for U-net based furnished generation model #2

102
Test Case #3

Figure 9.26: Integration testing for U-net based footprint generation model #3

Figure 9.27: Integration testing for U-net based roomsplit generation model #3

Figure 9.28: Integration testing for U-net based furnished generation model #3

103
9.16.2 Triple U-net Based Models
Test Case #1

Figure 9.29: Integration testing for Triple U-net based footprint generation model
#1

Figure 9.30: Integration testing for Triple U-net based roomsplit generation model
#1

Figure 9.31: Integration testing for Triple U-net based furnished generation model
#1

104
Test Case #2

Figure 9.32: Integration testing for Triple U-net based footprint generation model
#2

Figure 9.33: Integration testing for Triple U-net based roomsplit generation model
#2

Figure 9.34: Integration testing for Triple U-net based furnished generation model
#2

105
Test Case #3

Figure 9.35: Integration testing for Triple U-net based footprint generation model
#3

Figure 9.36: Integration testing for Triple U-net based roomsplit generation model
#3

Figure 9.37: Integration testing for Triple U-net based furnished generation model
#3

106
9.17 Furniture Mapping
The furnitures being represented by the colors are mapped to their corresponding
furniture icons. Three sample outputs of furniture mapping is shown below:

Figure 9.38: Furniture mapping#1

Figure 9.39: Furniture mapping#2

Figure 9.40: Furniture mapping#3

107
9.18 Wall Segmentation
To get 3D image of the generated floor plan, wall segments of the roomsplit was
required for which Canny edge detection was used and the walls of the room were
detected which were then used to generate 3D images.

Figure 9.41: Wall segmentation of generated roomsplit #1

Figure 9.42: Wall segmentation of generated roomsplit #2

Figure 9.43: Wall segmentation of generated roomsplit #3

108
9.19 3D Generation
With the help of segmented images got from Canny edge detection algorithm, a
3D image of the floor plan was generated and displayed to the users.

Figure 9.44: Wall segmentation of generated roomsplit #1

Figure 9.45: Wall segmentation of generated roomsplit #2

Figure 9.46: Wall segmentation of generated roomsplit #3

109
Chapter 10
Conclusion and Future Enhance-
ments
Conclusively, for the project Floor Plan Generation using Generative Adversarial
Network is completed as it was successfully deployed the project as a desktop
application for generating floor plan generation using U-net and Triple U-net ar-
chitecture.

As per future enhancements following can be considered:

• Use of StackGAN to stabilize the training of the conditional-GAN

• To include of other metrics available for better results

• Better training datasets can be used to train model to get better inference
output

110
Bibliography
[1] S. Chaillou, “Expliquer.” 2019. [Online]. Available: http://stanislaschaillou.
com/expliquer/
[2] N. Nauata, K.-H. Chang, C.-Y. Cheng, G. Mori, and Y. Furukawa, “House-
gan: Relational generative adversarial networks for graph-constrained house
layout generation,” in European Conference on Computer Vision. Springer,
2020, pp. 162–177.
[3] Q. Chen, Q. Wu, R. Tang, Y. Wang, S. Wang, and M. Tan, “Intelligent
home 3d: Automatic 3d-house design from linguistic descriptions only,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2020, pp. 12 625–12 634.
[4] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on Medical
image computing and computer-assisted intervention. Springer, 2015, pp.
234–241.
[5] D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,
“Doubleu-net: A deep convolutional neural network for medical image seg-
mentation,” in 2020 IEEE 33rd International symposium on computer-based
medical systems (CBMS). IEEE, 2020, pp. 558–564.
[6] E. Schonfeld, B. Schiele, and A. Khoreva, “A u-net based discriminator for
generative adversarial networks,” in Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, 2020, pp. 8207–8216.
[7] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
with conditional adversarial networks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2017, pp. 1125–1134.
[8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
translation using cycle-consistent adversarial networks,” in Proceedings of the
IEEE international conference on computer vision, 2017, pp. 2223–2232.
[9] H. Yong, J. Huang, D. Meng, X. Hua, and L. Zhang, “Momentum batch nor-
malization for deep learning with small batch size,” in European Conference
on Computer Vision. Springer, 2020, pp. 224–240.
[10] M. Vidanapathirana, Q. Wu, Y. Furukawa, A. X. Chang, and M. Savva,
“Plan2scene: Converting floorplans to 3d scenes,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021,
pp. 10 733–10 742.
[11] D. Shin, C. C. Fowlkes, and D. Hoiem, “Pixels, voxels, and views: A study
of shape representations for single view 3d object shape prediction,” in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition,
2018, pp. 3061–3069.

111
[12] C. Liu, J. Wu, P. Kohli, and Y. Furukawa, “Raster-to-vector: Revisiting floor-
plan transformation,” in Proceedings of the IEEE International Conference
on Computer Vision, 2017, pp. 2195–2203.

[13] S. Goyal, S. Bhavsar, S. Patel, C. Chattopadhyay, and G. Bhatnagar, “Suga-


man: describing floor plans for visually impaired by annotation learning and
proximity-based grammar,” IET Image Processing, vol. 13, no. 13, pp. 2623–
2635, 2019.

[14] J. Wu, C. Zhang, T. Xue, W. T. Freeman, and J. B. Tenenbaum, “Learning


a probabilistic latent space of object shapes via 3d generative-adversarial
modeling,” in Proceedings of the 30th International Conference on Neural
Information Processing Systems, 2016, pp. 82–90.

[15] J. Liu, F. Yu, and T. Funkhouser, “Interactive 3d modeling with a generative


adversarial network,” in 2017 International Conference on 3D Vision (3DV).
IEEE, 2017, pp. 126–134.

[16] J. Wu, C. Zhang, X. Zhang, Z. Zhang, W. T. Freeman, and J. B. Tenenbaum,


“Learning shape priors for single-view 3d completion and reconstruction,” in
Proceedings of the European Conference on Computer Vision (ECCV), 2018,
pp. 646–662.

[17] P. Green-Armytage, “A colour alphabet and the limits of colour coding,”


JAIC-Journal of the International Colour Association, vol. 5, 2010.

[18] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for


semantic segmentation,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2015, pp. 3431–3440.

[19] P. Luc, C. Couprie, S. Chintala, and J. Verbeek, “Semantic segmentation


using adversarial networks,” arXiv preprint arXiv:1611.08408, 2016.

[20] S. Dodge, J. Xu, and B. Stenger, “Parsing floor plan images,” in 2017 Fif-
teenth IAPR international conference on machine vision applications (MVA).
IEEE, 2017, pp. 358–361.

[21] J. T. Kajiya, “The rendering equation,” in Proceedings of the 13th annual


conference on Computer graphics and interactive techniques, 1986, pp. 143–
150.

[22] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and


X. Chen, “Improved techniques for training gans,” Advances in neural in-
formation processing systems, vol. 29, pp. 2234–2242, 2016.

[23] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “Gans


trained by a two time-scale update rule converge to a local nash equilibrium,”
Advances in neural information processing systems, vol. 30, 2017.

[24] “Clickup,” Sep 2021. [Online]. Available: https://sharing.clickup.com/g/h/


38n7b-67/aa4b651efda6ef5

112
[25] “Google sheet,” mar 2022. [Online]. Available: https://docs.google.com/
spreadsheets/d/1y5eFKPdgFDfsafWFMCaDbp96Zv5SKaLcZQK6Et0ZF60/
edit?usp=sharing

[26] A. C. Chattopadhyay, “Robin dataset,” 2021. [Online]. Available:


https://github.com/gesstalt/ROBIN

[27] L.-P. de las Heras, O. R. Terrades, S. Robles, and G. Sánchez, “Cvc-fp and
sgt: a new database for structural floor plan analysis and its groundtruthing
tool,” International Journal on Document Analysis and Recognition (IJDAR),
vol. 18, no. 1, pp. 15–30, 2015.

[28] P. Centore, “srgb centroids for the iscc-nbs colour system,” Munsell Colour
Sci. Painters, 2016.

[29] 2022. [Online]. Available: https://raw.githubusercontent.com/banmala/


fpgan-dataset/master/footprint generation Dataset 3rooms.tar.gz

[30] 2022. [Online]. Available: https://raw.githubusercontent.com/banmala/


fpgan-dataset/master/roomsplit generation Dataset 3rooms.tar.gz

[31] 2022. [Online]. Available: https://raw.githubusercontent.com/banmala/


fpgan-dataset/master/furnished generation Dataset 3rooms.tar.gz

113
Appendix

A Mockup Demonstration
The demo Prototype of the mockup design in figma is showcased in: Prototype
expected outcome.

B Expected Outcome Screenshots


B.1 Splash Screen

114
B.2 Main Menu

B.3 Manual image file with appropriate scale

115
B.4 Manual Map upload from Malpot and area marking

B.5 Free drawing shape for concept design

116
B.6 Constraint given to plan

B.7 Choose and proceed GAN design

117
B.8 Generate 3D

C GAN with U-net Generator and U-net Dis-


criminator

118
D Qualification Metrices
D.1 Orientation of the Prepared Datasets

119
D.2 Footprint of the Prepared footprint Datasets

120
D.3 Program of the Prepared roomsplit Datasets

121
E Actual OutCome Screenshots
E.1 Get Started Page

E.2 Upload Cadastral Map Page

122
E.3 After Uploading Cadastral Map

E.4 Preprocessing Cadastral Map for Parcel

123
E.5 Display Parcel and Choosing Model

E.6 Parcel to Footprint Generation

124
E.7 Footprint to Roomsplit Generation

E.8 Roomsplit to Furnished Generation

125
E.9 Furniture Mapping

E.10 Wall Segmentation for 3D Generation

126
E.11 3D Plan

127
E.12 Complete Result

128

You might also like