Floor Plan Generation Using GAN
Floor Plan Generation Using GAN
Floor Plan Generation Using GAN
INSTITUTE OF ENGINEERING
A FINAL REPORT ON
FLOOR PLAN GENERATION USING GAN
Submitted by
Anusha Bajracharya KCE074BCT011
Luja Shakya KCE074BCT022
Niranjan Bekoju KCE074BCT025
Sunil Banmala KCE074BCT045
This is to certify that this major project work entitled “Floor Plan Genera-
tion using GAN” submitted by Anusha Bajracharya (KCE074BCT011), Luja
Shakya (KCE074BCT022), Niranjan Bekoju (KCE074BCT025) and Sunil Ban-
mala (KCE074BCT045) has been examined and accepted as the partial fulfillment
of the requirements for degree of Bachelor in Computer Engineering.
.......................................... ..........................................
Er. Anil Verma Er. Aayush Adhikari
External Examiner Project Supervisor
Assistant Professor CTO
Dept. of Electronics and Computer Deepmind Creations
IOE, Pulchowk
..........................................
Er. Dinesh Gothe
Head of Department,
Department of Computer Engineering
Khwopa College of Engineering
i
Copyright
The author has agreed that the library, Khwopa College of Engineering may make
this report freely available for inspection. Moreover, the author has agreed that
permission for the extensive copying of this project report for scholarly purpose
may be granted by supervisor who supervised the project work recorded here in
or, in absence the Head of The Department where in the project report was done.
It is understood that the recognition will be given to the author of the report and
to Department of Computer Engineering, KhCE in any use of the material of this
project report. Copying or publication or other use of this report for financial gain
without approval of the department and author’s written permission is prohibited.
Request for the permission to copy or to make any other use of material in this
report in whole or in part should be addressed to:
Head of Department
Department of Computer Engineering
Khwopa College of Engineering
Liwali,
Bhaktapur, Nepal
ii
Acknowledgement
We take this opportunity to express our deepest and sincere gratitude to our HoD
Er.Dinesh Gothe, for his insightful advice, motivating suggestions for this project
and also for his constant encouragement and advice throughout our Bachelor’s
program.
Also, we would like to thank Er. Bindu Bhandari for providing valuable sug-
gestions and for supporting the project.
iii
Abstract
Whenever landowner wants to build a house, he needs to prepare design (floor
plan) of the house. He needs to decide where the main entrace, opening will be,
how is he going to split the room, what portion of buildings will be seperated for
bedroom, kitchen, bathroom etc. These are general questions that hits in mind.
In order to solve these queries, he consult an architect. Architect would use dif-
ferent planning tools to generate the plan of the building. Initially, it would be
difficult for an architect to make plan. So, Floor Plan Generation using GAN that
would produce conceptual floor plan that best suits parcel of the land to provide
a vision that can help architects was introduced. Architects would be able to
choose among generated plan and then modify accordingly. This method would
be relatively easier than directly generating plan from scratch. Moreover, to gen-
erate the plan, the system will get parcel of the land from architect, mapped it to
footprint, room split and finally furnished room. The system will use conditional
GAN for generation. It will also generate the 3D model of generated floor plan.
Here, datasets for training with 55.3% accuracy for parcel and footprint and done
manually for remaining. Similarly 98.27% accurately prepared furnished datasets
using template matching and parameter tuning was prepared. The gan model
image was generated with 1.6629 ± 0.1558 Inception Score for footprint, 2.0637 ±
0.1436 Inception Score for roomsplit, 1.7543 ± 0.0949 Inception Score for room-
split. And the corresponding FID score are 99.148, 55.375, 65.957 respectively.
iv
Contents
Copyright . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv
List of Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction 1
1.1 Background Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Goals and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Scope and Applications . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Literature Review 3
2.1 AI + Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 House-GAN: Relational Generative Adversarial Networks for Graph-
constrained House Layout Generation . . . . . . . . . . . . . . . . . 3
2.3 Intelligent Home 3D: Automatic 3D-House Design from Linguistic
Descriptions Only . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 U-Net: Convolutional Networks for Biomedical Image Segmentation 4
2.5 Double U-Net: A Deep Convolutional Neural Network for Medical
Image Segmentatio . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.6 A U-Net Based Discriminator for Generative Adversarial Networks . 5
2.7 Image-to-Image Translation with Conditional Adversarial Networks 5
2.8 Unpaired Image-to-Image Translation using Cycle-Consistent Ad-
versarial Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.9 Momentum Batch Normalization for Deep Learning with Small
Batch Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.10 Plan2Scene: Converting Floorplans to 3D Scenes . . . . . . . . . . 6
2.11 Pixels, voxels, and views: A study of shape representations for
single view 3D object shape prediction . . . . . . . . . . . . . . . . 6
2.12 Raster-to-Vector: Revisiting Floorplan Transformation . . . . . . . 7
2.13 SUGAMAN: Describing Floor Plans for Visually Impaired by An-
notation Learning and Proximity based Grammar . . . . . . . . . . 7
2.14 Learning a Probabilistic Latent Space of Object Shapes via 3D
Generative-Adversarial Modeling . . . . . . . . . . . . . . . . . . . 7
2.15 Interactive 3D Modeling with a Generative Adversarial Network . . 8
2.16 Learning Shape Priors for Single-View 3D Completion and Recon-
struction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.17 A Colour Alphabet and the Limits of Colour Coding . . . . . . . . 8
2.18 Fully Convolutional Networks for Semantic Segmentation . . . . . . 9
2.19 Semantic Segmentation using Adversarial Networks . . . . . . . . . 9
v
2.20 Parsing Floor Plan Images . . . . . . . . . . . . . . . . . . . . . . . 9
2.21 The Rendering Equation . . . . . . . . . . . . . . . . . . . . . . . . 10
2.22 Improved Techniques for Training GANs . . . . . . . . . . . . . . . 10
2.23 GANs Trained by a Two Time-Scale Update Rule Converge to a
Local Nash Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . 10
3 Requirement Analysis 11
3.1 Software Requirement . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Hardware Requirement . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.3 Functional Requirement . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Non-Functional Requirement . . . . . . . . . . . . . . . . . . . . . . 11
3.4.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.2 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.3 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.4.4 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4 Feasibility Study 13
4.1 Technical Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Operational Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.3 Economic Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.4 Time Feasibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5 Methodology 14
5.1 Agile methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1.1 Scrum Framework . . . . . . . . . . . . . . . . . . . . . . . . 14
5.2 Workload by Project Members . . . . . . . . . . . . . . . . . . . . . 17
vi
6.9 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10 Morphological Operation . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10.1 Erosion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.10.2 Dilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.3 Opening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.4 Closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.10.5 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.11 Canny Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . 39
7 Experiments 40
7.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.2 Color Coding to Dataset . . . . . . . . . . . . . . . . . . . . . . . . 40
7.3 Dataset Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7.3.1 Augmented Dataset Creation . . . . . . . . . . . . . . . . . 42
7.3.2 Parcel and Footprint Generation . . . . . . . . . . . . . . . . 43
7.3.3 Algorithm for furnished generation using template matching 46
7.4 Dataset Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4.1 Footprint Qualify . . . . . . . . . . . . . . . . . . . . . . . . 47
7.4.2 Program Qualify . . . . . . . . . . . . . . . . . . . . . . . . 48
7.4.3 Orientation Qualify . . . . . . . . . . . . . . . . . . . . . . . 49
7.5 Parcel Generation from Cadestrial . . . . . . . . . . . . . . . . . . . 49
7.6 GAN Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.1 Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.2 Discriminator . . . . . . . . . . . . . . . . . . . . . . . . . . 51
7.6.3 GAN Architectures . . . . . . . . . . . . . . . . . . . . . . . 52
7.7 Condition for the Training . . . . . . . . . . . . . . . . . . . . . . . 52
7.8 Model Comparision between different types of architecture . . . . . 52
7.9 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.9.1 Loading the dataset and preparing for training . . . . . . . . 52
7.9.2 Generator Model Architecture U-net . . . . . . . . . . . . . 54
7.9.3 Generator Model Architecture U-net summary . . . . . . . . 55
7.9.4 Defining the generator loss . . . . . . . . . . . . . . . . . . . 56
7.9.5 Training Procedure for the generator . . . . . . . . . . . . . 57
7.9.6 Discriminator Model Architecture . . . . . . . . . . . . . . . 58
7.9.7 Discriminator Model Summary . . . . . . . . . . . . . . . . 59
7.9.8 Defining the Discriminator loss . . . . . . . . . . . . . . . . 60
7.9.9 Training procedure for Discriminator . . . . . . . . . . . . . 61
7.9.10 Generator Model Architecture for Triple-U-net brief . . . . . 62
7.10 Generator Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.10.1 Inception Score . . . . . . . . . . . . . . . . . . . . . . . . . 62
7.10.2 FID Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.11 Activation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
7.11.1 Sigmoid Activation Function . . . . . . . . . . . . . . . . . . 63
7.11.2 ReLU Activation Function . . . . . . . . . . . . . . . . . . . 64
7.11.3 Leaky ReLU Activation Function . . . . . . . . . . . . . . . 64
7.11.4 Softmax Activation Function . . . . . . . . . . . . . . . . . . 64
7.11.5 Tanh Activation Function . . . . . . . . . . . . . . . . . . . 65
7.12 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
vii
7.12.1 Binary Cross Entropy . . . . . . . . . . . . . . . . . . . . . 65
7.12.2 Categorical Cross Entropy . . . . . . . . . . . . . . . . . . . 65
7.12.3 Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . 66
7.12.4 Mean Absolute Error . . . . . . . . . . . . . . . . . . . . . . 66
7.13 Optimization Function . . . . . . . . . . . . . . . . . . . . . . . . . 67
7.13.1 Bracketing Algorithm . . . . . . . . . . . . . . . . . . . . . . 67
7.13.2 Local Descent Algorithm . . . . . . . . . . . . . . . . . . . . 67
7.13.3 First Order Algorithm . . . . . . . . . . . . . . . . . . . . . 67
7.13.4 Second Order Algorithm . . . . . . . . . . . . . . . . . . . . 69
7.14 3D Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.15 Convolution Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.15.1 Same padding over valid padding . . . . . . . . . . . . . . . 72
7.15.2 Convolution Layer follows Transpose Convolution . . . . . . 74
7.16 Pooling Layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.17 Batch normalization . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.17.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
7.17.2 These things are considered while using batch normalization 75
7.18 Skip connection in U-net architecture . . . . . . . . . . . . . . . . . 76
7.18.1 Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.18.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.19 Upsampling of an image . . . . . . . . . . . . . . . . . . . . . . . . 76
7.19.1 Upsampling by Nearest Neighbour Method . . . . . . . . . . 76
7.19.2 Upsampling by Bi-linear Interpolation . . . . . . . . . . . . 77
7.20 Inference Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.20.1 Use case diagram of FPGAN . . . . . . . . . . . . . . . . . . 78
7.20.2 Pre-processing of Cadestral Map to Generate Parcel . . . . . 79
7.20.3 Step by step Generation . . . . . . . . . . . . . . . . . . . . 80
7.20.4 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . 81
7.20.5 Wall Segmentation and 3D Generation . . . . . . . . . . . . 82
7.21 Issue: Footprint area are generated outside the Parcel area . . . . . 83
8 Expected Outcomes 85
9 Actual Outcome 87
9.1 Review of ROBIN datasets . . . . . . . . . . . . . . . . . . . . . . . 87
9.2 Accuracy of Automatic Generated Parcel and Footprint . . . . . . . 89
9.3 Parameter Tuning for Template Matching for Furnished Datasets . 90
9.4 Templates for template matching . . . . . . . . . . . . . . . . . . . 90
9.5 Accuracy of furnished datasets . . . . . . . . . . . . . . . . . . . . . 91
9.6 Inception Score of prepared datasets . . . . . . . . . . . . . . . . . 91
9.7 Orientation of the Prepared Datasets . . . . . . . . . . . . . . . . . 91
9.8 Footprint of the Footprint Datasets . . . . . . . . . . . . . . . . . . 91
9.9 Program of the Roomsplit Datasets . . . . . . . . . . . . . . . . . . 92
9.10 Inception Score of Generated Image using U-net . . . . . . . . . . . 92
9.11 Inception Score of Generated Image using Triple U-net . . . . . . . 92
9.12 Comparision of Inception Score and Interpretation . . . . . . . . . . 93
9.13 FID Score of Generated Image using U-net . . . . . . . . . . . . . . 93
9.14 FID score of generated images using triple U-Net . . . . . . . . . . 94
viii
9.15 Unit Testing of Models . . . . . . . . . . . . . . . . . . . . . . . . . 95
9.15.1 U-net Based Models . . . . . . . . . . . . . . . . . . . . . . 95
9.15.2 Triple U-net Based Models . . . . . . . . . . . . . . . . . . . 98
9.16 Integration Testing of Models . . . . . . . . . . . . . . . . . . . . . 101
9.16.1 U-net Based Models . . . . . . . . . . . . . . . . . . . . . . 101
9.16.2 Triple U-net Based Models . . . . . . . . . . . . . . . . . . . 104
9.17 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.18 Wall Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.19 3D Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Bibliography 113
Appendix 114
A Mockup Demonstration . . . . . . . . . . . . . . . . . . . . . . . . . 114
B Expected Outcome Screenshots . . . . . . . . . . . . . . . . . . . . 114
B.1 Splash Screen . . . . . . . . . . . . . . . . . . . . . . . . . . 114
B.2 Main Menu . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
B.3 Manual image file with appropriate scale . . . . . . . . . . . 115
B.4 Manual Map upload from Malpot and area marking . . . . . 116
B.5 Free drawing shape for concept design . . . . . . . . . . . . 116
B.6 Constraint given to plan . . . . . . . . . . . . . . . . . . . . 117
B.7 Choose and proceed GAN design . . . . . . . . . . . . . . . 117
B.8 Generate 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
C GAN with U-net Generator and U-net Discriminator . . . . . . . . 118
D Qualification Metrices . . . . . . . . . . . . . . . . . . . . . . . . . 119
D.1 Orientation of the Prepared Datasets . . . . . . . . . . . . . 119
D.2 Footprint of the Prepared footprint Datasets . . . . . . . . . 120
D.3 Program of the Prepared roomsplit Datasets . . . . . . . . . 121
E Actual OutCome Screenshots . . . . . . . . . . . . . . . . . . . . . 122
E.1 Get Started Page . . . . . . . . . . . . . . . . . . . . . . . . 122
E.2 Upload Cadastral Map Page . . . . . . . . . . . . . . . . . . 122
E.3 After Uploading Cadastral Map . . . . . . . . . . . . . . . . 123
E.4 Preprocessing Cadastral Map for Parcel . . . . . . . . . . . . 123
E.5 Display Parcel and Choosing Model . . . . . . . . . . . . . . 124
E.6 Parcel to Footprint Generation . . . . . . . . . . . . . . . . 124
E.7 Footprint to Roomsplit Generation . . . . . . . . . . . . . . 125
E.8 Roomsplit to Furnished Generation . . . . . . . . . . . . . . 125
E.9 Furniture Mapping . . . . . . . . . . . . . . . . . . . . . . . 126
E.10 Wall Segmentation for 3D Generation . . . . . . . . . . . . . 126
E.11 3D Plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
E.12 Complete Result . . . . . . . . . . . . . . . . . . . . . . . . 128
ix
List of Figures
5.1 Agile methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Scrum framework . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.5 Workload of the project by project members . . . . . . . . . . . . . 17
x
6.41 Hysteresis applied to the previous image. . . . . . . . . . . . . . . . 39
xi
7.45 Sequence diagram of Pre-processing of Cadestral Map to Generate
Parcel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.46 Sequence diagram of Parcel to Furnished Generation . . . . . . . . 80
7.47 Algorithm for Furniture Mapping . . . . . . . . . . . . . . . . . . . 81
7.48 Wall Segmentation and 3D Generation . . . . . . . . . . . . . . . . 82
7.49 Bordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.50 No bordered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.51 Parcel Line width 3 case#1 . . . . . . . . . . . . . . . . . . . . . . 84
7.52 Parcel Line width 3 case#2 . . . . . . . . . . . . . . . . . . . . . . 84
7.53 Parcel Line width 5 case#1 . . . . . . . . . . . . . . . . . . . . . . 84
7.54 Parcel Line width 3 case#2 . . . . . . . . . . . . . . . . . . . . . . 84
xii
9.33 Integration testing for Triple U-net based roomsplit generation model
#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.34 Integration testing for Triple U-net based furnished generation model
#2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.35 Integration testing for Triple U-net based footprint generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.36 Integration testing for Triple U-net based roomsplit generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.37 Integration testing for Triple U-net based furnished generation model
#3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
9.38 Furniture mapping#1 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.39 Furniture mapping#2 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.40 Furniture mapping#3 . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.41 Wall segmentation of generated roomsplit #1 . . . . . . . . . . . . 108
9.42 Wall segmentation of generated roomsplit #2 . . . . . . . . . . . . 108
9.43 Wall segmentation of generated roomsplit #3 . . . . . . . . . . . . 108
9.44 Wall segmentation of generated roomsplit #1 . . . . . . . . . . . . 109
9.45 Wall segmentation of generated roomsplit #2 . . . . . . . . . . . . 109
9.46 Wall segmentation of generated roomsplit #3 . . . . . . . . . . . . 109
xiii
List of Tables
5.1 Sprint Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . 16
xiv
List of Abbreviation
Abbreviations Meaning
AI Artificial Intilligence
ASPP Atrous Spatial Pyramid Pooling
cGAN Conditional Generative Adversarial Network
CV Computer Vision
CVC-FP Computer Center Vision - Floor Plan
DNN Deep Neural Network
FID Frechet Inception Distance
FCN Fully Convolutional Network
GAN Generative Adversial Network
GC-LPN Graph Conditioned - Layout Prediction Network
ISCC-NBS Inter-Society Color Council-National Bureau of Standards
LCT-GAN Language Conditioned Textures - Generative Adversarial Network
LOFD Local Orientation and Frequency Description
OCR Optical Character Recognition
R-FP Request for Proposal
ROBIN Repository Of BuildIng plaNs
SUGAMAN Supervised and Unified framework using Grammar and
Annotation Model for Access and Navigation
VAE-GAN Variational Autoencoder- Generative Adversarial Network
VCN Volumetric Convolutional Network
VGG net Visual Geometry Group Network
xv
Chapter 1
Introduction
1.1 Background Introduction
During the early phases of conceptual design in architecture, designers often look
for past references in collections of printed or digitally created floor plans in order
to stimulate creativity, inspiration and assess the building design. The design of
sustainable architecture is also the concern for many architects. Some designers
are concerned about architecture hurdles that can occur using traditional ap-
proach.Given that the point of AI is to create machines or programs capable of
self-direction and learning, this concern is logical.
However, most experts agree AI has the potential to make architecture easier,
more efficient, and even more secure. The most obvious way to support these
phases with computer-aided means is to provide a retrieval method (e.g., in the
form of a specific software solution) that is able to find similar references in a
collection of previously created building designs.
1.2 Motivation
Apartment layout is a challenging yet fundamental task for any architect. Know-
ing how to place rooms, decide their size, find the relevant adjacencies among
them, while defining relevant typologies are key concerns that any drafter takes
into account while designing floor plans. While creating new designs, architects
usually go through past designs and the data prepared throughout the making of
the building. Similarly, making building calculations and environmental analysis is
not a simple task if done manually. This leads to wastage of time,money and effort.
So,our approach is instead of investing a lot of time and energy to create something
new, making a computer able to analyze the data in a short time period which
will give recommendations accordingly. With this, an architect will be able to do
testing and research simultaneously and sometimes even without pen and paper.
1
1.3 Problem Definition
Constructing a building is not a one-day task as it needs a lot of pre-planning.
However, this pre-planning is not enough sometimes, and a little bit of more effort
to get an architect’s opinion to life is required.The countless hours of research
at the starting of any new project architectural design is time consuming.About
seven percent of the world’s labor force is in the construction industry, yet it has
traditionally been one of the least technologically advanced industries. From this
it can be conclude that even by using autonomous or semi-autonomous construc-
tion machinery to help with excavation and prep work for building architectural
design is not enough for sustainable development in architectural field.
Computers excel at solving problems with clear answers; crunching data and do-
ing repetitive tasks, which frees up time for humans to be creative and work on
more open-ended problems—and there’s no shortage of those in architecture de-
sign. AI can lead to the organizations or the clients to revert to computers for
masterplans and construction. AI explore better building efficiency and even walk
clients through a structure before it’s built.
2
Chapter 2
Literature Review
2.1 AI + Architecture
In this thesis [1], Stanislas Chaillou offered promising results to generate, qual-
ify and allow users to browse through generated floor plans design options. For
qualifying floor plans, he used six metrics proposing a framework that captures
architecturally relevant parameters of floor plans. On one hand, footprint, orien-
tation, thickness & texture are three metrics capturing the essence of a given floor
plan’s style. On the other hand, program, connectivity, and circulation are meant
to depict the essence of any floor plan organization. For generation of architecture
floor plan, he proposed three steps pipeline which are:
Each step has been carefully engineered and trained with pix2pix GAN. An ex-
tensive database of Boston’s building footprints having dataset of around 700+
annotated floor plans, training a broad array of models was successful. To further
refine the output quality, an array of additional models, for each room type (living
room, bedroom, kitchen, etc. . . ) was trained using a color code for each furniture
type.
Then, generates a diverse set of realistic and compatible house layouts as output.
3
2.3 Intelligent Home 3D: Automatic 3D-House
Design from Linguistic Descriptions Only
Architect design homes by collecting a list of requirements and then generate the
layout of the house using trial and error approach. It takes a lot of time, so for time
saving and allowing people without expertise to participate in the design, Chen
Qi, et.al proposed this new model [3] which consists of following components:
4
2.6 A U-Net Based Discriminator for Genera-
tive Adversarial Networks
One of the major challenges to GAN is the capacity to synthesize globally and
locally coherent images with object shapes and textures indistinguishable from
real images. So, to overcome that U-net based discriminator architecture [6] was
introduced by Edgar Schonfeld, et.al where discriminator outputs global and lo-
cal decision of the image belonging to either real or fake else. This discriminator
architecture is created using an encoder decoder network aka U-Net. Here, the
encoder downsamples the image and then capture global image context. Consecu-
tively, decoder upsample and produce matching output resolution along with skip
connections.
5
2.9 Momentum Batch Normalization for Deep
Learning with Small Batch Size
Batch normalization [9] is a method used to make artificial neural networks faster
and more stable through normalization of the layers’ inputs by re-centering and
re-scaling. Hongwei Yong, et.al proposed a method basically for making scaled
and normalized input for the activation function to work in right way using batch
normalization. It is well-known that normalizing the input data makes training
faster so, batch normalization normalize data for training which can improve the
effectiveness and efficiency in optimizing various deep networks. As proposed,
momentum batch normalization uses the moving average of sample mean and
variance in a mini-batch for training which ensures proper training inside the
DNN.
6
The paper shows that surface-based methods outperform voxel representations for
objects from novel classes and produce higher resolution outputs.
a. 3D model popup
A neural architecture first converts a floorplan image into a junction layer,
where data is represented by a set of junctions. Then integer programming
is formulated to aggregate junctions into a set of simple primitives (e.g wall
lines, door lines, or icon boxes) to produce a vectorized floorplan.
b. Interactive editing
Vector-graphics representation allows direct floorplan manipulation by hand
for error correcction or modeling. Here, demonstrated common editing op-
erations includes removing wall/door, moving multiple walls, etc.
7
recent advances in VCN and GAN.This 3D generation model makes use of IKEA
dataset to generate novel objects and reconstruct 3D objects from images.The
discriminator in GAN, learned without supervision, whcih can be used as an in-
formative feature representation for 3D objects, achieving impressive performance
on shape classification. It requires training of VAE-GAN to capture mapping from
2D to 3D image for high quality 3D objects.
8
information. This paper focus on ways to determine the maximum number of
different colours that can be used in a colour code without risk of confusion. In
response to requests for sets of colours that would be as different from each other
as possible for purposes of colour coding, a benchmark proposed by Kenneth Kelly
which is a sequence of colours from which it would be possible to select up to 22
colours of maximum contrast is also discussed in this paper.
9
2.21 The Rendering Equation
Kajiya, et.al [21] presents a integral equation that generalized all variety of known
rendering equation. Author discuss about Monte Carlo Solution and also presents
a new form of variance reduction and discuss about a range of optical phenomenon
which can be effectively simulated. It presents a good equation that is well suited
for computer graphics.
10
Chapter 3
Requirement Analysis
3.1 Software Requirement
Software requirement for the prepared system includes:
1. Python
2. Star UML
3. Visual Studio Code
4. Google Colaboratory
5. CUDA
6. Git
7. Slack
8. Texmaker
9. Beamer
10. Clickup
11. Microsoft Team
12. Google Drive
13. Google Calender
14. Docker
11
3.4.1 Reliability
The system should be reliable. The system should consider all necessary rooms
and furnitures that must be present in a normal house.
3.4.2 Maintainability
A maintainable system is required.
3.4.3 Performance
The system should prepare designs quickly as possible in implementation phase.
3.4.4 Accuracy
The system must accurately estimate the total cost of implementing the floor plan.
And,the dimentions of every corners must be accurate too.
12
Chapter 4
Feasibility Study
4.1 Technical Feasibility
All the required hardware are available i.e computer with good GPU and 32GB
RAM and some test programmes were run in Google Colaboratory. So there was
no problem with hardware devices.
Python was used for development of the system, and all the team members are
familiar in the language. All other utility softwares being used are all free of cost
and easy to use. So there was no problem with language and softwares.
As for datasets to train the GAN model, some datasets from internet was collected
and filtered and processed them as per need. Required datasets was prepared. So,
there was no problem with datasets too.
So the project is technically feasible.
13
Chapter 5
Methodology
Agile Project Development can be properly used to guarantee the whole success
of the project. Therefore, Agile methodology as described below was used.
14
It is required to keep a project backlog containing all the TODOs in one place
relating to whole project. Also, keeping the sprint backlog which only has related
backlogs to that sprint cycle is useful to properly focus on one thing.
Other than that, scrum master is assigned a role to the project manager who will
be responsible for making a vision of final product of every sprint and present the
project increment(result of each sprint) to supervisor and relay the feedback to
scrum team.
Continous interaction between team and supervisor helps in proficient speed of de-
velopment as well as quality software development indeed.Weekly meeting proved
to be very effective to boost the productivity in team members and keep updated
facts to supervisor.
The use of iterative planning and feedback results in teams that can continuously
align a delivered product that reflects the invisioned product of a manager. It
easily adapts to changing requirements throughout the process by measuring and
evaluating the status of a project. The measuring and evaluating allows accurate
and early visibility into the progress of the project. The ongoing change can some-
times give both the client and the team more than they had originally invisioned
for the product. The Agile Method really is a winning solution for everyone in-
volved in software development.
So, in the actual project development phase it was tried to follow the above men-
tioned method as much as possible. Indeed the scrum master was applied and
decided to conduct a weekly meeting with supervisor to present the progress and
present the ideas to supervisor.
Also, it was tried to make a daily 15 mintues scrum meeting 8:45 to 9:00 AM in
order to present and refresh on working progress among each other.
15
Table 5.1: Sprint Configuration
Altogether 12 spaces inside the project workspace was used that are:-
a. Project plan
b. wiki
c. sprint - 1
d. sprint - 2
e. sprint - 3
f. sprint - 4
g. sprint - 5
h. Datasets preparation and Validation
i. Training for footprint generation, roomsplit generation and furnished gener-
ation
j. Inference Engine
k. Documentation
l. Final Touch
Then, for each of these sprint spaces, a list name as sprint backlog had been made
where the ideas or tasks mentioned inside the project plan’s backlog listed. As per
dscussion in the meeting each of the tasks are first created inside project plan’s
backlog.
As the tasks are assigned to team members one has to make an issue in related
git repo project folder which then converts to the branch for working in technical
way. Then, after all the commits are pushed into that branch then merge request
is created for the project manager to check work in that branch then verify it
finally to merge it with master branch.
Therefore in this procedure, whole ideation of problem solving broken down into
small tasks and then conversion to git issues and finally branches relating to those
issues merging with master is performed.
16
5.2 Workload by Project Members
According to Clickup Productivity it was observe that the workload covered by
each of project members are as given below: Niranjan Bekoju (28.8%), Luja
Shakya (26.3%), Sunil Banmala (24.3%), Anusha Bajracharya (20.6%) Overall
working schedule of the project untill now is specified and represented as a time-
line graph as in [24]. For section 2 the gantt chart is in [25]
17
Chapter 6
System (or Project) Design and Ar-
chitecture
6.1 System Block Diagram
The block Diagram of the proposed system is shown in the figure below:
18
6.2 Generation of Floor Plans
From a given parcel land structure to a well furnished floor plan and its 3D view,
the following main steps was followed:
a. Commercial
b. Resident(House)
c. Resident(Condo)
d. Industrial
Here, each model would create a set of relevant footprint for a given parcel.
19
Figure 6.3: Room Split from footprint
20
6.2.2.2 Program-Specific Generation
In this paradigm, user would specify:
a. Footprint of building
b. Position of facade opening
c. Position of a given room within building footprint
a. Footprint of building
b. Position of facade Opening
c. Existence of load bearing walls as initial constraints. By making the input
image of the training set with green lines, signal to the GAN model, the
presence of walls and train it to generate room layout.
21
Figure 6.6: Structure Specific Generation
Among these generation paradigm, Free Plan Generation was used since room
split was focused more, good orientation, connectivity and circulation between
those rooms rather than the specific position of a room and the existance of load
bearing walls.
6.2.3 Furnishing
Now, we’ve room split, the natural next process is furnishing each room (i.e.
addition of furniture across space in each room). Here, geometry of furniture is
not always perfect but furniture types and their relative space is reasonable. The
user have the ability to edit output of the model before transferring it to next
model keeps control of design process.
22
6.3 Qualifying Metrices for Floor Plans
6.3.1 Footprint
Footprint is used to analyze the shape of the floor plan perimeter and translate
it to histogram. It is used to know thin, bulky or symmetrical of the building
footprint as shown in figure 6.8.
6.3.2 Program
Program is the quantity analysis tool used to analyse the area covered by the
specific room in given total area of footprint.
Program is used to display the type of room and area it contains. It represent
the room using color code in any given floor plan. It provides color band and
become proxy to describe the program. It aggreagates quantity of room within
floor plan. This color band describes us to compute the programmatic similari-
ties and dissimilarities between any given pair of floorplans. It has mainly two
representation:
23
Figure 6.9: Query input to the Progam
24
6.3.3 Orientation
Orientation of wall is valuable source of information as it describes enclosure of a
plan and style of plan. Some of the types of style are:
a. Baroque
b. Manhattan
c. Row House
d. Sub Urban
For instance, modern house and gothic cathedral can be distinguished by simple
extracting the histogram of the walls orientation.
25
6.3.4 Thickness & Texture
Thickness and Texture is used for qualifying fatness of the plan. Thickness is
measure of the wall thickness and texture is variation of wall thickness. Thickness
is avg depth of each wall and Texture is the variation of depth of each wall. The
thickness of wall across the plan and geometry of wall surface differ from style to
style.
For eg: Beaux Arts Hall display columns and indented thick walls.
(a) Beaux Arts Hall Building (b) Beaux Arts Hall floor plan
26
Figure 6.15: Thickness and Texture of the floorplan
6.3.5 Connectivity
Connectivity is used to tackles room adjacency. It provides proximity of rooms
to one another and it is a key dimension of a floor plan. Connection between the
room through door and corridor defines the existence of connection between them.
This graph is used to compare floor plans taking into account the similarity of
connection among rooms.
27
6.3.6 Circulation
Circulation captures how people move across the floor plan. By extracting skeleton
of circulation, the people’s movement can be both quantity nad qualify across a
floor plan.
28
6.4 GAN used
For the given parcel, it was required to generate the footprint, generate the room
split and then furnishing the splited room. For each step, the paired image trans-
lation was plan to be used. i.e. pix2pix GAN.
G : {x, y} → y (6.1)
29
6.4.1.1 Objective
The objective function of conditional GAN can be expressed as:
LcGAN {G, D} = Ex,y [logD(x, y)] + Ex,z [log(1 − D(x, G(x, z)))] (6.2)
Here, G tries to minimize this objective against an adversial and D tries to maxi-
mize it.
30
(a) Three Room Floor Plan (b) Four Room Floor Plan (c) Five Room Floor Plan
Now, it was required to prepare dataset that will be suitable for purpose. And,
the following task have to be done to prepare the dataset.
31
6.5.1 Types of Room
There are total 5 types of room as mentioned in the ROBIN dataset.
a. Bedroom
b. Bathroom
c. Entry
d. Kitchen
e. Hall
32
Figure 6.24: Kelly’s 22 Color of Maximum Contrast
source: https://www.researchgate.net/figure/Kellys-22-colours-of-maximum-contrast-
set-beside-similar-colours-from-the-colour-alphabet fig9 237005166
Because of these advantages, the color selection using the HSV color space is
used, for example, in many common graphics programs. The standard color se-
lection dialog, for example from the Windows operating system, is also based on
the HSV color model: There is a color field in which the color can be selected
arranged according to hue and saturation, as well as an additional controller for
the brightness from white to black, with which the selected color can be adjusted.
The hue (H) is given as an angle on the chromatic circle, therefore it can reach
values between 0o and 360o . 0o corresponds to the color red, 120o corresponds
to the color green and 240o corresponds to the color blue. The saturation (S) is
declared as percentages and can therefore reach values between 0% and 100% (or
0 to 1). A saturation of 100% means a completely saturated and pure color, the
smaller the saturation, the more the color turns to a neutral gray. The lightness or
blackness value (V) is also given as a percentage, where 0% means no brightness
(hence black) and 100% full brightness, hence a spectrum between the pure color
33
(saturation of 100%) and white (saturation of 0%).
If both, the saturation as well as the lightness are 100%, a pure color results.
If the saturation is 0% and the lightness is 100% it is white and for all cases in
which the lightness is 0% it is black.
34
Table 6.1: Chain Code mapping
Chain code ( dx , dy )
0 (1,0)
1 ( 1 , -1 )
2 ( 0 , -1 )
3 ( -1 , -1 )
4 ( -1 , 0 )
5 ( -1 , 1 )
6 (0,1)
7 (1,1)
Template images are prepared in such a way that, it covers all the essential detail
of template but with optimal area covering for best possible matching.
All the possible orientations for the objects of template is prepared and are fed to
the program.
For eg. In case of Arm chair total of 4 orientation is possible but, incase of dining
table only 2 orientation are enough.
35
Figure 6.29: Templates example
6.9 Thresholding
Thresholding is a point operation. It can be used to create a binary image. This
technique in based on a simple concept that a parameter ’θ’ called Threshold is
choosen and applied to the image.
For every pixel, the same threshold value is applied. If the pixel value is smaller
than the threshold, it is set to 0, otherwise it is set to a maximum value.
6.10.1 Erosion
The basic idea of erosion is just like soil erosion only, it erodes away the boundaries
of foreground object. It is normally performed on binary images. It needs two
36
inputs, one is the original image, second one is called structuring element or kernel
which decides the nature of operation. A pixel in the original image (either 1 or
0) will be considered 1 only if all the pixels under the kernel is 1, otherwise it is
eroded (made to zero).
6.10.2 Dilation
Dilation takes two inputs in which one is the input image; the second is called the
structuring element or kernel, which decides the nature of the operation. Image
dilation Increases the object area. The dilation increases the white region in the
image, or the size of the foreground object increases.
6.10.3 Opening
Opening is just another name of erosion followed by dilation. It is useful in
removing noise.
6.10.4 Closing
Closing is reverse of Opening, i.e. Dilation followed by Erosion. It is useful in
closing small holes inside the foreground objects, or small black points on the
object.
37
Figure 6.33: Opening
6.10.5 Skeletonization
Skeletonization is a process of reducing foreground regions in a binary image to a
skeletal remnant that largely preserves the extent and connectivity of the original
region while throwing away most of the original foreground pixels.
In simpler words, Skeletonization makes a BLOB very thin (typically 1 pixel).
BLOB (Binary Large Object) refers to a group of connected pixels in a binary
image.
38
6.11 Canny Edge Detection
The Canny edge detector is used for edge detection. It used multiple stage to
detect a wide range of edges in image. It has 5 steps as:
39
Chapter 7
Experiments
7.1 System Architecture
The overall work flow of the system for processing from cadastrial map to the 3D
model include the key steps as shown in figure 7.1.
40
Table 7.1: Color coding to datasets
This way, the colors to represent different entities were standardized. Except
this, the border of land area, i.e parcel was drawn of color black for easy creation
of following datasets using simple python script.
41
Figure 7.2: Required Paired image of parcel and required footprint
To get these paired datasets, steps as in figure 7.3 to prepare the required datasets
was followed.
Figure 7.3: Principal to generate Paired image of parcel and required footprint
42
was created. This was done to ensure that all the datasets were of same size and
not squeezed. Then all the datasets were placed in the square background image
and were saved in a separate folder with same name. Also to increase the size of
datasets, all the images were again places in the background with slight change in
orientation, i.e 30 degree rotation in clockwise direction. This way, the datasets
were refined as shown in figure 7.5, so as to do some slight process and create
other required datasets and the datasets were named as augmented datasets.
Figure 7.4: Original dataset from Figure 7.5: Augmented Dataset with
ROBIN Floor Plan datasets. padding and 30 degree clockwise ro-
tation placed in a square frame, re-
sized to 512*512.
For generation of footprint, a python cv2 script was created to draw a black
43
colored overlay over the floor plan resulting the footprint datasets but it was quite
tedious job to do so many datasets manually. So New method with a image pro-
cessing tool using dilation and erosion was introduced. Dilation would increase
the width of any lines or pixels in the image and erosion would decrease the width
of the lines or pixels in the image. Using this concept footprint was created by
following method: First the image was rotated 30 degree anticlockwise as shown in
figure 7.7 if it was rotated once before which was identified using the naming con-
vention and then dilated with 5*5 pixels which would increase the width of white
parts of the image as in figure 7.8, resulting in decrease of width of the black lines.
The lines with width less than 5 pixel would disappear in this step.Then again
erosion with 5*5 stride size is done which would bring back the previous image as
shown in figure 7.9, except the thin lines that disappeared during dilation would
not come back again.This step would clean the floor plans and gives clean parcel.
Figure 7.8: Result when rotated final Figure 7.9: Result when dilated im-
image is dilated by stride of 5*5. age is eroded by stride of 5*5.
If required, the parcel is rotated agian by 30 degree clockwise to get the require
parcel image of the final image as in figure 7.10.
Then a difference between the original image and the parcel was created giving us
Floor plan without border line as in figure 7.11, of which first clear the doors and
44
the furniture by doing dilation with 2*2 stride. Then the wall segmentation and
then erode the image with stride of 127*127 giving us the highly eroded image,
i.e the wall segmentation would be changed to a very thick walled, i.e foot print
of the floor plans in figure 7.12.
The dilate the image using stride of 125*125 giving us the floor plan’s size as
the size of original as shown in figure 7.13 then thresholded the image to get exact
black and white print as shown in figure 7.14.
Figure 7.13: Dilated image is eroded Figure 7.14: Dilated image is thresh-
back by stride of 125*125. olded to get black and white print.
The plan is now the footprint which is then rotated clockwise 30 degree if re-
quired giving us final footprint shown in figure 7.15. Now add the floor plan with
the parcel and get the required footprint datasets shown in figure 7.16.
Since paired image for translation from parcel to footprint was required, concate-
nate the footprint datasets with the parcel and get footprint generation datasetsas
shown in figure 7.17.
45
Figure 7.16: parcel added to foot-
Figure 7.15: Footprint
print
46
Figure 7.18: Block Diagram of Template Matching for furnished generation
47
Figure 7.19: Block Diagram of Footprint Qualify
48
7.4.3 Orientation Qualify
Steps:
49
Figure 7.22: Cadestrial map from Figure 7.23: Area of land drawn over
Land Revenue Offices cadestrial map.
Then the corner points, i.e selected area were saved in an array. Then the coordi-
nates of the corner points were brought toward origin by subtracting the minimum
value of x coordinates from all x coordinates and minimum valur of y coordinated
from all y coordinates, i.e (min(x),min(y)) from all the corner points. Then the
result points were plotted in a new image giving us the land area of the user as
in as in figure 7.24. Since the land area image was not ready for further process-
ing by the system, it was further processed by placing it at center over a square
background with white color and size 25% bigger than the maximum side of the
land area image giving us the parcel as in figure 7.25 ready for further processing
by the system. Then the parcel image is saved in remote after resizing it to the
required size i.e 512*512.
50
7.6 GAN Architectures
For the preparation of the GAN model, it was needed to first know the component
of the GAN. A GAN model has three components. Namely:
a. Generator
b. Discriminator
c. Loss Function
Among these components, mainly focus on the generator and discriminator. In
case of the loss function, initially the binary cross entropy, Mean Square Error,
Mean Average Error was used,.
7.6.1 Generator
For generator, first needed to constrained that, a new type of image is going to
generate using a image. So, got condition from a image and then transfer the
content of the image to generate another image. So, requiredto have a generator
model design in such a way that the input and output have the same dimension.
The same scenario is observed in the image segmentation in medical image in [4].
Also the similar architecture but the most advanced version of U-net i.e. Double
U-net based architecture can also be used for the similar generation of image. So,
here, two model for generation of image can be used.
7.6.2 Discriminator
For discriminator, initially, Custom CNN model that takes two input images will
be used as a discriminator and generate a model that gives an output 16*16 model
will be used.
Here, the discriminator model works in similar concept to classifier with clas-
sification classes real or fake. so, it was planned to prepare a model using Google
Net pre-trained in image net datasets and change the top layer. The pre-trained
model is used because it have already learn most generic features in lower layer
(layer closes to the input layer) and also some high level features that using higher
layer. So, the concept of transfer learning for fast training and high accuracy was
used.
Not, only this, u-net based architecture was also used, that is build using en-
coder and decoder with skip connections. Here, the model can classify the whole
image as real or fake and also, it can classify the each pixel as real or fake. So,
this was one of the power architecture for discriminator. Since, the discriminator
is more powerful, so, the generator is also most powerful, since GAN is the com-
petitive algorithm between generator and discriminator.
51
7.6.3 GAN Architectures
a. U-net Generator with Custom CNN Discriminator
b. U-net Generator with Google net Discriminator
c. U-net Generator with U-net based Discriminator as shown in C
d. Double U-net Generator with Custom CNN Discriminator
e. Double U-net Generator with Google net Discriminator
f. Double U-net Generator with U-net based Discriminator
No. of loop: For now, let’s configure the no. of iteration for each model is
200,000 and 400,000.
GPU machine: RTX 3060 with 3584 cuda cores and 12 GB RAM will be used
for training purpose...
Then, after the val loss and val accuracy was observed, it can be observe from
where the model starts to over fit. If the model over fit, it simply memories the
style and content and the model will not be generic. So, it must be observe where
the model start over fitting.
52
a. Load zip Datasets
b. Extract zip file
c. Load Image
d. Convert Image to float format and divide by 255.0
e. Resize image to 286
f. Random Crop resulting to (256,256)
g. Mirror image randomly
53
7.9.2 Generator Model Architecture U-net
54
7.9.3 Generator Model Architecture U-net summary
Model: "U-net Generator"
____________________________________________________________________________
Layer Output Shape Param # Connected to
============================================================================
input_3 [(None, 256, 256, 3)] 0 []
55
’sequential_34[0][0]’]
============================================================================
Total params: 54,425,859
Trainable params: 54,414,979
Non-trainable params: 10,880
____________________________________________________________________________
56
7.9.5 Training Procedure for the generator
57
7.9.6 Discriminator Model Architecture
58
7.9.7 Discriminator Model Summary
Model: "Discriminator"
_________________________________________________________________________
Layer Output Shape Param # Connected to
=========================================================================
input_image [(None, 256, 256, 3)] 0 []
=========================================================================
59
7.9.8 Defining the Discriminator loss
Discriminator loss is calculated using real image and generated image.
60
7.9.9 Training procedure for Discriminator
61
7.9.10 Generator Model Architecture for Triple-U-net brief
• Inception score
• FID score
62
where X pg indicated that X is an image sampled from pg , DKL (pkq) is the KL-
divergence between theR distributions p and q, p(y|X) is the conditional class dis-
tribution and p(y) = X p(y|X)pg (X) is the marginal class distribution. The exp in
the expression is there to make the values easier to compare, so it will be ignored
and use ln(IS(G)) without loss of generality.
Considering the exponentiation new improved inception score is as follows:
N
1 X
s(G) = DKL (p(y|X(i) kp̂(y))) (7.5)
N i=1
63
require to predict the probability as an output.
Equation:
1
f (x) = s = (7.7)
1 + e−x
Derivative:
f 0 (x) = s ∗ (1 − s) (7.8)
Range: (0,1)
Leaky ReLU is defined to address this problem. Instead of defining the ReLU
activation function as 0 for negative values of inputs(x), Leaky ReLU define it as
an extremely small linear component of x. Here is the formula for this activation
function
Equation:
ex
f (x) = P i x (7.11)
( j=θ ei )
Probabilistic interpretation:
Sj = P (y = j|x) (7.12)
Range: (0, 1)
The softmax function is often used in the final layer of a neural network-based
classifier.
Softmax is used for multi-classification in logistic regression model. Softmax can
64
be used to build neural networks models that can classify more than two classes
instead of a binary class solution.
For the project, leaky ReLU activation function among all of the above was used.
While calculating, slope saturates when the input gets large in tanh, softmax and
sigmoid function. In this case, ReLU activation function overcomes this problem.
However, the slope of ReLU in the negative range is 0 so once a neuron gets
negative, it’s unlikely for it to recover. This means neurons are not playing any
role in discriminating the input and is essentially useless. Hence, to overcome all
above mentioned problems, leaky ReLU is most convenient for use.
Here, Hp (q) is Binary Cross Entropy loss , x is the input , y is label (let labels
be some color to points x: label 1 is green and label 2 is red)/ output, p(y) is
probability of y being label 1, 1 − p(y) is probability of y being label 2.Here,
Binary cross entropy is the average of sum of log of probability of a point being
red or green.
65
Formally, it is designed to quantify the difference between two probability distri-
butions.
The categorical cross entropy is well suited to classification tasks, since one exam-
ple can be considered to belong to a specific category with probability 1, and to
other categories with probability 0. It can be explained as:
esi
f (s)i = PC (7.14)
sj
j e
C
X
CE = − ti .log(f (s)i ) (7.15)
i
Here, ti is the groundtruth, i.e in the form of [0, 0, 0, 1], eg: for [cat, dog.horse, lion]
i.e multi class and f (s)i is result of softmax activation of multiple class eg: [0.2, 0.1, 0.3, 0.4]
And, Higher the probability of the actual class in softmax result, lesser the loss
and viceversa.
Here, Yi is initial observed values while Ŷi is the estimated or predicted value at
any instance. MSE is the average of square of error between estimated and actual
values.
66
function to use when a sigmoid function was used in the output layer of the net-
work, and to maximize the likelihood of classifying the input data correctly. And
categorical cross entropy is often used in case of multi class classifier.
67
c. Step size is hyper parameter that control how far to move in the search space
d. Small step size takes a longtime and can get stuck.
e. Large step size results in zig-zagging or bouncing around search space.
Adagrad: It adapts the learning rate to the parameters. It performs larger up-
dates for infrequent. It performs small updates for frequent parameters.
Adagrad used different learning rate for every parameter at every time step.
η
θt + 1, i = θt,i − p (7.20)
Gt,i i + .gt,i
68
Adam: It computes adaptive learning rate for each parameters. It has exponen-
tially decaying average of past squared gradient. It includes momentum in
adadelta or RMSprop.
η
∆wi (t) = − p Mi (t) (7.23)
Gi (t) +
δL
Mi (t) = αMi (t − 1) + (1 − α) (t) (7.24)
δwi
Among, this all Optimization function, Adam was choosed in the project for Op-
timization of Deep Neural Network as Adam includes momentum in Adadelta or
RMSprop. and Adadelta is extension of Adagrad. So, in Adam Optimizer there is
advantages of all Momentum, Adagrad and Adadelta. So, Adam was choosed as
optimization function. And, there is a wide practice of using Adam as optimization
function in case of deep neural networks.
69
7.14 3D Generation
For 3D Generation, one approach is 3D plotting of segmented image. For the plot-
ting of 3D model,three python packages was used which are numpy, matplotlib
and opencv using which 3D ploted the segmented image to 3D model. Here, first
segmented image that contain segmentation of wall, door and window was pro-
duced as shown in figure 7.31.
Here, the white background represents floor, dark color band with gray level rep-
resents wall, band with gray level 170 represents window, gray level 85 represents
door. The details is shown below in gray level format table 7.2.
But, while generating the segmented image from the room split, it may not get
exact gray level. So, for this gray level slicing approaches was used as shown in
figure 7.32.
Finally, the corresponding gray level with wall, floor, door, and window was
replaced. And the result will be display using the matplotlib 3D projection. The
output will be as shown in figure 7.33.
70
Figure 7.32: Graylevel
Figure 7.33: 3D Model Plotting
slicing approach
X X
V (m, n) = z(k, l)y(m − k, n − l) (7.25)
(k,l)w
Here, is a input image of 5*5*1 represented by green color, and a kernel of 3*3*1
represented by yellow color.
The filter slides over the image to performs convolution operation and the result
is shown in convolved feature in image.
Here, the filter moves to the right with certain stride value till it parses the com-
plete width. And, it hops down to the beginning of the image with the same stride
value and repeat the process until the entire image is traversed.
Here, the depth of the kernel is normally equal to the depth of the input image.
Here, two types of operation.
• Same Padding that performs the convolution in such a way that the output
image has same dimension as input image. It first includes required padding
in the input image.
71
Figure 7.35: Convolution of three channel image
• Valid Padding that perform the same operation without padding in input
image. So, the output image dimension is less that that of input image.
72
the entire image. To address this problem, put a frame around the image so that
the information appears at the center of this whole picture as in figure 7.37 which
is also known as padding. The padded value is usually zero. So, it is also called
zero padding. Now, the filter scan the image along with the frame but this time
the filter scans or visited each pixel of image equal no. of time. This is called
same padding.
73
7.15.2 Convolution Layer follows Transpose Convolution
The Transpose Convolution layer is a learnable layer that takes the input image
and convoluted it using a filter in such a way that the size of the input image
increases. The difference between the transpose convolution and the up sampling
layer is that up sampling has predefined method to up sample the image but the
transpose convolution use the learnable parameter. Here, in the figure 7.38, the
value of the filter are learned. But, there is an issue that the center pixel is visited
four times and is influenced by all the pixels while the others are not. This raise an
common issue called checkerboard issue as shown in figure 7.39 This is the main
disadvantage or defects of the transpose convolution layer.
74
• Max Pooling: It is used to extract the maximum value of the feature
map.It also performs as Noise Suppressant.
• Average Pooling : It is used to extract the average value of the feature
map.It simply performs dimensionality reduction.
7.17.1 Advantages
a. Once implemented, batch normalization has the effect of dramatically ac-
celerating the training process of a neural network, and improves the per-
formance of the model, in some cases by halving the epochs or better, and
provides some regularization, reducing generalization error.
b. It helps to coordinate the updation of multiple layers in the model.
c. It does this scaling the output of the layer, specifically by standardizing the
activations of each input variable per mini-batch, such as the activations of
a node from the previous layer.
75
c. Use Large Learning Rates
This may require the use of much larger than normal learning rates, that in
turn may further speed up the learning process.
7.18.1 Advantages
a. It is useful to get lost information during encoding stage.
b. They largely allow information to flow from earlier layers to later layers.
7.18.2 Implementation
U-net introduces skip connections from encoder to decoder during encoding stage.
During encoding stage, every single block in encoder that is same resolution as its
corresponding block in decoding stage get this extra connection to go concatenate
with the value so that information which might have been compressed too much
can still trickle through to get into some of later layers.
Forward pass: Skip connections allow information flow to decoder.
Backward pass: Skip connections improve gradient flow to encoder.
76
Figure 7.42: Nearest neighbour upsampling
77
7.20 Inference Engine
7.20.1 Use case diagram of FPGAN
78
7.20.2 Pre-processing of Cadestral Map to Generate Par-
cel
79
7.20.3 Step by step Generation
80
7.20.4 Furniture Mapping
81
7.20.5 Wall Segmentation and 3D Generation
82
7.21 Issue: Footprint area are generated outside
the Parcel area
Possible Problem
a. The width of the parcel is small
b. There is a boundary line in the parcel image
83
Test #2: Increase the width of the parcel to about 3px and
then check the result
Figure 7.51: Parcel Line width 3 Figure 7.52: Parcel Line width 3
case#1 case#2
Test #3: Increase the width of the Parcel to about 5px and
then check the result
Figure 7.53: Parcel Line width 5 Figure 7.54: Parcel Line width 3
case#1 case#2
Result
From the observation, it was concluded that the boundary line in the image must
be removed and also the width of the parcel line must be wide enough. This is
because, training image has wide parcel line. In addition to this, the parcel area
must cover most of the image else it will produce output outside the parcel area.
84
Chapter 8
Expected Outcomes
Firstly, B.1 the splash screen is displayed to welcome and introduce users about
the application and how it works.
B.2 Main menu consists of two options basically to make user to define area struc-
ture and space to start floor plan.
First one is using map provided by malpot to get an accurate location for the
building as well as maintain the appropriate scale to plan footprint.In this case of
manual file upload as in B.3, a fine quality PNG image with appropriate scale is
required.
Then, as for the precise map area selection option is provided as in B.4.
And another one is to freely draw and mark to specify the area eligible to place
building. For this the empty canvas is given to user with access to draw a free
shape to have abstract concept designing at the early stage as in B.5.
Also, after specifying area of interest option to specify the shape of area eligible
for building using vertices marking over it can be given. Now, finally in B.6 user
can define his/her constraints over the floor plan with markings for openings as
well as entrance and shapes of the floor to proceed towards GAN generated output.
In this stage as in B.7, user are given options of GAN generated raster images
which are displayed alog the line bottom of screen from which one can be selected
and proceed to vector image generation as well as the 3D model generation. Then
the final step is to 3D model the floor plan in browser as in B.8 and make user
experience rendered view eligible for exploration.
A deployable web app using simple python flask framework will be made. Until
now, the static screens eligible for populating with generated results from backend
system of GAN model was completed. Despite the completion of frontend parts,
model training for FPGAN is not yet completed.Therefore, Working prototype
was not mentioned for now. it will be focused in next phase.
So, it was decided to train GAN model based on the various combinations of
85
generators and discriminators as mention in 7.6.3
Then, after training each of above models, these models would be compare based
on inception score as well as FID score for each pipeline followed in the project.
i.e the expected model comparision table based in Inception score will be as: And
Table 8.1: Inception score of each model for each pipeline steps
Models
Pipelines 1 2 3 4 5 6
Parcel to Footprint - - - - - -
Footprint to Roomsplit - - - - - -
Roomsplit to Furnished - - - - - -
for FID score comparision, the similar table would be used seperately.
86
Chapter 9
Actual Outcome
9.1 Review of ROBIN datasets
2 B 10 L shape
3 C 10 L shape
4 D 10 L shape
5 E 10 U shape
87
6 F 10 U shape
7 G 10 U shape
8 H 10 U shape
9 I 10 T shape
10 J 10 L shape
11 K 10 T shape
12 L 10 T shape
88
13 M 10 H shape
14 N 10 I shape
15 O 10 L shape
16 P 10 Z shape
17 Q 10 Z shape
The error in the generation of parcel and footprint using automatic generation
script is due to following reasons:
89
9.3 Parameter Tuning for Template Matching
for Furnished Datasets
90
9.5 Accuracy of furnished datasets
91
9.9 Program of the Roomsplit Datasets
Using the roomsplit data, the program of each roomsplit was generated. This
Program helps to decide the percentage of area of footprint covered by each room.
Some program are shown in D.3
Table 9.6: Inception Score of Generated Image using U-net after 200K steps train-
ing
Table 9.7: Inception Score of Generated Image using U-net after 400K steps train-
ing
92
9.12 Comparision of Inception Score and Inter-
pretation
Table 9.9: Inception Score of Generated Image using U-net after 400k steps train-
ing
From the Comparision of the Inception for the generated image using U-net
Model at 200K and 400K steps of training, it was observed that the inception
score for footprint is high for 200k of training and for room split and furnished,
inception score is high for 400k training. Hence, it can be ensure that the image
generated by Model at 400k is more realistic than that produce at 200K for room
split and furnished.
From the Comparision, it was observed that the FID score of 400K steps U-net
Model is less than that of 200K step U-net model for all footprint, room split and
furnished generation. And, also the FID score is lower the better. So, A decision
to use 400K trained U-net model in the inference engine was taken.
Here, FID score is more prioritze than Inception score because, inception score
tells how realistic the generated image is. And, the FID score tells how close the
generated image with required image is. And, the image must be more close to
required image rather than the realistic image. So, prioritze the FID score and
select 400K steps model for footrprint even though the Inception score of 200K
steps is more than that of 400K steps model.
93
9.14 FID score of generated images using triple
U-Net
94
9.15 Unit Testing of Models
Output of each model with random input images was tested from testing dataset
to check working of individual models. Here, paired images of each test are shown
below where, left image is input image and right one is output image.
Figure 9.2: Unit testing for U-net based footprint generation model #1
Figure 9.3: Unit testing for U-net based footprint generation model #2
Figure 9.4: Unit testing for U-net based footprint generation model #3
95
9.15.1.2 U-net Based Roomsplit Generation Model
Figure 9.5: Unit testing for U-net based roomsplit generation model #1
Figure 9.6: Unit testing for U-net based roomsplit generation model #2
Figure 9.7: Unit testing for U-net based roomsplit generation model #3
96
9.15.1.3 U-net Based Furnished Generation Model
Figure 9.8: Unit testing for U-net based furnished generation model #1
Figure 9.9: Unit testing for U-net based furnished generation model #2
Figure 9.10: Unit testing for U-net based furnished generation model #3
97
9.15.2 Triple U-net Based Models
9.15.2.1 Triple U-net Based Footprint Generation Model
Figure 9.11: Unit testing for Triple U-net based footprint generation model #1
Figure 9.12: Unit testing for Triple U-net based footprint generation model #2
Figure 9.13: Unit testing for Triple U-net based footprint generation model #3
98
9.15.2.2 Triple U-net Based Roomsplit Generation Model
Figure 9.14: Unit testing for Triple U-net based roomsplit generation model #1
Figure 9.15: Unit testing for Triple U-net based roomsplit generation model #2
Figure 9.16: Unit testing for Triple U-net based roomsplit generation model #3
99
9.15.2.3 Triple U-net Based Furnished Generation Model
Figure 9.17: Unit testing for Triple U-net based furnished generation model #1
Figure 9.18: Unit testing for Triple U-net based furnished generation model #2
Figure 9.19: Unit testing for Triple U-net based furnished generation model #3
100
9.16 Integration Testing of Models
The output of full workflow was tested in integrated inference engine in flask.
Figure 9.20: Integration testing for U-net based footprint generation model #1
Figure 9.21: Integration testing for U-net based roomsplit generation model #1
Figure 9.22: Integration testing for U-net based furnished generation model #1
101
Test Case #2
Figure 9.23: Integration testing for U-net based footprint generation model #2
Figure 9.24: Integration testing for U-net based roomsplit generation model #2
Figure 9.25: Integration testing for U-net based furnished generation model #2
102
Test Case #3
Figure 9.26: Integration testing for U-net based footprint generation model #3
Figure 9.27: Integration testing for U-net based roomsplit generation model #3
Figure 9.28: Integration testing for U-net based furnished generation model #3
103
9.16.2 Triple U-net Based Models
Test Case #1
Figure 9.29: Integration testing for Triple U-net based footprint generation model
#1
Figure 9.30: Integration testing for Triple U-net based roomsplit generation model
#1
Figure 9.31: Integration testing for Triple U-net based furnished generation model
#1
104
Test Case #2
Figure 9.32: Integration testing for Triple U-net based footprint generation model
#2
Figure 9.33: Integration testing for Triple U-net based roomsplit generation model
#2
Figure 9.34: Integration testing for Triple U-net based furnished generation model
#2
105
Test Case #3
Figure 9.35: Integration testing for Triple U-net based footprint generation model
#3
Figure 9.36: Integration testing for Triple U-net based roomsplit generation model
#3
Figure 9.37: Integration testing for Triple U-net based furnished generation model
#3
106
9.17 Furniture Mapping
The furnitures being represented by the colors are mapped to their corresponding
furniture icons. Three sample outputs of furniture mapping is shown below:
107
9.18 Wall Segmentation
To get 3D image of the generated floor plan, wall segments of the roomsplit was
required for which Canny edge detection was used and the walls of the room were
detected which were then used to generate 3D images.
108
9.19 3D Generation
With the help of segmented images got from Canny edge detection algorithm, a
3D image of the floor plan was generated and displayed to the users.
109
Chapter 10
Conclusion and Future Enhance-
ments
Conclusively, for the project Floor Plan Generation using Generative Adversarial
Network is completed as it was successfully deployed the project as a desktop
application for generating floor plan generation using U-net and Triple U-net ar-
chitecture.
• Better training datasets can be used to train model to get better inference
output
110
Bibliography
[1] S. Chaillou, “Expliquer.” 2019. [Online]. Available: http://stanislaschaillou.
com/expliquer/
[2] N. Nauata, K.-H. Chang, C.-Y. Cheng, G. Mori, and Y. Furukawa, “House-
gan: Relational generative adversarial networks for graph-constrained house
layout generation,” in European Conference on Computer Vision. Springer,
2020, pp. 162–177.
[3] Q. Chen, Q. Wu, R. Tang, Y. Wang, S. Wang, and M. Tan, “Intelligent
home 3d: Automatic 3d-house design from linguistic descriptions only,” in
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2020, pp. 12 625–12 634.
[4] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
for biomedical image segmentation,” in International Conference on Medical
image computing and computer-assisted intervention. Springer, 2015, pp.
234–241.
[5] D. Jha, M. A. Riegler, D. Johansen, P. Halvorsen, and H. D. Johansen,
“Doubleu-net: A deep convolutional neural network for medical image seg-
mentation,” in 2020 IEEE 33rd International symposium on computer-based
medical systems (CBMS). IEEE, 2020, pp. 558–564.
[6] E. Schonfeld, B. Schiele, and A. Khoreva, “A u-net based discriminator for
generative adversarial networks,” in Proceedings of the IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, 2020, pp. 8207–8216.
[7] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
with conditional adversarial networks,” in Proceedings of the IEEE conference
on computer vision and pattern recognition, 2017, pp. 1125–1134.
[8] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
translation using cycle-consistent adversarial networks,” in Proceedings of the
IEEE international conference on computer vision, 2017, pp. 2223–2232.
[9] H. Yong, J. Huang, D. Meng, X. Hua, and L. Zhang, “Momentum batch nor-
malization for deep learning with small batch size,” in European Conference
on Computer Vision. Springer, 2020, pp. 224–240.
[10] M. Vidanapathirana, Q. Wu, Y. Furukawa, A. X. Chang, and M. Savva,
“Plan2scene: Converting floorplans to 3d scenes,” in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021,
pp. 10 733–10 742.
[11] D. Shin, C. C. Fowlkes, and D. Hoiem, “Pixels, voxels, and views: A study
of shape representations for single view 3d object shape prediction,” in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition,
2018, pp. 3061–3069.
111
[12] C. Liu, J. Wu, P. Kohli, and Y. Furukawa, “Raster-to-vector: Revisiting floor-
plan transformation,” in Proceedings of the IEEE International Conference
on Computer Vision, 2017, pp. 2195–2203.
[20] S. Dodge, J. Xu, and B. Stenger, “Parsing floor plan images,” in 2017 Fif-
teenth IAPR international conference on machine vision applications (MVA).
IEEE, 2017, pp. 358–361.
112
[25] “Google sheet,” mar 2022. [Online]. Available: https://docs.google.com/
spreadsheets/d/1y5eFKPdgFDfsafWFMCaDbp96Zv5SKaLcZQK6Et0ZF60/
edit?usp=sharing
[27] L.-P. de las Heras, O. R. Terrades, S. Robles, and G. Sánchez, “Cvc-fp and
sgt: a new database for structural floor plan analysis and its groundtruthing
tool,” International Journal on Document Analysis and Recognition (IJDAR),
vol. 18, no. 1, pp. 15–30, 2015.
[28] P. Centore, “srgb centroids for the iscc-nbs colour system,” Munsell Colour
Sci. Painters, 2016.
113
Appendix
A Mockup Demonstration
The demo Prototype of the mockup design in figma is showcased in: Prototype
expected outcome.
114
B.2 Main Menu
115
B.4 Manual Map upload from Malpot and area marking
116
B.6 Constraint given to plan
117
B.8 Generate 3D
118
D Qualification Metrices
D.1 Orientation of the Prepared Datasets
119
D.2 Footprint of the Prepared footprint Datasets
120
D.3 Program of the Prepared roomsplit Datasets
121
E Actual OutCome Screenshots
E.1 Get Started Page
122
E.3 After Uploading Cadastral Map
123
E.5 Display Parcel and Choosing Model
124
E.7 Footprint to Roomsplit Generation
125
E.9 Furniture Mapping
126
E.11 3D Plan
127
E.12 Complete Result
128