Games: Mesh-Based Adapting and Modification of Gaussian Splatting

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

Joanna Waczyńska 1 * Piotr Borycki 1 * Sławomir Tadeja 2 Jacek Tabor 1 Przemysław Spurek 1

Abstract Before modification After modification


In recent years, a range of neural network-based
methods for image rendering have been intro-
arXiv:2402.01459v1 [cs.CV] 2 Feb 2024

duced. For instance, widely-researched neural


radiance fields (NeRF) rely on a neural network
to represent 3D scenes, allowing for realistic view
synthesis from a small number of 2D images.
However, most NeRF models are constrained by
long training and inference times. In comparison,
Gaussian Splatting (GS) is a novel, state-of-the-
art technique for rendering points in a 3D scene by
approximating their contribution to image pixels
through Gaussian distributions, warranting fast
training and swift, real-time rendering. A draw-
back of GS is the absence of a well-defined ap-
proach for its conditioning due to the necessity
to condition several hundred thousand Gaussian
components. To solve this, we introduce Gaussian
Mesh Splatting (GaMeS) model, a hybrid of mesh
and a Gaussian distribution, that pin all Gaussians
splats on the object surface (mesh). The unique
contribution of our methods is defining Gaussian
splats solely based on their location on the mesh,
Figure 1. GaMeS produce a hybrid of Gaussian Splating and mesh
allowing for automatic adjustments in position, representations. Therefore, GaMeS allows real-time modification
scale, and rotation during animation. As a result, and adaptation of Gaussian Splatting. The supplementary material
we obtain high-quality renders in the real-time provides more examples, including a video illustrating our results.
generation of high-quality views. Furthermore,
we demonstrate that in the absence of a predefined
mesh, it is possible to fine-tune the initial mesh
grown in popularity within the computer vision and graph-
during the learning process.
ics community as they enable the creation of high-quality
renders. Despite this interest and growing body of related
research, the long training and inference time remains an
1. Introduction unsolved challenge for NeRFs.

Recently, we have observed the emergence of several In contrast, latterly introduced Gaussian Splatting
promising methods for rendering unseen views of 3D objects (GS) (Kerbl et al., 2023) offers swift training and real-time
and scenes using neural networks. For instance, Neural Ra- rendering capabilities. What is unique to this method is
diance Fields (NeRFs) (Mildenhall et al., 2020) have rapidly that it represents 3D objects using Gaussian distributions
(i.e. Gaussians). Hence, it does not rely on any neural
* 1
Equal contribution Jagiellonian University, Faculty network. Consequently, Gaussians can be employed in a
of Mathematics and Computer Science, Cracow, Poland manner akin to manipulating 3D point clouds or meshes,
2
Department of Engineering, University of Cambridge, allowing for actions like resizing or repositioning in 3D
Cambridge, UK. Correspondence to: Joanna Waczyńka
<[email protected]>. space. Nonetheless, practical challenges may arise when al-
tering Gaussian positions, particularly in accurately tracking
changes in the shape of Gaussian components, such as the

1
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

ellipses. Moreover, scaling Gaussian components proves the corresponding Gaussian components, enabling real-time
challenging when the object undergoes resizing, which is animation, see Fig. 1. Our approach can be applied in sce-
not an issue for classical meshes, as their triangle faces can narios with pre-existing mesh that we do not want to modify
be readily updated when vertex positions are adjusted. during the training, or requiring simultaneous optimization
of both the mesh and Gaussian Splatting, see Fig. 2. In
The above constraints may be resolved by constructing Gaus-
summary, this work makes the following contributions:
sian directly on the mesh, as shown by SuGaR (Guédon &
Lepetit, 2023). Here, the authors introduced a regulariza-
tion term in the Gaussian splat cost function to promote • We introduce a hybrid representation for 3D objects,
optimal alignment of the Gaussians with the scene’s sur- seamlessly integrating mesh and GS.
face. SuGaR uses signed distance functions (SDF) from the • Attaching Gaussian splats on mesh enables real-time
vanilla GS and minimizes the difference between this SDF modifications to the GS alongside mesh changes.
and its actual value computed for the Gaussians. In compar-
ison, GaussianAvatars (Qian et al., 2023) utilizes the local • Our method relies only upon basic vector operation,
coordinate system to generate Gaussians aligned with the and consequently, we are able to render dynamic scenes
relevant faces of the mesh, offering a technique specifically at a similar time to their static counterpart.
designed for avatars under the assumption of possessing a
realistic external model for mesh fitting. However, it is not 2. Related Works
possible to train both mesh and GS simultaneously. The
above solutions have some advantages but do not directly Point-based Gaussian representations have found a large
combine Gaussian components with meshes. Therefore, we number of potential application scenarios, including sub-
cannot automatically adapt the Gaussian parameters when stituting point-cloud data (Eckart et al., 2016), molecular
the mesh changes. structures modeling (Blinn, 1982), or shape reconstruction
(Keselman & Hebert, 2023). These representations can
Before modification After modification also be utilized in applications involving shadow (Nulkar
& Mueller, 2001) and cloud rendering (Man, 2006). In ad-
dition, a new technique of 3D Gaussian Splatting (3D-GS)
was recently introduced (Kerbl et al., 2023). This method
couples splatting methods with point-based rendering for
real-time rendering speed. The rendering quality of 3D-GS
is comparable to that of Mip-NeRF (Barron et al., 2021),
one of the best multilayer perceptron-based renderers.
The GS surpasses NeRF in terms of both training and in-
ference speed, distinguishing itself by not depending on a
neural network for operation. Rather, GS stores essential
information within its 3D Gaussian components, a feature
that renders it well-suited for dynamic scene modeling (Wu
et al., 2023). Additionally, integrating GS with a dedicated
3D computer graphics engine is a straightforward process
(Kerbl et al., 2023). However, conditioning GS is a challeng-
Figure 2. GaMeS can be effectively trained on large scenes to allow
their modifications while preserving high-quality renders. ing task due to the large number of Gaussian components
typically involved, which can reach hundreds of thousands.
To address this issue, we introduce a novel approach called One potential approach is to combine Gaussian Splatting
Gaussian Mesh Splatting (GaMeS), combining the con- with the mesh. An example of this is SuGaR (Guédon &
cepts of mesh and Gaussian distribution. Implementing Lepetit, 2023), which introduces a regularization term in the
the GaMeS involves positioning Gaussian components on GS cost function to encourage alignment between the Gaus-
the faces of the mesh, ensuring proper alignment of these sians and the surface of the scene. SuGaR achieves this by
components with the mesh structure. Here, we propose a using signed distance functions (SDF) and minimizing the
novel approach to defining and modifying Gaussians: the difference between the SDF and the computed Gaussian val-
affine Gaussian transformation. Using our methodology, ues. Another method, GaussianAvatars (Qian et al., 2023),
we can obtain comparable to state-of-the-art high-quality utilizes the local coordinate system to generate Gaussians
outcomes that can be attained for static scenes, akin to the corresponding to the mesh’s faces. This approach is specifi-
conventional GS method. Moreover, any alterations ap- cally designed for avatars and assumes the availability of a
plied to the mesh will automatically propagate updates to realistic (external) model for mesh fitting. However, train-

2
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

Distribution on Mesh Faces In GaMeS, we placed all


Gaussian components onto the mesh surface. Let us exam-
ine a single triangle face with vertices:

V = {v1 , v2 , v3 } ⊂ R3 .

We aim to parameterize the Gaussian components using


vertices from the face V . We express the mean vector as
a convex combination of vertices V , thus determining the
Figure 3. The left image presents the Gaussian component con- Gaussian splats positions:
structed on the face V = {v1 , v2 , v3 }. We overlap the centers of
mV (α1 , α2 , α3 ) = α1 v1 + α2 v2 + α3 v3 ,
Gaussian and face and parameterize the covariance matrix by ver-
tices V . The right image presents how our model uses previously where α1 , α2 , α3 are trainable parameters such that
estimated Gaussian to construct k (in Figure k = 3) components
α1 + α2 + α3 = 1. Through this parameterization, we con-
in GS. GaMeS components are built by scaling and a translation
sistently maintain the Gaussians positioned in the middle of
on a mesh face.
the face V .
The covariance matrix can be defined as an empirical covari-
ing both the mesh and GS simultaneously is not feasible.
ance calculated from three points. However, such a solution
While these solutions offer certain advantages, they do not
is complex to combine with the optimization proposed in GS.
directly combine Gaussian components with the mesh. As a
Instead, the covariance is parameterized by factorization:
result, automatic adaptation of the Gaussian parameters to
changing mesh is impossible. Σ = RT SSR,

where R is the rotation matrix and S the scaling param-


3. GaMeS: Mesh-Based Gaussian Splatting eters. Here, we define rotation and scaling matrices to
This section delves into the details of the GaMeS model, stay in the original framework. Let start from orthonor-
commencing with the fundamental aspects of vanilla GS. mal vectors: r1 , r2 , r3 ∈ R3 , consisted into rotation matrix
Subsequently, we elucidate how we parameterize Gaussian RV = [r1 , r2 , r3 ]. Where the first vector is defined by the
distributions on the mesh’s faces. Finally, we introduce the normal vector:
novel GaMeS approach. (v2 − v1 ) × (v3 − v1 )
n= ,
∥(v2 − v1 ) × (v3 − v1 )∥
Gaussian Splatting The Gaussian Splatting (GS) tech-
nique captures a 3D scene through an ensemble of 3D Gaus- where × is the cross product. Given an explicitly defined
sians, each defined by its position (mean), covariance matrix, mesh, we consistently possess knowledge of the vertex order
opacity, and color. Additionally, spherical harmonics (SH) for any given face. Hence, to calculate r2 , we define the
are employed to depict the color attributes. (Fridovich-Keil vector from the center to the vertex v1 :
et al., 2022; Müller et al., 2022). The GS effectiveness is v1 − m
primarily attributed to the rendering process, which utilizes r2 = ,
∥v1 − m∥
projections of Gaussian components. GS relies on a dense
set of 3D Gaussian components with color and opacity: where m = mean(v1 , v2 , v3 ), which corresponds to the
centroid of the triangle.
G = {(N (mi , Σi ), σi , ci )}ni=1 ,
The last operation is obtained through orthonormalizing the
vector with respect to the existing two vectors (a single step
where mi denotes position, Σi marks covariance, σi stands
in the Gram–Schmidt process (Björck, 1994)):
for opacity, and c is SH colors of i-th component.
The parameters of Gaussian distributions undergo direct orth(x; r1 , r2 ) = x − proj(x, r1 ) − proj(x, r2 ),
training through gradient optimization. To enhance adapt- where
ability to complex scenes, GS employs additional training ⟨v, u⟩
strategies. Notably, significant Gaussian components are proj(v, u) = u.
⟨u, u⟩
subdivided. If the update parameters are substantial, Gaus-
To obtain r3 , we use the vector from the center to the second
sians components are duplicated, which also can be removed
vertex of the triangle:
due to their low transparency. In addition, the GS training
procedure is implemented in the CUDA kernel, which sup- orth(v2 − m; r1 , r2 )
ports fast training and real-time rendering. r3 = .
∥orth(v2 − m; r1 , r2 )∥

3
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

Original mesh Subdivide larger faces

Figure 4. Visualization of affine transformation of Gaussian com- Figure 5. GaMeS use a fixed number of faces. In the case of a mesh
ponents when we modify mesh. Mesh vertices parameterize the containing faces of different sizes, finding such parameters causes
mean and covariance matrix of Gaussian. Therefore, such parame- problems. In such a case, we propose to divide large faces into
ters update when we change the mesh. smaller parts. The left side includes only the original mesh, while
on the right, the sizable mesh faces of the brown pool surrounding
have been subdivided into smaller ones.
In consequence, we obtain a rotation matrix RV =
[r1 , r2 , r3 ], which aligns with the triangular face. As scaling
V ′ = v1′V , v2′V , v3′V . Three Gaussians splats GV1 , GV2 ,

parameter S we use:
GV3 depend only on the position of the vertices.
SV = diag(s1 , s2 , s3 ),

where s1 = ε, s2 = ∥mean(v1 , v2 , v3 ) − v1 ∥, and GaMeS: Gaussian Splatting on Mesh The essential step
s3 = ⟨v2 , r3 ⟩. The first scaling parameters correspond with in our model involves positioning Gaussians on the 3D mesh
the normal vector. Since, in our case, the Gaussians are of the scene or object in question. This non-trivial approach
positioned on the faces, aligning with the surface s1 should disregards the hierarchy of Gaussian positions and facili-
be equal to zero. To avoid numerical problems we fix this tates ease of potential animation. Here, we have to consider
value as a small value (constant). On the other hand, s2 two crucial experimental scenarios, namely, with and with-
and s3 are proportional to distances from the center to the out using mesh during the training. First, we assume that
triangle border. a mesh is being provided and exclusively train the Gaus-
Consequently, the covariance of Gaussian distribution posi- sian components’ centers, colors, and opacity. Second, we
tioned to face is given by: conduct training simultaneously on the mesh and Gaussian
components. In the latter case, initialization of the mesh
ΣV = RVT SV SV RV , is required and achieved by utilizing points, for example,
from the COLMAP tool and straightforward meshing strate-
and correspond with the shape of a triangle V . Fig. 3 shows gies. We also show the initialization of the mesh using
the process of determining Gaussians. Here the αji refer to Faces Learned with an Articulated Model and Expressions
the i-th Gaussian GVi and the j-th vertex, respectively. (FLAME) (Li et al., 2017), i.e., a human face initializa-
tion. We adjust the mesh vertices and Gaussian parameters
Within our model, we train a scale parameter, denoted as
throughout the training process.
ρ, to dynamically adjust the Gaussian component size. For
one face let us take k ∈ N number of Gaussians: Let us consider a mesh denoted as M = (V, E), where
V ⊂ R3 represents vertices, and E ⊂n
R3 signifies edges.
GV = {N (mV (α1i , α2i , α3i ), ρi ΣV ))}ki=1 , o n
The mesh’s faces, denoted by FM = v1Vi , v2Vi , v3Vi
i
where α1i + α2i + α3i = 1 and ρi ∈ R+ . are described by sets of vertices. As previously detailed,
such mesh serves as a parameterization for shaping Gaussian
This distribution method enables us to render Gaussian
components.
splats solely dependent on the position of the mesh triangles.
Consequently, we can readily perform geometric transforma- For our experiments, we choose the number k ∈ N of Gaus-
tions when dealing with Gaussians within a single (triangle) sian components per mesh face and define how densely they
face. Triangle transformation will apply Gaussian transfor- should be placed on its surface. For mesh containing n ∈ N
mation as well. Such a novel approach is crucial to our work, faces, the final number of Gaussians is fixed and equal to
and we can relate this to the affine Gaussian transformation. k · n.
Fig. 4 shows examples of such transformation for rota- For each Gaussian component, we define trainable
 Face V at time zero corresponds to the
tion and scaling. parameters used to calculate mean and covariance
triangle V = v1V , v2V , v3V , then it is transformed into {(α1i , α2i , α3i , ρi )}k·n
i=1 . Therefore, GaMeS use a dense set

4
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

Only Gaussians means are on mesh GaMeS Only Gaussians means are on mesh GaMeS

Figure 6. GaMeS parameterizes all Gaussian components by mesh vertices to lie directly on the mesh. Therefore, when we modify the
mesh positions, we automatically adapt the components’ means and covariances. Consequently, we can see differences in the rendered
image quality for the transforming when (a) only the Gaussian position is editable, and (b) when the Gaussian rotation and scaling are
editable.

GM of 3D Gaussians automatically adapt the means and covariances of relevant


[ Gaussians. Such an approach allows us to work with trans-
GM= {GVi , σi,j , ci,j }kj=1 , formation. To better visualize such properties, we compare
Vi ∈FM our model with a solution where the Gaussian components’
centers are placed on mesh faces but do not parameterize the
where σi,j is opacity, and ci,j is SH colors of j-th Gaussian,
covariance matrix. Fig. 6 we can see that GaMeS Gaussian
and i-th face.
component adapts automatically to mesh modification.
Mesh initialization During training, we can modify mesh In particular, when an object is bent, models that would only
vertices and Gaussian parameters. We can also use external consider the position (means) of Gaussians depending on
tools for mesh fitting (Takikawa et al., 2021a) and train only the mesh would fail to adapt properly. We can observe this
Gaussian components. In the experimental section, we show effect in Fig. 6 with the ship’s rope rendering. In contrast,
both of such scenarios. GaMeS, provides a perfect fit with the modification, and the
operation scheme is shown in Fig. 7.
In our model, we use a fixed number of Gaussian compo-
nents per face. In the case of models with various face sizes,
finding such parameters is not trivial. To solve such a prob- 4. Experiments
lem, we propose to divide large faces into smaller parts, see Here, we delineate implementation details, describe the
Fig. 5. used datasets and the reasons for their selection. Then, we
describe three scenarios: (1) when we have an existing mesh,
Initial Only Gaussians (2) when we train mesh based on COLMAP initialization,
Position GaMeS means on mesh and (3) when we use FLAME initialization. In the end, we
show inference time both on static and animated scenes.

4.1. Implementation details and data description


The time taken by the optimization process for our model
varied from a few up to no more than sixty minutes, as
it depends on factors like scene complexity, dataset char-
Figure 7. Compared to a method solely based on Gaussian averag- acteristics, the number of Gaussians per face, and vertex
ing dependent on position, we observe that, following rotation and fine-tuning decisions. Nonetheless, all experiments were
stretching the surface, Gaussians should dynamically adjust their conducted within a reasonable time frame, underscoring the
scale and rotation to seamlessly conform to the change as in the efficiency of our approach across diverse scenarios. The
case of GaMeS model.
supplementary material provides all the relevant details, and
the source code is available in Github1 .
Mesh modification GaMeS parameterize all Gaussian 1
https://github.com/waczjoan/
components by verities of mesh in such a way that they lie gaussian-mesh-splatting
on meshes. Therefore, when we modify mesh positions, we

5
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

PSNR ↑
Chair Drums Lego Mic Materials Ship Hotdog Ficus
Static
NeRF 33.00 25.01 32.54 32.91 29.62 28.65 36.18 30.13
VolSDF 30.57 20.43 29.46 30.53 29.13 25.51 35.11 22.91
Ref-NeRF 33.98 25.43 35.10 33.65 27.10 29.24 37.04 28.74
ENVIDR 31.22 22.99 29.55 32.17 29.52 21.57 31.44 26.60
Plenoxels 33.98 25.35 34.10 33.26 29.14 29.62 36.43 31.83
Gaussian Splatting 35.82 26.17 35.69 35.34 30.00 30.87 37.67 34.83
Editable
GaMeS (our) 32.05 25.43 33.54 34.78 27.54 30.79 34.36 33.12
RIP-NeRF 34.84 24.89 33.41 34.19 28.31 30.65 35.96 32.23

Table 1. Quantitative comparisons (PSNR) on a NeRF-Synthetic dataset showing that GaMeS gives comparable results with other models.

PSNR ↑
Outdoor scenes Indoor scenes
bicycle flowers garden stump treehill room counter kitchen bonsai
Static
Plenoxels 21.91 20.10 23.49 20.66 22.25 27.59 23.62 23.42 24.66
INGP-Big 22.17 20.65 25.07 23.47 22.37 29.69 26.69 29.48 30.69
Mip-NeRF360 24.37 21.73 26.98 26.40 22.87 31.63 29.55 32.23 33.46
GS - 7K 23.60 20.52 26.25 25.71 22.09 28.14 26.71 28.55 28.85
GS - 30K 25.25 21.52 27.41 26.55 22.49 30.63 28.70 30.32 31.98
Editable
R-SuGaR-15K 22.91 - 25.29 24.55 - 29.95 27.47 29.38 30.42
GaMeS (small) (Our) 22.32 19.18 25.93 24.26 21.56 28.49 25.88 25.92 26.98
GaMeS (large) (Our) 23.46 19.39 26.28 24.59 21.40 28.83 26.44 27.18 27.84

Table 2. The quantitative comparisons of reconstruction capability (PSNR) on Mip-NeRF360 dataset. R-SuGaR-15K uses 200K vertices.
GaMeS (large) uses 30 Gaussians per face on mesh from COLMAP and 10 for mesh from GS. GaMeS (small) uses 10 Gaussians per face
on mesh from COLMAP and 1 for mesh from GS.

We illustrated the fundamental gains of GaMeS model dataset lacks corresponding mesh representations for the de-
through experiments with three distinct datasets. picted person. This dataset is used to compare GaMeS with
the NeRFlame model (Zajac ˛ et al., 2023), mainly within the
Synthetic Dataset: A NeRF-provided eight geometrically
human-mesh fitting task.
complex objects with realistic non-Lambertian materials
(Mildenhall et al., 2020). In addition to containing images
of the objects, the dataset also incorporates corresponding 4.2. Scenario I: Model with existing mesh
meshes. We leveraged this dataset for our initial experiments In this scenario, we utilize the provided mesh, incorporating
to unequivocally showcase the capabilities of our model and Gaussians by strategically placing them onto its surface.
underscore its flexibility in mesh editing.
Our initial experiments used the Synthetic Dataset (Milden-
Mip-NeRF360 Dataset: A dataset comprising five outdoor hall et al., 2020), incorporating shared meshes. Table 1
and four indoor scenes, each featuring intricate central ob- presents PSNR metrics demonstrating the competitiveness
jects or areas against detailed backgrounds (Barron et al., of GaMeS approach with existing methods. Additional
2022). We used this dataset to demonstrate the adaptabil- numerical comparisons are available in the supplementary
ity of our model in handling dense scenes, employing a material.
non-conventional strategy during the initial training stage in
mesh preparation.
Human Faces Dataset: A modified subset of the Stir-
ling/ESRC 3D Face Database2 , that includes images of
six people, generated using Blender software from vari-
ous perspectives, consistently excluding the backs of heads.
Only one expression is assigned for each face. Notably, the
2
https://pics.stir.ac.uk/ESRC/index.htm

6
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

GS over 7000 iterations to treat the 3D Gaussians positions


as a point cloud. Only 20% of them are then transformed
into a mesh using the Alpha Shapes (Edelsbrunner et al.,
1983) algorithm.
As anticipated, we observe scene reconstruction, including
both objects and details, see Fig. 2. Nevertheless, back-
grounds and elements with large mesh faces may exhibit a
lower-quality appearance. However, this does not hinder the
feasibility of GaMeS model for reconstruction and modifi-
cation. Additionally, the experiment revealed that Gaussian
splats do not need to form a hierarchical structure. Instead,
they can be embedded quite flatly, which is a non-trivial
observation, and to our knowledge, it is not considered.
Numerous methods facilitate the generation of an initial
mesh for objects, but our primary focus is not on this task.
Instead, we emphasize addressing distinct issues that ex-
tend beyond the scope of initializing object meshes like
EG3D (Chan et al., 2022), NGLOD (Takikawa et al., 2021b)
or NeRFMeshing (Rakotosaona et al., 2023).

Figure 9. The main idea of fitting human mesh and how it can be
transformed to a different person.
Figure 8. Reconstruction of 3D peoples and generation of new
expressions using the GaMeS model.
4.4. Scenario III: GaMeS with FLAME as Init Mesh

Notably, a significant contribution lies in the ease with In this experiment, we show the possibility of fitting mesh
which renders can be modified manually or in an automated using an Init Mesh, which was not acquired directly from
process. Fig. 1 shows examples of reconstruction and sim- our data in contrast to previous scenarios. Here, we relied on
ple object animations. the Faces Dataset as it provides such required experimental
data. To initialize the meshes, we used the FLAME (Li et al.,
2017) framework, generating fully controllable meshes of
4.3. Scenario II: Joint training of mesh and GS
human faces. Consequently, we could highlight the key ad-
Our approach is capable of modeling unbounded scenes. In vantage of GaMeS model, i.e., the ability to modify objects.
order to achieve that, we first create an initial mesh and then We used the official implementation of the FLAME, with
adjust it during the training. Results in Table 2 demonstrate the number of parameters suggested in RingNet (Sanyal
that such modeling is feasible and reasonably comparable to et al., 2019) In addition, we implemented an extra detail
other existing models. Particularly, the results show parity learning parameter to better represent hair or T-shirts.
with the GS-7K and R-SuGaR-15K models.
Fig. 9, shows the main idea behind our approach. Initially,
It is important to highlight that the rendering quality is a human mesh is generated using the FLAME model. By
contingent upon the quality of the acquired mesh. In our accurately adjusting the mesh, we gain the ability to ap-
experiments, we opted for a hybrid approach to generating ply it, for example, to another person. By utilizing the
the initial mesh. This approach combines two meshes, each FLAME parameters, we can modify various aspects, espe-
using a different number of Gaussian splats per face. The cially facial expressions, to generate facial expressions like
first mesh is extracted directly from the shared point cloud smiling, sadness, eye closure, mouth opening, and flirty or
COLMAP, while the second mesh is inspired by the SuGaR disgusted expressions. The possibilities of reconstruction
technique (Guédon & Lepetit, 2023) and involves training and modifications for all six faces are depicted in Fig. 8. All

7
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

expressions were chosen randomly as our main goal was to GaMeS GaMeS-A
FPS Num. of comp. PSNR FPS
show GaMeS model’s ability to render objects as well as its
bicycle 230 1 316 658 22.3 43.2
modifications/animations. flowers 260 815 966 19.2 58.9
garden 163 2 092 692 25.9 31.9
PSNR ↑ stump 333 658 420 24.3 65.6
Person 1 2 3 4 5 6 treehill 120 975 790 21.5 36.2
Static room 230 775 096 28.5 52.3
NeRF 33.37 33.39 33.08 31.96 33.15 32.42 counter 213 502 808 25.9 60.0
GS 50.16 49.04 49.81 49.97 49.77 49.17 kitchen 201 532 324 25.9 62.4
Editable bonsai 174 1 274 648 27.0 42.4
NeRFlame 27.89 29.79 29.70 25.78 32.59 29.18
GaMeS 32.73 29.56 29.42 29.95 32.27 31.50 Table 4. Comparison of achieved frame rate (FPS) on different
scenes. GaMeS-A means FPS during modification. In each frame,
Table 3. Comparison of PSNR, SSIM, and LPIPS matrices be- we have different mesh positions. We can render static and dy-
tween static models: NeRF, GS and referring to them: NeRFlame namic scenes in real-time. All results were obtained using the RTX
and GaMeS. We used 100 Gaussians per face. 4080 GPU.

The comparison of the mesh fitting capability between the not have any additional steps, which results in a considerable
GaMeS with the NeRFlame model is shown in Fig. 10. advantage of our approach.
Notably, the faces are distinctly delineated and closely re-
semble the modeled objects, capturing not only postures but To prove that, we conducted experiments to compare GS
also expressions and intricate details. with GaMeS, examining their performance in static visual-
izations. We extend our experiments to include dynamic
Table 3 shows the results of using PSNR metrics on six scenes by evaluating the performance of GaMeSin an an-
faces. Comparisons are made between the two baseline imated scenario, see Fig. 2. By analyzing FPS metrics in
NeRF models, GS, and the models created from them, i.e., static and animated contexts, we seek to provide real-time
the corresponding NeRFlame and GaMeS with animation rendering. For comparability, all the computations were
capabilities. performed using the same RTX 4080 GPU.

5. Conclusion
This paper presents a new GS-based model called GaMeS
that leverages two main concepts, i.e., classical mesh and
vanilla GS. Here, we represent a 3D scene with a set of
Gaussian components lying on the mesh. Moreover, we
parameterize the Gaussian components by the mesh ver-
tices, allowing single-stage training. In inference, we obtain
GS, which can be modified in real time by manipulating
mesh components. As a result, when the mesh changes, all
Gaussian components automatically adapt to a new shape.
Limitations GaMeS allows real-time modifications, but
artifacts appear in the case of significant changes in the case
of meshes with large faces. In practice, large faces should
be divided into smaller ones. How to change Gaussian
Figure 10. During the training of GaMeS, we simultaneously components in GaMeS when mesch faces are split is not
model FLAME mesh. Here, we present the meshes fitted by apparent.
GaMeS compared to fitted mesh by NeRFlame model. Societal impact Our GaMeS contributes to lowering the
computational complexity of 3D modeling, therefore reduc-
4.5. Real-time rendering ing the energy consumption of such models.

We show that GaMeS can produce high-quality renders sim-


ilar to GS or NeRF. Furthermore, we can manipulate mesh
References
structures to obtain real-time modification of 3D objects, see Barron, J. T., Mildenhall, B., Tancik, M., Hedman, P.,
Table 4. In our process of modification/simulation, we do Martin-Brualla, R., and Srinivasan, P. P. Mip-nerf: A

8
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

multiscale representation for anti-aliasing neural radiance Mildenhall, B., Srinivasan, P. P., Tancik, M., Barron, J. T.,
fields. In Proceedings of the IEEE/CVF International Ramamoorthi, R., and Ng, R. NeRF: Representing Scenes
Conference on Computer Vision, pp. 5855–5864, 2021. as Neural Radiance Fields for View Synthesis. In ECCV,
2020.
Barron, J. T., Mildenhall, B., Verbin, D., Srinivasan, P. P.,
and Hedman, P. Mip-nerf 360: Unbounded anti-aliased Müller, T., Evans, A., Schied, C., and Keller, A. Instant
neural radiance fields. CVPR, 2022. neural graphics primitives with a multiresolution hash
encoding. ACM Transactions on Graphics (ToG), 41(4):
Björck, A. Numerics of Gram-Schmidt orthogonalization. 1–15, 2022.
Linear Algebra and Its Applications, 197:297–316, 1994.
Nulkar, M. and Mueller, K. Splatting with shadows. In
Blinn, J. F. A generalization of algebraic surface drawing. Volume Graphics 2001: Proceedings of the Joint IEEE
ACM Transactions on Graphics (TOG), 1(3):235–256, TCVG and Eurographics Workshop in Stony Brook, New
1982. York, USA, June 21–22, 2001, pp. 35–49. Springer, 2001.
Chan, E. R., Lin, C. Z., Chan, M. A., Nagano, K., Pan, B., Qian, S., Kirschstein, T., Schoneveld, L., Davoli, D., Gieben-
Mello, S. D., Gallo, O., Guibas, L., Tremblay, J., Khamis, hain, S., and Nießner, M. GaussianAvatars: Photorealistic
S., Karras, T., and Wetzstein, G. Efficient geometry-aware Head Avatars with Rigged 3D Gaussians. arXiv preprint
3D generative adversarial networks. In CVPR, 2022. arXiv:2312.02069, 2023.
Eckart, B., Kim, K., Troccoli, A., Kelly, A., and Kautz, J. Rakotosaona, M.-J., Manhardt, F., Arroyo, D., Niemeyer,
Accelerated generative models for 3D point cloud data. In M., Kundu, A., and Tombari, F. NeRFMeshing: Distilling
Proceedings of the IEEE Conference on Computer Vision Neural Radiance Fields into Geometrically-Accurate 3D
and Pattern Recognition, pp. 5497–5505, 2016. Meshes, 03 2023.
Edelsbrunner, H., Kirkpatrick, D., and Seidel, R. On the Sanyal, S., Bolkart, T., Feng, H., and Black, M. Learning
shape of a set of points in the plane. IEEE Transactions to regress 3d face shape and expression from an image
on Information Theory, 29(4):551–559, 1983. without 3d supervision. In Proceedings IEEE Conf. on
Computer Vision and Pattern Recognition (CVPR), June
Fridovich-Keil, S., Yu, A., Tancik, M., Chen, Q., Recht, B., 2019.
and Kanazawa, A. Plenoxels: Radiance fields without
neural networks. In CVPR, pp. 5501–5510, 2022. Takikawa, T., Litalien, J., Yin, K., Kreis, K., Loop, C.,
Nowrouzezahrai, D., Jacobson, A., McGuire, M., and
Gao, K., Gao, Y., He, H., Lu, D., Xu, L., and Li, J. Nerf: Fidler, S. Neural geometric level of detail: Real-time
Neural radiance field in 3d vision, a comprehensive re- rendering with implicit 3D shapes. In Proceedings of the
view. ArXiv, abs/2210.00379, 2022. IEEE/CVF Conference on Computer Vision and Pattern
Recognition, pp. 11358–11367, 2021a.
Guédon, A. and Lepetit, V. Sugar: Surface-aligned gaussian
splatting for efficient 3d mesh reconstruction and high- Takikawa, T., Litalien, J., Yin, K., Kreis, K., Loop, C.,
quality mesh rendering. arXiv preprint arXiv:2311.12775, Nowrouzezahrai, D., Jacobson, A., McGuire, M., and
2023. Fidler, S. Neural Geometric Level of Detail: Real-time
Rendering with Implicit 3D Shapes. 2021b.
Kerbl, B., Kopanas, G., Leimkühler, T., and Drettakis, G. 3d
gaussian splatting for real-time radiance field rendering. Wu, G., Yi, T., Fang, J., Xie, L., Zhang, X., Wei, W.,
ACM Transactions on Graphics, 42(4), 2023. Liu, W., Tian, Q., and Wang, X. 4d gaussian splatting
for real-time dynamic scene rendering. arXiv preprint
Keselman, L. and Hebert, M. Flexible techniques for dif- arXiv:2310.08528, 2023.
ferentiable rendering with 3D gaussians. arXiv preprint
arXiv:2308.14737, 2023. Zajac,
˛ W., Waczyńska, J., Borycki, P., Tabor, J., Zi˛eba,
M., and Spurek, P. NeRFlame: FLAME-based condi-
Li, T., Bolkart, T., Black, M. J., Li, H., and Romero, J. tioning of NeRF for 3D face rendering. arXiv preprint
Learning a model of facial shape and expression from arXiv:2303.06226, 2023.
4D scans. ACM Transactions on Graphics, (Proc. SIG-
GRAPH Asia), 36(6):194:1–194:17, 2017.

Man, P. Generating and real-time rendering of clouds. In


Central European seminar on computer graphics, vol-
ume 1. Citeseer Castá-Papiernicka, Slovakia, 2006.

9
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

A. Appendix Initial Mesh Subdivided Mesh


The appendix to the paper contains excerpts related to NeRF k=1 k=10 k=5
Synthetic and a continuation of the numerical results ex-
pounded in the main document.

A.1. Number of Gaussians per face


Here, we show the results of the experiments for the NeRF
Synthetic dataset with a white background and original
mesh. The dependence of the PSNR, SSIM and LPIPS
result on the number of Gaussians per face is shown in
Table 5.
Mostly, as a consequence of increasing the number of Gaus-
sians, we get better results, and we can observe that a single
Gaussian per face is insufficient. Materials and Lego has
a very dense mesh, consequently positioning an excessive
number of Gaussians within a confined space does not yield
substantial improvements in the results.
Figure 12. A juxtaposition of outcomes is conducted for a model
Initial Mesh Subdivided Mesh varying in Gaussian counts per face and with a mesh subdivided.
k=1 k = 10 k=5 GaMeS with good mesh initialization is able to capture fine details.
PSNR = 29.32 PSNR = 31.11 PSNR = 33.01
sians per face, allows for significantly better results and
detail capturing. The effects are shown in Fig. 11 (initial
positions) and Fig. 12 (modifications).

A.2. Modification
Illustrating the reconstruction and modification through
the application of the GaMeS model is shown in Fig. 13.
GaMeS allows the creation of lifelike alterations, such as
elevating the excavator blade or imaginative and surrealistic
bends.
In fact, modification depends only on user skills. Fig. 14
shows various modifications of the ficus such as: a) scaling
b) modifying a certain part – like bending branches, or
modifying the whole object – dancing ficus. (c) ignoring a
certain part. All images are generated from the view of the
same camera. Note that Gaussians represent color from two
sides. Therefore, after removing the soil from the pot, we
can still see the inner side of the pot.

A.3. Extension of numerical results from main paper

Figure 11. A qualitative assessment delineating the impact of vary- Here, we present extensions of experiments proposed in the
ing Gaussian quantities per face, discerned between the initial main part.
mesh and its subdivided counterpart.

However, experiments have shown that choosing a fixed


number of Gaussians per face mesh is inefficient when the
mesh contains different face sizes – in particular, it is diffi-
cult to cover large faces. Therefore, we decided to split the
large facets. Splitting large facets, even using fewer Gaus-

10
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

PSNR ↑
Number of
Chair Drums Lego Mic Materials Ship Hotdog Ficus
Gaussians
1 29.32 25.18 32.74 32.48 25.37 27.45 32.10 Z32.05
5 30.70 25.35 33.21 32.49 25.29 29.29 33.40 32.29
10 31.11 25.36 OOM 32.29 25.25 29.56 33.65 32.35
SSIM ↑
Chair Drums Lego Mic Materials Ship Hotdog Ficus
1 0.934 0.940 0.972 0.984 0.916 0.879 0.959 0.978
5 0.950 0.942 0.975 0.984 0.914 0.894 0.968 0.978
10 0.95 0.942 OOM 0.983 0.913 0.896 0.970 0.979
LPIPS ↓
Chair Drums Lego Mic Materials Ship Hotdog Ficus
1 0.066 0.052 0.027 0.012 0.060 0.133 0.062 0.018
5 0.050 0.049 0.022 0.012 0.061 0.107 0.042 0.018
10 0.044 0.049 OOM 0.12 0.063 0.102 0.038 0.019

Table 5. The quantitative comparisons (PSNR / SSIM / LPIPS) on Synthetic dataset with original meshes, with white background. OOM -
CUDA out of memory. In this experiment we used Tesla V100-SXM2-32GB GPU.

Before modification After modification

Figure 13. An example of reconstruction and modification using


the GaMeS model. The model enables more realistic modifica- Figure 14. Demonstrations of potential modifications facilitated
tions like lifting the excavator blade as well as surrealistic drums by games include: a) scaling, b) transforming either specific com-
bending. ponents or entire objects, and c) selectively disregarding specific
sections.

11
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

PSNR ↑
Outdoor scenes Indoor scenes
bicycle flowers garden stump treehill room counter kitchen bonsai
Static
Plenoxels 21.91 20.10 23.49 20.66 22.25 27.59 23.62 23.42 24.66
INGP-Big 22.17 20.65 25.07 23.47 22.37 29.69 26.69 29.48 30.69
Mip-NeRF360 24.37 21.73 26.98 26.40 22.87 31.63 29.55 32.23 33.46
GS - 7K 23.60 20.52 26.25 25.71 22.09 28.14 26.71 28.55 28.85
GS - 30K 25.25 21.52 27.41 26.55 22.49 30.63 28.70 30.32 31.98
Editable
R-SuGaR-15K 22.91 - 25.29 24.55 - 29.95 27.47 29.38 30.42
GaMeS (Our) 23.46 19.39 26.28 24.59 21.40 28.83 26.44 27.18 27.84
SSIM ↑
Outdoor scenes Indoor scenes
bicycle flowers garden stump treehill room counter kitchen bonsai
Static
Plenoxels 0.496 0.431 0.606 0.523 0.509 0.842 0.759 0.648 0.814
INGP-Big 0.512 0.486 0.701 0.594 0.542 0.871 0.817 0.858 0.906
Mip-NeRF360 0.685 0.583 0.813 0.744 0.632 0.913 0.894 0.920 0.941
GS - 7K 0.675 0.525 0.836 0.728 0.598 0.884 0.873 0.900 0.910
GS - 30K 0.771 0.605 0.868 0.775 0.638 0.914 0.905 0.922 0.938
Editable
R-SuGaR-15K 0.631 - 0.771 0.681 - 0.909 0.890 0.907 0.933
GaMeS(Our) 0.669 0.460 0.831 0.660 0.611 0.892 0.844 0.861 0.887
LPIPS ↓
Outdoor scenes Indoor scenes
bicycle flowers garden stump treehill room counter kitchen bonsai
Static
Plenoxels 0.506 0.521 0.386 0.503 0.540 0.419 0.441 0.447 0.398
INGP-Big 0.446 0.441 0.257 0.421 0.450 0.261 0.306 0.195 0.205
Mip-NeRF360 0.301 0.344 0.170 0.261 0.339 0.211 0.204 0.127 0.176
GS - 7K 0.318 0.417 0.153 0.287 0.404 0.272 0.254 0.161 0.244
GS - 30K 0.205 0.336 0.103 0.210 0.317 0.220 0.204 0.129 0.205
Editable
R-SuGaR-15K 0.349 - 0.218 0.336 - 0.243 0.234 0.166 0.219
GaMeS(Our) 0.326 0.481 0.142 0.347 0.384 0.257 0.294 0.218 0.287

Table 6. The quantitative comparisons (PSNR / SSIM / LPIPS) on the Mip-NeRF360 dataset. R-SuGaR-15K, with the number of 200K
vertices.

12
GaMeS: Mesh-Based Adapting and Modification of Gaussian Splatting

PSNR ↑
Chair Drums Lego Mic Materials Ship Hotdog Ficus
Static
NeRF 33.00 25.01 32.54 32.91 29.62 28.65 36.18 30.13
VolSDF 30.57 20.43 29.46 30.53 29.13 25.51 35.11 22.91
Ref-NeRF 33.98 25.43 35.10 33.65 27.10 29.24 37.04 28.74
ENVIDR 31.22 22.99 29.55 32.17 29.52 21.57 31.44 26.60
Plenoxels 33.98 25.35 34.10 33.26 29.14 29.62 36.43 31.83
Gaussian Splatting 35.82 26.17 35.69 35.34 30.00 30.87 37.67 34.83
Editable
GaMeS (our) 32.05 25.43 33.54 34.78 27.54 30.79 34.36 33.12
RIP-NeRF 34.84 24.89 33.41 34.19 28.31 30.65 35.96 32.23

SSIM ↑
Static
NeRF 0.967 0.925 0.961 0.980 0.949 0.856 0.974 0.964
VolSDF 0.949 0.893 0.951 0.969 0.954 0.842 0.972 0.929
Ref-NeRF 0.974 0.929 0.975 0.983 0.921 0.864 0.979 0.954
ENVIDR 0.976 0.930 0.961 0.984 0.968 0.855 0.963 0.987
Plenoxels 0.977 0.933 0.975 0.985 0.949 0.890 0.980 0.976
Gaussian Splatting 0.987 0.954 0.983 0.991 0.960 0.907 0.985 0.987
Editable
GaMeS (our) 0.976 0.945 0.973 0.986 0.928 0.895 0.977 0.979
RIP-NeRF 0.980 0.929 0.977 0.962 0.943 0.916 0.963 0.979

LPIPS ↓
NeRF 0.046 0.091 0.050 0.028 0.063 0.206 0.121 0.044
VolSDF 0.056 0.119 0.054 0.191 0.048 0.191 0.043 0.068
Ref-NeRF 0.029 0.073 0.025 0.018 0.078 0.158 0.028 0.056
Plenoxels 0.031 0.067 0.028 0.015 0.057 0.134 0.037 0.026
ENVIDR 0.031 0.080 0.054 0.021 0.045 0.228 0.072 0.010
Gaussian Splatting 0.012 0.037 0.016 0.006 0.034 0.106 0.020 0.012
Editable
GaMeS (our) 0.019 0.046 0.025 0.013 0.061 0.096 0.0258 0.020
RIP-NeRF - - - - - - - -

Table 7. Quantitative comparisons (PSNR) on a NeRF-Synthetic dataset showing that GaMeS gives comparable results with other models.

13

You might also like