Ganimeexplained: Ganime Girl Random Sample

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

GANIME EXPLAINED

contents
1 Generation Algorithm 2
1.1 GANime generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Upscaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Contract 4
2.1 Backend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Future 5
4 Useful links 5

list of figures
Figure 1 GANime girl random sample . . . . . . . . . . . . . . . . . . . 1
Figure 2 Upscale sample, left one is original, right processed with
SwinIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Figure 3 Tagged examples . . . . . . . . . . . . . . . . . . . . . . . . . . 3

abstract
This paper outlines current technology stack behind the GANime project. Including
machine learning part, software backend and NFT Smart-Contract.

Figure 1: GANime girl random sample

1
generation algorithm 2

Figure 2: Upscale sample, left one is original, right processed with SwinIR

1 generation algorithm
1.1 GANime generation

The main idea behind the algorithm it’s to map some latent space f(z) → x into
images space. This can be done using GAN models. In short, the idea is to take a
bunch of real images and images generated from z and start to train a model which
will distinguish between real and fake ones.
As result, we obtain the model which can make good
realistic images. We will not provide here detailed infor-
mation of how GAN models work training due to many
good persons already did it for us here or here. Also,
we would like to make such space Z controllable. By
controllable we mean that if change a bit our Z variable,
the space will move a bit too f(z + a) -> x + a.
For such a purpose, we decide to utilize the latest
state-of-the-art network called StyleGan3. This network
is not much different from StyleGan2 in terms of image
generation, but for videos, it’s allowed to avoid situations if the texture is losing its
temporal consistency. A great description of all other methods can be found in this
video. In result StyleGan model allows us to make such function that f(z) → x and
using normal distributed random function obtain g(y) → z.

1.2 Upscaling

After we got our generated image we occur the prob-


lem that the quality is not enough, mainly in terms of
resolution. And we decided to train a new model for
this purpose. In academic research, such a task is called
super-resolution. There’s a lot of approaches to solving
this problem. We decide to use a single-shot image algo-
rithm called SwinIR.
The idea of this work is to take an image, downscale
it and show the network a small sample and ask her to
make an upscaled version similar to the original version.
This can be done using Transformer networks or Visual
Transformers.
generation algorithm 3

Figure 3: Tagged examples

1.3 Tagging

We not only want to get a high-quality generation but also to detect traits. Traits
very useful, because it the simpliest way to estimate image rarity.
The 700k images used for generator training were utilized in this task too. All
of them were tagged by the community. Some of them are not very accurate or
have crossed classes, like "1girl", "single girl", "single", "one girl". So we select the
1024 most common tags, later we reduced the number up to 512. As a backbone,
the ResNet34 trained on the ImageNet dataset and later finetuned on our dataset.
Finetuned means that original classes used for predic-
tion were replaced with ours. We generated in total
21.5k images to estimate our GAN tags distribution.
Then we manually select the 32 best with good statistical
criteria. As result, we have a model to classify 32 tags
from generated images. Classification is done on small
192x192 images to reduce computation complexity.

1.4 Conclusion

Overall to generate a single sample we have to run our


generator, then upscale and downsize to classify tags. In
pseudocode, in can be described as

1 from ml.models.stylegan3 import get_stylegan3_model


2 from ml.models.upscaler_transofmer import get_upscaler_model
3 from ml.models.autotagger import get_tagger_model, select_best32
4 from tools.cv import resize
5 from core.providers import RandomProvider
6
7 # x - original generated image
8 # x_small - downsized original generated image
9 # x_big - upscaled original generated image
10 # z - random vector
11
12 # g(y) -> z
13 rnd = RandomProvider().provide_state_with_random_seed()
14 z = rnd.standart_normal([1, N])
15
16 # f(z) -> x, x is image now with pixels in
17 x = get_stylegan3_model().run(z)
contract 4

18
19 # downsize for autotagger
20 x_small = resize(x, (192, 192))
21 # detect tags
22 tags = get_tagger_model().run(x_small)
23 # select best
24 tags_best = select_best32(tags)
25
26 # upscale
27 x_big = get_upscaler_model().run(x)

2 contract
There are two common ways to store content in the blockchain:

• To pre-generate images and provide URLs on IPFS or any other distributed


storage while minting. This way is mainly used in the community in my opinion.
It’s a great way because of easy to code and low gas price but from a tech point,
we have an issue. The content is not generated while you’re minting. Would be
great if such an opportunity will exist because it means that blockchain has born
the content.
• Store the svg/jpg images on-chain as traits collection and merge the parts while
rendering starts. This way solving our previous issue and now the content will be
always stored until the ETH platform stops existing. But the new issue appears
here, the gas price. Due to you having to store a lot of information on blockchain,
like hundreds of KB or even MB, the deploy price is extremely high.

We found a way to concatenate the best from both worlds. The main idea is to
generate random unique seed using some function like

1 keccak256(abi.encodePacked(block.difficulty, block.timestamp, msg.sender, nonce));

The seed will be computed at minting time. After this, the URL with the content
will be stored in the contract too. The reveal of content will be not instantly, but
with a delay of up to a dozen minutes, depending on servers loads. This URL
is provided by our backend which runs NN inference using your random seed.
The generation procedure is compute-intensive so we
generate media for you once. Also, we will provide the
source code with instructions on how to run the model
and you can restore the media from your unique seed by
yourself. The model with all sources will be stored on
IPFS storage and code for downloading will be provided
too.
This method allows us to generate NFT itself on-chain
and store assets for reproducing using a third-party
provider. So your NFT always will be on-chain and born
on-chain too. All sources and instructions will be stored
using the IPFS provider.

2.1 Backend

After you mint, the blockchain generates an event with a unique ID. We subscribe
to these events and retrieve that ID.After that, our AI uses this number to generate a
picture. Then, our server places it on IPFS and pins it, receiving a link to the image.
Pinning is an action that tells IPFS that this file doesn’t need to be deleted at the
next "garbage collection" i.e. now the file is guaranteed not to be deleted from the
system. To pin files, we use a popular service pinata.cloud
future 5

3 future
We’re researching the idea of combining the existing NFT tokens a.k.a. latent
vectors. The idea behind how to implement it by classify which latent space re-
sponse for concrete trait and when combining just averaging the sectors or crossover.
Basically, if we do so you can combine 2 ganime girls.
For example the first one with glasses, the second one
with blue hair. The possible result will be blue hair with
glasses.
The next thing is we’re thinking on its style transfer,
i.e. to take an existing image and mutate it and get some
cyberpunk or any other styled image. This mechanic can
be interpreted as some potion as an NFT token.
We plan to give the girls animated facial expressions
and give you ability to control them in your browser.
Our long-term plans include making them talk with you.
In terms of facial animation we’re archived acceptable results but there is still a lot
of work to do.
We will not lie to you. To bring these plans for real it will take 1-2years of full-time
research and programming. All these plans require lots of computational resources
for scientific experiments and model checking.

4 useful links
• Discord

• Twitter

• Beauty contest

• 10K GANime grid

• 12.1K GANime grid

• Talking head demo

• Style transfer 1

• Style transfer 2

• Video 1

You might also like