0
\$\begingroup\$

I am writing my 3D engine for Linux, and I ran into a performance problem. What was included at the time of testing:

  • Bloom
  • DOF (Depth Of Field)
  • Soft shadows
  • Point light source
  • 1k, 2k textures (ambient, diffuse, specular, normal mapping)

And I get only 48 fps

Framebuffer size - 1366x768

Shadow map resolution - 1024x1024

Nvidia GeForce 920mx, Intel Core i5-6200u

Here is the order of my actions:

  1. Rendering to the shadow map (cost: 0.5 - 1 frame)
  2. Basic rendering: drawing shadows, lighting, and more
  3. Blur vertically and horizontally (each pass is performed 2 times): bloom with fixed kernel and dof with generated (cost: bloom 10-15 frames, dof 12-16 frames; total = 30 frames)
  4. Blending

How normal are the results for this video card? Are there ways to optimize rendering, or is this the norm for this video card? Below is my implementation of the blur.

Fragment blur shader:

#version 330

#algdef

in vec2 texCoord;

#ifdef ALGINE_BLOOM_MODE_ENABLED
layout (location = 0) out vec4 bloomFragColor;
uniform sampler2D image; // bloom
uniform float bloom_kernel[BLOOM_KERNEL_SIZE];
vec3 bloom_result;
#endif

#ifdef ALGINE_DOF_MODE_ENABLED
layout (location = 1) out vec4 dofFragColor;

uniform sampler2D image2; // dof
uniform float max_sigma = 8.0;
uniform float min_sigma = 0.0001;

float dof_kernel[DOF_KERNEL_SIZE];
vec3 dof_result;

vec4 tmp;
float fdof;

const int DOF_LCR_SIZE = DOF_KERNEL_SIZE * 2 - 1; // left-center-right (lllcrrr)
const int DOF_MEAN = DOF_LCR_SIZE / 2;

void makeDofKernel(float sigma) {
    float sum = 0; // For accumulating the kernel values
    for (int x = DOF_MEAN; x < DOF_LCR_SIZE; x++)  {
        dof_kernel[x - DOF_MEAN] = exp(-0.5 * pow((x - DOF_MEAN) / sigma, 2.0));
        // Accumulate the kernel values
        sum += dof_kernel[x - DOF_MEAN];
    }

    sum += sum - dof_kernel[0];

    // Normalize the kernel
    for (int x = 0; x < DOF_KERNEL_SIZE; x++) dof_kernel[x] /= sum;
}
#endif

void main() {
    #ifdef ALGINE_BLOOM_MODE_ENABLED
    vec2 texOffset = 1.0 / textureSize(image, 0); // gets size of single texel
    #else
    vec2 texOffset = 1.0 / textureSize(image2, 0); // gets size of single texel
    #endif

    #ifdef ALGINE_DOF_MODE_ENABLED
    tmp = texture(image2, texCoord);
    fdof = tmp.a;
    makeDofKernel(max_sigma * fdof + min_sigma);
    dof_result = tmp.rgb * dof_kernel[0];
    #endif

    #ifdef ALGINE_BLOOM_MODE_ENABLED
    bloom_result = texture(image, texCoord).rgb * bloom_kernel[0]; // current fragment’s contribution
    #endif

    #ifdef ALGINE_BLUS_HORIZONTAL
    #ifdef ALGINE_BLOOM_MODE_ENABLED
    for(int i = 1; i < BLOOM_KERNEL_SIZE; i++) {
        bloom_result +=
            bloom_kernel[i] * (
                texture(image, texCoord + vec2(texOffset.x * i, 0.0)).rgb +
                texture(image, texCoord - vec2(texOffset.x * i, 0.0)).rgb
            );
    }
    #endif

    #ifdef ALGINE_DOF_MODE_ENABLED
    for(int i = 1; i < DOF_KERNEL_SIZE; i++) {
        dof_result +=
            dof_kernel[i] * (
                texture(image2, texCoord + vec2(texOffset.x * i, 0.0)).rgb +
                texture(image2, texCoord - vec2(texOffset.x * i, 0.0)).rgb
            );
    }
    #endif
    #else
    #ifdef ALGINE_BLOOM_MODE_ENABLED
    for(int i = 1; i < BLOOM_KERNEL_SIZE; i++) {
        bloom_result +=
            bloom_kernel[i] * (
                texture(image, texCoord + vec2(0.0, texOffset.y * i)).rgb +
                texture(image, texCoord - vec2(0.0, texOffset.y * i)).rgb
            );
    }
    #endif

    #ifdef ALGINE_DOF_MODE_ENABLED
    for(int i = 1; i < DOF_KERNEL_SIZE; i++) {
        dof_result +=
            dof_kernel[i] * (
                texture(image2, texCoord + vec2(0.0, texOffset.y * i)).rgb +
                texture(image2, texCoord - vec2(0.0, texOffset.y * i)).rgb
            );
    }
    #endif
    #endif

    #ifdef ALGINE_BLOOM_MODE_ENABLED
    bloomFragColor = vec4(bloom_result, 1.0);
    #endif
    #ifdef ALGINE_DOF_MODE_ENABLED
    dofFragColor = vec4(dof_result, fdof);
    #endif
}

And C++ code:

// configuring textures
for (int i = 0; i < 2; i++) {
    glBindFramebuffer(GL_FRAMEBUFFER, pingpongFBO[i]);

    // bloom
    glBindTexture(GL_TEXTURE_2D, pingpongBuffers[i]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB16F, ALGINE_SCR_W, ALGINE_SCR_H, 0, GL_RGB, GL_FLOAT, NULL);

    // dof
    glBindTexture(GL_TEXTURE_2D, dofBuffers[i]);
    glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, ALGINE_SCR_W, ALGINE_SCR_H, 0, GL_RGBA, GL_FLOAT, NULL);
    glClearColor(0, 0, 0, 1);
}

horizontal = true; 
firstIteration = true;
for (int i = 0; i < ALGINE_BLUR_AMOUNT; i++) {
    glUseProgram(blusPrograms[horizontal].programId);

    glBindFramebuffer(GL_FRAMEBUFFER, pingpongFBO[horizontal]);

    // bloom
    glActiveTexture(GL_TEXTURE0);
    glBindTexture(GL_TEXTURE_2D, firstIteration ? colorBuffers[1] : pingpongBuffers[!horizontal]);

    // dof
    glActiveTexture(GL_TEXTURE1);
    glUniform1i(blusPrograms[horizontal].samplerDof, 1);
    glBindTexture(GL_TEXTURE_2D, firstIteration ? colorBuffers[0] : dofBuffers[!horizontal]);

    // rendering
    renderQuad(blusPrograms[horizontal].inPos, blusPrograms[horizontal].inTexCoord);
    horizontal = !horizontal;
    if (firstIteration) firstIteration = false;
}

Screenshot: scene

I will be grateful for your answers

\$\endgroup\$
3
  • 2
    \$\begingroup\$ Stop thinking in frames per second and start thinking in time per frame when doing optimization. This makes the numbers a lot more relevant. Going from 60 to 30 FPS is a lot more severe than going from 1030 to 1000FPS. In the first case it's 16 milliseconds in the other it's 0.03 milliseconds. A massive difference. \$\endgroup\$ Commented Oct 15, 2018 at 11:11
  • \$\begingroup\$ notebookcheck.net/NVIDIA-GeForce-920MX.156034.0.html - it's a weak GPU \$\endgroup\$ Commented Oct 15, 2018 at 11:44
  • \$\begingroup\$ I decided to run glxgears in a window of the same size as the engine tested, and I get about 100 frames. With dof and bloom disabled, my engine gives about 85 frames. So I think that @MaximusMinimus is really right, and the point is in my video card. \$\endgroup\$
    – congard
    Commented Oct 20, 2018 at 15:43

1 Answer 1

1
\$\begingroup\$

Bottlenecks come in many forms.

A quick way to rule out if the fragment shading is the bottleneck, render in a small 320x200 window.

If the frame rate stays the same, your bottleneck lies somewhere else. If it increases, your fragment shaders are too expensive.

Next, make your post-processing steps optional. Switch on/off DOF and BLOOM, and see what happens.

Also, disable texturing, and see its impact.

And try skipping the shadow map generation, as it could be the limiting factor as well, or cause synchronization issues.

You may have unneeded GPU/CPU synchronization going on if you upload stuff to your GPU every frame. Look into buffer orphaning.

There are some free tools available to profile your GPU. Look into Intel's Graphics Performance Analyzers and the similar Renderdoc tool.

\$\endgroup\$
2
  • \$\begingroup\$ Thank you for your reply. As i noticed, my bottleneck is dof and bloom. Shadows have almost no effect on the number of frames. Turning off all effects gives approximately twice as many frames. As I understand it, I should optimize exactly the blur, the code of which I gave in the post above. Is it possible to somehow optimize it? For example, avoid glUniform1i and glActivateTexture? \$\endgroup\$
    – congard
    Commented Oct 15, 2018 at 19:02
  • 1
    \$\begingroup\$ @congard try a lower DOF kernel size. A large kernel means a lot of reads from the framebuffer. \$\endgroup\$
    – Bram
    Commented Oct 15, 2018 at 21:30

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .