0

I am experiencing varnish (6.4) crashing very regularly when about 5K items are in the cache.

The problem is that I don't see any MAIN.n_lru_nuked entry in varnishstat.

Does that mean that no eviction is taking place ?

We have set the storage as malloc with 5g. varnish is running in docker a container with 10g of mem allocated to it.

varnishd -F -f /etc/varnish/default.vcl -a http=:80,HTTP -a proxy=:8443,PROXY -s malloc,5g

Here is the vcl

vcl 4.0;

import directors;

backend back1 {
  .host = "xxx.xx.xx.xx";
  .port = "80";
  .connect_timeout = 600s;
  .first_byte_timeout = 600s;
  .between_bytes_timeout = 600s;
}

acl purge {
  "localhost";

  #back1 1
  "xxx.xx.xx.xx";
}


sub vcl_init {
  new loadbalancer = directors.round_robin();
  loadbalancer.add_backend(back1);
}


sub vcl_backend_response {

  set beresp.grace = 30s;

  if (bereq.url ~ "assets") {
      unset beresp.http.set-cookie;
      set beresp.http.cache-control = "public, max-age=120";
      set beresp.ttl = 2h;
      return (deliver);
  }

  # Default : Any other content is cached for 2hours in Varnish and 120s in the browser . Except for the admin area backend
  if ( !(bereq.url ~ "adminarea") )
  {
      unset beresp.http.set-cookie;
      set beresp.http.cache-control = "public, max-age=120";
      set beresp.ttl = 2h;
      return (deliver);
  }

}

sub vcl_deliver {

    # Dynamically set the Expires header on every request from the web.
    set resp.http.Expires = "" + (now + 120s);

    # 2. Delete the temporary header from the response.
    unset resp.http.via;
    unset resp.http.x-powered-by;
    # unset resp.http.server;
    # unset resp.http.x-varnish;
}


sub vcl_recv {


  if (req.method == "BAN") {
    if (!client.ip ~ purge) {
            return(synth(403, "Not allowed."));
    }

    ban("obj.http.Pid == " + req.http.Varnish-Ban-Pid ) ;

    # Throw a synthetic page so the
    # request won't go to the backend.
    return (synth(200, "Banned  pid "+ req.http.Varnish-Ban-Pid)) ;
  }


  # Enable caching only for GET/HEADER methods
  if (req.method != "GET" && req.method != "HEAD"  ) {
    set req.http.X-Varnish-Pass="y";
    return (pass);
  }

  # Do not cache multimedia
  if (req.url ~ "\.(mp3|mp4|flv)$") {
    return (pass);
  }

  # Do not check in the cache for TYPO3 backend and AJAX requests
  if (req.url ~ "^/adminarea/") {
    set req.http.X-Varnish-Pass="y";
    return (pass);
  }

  if (req.http.Accept-Language) {
    if (req.http.Accept-Language ~ "^fr") {
            set req.http.Accept-Language = "fr";
    } elsif (req.http.Accept-Language ~ "^es") {
            set req.http.Accept-Language = "es";
    } elsif (req.http.Accept-Language ~ "^en") {
            set req.http.Accept-Language = "en";
    } else {
      set req.http.Accept-Language = "fr";
    }
  }

  # Force to gzip compression if the client allow compression of any kind
  if (req.http.Accept-Encoding) {
    if (req.http.Accept-Encoding ~ "gzip") {
      set req.http.Accept-Encoding = "gzip";
    } else {
      unset req.http.Accept-Encoding;
    }
  }

  # Update the X-Forwarded-For header by adding client IP address to it
  if (req.http.X-Forwarded-For) {
      set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
  } else {
      set req.http.X-Forwarded-For = client.ip;
  }

  # Tell Varnish to cache anything stored in /fileadmin /assets /Resources
  # (ignoring web server cache control header directives)
  if (req.url ~ "assets") {
    return (hash);
  }

  # Tell Varnish to always cache the calendar
   if (req.url ~ "calendar") {
      return (hash);
  }

  if ( !(req.url ~ "adminarea") )
  {
    return (hash);
  }

  set req.http.X-Varnish-Pass="y";
  return (pass);
}
3
  • I'm going to need a lot more information & context before I'm able to answer your question. What's does your VCL look like? What runtime parameters does your varnishd process have? Is your transient storage increasing? Commented Nov 19, 2020 at 12:09
  • @ThijsFeryn thank you. I have added more info in the original post
    – fran6
    Commented Nov 19, 2020 at 13:48
  • SMA.Transient.c_req 842 SMA.Transient.c_bytes 17.37G SMA.Transient.c_freed 17.37G SMA.Transient.g_alloc 0 SMA.Transient.g_bytes 0
    – fran6
    Commented Nov 19, 2020 at 15:47

1 Answer 1

1

DISCLAIMER: This is just a working theory, I cannot prove this

Theory: transient storage makes container go out of memory

I notice that over time 17.37G has been allocated to the Transient storage. Your stats show that this number has been freed as well.

Transient storage consumes memory that is not contained within the -s malloc,5g.

You say that your container has 10G allocated to it, so that means if the transient storage reaches 5G at some point, your container might crash.

What goes into transient?

As the name indicates, transient is temporary storage. This type of storage is used for:

  • Short-lived objects (objects with a TTL lower than the shortlived runtime parameter that defaults to 10 seconds)
  • Non-cacheable objects that are in-flight
  • Request bodies

Transient is primarily used to store items that aren't going to be in regular memory for long.

Even non-cacheable objects are temporarily put in transient, because you don't want fast backends to be blocked by slow clients. This means the backend streams the response to transient and can handle other tasks, while the client can pick this response up at its own convenience.

What to happened in your case?

Does your Varnish container process large files, such as video or audio? Even if they are not cached, they need to be kept in transient?

Again, it's just a theory, no way to prove this. But if you can reproduce the problem, please check the transient varnishstat counters.

If you see the SMA.Transient.g_bytes increasing, you know that transient is the reason for the crash.

6
  • Thank you for this answer. In fact we have videos (mp4) on the site, that are not cached but being served via varnish. The SMA.Transient.g_bytes is 0 most of the time. Sometimes it remains steady at ~700M. Either way, after a little while varnish child process restarts
    – fran6
    Commented Nov 20, 2020 at 9:23
  • are you suggestion limiting the space allocated to the transcient storage by adding something like "-s Transient=malloc,128m" ? What would be the impact on video streaming ?
    – fran6
    Commented Nov 20, 2020 at 13:08
  • It is possible to limit the transient size, as you correctly indicated in your comment. However, you will also get errors if that happens. Varnish will not crash, but you will get FetchError messages in your VSL logs stating Could not get storage. The end result still isn't positive, but at least your Varnish doesn't crash. Commented Nov 23, 2020 at 7:28
  • Ok, thank you. How about the issue that I can't see any MAIN.n_lru_nuked ? Is there anything else I could check to see if this process is running as expected ?
    – fran6
    Commented Nov 23, 2020 at 9:00
  • The n_lru_limited counter indicates when the nuke limit was reached. The default value is 50 and can be set via the nuke_limit runtime parameter. Commented Nov 23, 2020 at 10:15

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.