Grafana MySQL charts can be inconsistent when zooming out
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	fnegri
	Jul 31 2024, 12:21 PM

Description

During the investigation for T367778: [wikireplicas] frequent replag spikes in clouddb hosts I discovered an issue with our Grafana MySQL dashboard. Most charts use irate[5m] but that can produce some artifacts when zooming out, if the metric is "spiky".

An example is the bytes_sent metric for server clouddb1019:13314 in the past 90 days. This is what the dashboard currently shows:

Screenshot 2024-07-31 at 13.48.19.png (626×1 px, 169 KB)

It looks like the values for bytes_sent before 2024-06-12 are very low, but if I change the query to use rate[1h], I get this graph:

Screenshot 2024-07-31 at 13.49.38.png (636×1 px, 216 KB)

Zooming in and using the original query (irate[5m]), you can see the actual shape of the traffic:

Screenshot 2024-07-31 at 13.52.18.png (540×1 px, 155 KB)

I can see two possible fixes:

Replacing irate[5m] with rate[$__rate_interval] seems to work fine in most situations, but it's possible it might hide some spikes that are captured by irate.
Copying the approach used by percona/grafana-dashboards where they use max(rate[$interval] or irate[5m]).

We should probably do this for all metrics in the MySQL dashboard that are currently using irate[5m].

Related Objects

Mentioned In: T367778: [wikireplicas] frequent replag spikes in clouddb hosts
Mentioned Here: T371102: Include long-retention Prometheus data from Thanos into Grafana queries
T367778: [wikireplicas] frequent replag spikes in clouddb hosts

Event Timeline

fnegri created this task.Jul 31 2024, 12:21 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 31 2024, 12:21 PM

fnegri mentioned this in T367778: [wikireplicas] frequent replag spikes in clouddb hosts.Jul 31 2024, 1:28 PM

If going with rate[$__rate_interval] is acceptable I'd recommend going for that as the simplest solution, see also the (ongoing) discussion at T371102

Krinkle subscribed.Aug 7 2024, 1:23 PM

lmata edited projects, added SRE Observability (FY2024/2025-Q1); removed observability.Aug 14 2024, 2:14 PM

joanna_borun removed a project: cloud-services-team.Aug 21 2024, 2:11 PM

lmata edited projects, added SRE Observability (FY2024/2025-Q2); removed SRE Observability (FY2024/2025-Q1).Nov 5 2024, 5:10 PM

	F56819045: Screenshot 2024-07-31 at 13.52.18.png
	Jul 31 2024, 12:21 PM

	F56819014: Screenshot 2024-07-31 at 13.49.38.png
	Jul 31 2024, 12:21 PM

	F56818988: Screenshot 2024-07-31 at 13.48.19.png
	Jul 31 2024, 12:21 PM

Grafana MySQL charts can be inconsistent when zooming outOpen, Needs TriagePublicActions

Description

Related Objects

Event Timeline

Grafana MySQL charts can be inconsistent when zooming out
Open, Needs TriagePublic
Actions