Tableau Performance Checklist

Download as xlsx, pdf, or txt
Download as xlsx, pdf, or txt
You are on page 1of 21

Question Answer

How many workbooks are concerned?


When do the performance issues happen?

Who is concerned?

What have you tried previously?


Comment
Quantity and percent of total
All the time ? Early in the
morning ? When moving the
"x" filter?

Only Jeremy from Sales? All


the marketing Team?
Machine Type
Desktop one-shot recording pe

workbook optimizer

Server one-shot

workbook optimizer
reporting
How
To start recording performance, follow this step:
Help > Settings and Performance > Start Performance Recording
To stop recording and view a temporary workbook containing results from the recording session, follow this
step:
Help > Settings and Performance > Stop Performance Recording
You can now view the performance workbook and begin your analysis.

At the end of the "Publish Workflow" window.

Enable Performance Recording for a Site


By default, performance recording is not enabled for a site. A server administrator can enable performance
recording site by site.
Navigate to the site for which you want to enable performance recording.
Click Settings:
Under Workbook Performance Metrics, select Record workbook performance metrics.
Click Save.
Start a Performance Recording for a View
Open the view for which you want to record performance.
When you open a view, Tableau Server appends ":iid=<n>" after the URL. This is a session ID. For example:
http://10.32.139.22/#/views/Coffee_Sales2013/USSalesMarginsByAreaCode?:iid=1
Type :record_performance=yes& at the end of the view URL, immediately before the session ID. For
example:

http://10.32.139.22/#/views/Coffee_Sales2013/USSalesMarginsByAreaCode?:record_performance=yes&:iid=
1
Click the Refresh button in the toolbar.
Load the view.
View a Performance Recording
Click Performance to open a performance workbook. This is an up-to-the-minute snapshot of performance
data. You can continue taking additional snapshots as you continue working with the view; the performance
data is cumulative.

Move to a different page or remove :record_performance=yes from the URL to stop recording.

At the end of the "Publish Workflow" window or at the end of the Publish menu.
the http_requests (and the associate view) from the workgroup db in the Tableau Server Repository can help.
It's used in the Tableau Server Insights datasource available here : https://github.com/tableau/community-
tableau-server-insights
Pro Cons/limit References
very easy to use, with a good details "one-shot" report. Be careful : you won't https://help.tableau.com/current/
see the details if the datasource is published
on a server (in this case, download the
datasource and open it on your desktop)

A good starting point to have insights Guidelines necessarily general and may not
apply in every situation. These suggestions
are a starting point only; always frame your
decisions in the context of your
environment and the goals of your
workbook.
Item
Machine
Server
Desktop
Cloud
Tableau Version

Respect of requirement

CPU Nb of cores Frequency(GHz)


RAM Go DDR Type
Hard Drive SSD? PCIE Generation
Quantity of machine if cluster machines
network between machine and data idea of speed usage of VPN

Web Browser up to date


Comment

Consider to upgrade Tableau to the last release, especially if you're more than
1 year late. Optimizations are frequently made by software editors (like cache
management, improved queries...).
please check Tableau requirements
https://www.tableau.com/products/techspecs
https://help.tableau.com/current/server/en-us/server_hardware_min.htm
type

Not all SSD are equals, the PCIE 6 bandwidth is x8 higher than PCIE 3

Network can affect very badly the performance of your dashboard in several
ways, from firewall to vpn bottleneck and in upload (the query you send) like
in download (the result of query you receive). If you use a VPN, try to
reproduce without it. If you're on a desktop, try to reproduce on server, etc,
etc. Some tools are designed especially for it but even a ping can help.

For rendering performance.


Disclaimer This part is probably the hardest because your data model is above all linked to a func
general advices that must be followed with prec
Item
Extract/direct

Relationship/join/union

Use the Relationship performance


options (cardinality and referential
integrity)
snowflake/star/unique big table

blending

Number of connections
Custom SQL

Row-Level Security
This part is probably the hardest because your data model is above all linked to a functionnal need. In this document, we will have
general advices that must be followed with precautions.
Comment
Depending of your datasource, extracts in hyper (avoid .tde) can improve your performance by several orders of
magnitudes. However, other considerations should be taken such as data duplication (and the ressources needed),
the management of reload tasks, etc. Rule of thumb : if "non-analytic/non-MPP/ transactional" database or file,
extracts will be great. If you have a dedicated database for analytic, try to use it for architecture, governance and
monitoring.

Union : try to precompute it as much as you can (before Tableau). Relationships are computed only when needed,
contrary to Joins (that are always computed), so would provide better performance (However, Joins can be needed
for functional reason). Be careful about the field types you use for join (basically boolean is faster than integer that is
faster to string, etc, etc..) and try to use only one field as key for join (e.g : you can precompute a mono field key)

If you know the shape of your data, using these options can allow Tableau to optimize the query (like by deleting
useless joins). Do not use if you're not sure, it can leads to wrong results.

General idea : the more joins you have in your model, the more you take CPU time to calculate the joins; the less joins
you have, the more you take memory (it can also lead to more time for calculation depending of your calculation and
the database you use...);
Rule of thumb : Snowflake schema is unefficient. The Star schema/big fact table question cannot be answered
without knowing the exact database and calculations and even with that, tests can be required. Some ideas : on a
column-store database without "row calculation" on dimension (like a calculation on the product to make a group) ,
your model will work fine with a unique fact table.

Mixing data from several sources is usually to be avoided since Tableau will have to do the mix every time you do an
action on the machine, so having to "download" the keys of the blending;
For the same reasons than blending. Try to avoid it as much as you can, especially with Live Query.
Try to avoid it as much as you can. Tableau will use it in sub/nested queries. Alternative : views on the database.
If you can't avoid it, write an efficient query, without useless clauses (often : order by)…
In case you use a referential table in order to link your data and users/group of users, don't forget this table is… data
and the same rules seen above apply. Especially, join on number and having your permission table in the same
base/type than the data, especially in live query (e.g. : on live query, don't use an excel permission table with a Hive
table, load the permission table to Hive instead)
References

https://help.tableau.com/current/pro/desktop/en-us/datasource_relationships_perfoptions.htm
Items
Field type are ok

dimension (row level) calculation

Calculations are simple

Groups are avoided

Only use the data you need


Comment
Use date instead of timestamp if you don't need hours, minutes, secondes… cast
properly thing : a year is a number, not a string. Boolean if you have only two values, etc

try to deport as much as you can the calculation before Tableau Viz (in database, in file…
with help of ELT/ELT/Dataprep tools). E.g : left(date,4)=> create a Year field in the data.
Unit Price*Quantity => create an Amount field.
Please note that you can retrieve calculations :
-by using c: in the search bar of your field panel (available everyhere you're editing a viz)
-downloading your workbook and then using this python script :
https://github.com/scinana/tableauCalculationExport/
-extracting it with Alteryx cf https://www.theinformationlab.co.uk/2016/06/07/extract-
calculated-fields-tableau-alteryx/

if are avoided as much as you can (try to use boolean logic, case when syntax…, etc) and
when not avoided, the condition must be fast to calculate
LOD are among the worst calculation in terms of performance. Try to avoid it as much as
you can and when you have to do it, with only the fields required and only for the rows
needed. Alternative : table with precomputed several level of granularity.
even if it seems practical, Tableau "groups" often lead to poor performance, worse than
a dedicated calculated field (you can do the same with a "case myfield when value then
my_group syntax) or even better a dedicated field.
in terms of rows and column, limit the data to the strict minimum : select columns, filter
data… and aggregate it the good level !
These optimizations are mandatory when you're in LiveQuery but it can also impact whe
Database General Item
Database version

ODBC/JDBC/API…. Driver version


Database choice

Database server requirement/ressources


Field Types are ok

Statistics

Partitions

Primary key/foreign key


Database user can create temporary tables

Database specific
Most index
Vertica projections

Microsoft Sql Server column_store index


Hadoop data format (orc/parquet/avro/textfile…)

use of Hive/Impala/…
external/internal?
bucket
size of container
several queue
when you're in LiveQuery but it can also impact where you're in Extract.
Comment
A lot of improvement are made by database developers about performance. Try to have the
most recent version or at least not more than 1 year late.
Usually, the newer the faster
there is only a small probability you can choose your database. However MPP and Column-
Store database are known to be very efficient for dataviz-like queries (huge amount of rows,
high aggregation). Vertica, MonetDB, Clickhouse…

database vision. Same than data and calculation vision but with a focus on length added.
(ex : varchar(xxx), int/bigint…)
usually, several kinds of statistics : table, column, even partition. Statistics can be used to
store some metrics and also will help query execution plan. Ideally, update statistics every
time the table is updated (of course, when you use a dedicated table, not when it's a
transactionnal table)

must be related to dimension fields that are used and not to technical field (it's not an "etl"
table !). E.g : if you always filter by date, a partition by date can be a good idea.

PK/FK constraints will help to optimize query plan, especially join operations.
Caution : it was an old performance trick, not sure it's up to date.

useful on the dimension fields you use the most


on Vertica, it's the real way the data is stored, you can have several by table. It's a way to
precalculate.
Faster when doing dataviz, either index are row stored
Rule of thumb : Orc works well Hive, Parquet well with Impala, Avro and Textfile should be
avoided.
Rule of thumb : Impala is faster where there are a lot of aggregations (so for live queries)
not sure if this has a performance impact.

default container can be too small


the idea is to isolate the dataviz in production queue to keep a controlled ressource pooling.
Item
Filters
displayed /quick filters

Context filter

Limit number of worsheet

Avoid too much marks


Only useful objects, fields and rows

Tooltips

Fix size of dashboard instead of


automatic

Reduce images size (shapes,


pictures…)
Use View Acceleration

Design phase
Sample

Pause automatic refresh


Comment

By default, a quick filter will query all values in database. You can use Tableau Order of
Operation to reduce that, by choosing "all values in context" or "only relevant value".
(please note that may cause two queries instead of one.. So test it !). Try to limit the
number of quick filters, especially with high cardinality.Alternative : action filters.

Use context filters when filters reduce the data by at least 1/10 (rule of thumb). e.g. :
you always filter by date, and the date is even a partition key. Be careful, it can has
impact on LOD, especially fixed, top N, etc (cf Tableau Order of Operation).

Instead of doing two tables, can you merge it? Instead of doing two separate pie charts
on two sheets, can you do one pie chart with a column field in order to have only one
sheet? Can you split your dashboard in several dashboards? The idea is to reduce the
number of queries on one dashboard.

e.g. : 100 000 rows, scatterplot with 50 000 points.


Remove everything that does not make sense for your analysis. Keep it clean and
simple.
Tooltips may interfere very badly with UX, especially with viz in tooltips that would
require heavy calculations. (however doesn't impact the opening of the dashboard)
It doesn't play on queries but on the computation of your dashboard in browser
(basically javascript and all the responsivness calculation). A Fix size is easier to
calculate. And you can have several according to the device.
Use transparent background when possible, you can also ajust size/resolution of the
picture. Note you can also use utf8 icons instead of pictures.
Tableau Server/Cloud only. Allows to precompute views (precomputed results are
stored on the hard drive) and must be set by a Creator or an Explorer. (Allowed by
default on site but can be turned off so check your site parameters).

You can design your viz with a subset of data in order to design faster without being
impacted by performance. However, do not forget to stress test your work with real
volumetry.
References

https://help.tableau.com/current/server/en-us/data_acceleration.htm
https://help.tableau.com/current/pro/desktop/en-us/extracting_data.htm
https://help.tableau.com/current/pro/desktop/en-us/datasource_relationships_perfoptions.htm
https://interworks.com/blog/bfair/2015/02/23/tableau-performance-checklist
https://help.tableau.com/current/pro/desktop/en-us/perf_record_create_desktop.htm
https://help.tableau.com/current/pro/desktop/en-us/wbo_streamline.htm
https://help.tableau.com/current/server/en-us/data_acceleration.htm
Disclaimer

Change log
The informations and principles in this document come mostly from theory but also from experience.
Despite all precautions, we do not ensure the advices given here are always true and will stay true in
the future. Take some distance, test, iterate. And do not hesitate to get back to us with your ideas !

what
Hard drive

First version of the document.


who when version
Simon AUBERT (Business & Decision, 7/9/2023 v1.1
Orange Business)
Simon AUBERT (Business & Decision, 5/20/2023 v1.0
Orange Business)

You might also like