Boost your Tableau calculations with Python

Data Boutiq blog

Why do we need calculations in Tableau? They are essential when the source data lacks the necessary information for visualization but can be derived through calculations.

Python excels in data processing and calculations but falls short in visualization. Therefore, it often makes sense to handle pre-processing in Python and then use Tableau for visualization.


Tableau also has limitations with table calculations. In one of our projects, we had to calculate ranks between columns in Python and then integrate them into Tableau Prep. Additionally, many advanced types of analysis, such as log-linear models, are not feasible in Tableau.


Why did we choose Python
to interact with Tableau?

connect Python with Tableau
To address challenges in the field of machine learning, which is a growing trend in the development of BI systems.
By leveraging existing Python solutions — whether your own or those available in numerous communities — you can enhance the Tableau environment.
Python offers more options for working with dates, text, and data structures, while Tableau complements these capabilities by creating interactive dashboards.
connect Python with Tableau

Advantages of Python for BI cases

calculations in Tableau
calculations in Tableau
Python allows you to write simple scripts from scratch with ease.
Its syntax is close to natural language, making it easy to read and write. Plus, it's interactive (interpreter).
There are numerous ready-made tools that can be adapted to meet your specific needs.
A wealth of literature, reference materials, and active communities are available to provide tips and tricks. One of Python's significant advantages is its extensive collection of libraries.

Top Python libraries
for data science

NumPy

https://numpy.org/
Responsible for representing and processing data in the form of arrays.

Pandas

https://pandas.pydata.org/
Responsible for the presentation and processing of one-dimensional vectors (Series) and two-dimensional data frames (Data frame).

Scikit-learn

https://scikit-learn.org/
Built on top of the NumPy, SciPy, and Matplotlib libraries.
Free and available for commercial use.
Scikit-learn Python library

TabPy installation & setup

TabPy (Tableau Python Server) is an Analytics Extension that enhances Tableau's capabilities. It allows users to execute Python scripts and saved functions directly within Tableau's table calculations.
tabpy
tabpy
The TabPy documentation strongly recommends running the server in a virtual environment.

We use the venv module for this purpose. It is included among the standard modules and does not require installation.

To proceed, you need to:
• Access the command line (execute cmd).
• Have Python installed.
To create a virtual environment, use the command:
python –m venv ‘name of the virtual environment’

For example: python –m venv myenv

Activation: myenv\Scripts\activate
Deactivation: deactivate

Watch setup examples videos:

TabPy module connection

Python virtual environment

Writing a Python script

Using TabPy, Tableau can execute Python scripts in real-time and display the results as visualizations. Users can control the data sent to TabPy by interacting with parameters in their Tableau worksheets, dashboards, or stories.
Profit prediction
To predict profit, we use the predict() method. This method takes the independent variable(s) as arguments and returns the predicted profit values.
x_full = pd.DataFrame(_arg1, _arg2)
p = regr.predict(x)
Functions for calculated in Python fields
SCRIPT_INT        for integer data (1, 2, 3, 5, 8);
SCRIPT_REAL       for floating-point data (3,14, 2,72, 6,07);
SCRIPT_STR        for categorical data (men/women);
SCRIPT_BOOL       for boolean data (True/False)

Let's build a regression Model!

How do we train a regression model in Python?
The fit() method of the LinearRegression class takes independent variable(s) (in the form of a DataFrame) and a dependent variable as arguments.
x = pd.DataFrame(_arg4, _arg5)
y = _arg3
regr = LinearRegression()
regr.fit(x,y)

Python function argument
We specify that the predicted values will be floating-point numbers. To achieve this, we use the SCRIPT_REAL() function, which will look like this:
SCRIPT_REAL(“PYTHON CODE…”,
SUM([Sales]) # forecasting
SUM([Discount]) # forecasting
SUM([filtered Profit]) # training
SUM([filtered Sales]) # training
SUM([filtered Discount])) # training

Passing calculation
results to Tableau

The result of the predict() method is a vector, but Tableau only understands returned lists. Therefore, before returning the results to Tableau, we convert them to list format using the tolist() method from pandas:

p = p.tolist()
return p
tabpy

How to use the results
in Tableau?

When using a calculated field in a visualization,
it is essential to remember:
• The granularity of this field (as well as the data sent to and received from TabPy) matches the granularity of the visualization.

• SCRIPT functions are table calculations and require adjustments to the fields used when the visualization changes.
tabpy
Otherwise, the fields received from TabPy can be used in visualizations as usual and incorporated into other calculated fields, provided they are already aggregated.

How do we identify
errors in scripts?

We use error messages to correct our calculations. It's important to distinguish between errors caused by Tableau and those arising from Tableau-Python communication and script execution.
Script execution error

We haven't yet discussed Python scripts and Tableau Prep

Python works directly with both Tableau and Tableau Prep, offering additional possibilities, especially for ETL processes. The connection mechanism is the same in both cases, using the TabPy module. However, the scripts are written slightly differently: Prep works purely with Python scripts, while Tableau requires you to "wrap" them in Tableau's formula.

Get a free consultation

By clicking the button, you agree to the processing of personal data.

We use it to stay connected. Read more