Enrollment Nerdery

A place to collect my thoughts on data analysis within Enrollment Management. Dare I call it Enrollment Science?



Where you can find me
GitHub
Twitter
LinkedIn
Resume
Email

Python, Rodeo, and Tableau Data Extract API

Many moons ago, I wrote some code to build a Tableau Data Extract from the work that I had munged together in python. I figured it was time to update the code since I recently discovered that the Tableau API has changed.

For a link to that old code, refer to the Jupyter Notebook in this repo.

Assumptions and Requirements

First off, I am using a Macbook, and while I believe things are getting easier on Windows machines with respect to coding, I prefer to write Terminal commands over point-and-click installs.

For tools, we will be using:

1. Create our Python environment

Assuming you have miniconda installed properly, we are going to create a Python 2.7 environment for our work with the Tableau API.

First, let’s explore what I have for environments on my system.

$ conda info --envs

yields

# conda environments:
#
chatbot                  /Users/btibert/miniconda3/envs/chatbot
ipy                      /Users/btibert/miniconda3/envs/ipy
tableau                  /Users/btibert/miniconda3/envs/tableau
root                  *  /Users/btibert/miniconda3

Above are the environments that I have set up on my machine. If you are not familiar with environments, think of them as isolated installs of python with various packages for your use-case. Environments are a great way to isolate the versions and dependencies relative to a project, and allows you to avoid collisions if you just manage a single python environment.

Now, lets create our environment. In this case, I am going to call it tableau-post.

conda create -n tableau-post python=2.7 pandas jupyter matplotlib

This will install python 2.7 , pandas, matplotlib and jupyter (formerly ipython).

These packages and version are what you need to:

To activate the environment, simply use:

 source activate tableau-post

The code above will activate our environment.

To confirm that you are good to go:

source activate tableau-post
which python

In my case, this yields:

/Users/btibert/miniconda3/envs/tableau-post/bin/python

We will use this when we configure Rodeo later on.

From this point forward, I am going to assume that you have tableau-post activated.

2. Get the Tableau SDK

The code below jumps into Downloads and will get the Tableau SDK from the web.

cd ~/Downloads/
wget https://downloads.tableau.com/tssoftware/Tableau-SDK-Python-OSX-64Bit-10-0-2.tar.gz
tar -xvzf Tableau-SDK-Python-OSX-64Bit-10-0-2.tar.gz

This will download and unpack the file, which will yields a directory that we need for installation.

cd TableauSDK-10000.16.1004.1720/

The command above will navigate into the folder.

sudo python setyp.py install

The command above will install the Tableau SDK for you. For more on this, go to this link

3. Fire up Rodeo and Configure it

With Rodeo open, navigate to Preferences. You will see a dialog box similar to the one below.

preferences

You can see above that for the Python Command I am using the same location that was printed from the which python command within the tableau-post environment.

It’s that simple; now you can use isolated environments when developing with Rodeo!

The majority of the code that I write is in R using the fantastic Rstudio IDE, and while the Rodeo tool is not as feature-rich (yet), it’s great for the way I develop. The fact that there is “type-ahead” functionality (read: you write less code), and the ability to commit the lines of my script 1x1, is extremely helpful for the way I work through my tasks.

I hardly debug in an efficient way, but I also tend to solve smaller problems and do not require test suites.

4. Build a Basic Extract

Below are a few helpful resources on how to use python to build out a Tableau Data Extract:

The code below builds out a basic extract from a sample dataset available on the web.

## import the libraries
import tableausdk.Extract as tde
import pandas as pd
import os

## bring in a sample Graduate School Admissions datasets
file_name = "http://www.ats.ucla.edu/stat/data/binary.csv"
df = pd.read_csv(file_name)
df.head()
df.shape

## create the extract name, but remove the extract if it already exists
fname = "example.tde"
try:  
    tdefile = tde.Extract(fname)
except:
    os.system('del ' + fname)
    os.system('del DataExtract.log')
    tdefile = tde.Extract(fname)


## define the table definition
tableDef = tde.TableDefinition()

## create a list of column names and types
colnames = df.columns
coltypes = df.dtypes

## create a dict for the field maps
## Caveat: I am not including all of the possibilities below
fieldMap = {
    'float64' :     tde.Types.Type.DOUBLE,
    'float32' :     tde.Types.Type.DOUBLE,
    'int64' :       tde.Types.Type.DOUBLE,
    'int32' :       tde.Types.Type.DOUBLE,
    'object':       tde.Types.Type.DOUBLE,
    'bool' :        tde.Types.Type.DOUBLE
}

## for each column, add the appropriate info the Table Definition
for i in range(0, len(colnames)):
    cname = colnames[i]
    ctype = fieldMap.get(str(coltypes[i]))
    tableDef.addColumn(cname, ctype)  


## create the extract from the Table Definition
## Super Hacky, but legible
## for each row, add the data to the table
## Again, not accounting for every type or errors
with tdefile as extract:
    table = extract.addTable("Extract", tableDef)
    for r in range(0, df.shape[0]):
        row = tde.Row(tableDef)
        for c in range(0, len(coltypes)):
            if str(coltypes[c]) == 'float64':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'float32':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'int64':
                row.setDouble(c, df.iloc[r,c])   
            elif str(coltypes[c]) == 'int32':
                row.setDouble(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'object':
                row.setString(c, df.iloc[r,c])
            elif str(coltypes[c]) == 'bool':
                row.setBoolean(c, df.iloc[r,c])
            else:
                row.setNull(c)
        # insert the row
        table.insert(row)

## close the file
tdefile.close()

Summary

Some notes on the process:

I hope that this helps you as you develop with Python and Tableau. I recently have been trying to define a workflow locally to prototype my data needs, and since my dataset has millions of rows, using CSVs and local database tables just isn’t cutting it.

comments powered by Disqus