Tuesday, September 29, 2015

Email notifications for failed Tableau extracts using Pentaho

UPDATE: This article is pointless now that Tableau Server does this out of the box. It is nice that they roll out so many enhancements so I don't have to make (and support and maintain) as many hacks.

Our Tableau server has (at the moment) 28 sites spread over as many departments. Newer (v8+) versions of Tableau Server do a better job of notifying if a workbook failed when logged in, but there still isn't a good way to get them pushed out. Site Admins can look at the system dashboards to see failed extracts, but that doesn't help users who only publish.

I was  getting sick and tired of sending people email saying "your extract keeps failing because of X. Please fix it or I will disable it." Cutting down on these failed extracts becomes especially important when they hit the backgrounder.querylimit (default is 2 hours), which means they occupied one of the backgrounders for 2 hours with nothing to show for it. I have a couple dashboards that tell me about chronic offenders, but the single most effective tool has been sending automatic notifications to the owner of a workbook or datasource when it fails. 

To accomplish this I used Pentaho Data Integration to:

  1. Query the internal Tableau Server database to find failed extracts and the owner.
  2. Email the owner.
  3. See your failed extracts diminish. 
The SQL looks for the last hour, so you will need to adjust that if you plan to run on a different schedule. 

The Pentaho Transform is available here.


Wednesday, September 23, 2015

Using Pentaho to Sync Tableau Server AD Groups

I have been using Pentaho Data Integration for a while at work to ETL data here and there, but have also been finding it useful for managing our Tableau Server. I wanted to share some of this work so here is the first one.

 I know that Tableau Server 9.1 can now automatically sync AD groups. If you are running an older version this might be useful. You could also use this as a starting point for some other purpose. Run it  on a schedule to keep your groups synced.

We had a script prior to this that we would add groups to. The issue was we have 28 sites on our server and only a small percentage of the total groups made it onto the script. This way everything stays in sync.

   Basically it:
  1.  Queries the internal Tableau Server DB and comes back with a list of all the AD groups in all the sites. 
  2. It then compares the results to a blacklist csv file and filters out any on the blacklist.
  3. It then runs a tabcmd sync group for each group. 
  4. Logs the results of the syncs to a database. 
Download thePentaho transform here. 




Thursday, September 10, 2015

Attack of the Ashley Madison FemBots!

As a followup question about some of the hacked Ashley Madison data I was asked how the bots that we designed to drum up revenue compared to real people.  Here you go!

Annalee Newitz at Gizmodo had a good breakdown of insights from the second dump with the source code. 

Wednesday, September 2, 2015

Ashley Madison Data: Life is Short. Have a look at an Affair.

If you haven't heard the Affair/Dating site Ashley Madison was hacked last month and when they didn't pay up, the data was released. When I heard the data was released I was really excited. The amount and kind of data make this topic really seductive. 

This dashboard won't allow you to look up people you know (or your spouse). It is designed to show some aggregate information about AM accounts. 


On to the nerdy details! So the data was released as a bunch of mysql dump files so they have to be loaded into a database. The credit card transactions were daily CSV files that had to be merged (I used Pentaho). Getting this all in place was a bit of work, and then the huge amount of the data was bringing my computer to a grinding halt. I ended up creating some views to shape the data how I wanted it and then exporting the view to a text file to load in Tableau. Once in Tableau I could aggregate further in an extract. I ended up with a few smaller aggregated datasets and it is a pretty small workbook. 

The other issue was that there was no data dictionary so some I have a seeking field, but I have no idea what a 4, 6, or 2 means. Reports are that many of the females are actually bots that trick men into buying more credits. Annalee Newitz at Gizmodo had a good breakdown of insights from the second dump with the source code. Using this I have excluded bots from almost everything.