Saturday, May 9, 2020

MotM May 2020: 3D Map of Portland

Turns out the city of Portland, OR has a ton of spatial data available, and ESRI built a 3D map of the city. 




This uses the ESRI CityEngine to build a view of Portland. You can even change the direction of the sun!


Description From the Owner:

This web scene allows you to explore the central business district of the City of Portland, OR. It contains two types of 3D buildings: procedurally generated volumes with automated facade textures, and manually textured facades. Both created in CityEngine. 3D trees, where present, have accurate LiDAR-derived heights. Use the bookmark (star) icon to take a tour of different features in the city.

This web scene has been built using the 3D Cities Workflow 4.

New Normal for Map of the Month

For years now I have been doing a Map of the Month at work. It has consisted of a physical map on the outside of my cubicle plus a Tableau dashboard with some details, past maps, etc.

Now It has been a couple months since most people have been in our offices and I am missing the map of the month. I think I need to take it digital! So in a few days look forward to an online-focused MotM.

Wednesday, November 20, 2019

Deeper into the Tableau Repository Part 6: Cache how you want to



WARNING 

This is out of the box, non-standard, and not supported. Proceed at your own risk. Make sure you have a backup. Don't try this at home (more likely work). 
UPDATE 11/12/2020: I added some more details and comments about the code

Tableau server only allows you to set the cache for everything server wide. I have our server set at 12 hours. This is a pretty good compromise for things that update monthly, weekly, or daily. Newer versions of Tableau will dump the cache if an extract refreshes or the workbook is republished but a dashboard with a live database connection also gets that 12 hour cache. Mark Wu has a great article about how to manage this. I chatted with Mark last week at TC19 and got some really good insights. His solution is to add 1 millisecond to the last_published_at value for the workbook. His blog doesn't include code so below is what I came up with. Since the workbook thinks it has been updated it will dump the cache. 

There are comments in the code, but there are some things I am checking for:
  1. Who tagged it? Unless the tagger was the owner, site admin, or server admin I will ignore the tag.
  2. Does the workbook have an extract? This is intended for live database connections where the data is constantly changing. Extracts refreshes trigger a cache reload automatically so we want to use the cache. 

I run it every 5 minutes, but there is no reason you couldn't extend the idea to using multiple tags and having a more customizable cache. Something like a '1h-cache' tag to update it hourly and '15m-cache' that runs every 15 minutes. 

/*
The idea here is to refresh the cache ala http://enterprisetableau.com/cache3/ 
How it works:
This update statement will increment the published date for a workbook view tagged 'no-cache' by 1 millisecond
This forces the cache to be ignored and requery the database. 
*/
update workbooks w
set last_published_at = last_published_at + interval '1 millisecond'
FROM views v
,taggings ts
,tags t
,users u
,system_users su
where 
v.workbook_id = w.id
AND ts.user_id=u.id
AND su.id=u.system_user_id
AND v.id=ts.taggable_id
AND ts.tag_id=t.id
AND t.name='no-cache' --the tag we are looking for
AND (w.owner_id=ts.user_id --owner tagged the workbook
    OR su.admin_level=10 --OR a server admin tagged it
    OR u.site_role_id=0  --OR a site admin tagged it
    ) 
-- we are going to ignore it if it the workbook has an extract
AND (w.refreshable_extracts=false
    OR 
    w.incrementable_extracts=false);
 

For some reason the if I tag a workbook it shows as the view being tagged in the repository. I also set it to only do this for tags that were created by an admin (server or site) or the workbook owner.

Tuesday, October 15, 2019

Deeper into the Tableau Repository Part 5: Cracking the Workgroup Database Wide Open!


This is Part 5 of a series where we dig into the Tableau repository.


WARNING 

This is out of the box, non-standard, and not supported. Proceed at your own risk. Make sure you have a backup. Don't try this at home (more likely work). 

When Tableau 2019.2 came out it has an annoying banner that shows up. I have been wanting a MOTD (Message Of The Day) for a long time so I can post upcoming events, upgrades, etc to all users without spamming people. 



    Like many things this is a case where we get a feature, that isn't quite fleshed out and have no administrative controls (looking at you Ask Data and Extensions) and then in a next release Tableau allows us to control and customize them. Some of the issues I have with the banner:
    • Once it is hidden, it is hidden forever. You can't get it back. 
    • If you have an account on 2 sites you have to hide it for each site.
    • I can't control if it is shown.
    • I can't control what is shown. 
    Based on the user's license role the banner will change. The issue here is we have a Core-Based Server license and 40k users with publishing rights. Since they can publish they get the links to download Tableau Desktop and Prep Builder. This banner means I potentially have thousands of users who are prompted to download software which:
    1. They aren't licensed for
    2. Isn't the version we are using
    Most users will get this even though they don't have a license for Tableau Desktop, and we have zero Prep Builder licenses.

    While I was searching for a solution I came upon this article By Jonathan MacDonald over at the Information Lab that suggests enabling write access to the Workgroup database and just writing to the preferences table so it doesn't show up for users. It doesn't address what to do with new users, so you would actually have to run a scrip to find any new users and turn the banner off for them. 

    Now that I can write to the database what else is possible?

    I have so far avoided the REST API that can probably handle some of this. Some things I have been experimenting with are:

    Canceling Jobs


    Have a long-running extract that you want to cancel? well, there is a table for canceling them called BACKGROUND_JOBS_CANCEL that is empty. Insert a row in there with a background_job_id and it will stop running! I would guess there is a REST API command to populate the table and kill the job too.

    Turn off AskData


    You can do this for each data source, or for the entire server. Turning it off requires multiple clicks per datasource and we have hundreds.  I would really like the default to be off but still, allow datasource owners to enable it. AskData requires some prep to work well. The first thing that comes up when clicking on a datasource in Tableau Server is that it triggers the indexing of the datasource. We have had some performance issues with indexing large datasources. If people actually use AskData that is great but watching the logs most of the time a user will trigger the indexing and then do something else. Rarely do we have a user actually use AskData.

    To implement this I had to get the right permissions on the table by running:
    GRANT ALL ON SEQUENCE public.datasources_id_seq TO readonly;  
    GRANT ALL ON TABLE public.datasources TO readonly; 
    GRANT ALL ON SEQUENCE public.datasource_events_id_seq TO readonly;

    And then this SQL will turn off AskData for every datasource:
    UPDATE datasources 
    SET nlp_setting = 'disabled' 
    where parent_workbook_id is null

    Reschedule Jobs


     With the newer REST API, you can do this programmatically, but for a long time, it wasn't possible. I have been working to set thresholds to demote extracts that aren't used often. Here is an example: A workbook extract that updates every day, but nobody has looked at it in 20 days. Let's move it to a weekly schedule.

    Wednesday, October 9, 2019

    Deeper into the Tableau Repository Part 4: Letting users know about their Tableau Content

    This is Part 4 of a series where we dig into the Tableau repository.

    Our Tableau Server is a shared resource. I understand it is really easy to create a dashboard, generate an extract, and publish it out there. The really hard part is getting people to use a dashboard. This leads to lots of orphaned content.

    We recently have been having issues with a pretty substantial backlog of extract jobs in the morning. Our main EMR(Electronic Medical Record) database doesn't finish updating until 6am and then we have about 300 jobs that get added to the queue between 6am and 10am. With only 4 backgrounders to execute these jobs (plus any system tasks and subscriptions), there can be a bit of a backlog. We were averaging hours of lag time during the morning. I ended up adding a backgrounder process and there is a configuration setting to sort jobs (with the same priority) based on runtime.

    To better let users know what is going on I created a dashboard to allow them to see some stats. I invited all our users to subscribe themselves and I added some of the worst offenders. It contains sections that show on a monthly basis:

    • What content is being used
    • What extracts have failed or suspended
    • What content has not been accessed in the last 6 months (including how big it is)
    • How long do my extracts take compared to other users
    • How much time did my extracts take last month
    • How long did my extracts wait before executing
    • What extracts are being refreshed too frequently based on use

    The last one is a big deal. I found someone who had an hourly extract that hasn't been used in 6 months. I set some thresholds for hourly, daily, weekly, and monthly schedules. My next steps are to automatically demote extracts that are exceeding these thresholds. More on that to come!

    Monday, September 30, 2019

    Deeper into the Tableau Repository Part 3: So many AD Groups

    This is Part 3 of a series where we dig into the Tableau repository.

    In my previous lives as a sysadmin I learned that there are very few things you actually have to do once. From that experience, I learned that I should script EVERYTHING!!! As our Tableau Server use grew people were confused why a user or group wasn't on the Tableau Server. I had to explain that users and groups have to be imported and import them. I took the next step of importing a group with all our employees. But what about groups?

    On our dev server I wrote a script to find every group and import them. It worked fine except it took about 8 hours for the Tableau Server to sync them. Turns out in our organization we have about 30,000 users and 35,000 groups. Since that wouldn't work it was back to the drawing board. Since every group was out we needed to import groups people actually wanted without them bothering me.

    I came up with a solution that used a Google Form. Users could enter the group name and minimum site role for that group. I used Pentaho to make a job that:
    1. checks the Google Sheet with the results
    2. checks the Tableau Repository for groups that are already imported
    3. joins the two together and if any groups need to be imported import them with tabcmd
    4. send the user who requested the group a confirmation email with the results
    5. log the results to another tab in the Google Sheet. 
    This job has been running 3 times a day for a couple years now and really cuts down on emails from the 300+ people who are publishing content. I found early on that users would sometimes include the @domain.com instead of just the group name so I ended up doing some error checking in the Google Form to check for an '@' character. Originally I allowed the requestor to choose the site role, but since all our employees got a 'Publisher' role when I altered it for the new roles 'Creator Publisher' is the only option.

    Tuesday, September 24, 2019

    Deeper into the Tableau Repository Part 2: Opening it up with published datasources

    This is Part 2 of a series where we dig into the Tableau repository.

    There are lots of good bits of info in the Tableau Repository but Tableau only gives you one (two really but the readonly user is the one with good access) account with one password to access it. Site and Server Admins can see the built-in status dashboards to get information but there are a couple problems with them:

    1. Only admins? come on Tableau it can't be that hard to add a row-level filter to your dashboards and let anyone see what they did or own.
    2. Why can't I subscribe? They have embedded their own dashboards into Tableau Server they turned off the ability to subscribe. Wouldn't it be nice to get a subscription with how many extracts are failing on your server?
    A few years back Matt Coles shared his sharable datasources at TC16 and I loved the idea. I don't like having to be a bottleneck or doing the same thing twice so these were a really great solution. I applied a user filter (more on that later) and then piloted the Background Tasks, Events, and a custom-built Comments DS with a small group of power-users. The beta was a great success and now they have been rolled out to all users.

    Row-Level headache

    First off who gets to see what? I kicked around some ideas and came up with that you should be able to see the row if:

    • if you were the actor (you did the thing)
    • if you own the object acted upon (Workbook, View, or Datasource)
    • if you own a Datasource and the Workbook acted up connects to it
    • if you own the project the Datasource or Workbook lives in
    • if you are a project lead for the project the Datasource or Workbook lives in
    • if you are a site admin or server admin

    That list can be confusing, but I took it in chunks. Owners are easy, but it can be harder when there are multiple site/server admins and project leads. This was back before multi-table extracts and I wanted to be able to extract this datasource. To prevent exploding the extract I ended up combining all the Project Leads into one comma-separated field and doing a contains() in Tableau
    select
        pr.id as project_id
        ,pr.site_id
        , ',' || string_agg(COALESCE(_users.name,gmemebers.name) , ',') || ',' AS username_PLead
    from projects pr
    left outer join next_gen_permissions ng on (pr.id = ng.authorizable_id AND ng.authorizable_type = 'Project' AND (ng.permission=1 or ng.permission=3))
    inner join capabilities cap on (ng.capability_id = cap.id AND cap.name = 'project_leader')
    left outer join _users on (ng.grantee_id = _users.id AND ng.grantee_type = 'User')
    left outer join _groups on (ng.grantee_id = _groups.id AND ng.grantee_type = 'Group')
    left outer join group_users on (_groups.id=group_users.group_id)
    left outer join _users gmemebers on ( group_users.user_id = gmemebers.id)
    Group by 1
    I joined the above CustomSQL to the project_id once each for the DataSource, Workbook, and View. I did something similar for site and server admins. Then my row-level filter had a big boolean statement that looked something like:
    USERNAME() = [item_owner_username]
    OR USERNAME() = [actor_username]
    OR CONTAINS([Current DataSource Project Leads],','+USERNAME()+',')
    OR CONTAINS([Current View Project Leads],','+USERNAME()+',')
    OR  CONTAINS([Current WB Project Leads],','+USERNAME()+',')
    OR  CONTAINS([Admin Username],','+USERNAME()+',')
     I now do this with a live query, so there is no extract to explode. Sometimes I get off the wall requests that I have to manually create, but the published datasources make it only a handful each year.