I’ll let you in on my best known secret. You can grab data directly from published Tableau workbooks without even opening them. This is handy when you don’t have access to an underlying database or when you’re dealing with a database that requires complicated table joins to extract the data you need.
Here’s the trick:
To download data from a Tableau workbook add “.csv” to the end of a worksheet’s URL.
Let’s try it!
Let’s the grab the data behind the internal Volkswagen vehicle workbook.
library(readr)
# MPCA Tableau site
pca_tableau <- "http://tableau.pca.state.mn.us/views/"
# Tableau worksheet URL
## Delete the link garbage
## Remove the "/#/" in the middle and the question mark at the end w/ anything that comes after
workbook <- "Volkswagenownership/VWownership"
# Complete URL
tableau_url <- paste0(pca_tableau, workbook)
# Final URL will look like this
tableau_url <- "http://tableau.pca.state.mn.us/views/VWVolkswagendieselregistrationsinMN/VWownership"
# Paste ".csv" to end of Tableau URL
tableau_url <- paste0(tableau_url, ".csv")
# Read data into a data frame
vw_cars <- read_csv(tableau_url)
# Take a look
head(vw_cars)
## # A tibble: 6 x 5
## `County number` County `Latitude (gene~ `Longitude (gen~ `% of VW regis~
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Aitkin AITKIN 46.6 -93.4 0.2%
## 2 Anoka ANOKA 45.2 -93.3 5.0%
## 3 Becker BECKER 46.9 -95.7 0.3%
## 4 Beltrami BELTR~ 47.9 -95.0 0.3%
## 5 Benton BENTON 45.7 -94.1 0.5%
## 6 Big Stone BIGST~ 45.5 -96.5 0.1%
Isn’t that great? But there is one small caveat.
As an example, take a look at the Watershed Pollutant Load Monitoring workbook. Since the workbook loads to a Story the nifty .csv
trick won’t work on this page. We need to jump over to a worksheet or dashboard behind the story.
In this case the tabs are shown at the top, but this won’t always be true. Let’s try the tab “Download Annual data”.
Now that you know the name of the worksheet, the data is free for the taking. Let’s read it into R.
library(readr)
# MPCA Tableau site
pca_tableau <- "http://tableau.pca.state.mn.us/views/"
# Tableau workbook
workbook <- "wplmn_data_browser/DownloadAnnualData"
# Complete URL
tableau_url <- paste0(pca_tableau, workbook)
# Paste ".csv" to end of Tableau URL
tableau_url <- paste0(tableau_url, ".csv")
# Read data into a data frame
water <- read_csv(tableau_url)
# How many rows are there?
nrow(water)
## [1] 5293
# Take a look
head(water)
## # A tibble: 6 x 21
## Blank `Bsn Name` `CatchArea (acr~ `End date` `Equis ID` `FWMC (mg/L)`
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 <NA> Cedar Riv~ 92160 10/31/2013 S004-432 16
## 2 <NA> Cedar Riv~ 92160 10/31/2013 S004-432 1.7
## 3 <NA> Cedar Riv~ 92160 10/31/2013 S004-432 0.133
## 4 <NA> Cedar Riv~ 92160 10/31/2013 S004-432 NA
## 5 <NA> Cedar Riv~ 92160 10/31/2013 S004-432 35
## 6 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 0.185
## # ... with 15 more variables: Huc12 <chr>, `Hydstra ID` <chr>,
## # `Intentionally blank` <chr>, Latitude <dbl>, Longitude <dbl>, `Mass
## # (kg)` <int>, Name <chr>, `Number of samples` <int>, `Org Name` <chr>,
## # Parameter <chr>, `Site Type` <chr>, `Start date` <chr>, `Vol (acre
## # ft)` <int>, Year <int>, `Yield (lbs/ac)` <dbl>
Wow! There’s over 4,000 rows of data. It looks like we got ALL the data. That’s because everything is selected in the filters on the right. If you’d like to download data for only a single site or a single year, you can change the filter by adding a little magic to the end of the URL.
The filters on the worksheet are accessible by using the format Year=2014
. To add a filter to your data request you first add a ?
to the end of the workseet URL and then the filter. The final URL with the 2014 year filter would look like this http://tableau.pca.state.mn.us/views/wplmn_data_browser/DownloadAnnualData.csv?Year=2014
.
Let’s get all the data for year 2014.
library(readr)
# MPCA Tableau site
pca_tableau <- "http://tableau.pca.state.mn.us/views/"
# Tableau worksheet
workbook <- "wplmn_data_browser/DownloadAnnualData"
# Complete URL
tableau_url <- paste0(pca_tableau, workbook)
# Paste ".csv" to end of Tableau URL
tableau_url <- paste0(tableau_url, ".csv")
# Add the Year=2014 filter
tableau_url <- paste0(tableau_url, "?Year=2014")
# Read data into a data frame
water_2014 <- read_csv(tableau_url)
# Take a look
head(water_2014)
## # A tibble: 6 x 21
## Blank `Bsn Name` `CatchArea (acr~ `End date` `Equis ID` `FWMC (mg/L)`
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 0.185
## 2 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 NA
## 3 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 9.9
## 4 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 1.86
## 5 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 0.288
## 6 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 78
## # ... with 15 more variables: Huc12 <chr>, `Hydstra ID` <chr>,
## # `Intentionally blank` <chr>, Latitude <dbl>, Longitude <dbl>, `Mass
## # (kg)` <int>, Name <chr>, `Number of samples` <int>, `Org Name` <chr>,
## # Parameter <chr>, `Site Type` <chr>, `Start date` <chr>, `Vol (acre
## # ft)` <int>, Year <int>, `Yield (lbs/ac)` <dbl>
Perfect! Were down to only 926 rows of data.
Want to try one of the other filters? Things can get a little trickier when you filter on values that contain special characters, such as text with spaces - “Total phosphorous” - or slashes - “FWMC (mg/L)” -. To filter with special characters in a URL you’ll need to let the browser know that you really mean a text slash “/”, and that you don’t mean “move up a folder” as in Desktop/Documents/Reports
. To do this, URL’s have codes for the characters that usually mean something else to computers. For example, the forward slash “/” has the special code %2
. A space has the code %20
.
Here is a table of codes for the most common symbols. A more complete reference and a tool to convert your text to a URL friendly format is online here.
Let’s try filtering the data to one the values listed for Parameter
. If the filter doesn’t work, Tableau will be nice and send you all of the data. However, it isn’t helpful enough to give you clues about what’s wrong with your filter. Like if I happen to misspell phosphourus 3 times.
The code below filters the data to the Parameter
Total phosphorus.
library(readr)
# MPCA Tableau site
pca_tableau <- "http://tableau.pca.state.mn.us/views/"
# Tableau worksheet
workbook <- "wplmn_data_browser/DownloadAnnualData"
# Complete URL
tableau_url <- paste0(pca_tableau, workbook)
# Paste ".csv" to end of Tableau URL
tableau_url <- paste0(tableau_url, ".csv")
# Add the Measures="Mass+(kg)" filter
tableau_url <- paste0(tableau_url, "?Parameter=Total%20phosphorus")
# Read data into a data frame
water_phos <- read_csv(tableau_url)
# Take a look
head(water_phos, 5)
## # A tibble: 5 x 21
## Blank `Bsn Name` `CatchArea (acr~ `End date` `Equis ID` `FWMC (mg/L)`
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 <NA> Cedar Riv~ 92160 10/31/2014 S004-432 0.288
## 2 <NA> Cedar Riv~ 92160 10/31/2015 S004-432 0.214
## 3 <NA> Cedar Riv~ 122240 12/31/2009 S000-084 0.423
## 4 <NA> Cedar Riv~ 122240 12/31/2010 S000-084 0.349
## 5 <NA> Cedar Riv~ 122240 12/31/2011 S000-084 0.313
## # ... with 15 more variables: Huc12 <chr>, `Hydstra ID` <chr>,
## # `Intentionally blank` <chr>, Latitude <dbl>, Longitude <dbl>, `Mass
## # (kg)` <int>, Name <chr>, `Number of samples` <int>, `Org Name` <chr>,
## # Parameter <chr>, `Site Type` <chr>, `Start date` <chr>, `Vol (acre
## # ft)` <chr>, Year <int>, `Yield (lbs/ac)` <dbl>
Try filtering the sites to only the Hydstra ID
W38014001.
library(readr)
#Tableau worksheet URL
tableau_url <- "http://tableau.pca.state.mn.us/views/wplmn_data_browser/DownloadAnnualData"
# Paste ".csv" to end of Tableau URL
tableau_url <- paste0(tableau_url, ".csv")
# Add the Measures="Mass+(kg)" filter
tableau_url <- paste0(tableau_url, "?Hydstra%20ID=W38014001")
# Read data into a data frame
water_missip <- read_csv(tableau_url)
# Take a look
head(water_missip, 5)
## # A tibble: 5 x 21
## Blank `Bsn Name` `CatchArea (acr~ `End date` `Equis ID` `FWMC (mg/L)`
## <chr> <chr> <int> <chr> <chr> <dbl>
## 1 <NA> Lower Mis~ 29824000 12/31/2007 S000-132 NA
## 2 <NA> Lower Mis~ 29824000 12/31/2007 S000-132 2.9
## 3 <NA> Lower Mis~ 29824000 12/31/2007 S000-132 1.26
## 4 <NA> Lower Mis~ 29824000 12/31/2007 S000-132 0.16
## 5 <NA> Lower Mis~ 29824000 12/31/2007 S000-132 41
## # ... with 15 more variables: Huc12 <chr>, `Hydstra ID` <chr>,
## # `Intentionally blank` <chr>, Latitude <dbl>, Longitude <dbl>, `Mass
## # (kg)` <int>, Name <chr>, `Number of samples` <int>, `Org Name` <chr>,
## # Parameter <chr>, `Site Type` <chr>, `Start date` <chr>, `Vol (acre
## # ft)` <int>, Year <int>, `Yield (lbs/ac)` <dbl>
Simple site IDs are your friend. Dashes, slashes, and surprising capital letters are not.
To add multiple filters, separate them with the &
character with no spaces. As in ?Year=2014&Parameter=Flow
.
If the author of the worksheet hasn’t added the filter to the page, you won’t be able to use it in a magic URL.
If the workbook author has all of its worksheets and dashboards hidden, you won’t be able to access its data using the “.csv” trick. Contact the workbook author and politley request that they unhide one of their data tabs so you can more easily steal the data.
Congratulations! You found a shiny treasure. But you’re greedy. Return to the top of the page and keep hunting.