Collocated monitors

Analysis steps for collocated monitors include:

5.1 Evaluate sites with collocated monitors
5.2 Prioritize observations from collocated monitors

5.1 Evaluate collocated air monitors

Description

Collocated samples are multiple samples taken for a pollutant from the same monitoring site at the same time, using two pieces of monitoring equipment. Collocated samples provide an extra layer of quality assurance by taking multiple measurements under very similar conditions which helps detect machine, lab, and human error. Depending on the results from collocated monitors, only one of the monitors, or potentially neither may be considered as valid measurements.

Why not keep all the data?

Leaving observations for multiple POCs in the data set is likely to bias calculated summary statistics. For example, consider a site that starts with two monitors operating during the winter months, but after one malfunctions, the site is left with one monitor operating for the remainder of the year. If we were to calculate the annual average for this site, the mean would be biased by having twice as many observations during the winter months. By limiting the number of observations to 1 per day, we can remove the bias due to the extra winter observations.

Recommended steps

Viewing a scatterplot of collocated measurements helps to see both obvious and subtle irregularities between collocated measurements. If there is evidence of bias in one of the monitors (values from one collocated monitor are usually higher than the other), or there is large variance between the collocated monitors (values differ from each other by significant margins), then both collocated monitors need to be investigated further.

EPA recommends that collocated monitors have no higher than a 15% coefficient of variation (CoV) between their measurements. Calculating the CoV for every measurement pair does not make sense as small differences (i.e. 0.001 and 0.002) have the same CoV as large differences (i.e. 10 and 20). Therefore, we choose to look at the median CoV for each site, pollutant, and year. We then flag collocated monitors if the median CoV is greater than 15%, which suggests the differences between the collocated monitors are systematic.

Since there is lower confidence in values below the MDL, we only calculate the CoV for collocated monitors on days that have valid measurements greater than or equal to the MDL. Lastly, we only compare the median CoV to the threshold of 15% if there are at least 10 calculated CoV values. You are much more likely to find differences between monitors when there are only a few values above detection.

If collocated monitors do not agree, consult with the lab, field or quality assurance team to determine if there is a problem with one of the POCs. If they cannot determine which POC is the problem, then confidence in the measurements from both collocated monitors is reduced and no measurements taken from either collocated monitor should be used. If you are required to report the results, the site should be flagged for the low confidence in the accuracy of the results.

Sample R script

The function POC_compare() below generates scatterplots comparing sample values from POC 1 and POC 2 for sites with collocated monitors. If most points on the scatterplot are close to the red line and the correlation is high, then measurements from both POCs are considered equally valid. If points are not close to the red line or the correlation is not high, then both POCs need to be investigated to determine if one POC is more relaible.

Click below to view the script.

library(tidyverse)

data <- read_csv('https://raw.githubusercontent.com/MPCA-air/air-methods/master/airtoxics_data_2009_2013.csv')

names(data) <- c("AQSID", "POC", "Parameter", "Date", "Result", "null_code", "MDL", "Pollutant", "Year", "cas")

POC_compare = function(data) {

library(shiny)
library(tidyverse)
library(rsconnect)  
  
data <- distinct(data, AQSID, POC, Parameter, Date, Pollutant, Year, .keep_all = T) #replace with better cleaning function

data <- spread(data, POC, Result) %>% 
        mutate(Status = ifelse(`1` < MDL  & `2` < MDL, "POCs 1 and 2 below MDL", ifelse(`1` < MDL, "POC 1 below MDL", ifelse(`2` < MDL, "POC 2 below MDL", "POCs 1 and 2 above MDL") ) ) ) %>% drop_na(Status)


Pollutant <- unique(data$Pollutant)

Site <- unique(data$AQSID)

Year <- unique(data$Year)

shinyApp(
  ui = fluidPage(responsive = FALSE,
                 fluidRow(
                   column(3,
                          style = "padding-bottom: 20px;",
                          inputPanel(
                            selectInput("Pollutant", label="Choose a pollutant", choices = Pollutant),
                            selectInput("Year", label="Choose a year", choices = Year),
                            selectInput("Site", label="Choose a site", choices = Site))),
                   column(9,
                          plotOutput('normviz', height = "500px")))),
  
  
  server = function(input, output) {
    
    output$normviz <- renderPlot({
      
      print(input$Pollutant)
      
      print(input$Site)
      
      print(input$Year)
      
      data_sub <- filter(data, Pollutant==input$Pollutant, AQSID == input$Site, Year == input$Year)
      
      ggplot(data_sub, aes(x = `1`, y = `2`, color = Status)) +
        geom_point(size = 3) +
        geom_segment(x = -1000, xend = 1000, y = -1000, yend = 1000, color = "red", size = 1) +
        labs(title    = "POC comparison chart", 
             x = "POC 1", 
             y = "POC 2", 
             subtitle = paste("Correlation =", round(cor(data_sub$`1`,data_sub$`2`, use = "complete"), 2)) )
      
    })
    
  })

}

POC_compare(data)

5.2 Prioritize multiple air monitors (POCs)

Description

If both collocated monitors at a site are considered equally valid, then a single value must be chosen from multiple collocated values to avoid double-counting values at a single site.

Recommended steps

For air toxics, measurements from collocated monitors are considered equally valid unless there is a specific reason to treat them differently. Therefore, these methods are used for dealing with collocated samples where the POCs generally provide similar results.

If one monitor has a valid measurement, and the other does not, then the valid measurement is used.
If one monitor has a valid measurement greater than or equal to the MDL and the other has a valid measurement less than the MDL, then the measurement above the MDL is used as a conservative approach as there is greater confidence in the value above the MDL.
If both monitors have valid measurements above the MDL, then those values are averaged since both values are equally valid and the true value is assumed to likely be between the two values.

Sample R script

The script below uses the POC_average() function to combine results from multiple POCs into one combined site result according to the rules listed above.

Click below to view an example R script.

library(tidyverse)

data <- read_csv('https://raw.githubusercontent.com/MPCA-air/air-methods/master/airtoxics_data_2009_2013.csv')

names(data) <- c("AQSID", "POC", "Parameter", "Date", "Result", "Null_Data_Code", "MDL", "Pollutant", "Year", "CAS")

POC_average = function(data) {
  
  library(tidyverse)
  
  POC_averaging <- function(Result, Censored) {
    
    Result <- Result[!is.na(Result)]
    
    Censored <- Censored[!is.na(Censored)]
    
    if(all(Censored, na.rm = T)) {
      return (mean(Result, na.rm = T))
    }
    else {
      return (mean(Result[!Censored], na.rm = T ) )
    }
    
  }
  
  data <- data %>% 
          group_by(AQSID, Parameter, Pollutant, Date, MDL) %>% 
          summarise(Result = POC_averaging(Result, Censored), Censored = all(Censored, na.rm = T) ) %>%
          mutate(Result = ifelse(is.na(Result), NA, Result), Censored = ifelse(is.na(Result), NA, Censored))
  
  return (select(data, AQSID, Parameter, Pollutant, Date, Result, MDL, Censored) ) %>% ungroup()
}

POC_averaged_data <- data %>% mutate(Censored = Result < MDL) %>% POC_average()

Back to top