Fire Coop: A Peek at the Prevalance of Regime Change Discussions on r/tampabaylightning

Gist available on my Github

First, we’ll start by installing Mike Kearney’s rreddit package. It’s kinda still in development, I guess, but it gets the job done. Some functions from it were either not found or broken, but that may be because I was too lazy to update my version of R from 3.5.0. If you have problems, just throw “mkearney %broken function%” into the ol’ Google and you’ll be good to go. Below, I have included fixed/usable functions that should fulfill everything needed by the rreddit package.

#remotes::install_github("mkearney/rreddit")
#remotes::install_github("mkearney/tbltools")
library(rreddit)
library(tidyverse)
library(tbltools)
library(lubridate)
library(hexbin)

as_tbl <- function(x, ..., validate = FALSE) {
    tibble::as_tibble(x, ..., validate = FALSE)
}

get_comment_reddit <- function(subreddit = "all", author = NULL, n = 1000, after = NULL) {
  n <- ceiling(n / 1000)
  x <- vector("list", n)
  for (i in seq_along(x)) {
    url <- "https://api.pushshift.io/reddit/search/comment/?size=1000"
    if (!identical(subreddit, "all")) {
      url <- paste0(url, "&subreddit=", subreddit)
    }
    if (!is.null(author)) {
      url <- paste0(url, "&author=", author)
    }
    if (!is.null(after)) {
      url <- paste0(url, "&before=", as.numeric(after))
    }
    r <- httr::GET(url)
    j <- httr::content(r, as = "text", encoding = "UTF-8")
    j <- jsonlite::fromJSON(j)
    x[[i]] <- as_tbl(non_recs(j$data))
    if (!"created_utc" %in% names(x[[i]])) break
    x[[i]] <- formate_createds(x[[i]])
    after <- x[[i]]$created_utc[nrow(x[[i]])]
    if (length(after) == 0) break
    #tfse::print_complete(
    #  "#", i, ": collected ", nrow(x[[i]]), " posts"
    #)
  }
  tryCatch(docall_rbind(x),
           error = function(e) x)
}

get_r_reddit <- function(subreddit = "all", n = 1000, after = NULL) {
  n <- ceiling(n / 1000)
  x <- vector("list", n)
  for (i in seq_along(x)) {
    url <- "https://api.pushshift.io/reddit/search/submission/?size=1000"
    if (!identical(subreddit, "all")) {
      url <- paste0(url, "&subreddit=", subreddit)
    }
    if (!is.null(after)) {
      url <- paste0(url, "&before=", as.numeric(after))
    }
    r <- httr::GET(url)
    j <- httr::content(r, as = "text", encoding = "UTF-8")
    j <- jsonlite::fromJSON(j)
    x[[i]] <- as_tbl(non_recs(j$data))
    if (!"created_utc" %in% names(x[[i]])) break
    x[[i]] <- formate_createds(x[[i]])
    after <- x[[i]]$created_utc[nrow(x[[i]])]
    if (length(after) == 0) break
    # tfse::print_complete(
    #   "#", i, ": collected ", nrow(x[[i]]), " posts"
    # )
  }
  tryCatch(docall_rbind(x),
    error = function(e) x)
}

non_recs <- function(x) {
  x[!sapply(x, is.recursive)]
}

formate_createds <- function(d) {
  if ("created" %in% names(d)) {
    d$created <- as.POSIXct(d$created, origin = "1970-01-01")
  }
  if ("created_utc" %in% names(d)) {
    d$created_utc <- as.POSIXct(d$created_utc, origin = "1970-01-01", tz = "UTC")
  }
  d
}

docall_rbind <- function(...) {
  dfs <- list(...)
  if (length(dfs) == 1L && is.list(dfs[[1]]) &&
      is.data.frame(dfs[[1]][[1]])) {
    dfs <- dfs[[1]]
  }
  nms <- unlist(lapply(dfs, names))
  nms <- table(nms)
  max_n <- max(nms, na.rm = TRUE)
  nms <- names(nms)[nms == max_n]
  dfs <- lapply(dfs, function(.x) .x[nms])
  dfs <- do.call("rbind", dfs, quote = TRUE)
  dfs <- dfs[!duplicated(dfs$id), ]
  dfs
}

Through some trial and error, I discovered that there’s 335,116 comments on r/tampabaylightning which date as far back as May 5th, 2010, which brings me to the first interesting point. It’s fairly common knowledge that Jon Cooper is the most tenured coach in the NHL, but it’s interesting that he’s been the coach for 2/3rds of the subreddit’s exisitence. It’s probably safe to say that Cooper is the only head coach that many of the the subreddit’s users may have seen in action.

Anyways, we’ll do a pull of all of the existing comments with the following… Actually, lets get the posts, too, while we’re at it (Trial and Error tells me that there were 27106 posts):

c <- get_comment_reddit("tampabaylightning", n = 335112)
d <- get_r_reddit("tampabaylightning", n = 27106)
# c <- get_comment_reddit("tampabaylightning", n = 5000)
# d <- get_r_reddit("tampabaylightning", n = 3000)

That took a few minutes, but the job got done. Let’s have a look at the general distribution of comments and posts over time.

ggplot() + geom_density(data = c, aes(x = created_utc, color = 'Comments')) + geom_density(data = d, aes(created_utc, color = 'Posts')) + geom_vline(aes(xintercept = as.POSIXct('2013-03-25'), color = 'Cooper Hired')) + 
  scale_x_datetime(breaks = c(as.POSIXct('2010-01-01'), as.POSIXct('2020-01-01')), date_breaks = '1 year', date_labels = "%Y")

Those peaks clearly mark the playoffs and really show the activity drop during the failure-to-launch of the 16-17 season, but the early-exit of the 18-19 season is masked by the activity behind the wildly-successful regular season.

e <- c %>% filter(str_detect(tolower(body), 'fire coop'))
f <- d %>% filter(str_detect(tolower(title), 'fire coop'))
c <- c %>% mutate(fc_y = case_when(str_detect(tolower(body), 'fire coop') == T ~ 1,
                                   TRUE ~ 0),
                  date = as.Date(created_utc, 'EST'),
                  year = year(date))

There is apparently only 266 comments and 15 posts that include the string “fire coop”, which I find crazy because it seems like every post-game thread has been absolutely spammed with that statement for the last year.

c %>% filter(date >= as.Date('2015-01-01')) %>% group_by(date) %>% summarize( perc_fc = sum(fc_y)/ n()) %>% ggplot() + geom_hex(aes(x = date, y = perc_fc)) + scale_x_date( breaks = c(as.Date('2015-01-01'), as.Date('2020-01-01')), date_breaks = '1 year', date_labels = "%Y")

c %>% filter(date >= as.Date('2015-01-01')) %>% group_by(date) %>% summarize( perc_fc = sum(fc_y)/ n()) %>% mutate(year = year(date)) %>% ggplot() + geom_hex(aes(x = date, y = perc_fc)) +  facet_wrap( ~ year, scales= 'free_x') +  theme(axis.text.x = element_text(angle = 90, hjust = 1))

It looks like the calls for canning aren’t just coming from your grandmother during strawberry season, but they’ve also jumped way up even since when we missed the playoffs. Interestingly, it doesn’t look like the calls for firing Cooper have necessarily increased throughout this season. Perhaps we should look to see if there’s a secret “CAN COOP” clubhouse that keep regurgitating the same calls.

e %>% mutate(date = as.Date(created_utc, 'EST'),
             year = year(date)) %>%
group_by(date) %>% count(author) %>% arrange(desc(n)) %>% filter(n>1)

If we figure that, after a bad game, someone might call for firing Cooper in the Post-Game threads and then maybe in another thread, then we only have 2 users that have really gone above-and-beyond in a single day. u/BigSaveBigCat is really the most egregious offender for a single-day spam session, but let’s look at who has been most consistent.

e %>% mutate(date = as.Date(created_utc, 'EST'),
             year = year(date)) %>% count(author) %>% arrange(desc(n)) %>% filter(n>2)

I see u/Boltsfan91 in both of these lists, which brings one point to a head: people quoting other people are getting lumped in with the wrong crowd. For example, here are u/Boltsfan91’s 3 posts:

e %>% filter(author == 'Boltsfan91') %>% select(body)

The posts are pretty long, but the gist is is that they were not calling for the end of Cooper’s tenure. I should probably try some sentiment analysis on these posts to try to weed out the pro-Cooper comments, but it’s the sunday before Christmas and I don’t really feel like looking up that stuff right now.

It looks like there isn’t just a brigade of users that are constantly calling for resignation, nor are they really spamming across multiple threads in a day. I feel like that might makes this discussion even more concerning as the increase in posts is driven by the everyman and not just a few fair-weather fans.

Regardless of sentiment, however, the point still stands that Cooper’s position is a hot topic for discussion. And, while the Lightning aren’t quite at the 2018 Blues’ level of performance, they aren’t doing nearly as well as we’d expected. I’d bet that most people are willing to agree that we have all the pieces to be successful, but the question is whether or not we have the directions to get to the cup. Additionally, I don’t think anyone would be surprised if Cooper got fired. It seems like everyone has been calling every game “important” now, but he game against the Panthers on 12/23 seems like it will be a particularly-important game for the Bolts. The Cats currently teetering between Playoff and Wildcard for the playoffs and, even though we are a game behind them, we’re still 4 points away from taking the playoff slot and a loss to them will put us that much further behind.