Fire Coop: A Peek at the Prevalance of Regime Change Discussions on r/tampabaylightning

Gist available on my Github

First, we’ll start by installing Mike Kearney’s rreddit package. It’s kinda still in development, I guess, but it gets the job done. Some functions from it were either not found or broken, but that may be because I was too lazy to update my version of R from 3.5.0. If you have problems, just throw “mkearney %broken function%” into the ol’ Google and you’ll be good to go. Below, I have included fixed/usable functions that should fulfill everything needed by the rreddit package.

#remotes::install_github("mkearney/rreddit")
#remotes::install_github("mkearney/tbltools")
library(rreddit)
library(tidyverse)
library(tbltools)
library(lubridate)
library(hexbin)

as_tbl <- function(x, ..., validate = FALSE) {
    tibble::as_tibble(x, ..., validate = FALSE)
}

get_comment_reddit <- function(subreddit = "all", author = NULL, n = 1000, after = NULL) {
  n <- ceiling(n / 1000)
  x <- vector("list", n)
  for (i in seq_along(x)) {
    url <- "https://api.pushshift.io/reddit/search/comment/?size=1000"
    if (!identical(subreddit, "all")) {
      url <- paste0(url, "&subreddit=", subreddit)
    }
    if (!is.null(author)) {
      url <- paste0(url, "&author=", author)
    }
    if (!is.null(after)) {
      url <- paste0(url, "&before=", as.numeric(after))
    }
    r <- httr::GET(url)
    j <- httr::content(r, as = "text", encoding = "UTF-8")
    j <- jsonlite::fromJSON(j)
    x[[i]] <- as_tbl(non_recs(j$data))
    if (!"created_utc" %in% names(x[[i]])) break
    x[[i]] <- formate_createds(x[[i]])
    after <- x[[i]]$created_utc[nrow(x[[i]])]
    if (length(after) == 0) break
    #tfse::print_complete(
    #  "#", i, ": collected ", nrow(x[[i]]), " posts"
    #)
  }
  tryCatch(docall_rbind(x),
           error = function(e) x)
}

get_r_reddit <- function(subreddit = "all", n = 1000, after = NULL) {
  n <- ceiling(n / 1000)
  x <- vector("list", n)
  for (i in seq_along(x)) {
    url <- "https://api.pushshift.io/reddit/search/submission/?size=1000"
    if (!identical(subreddit, "all")) {
      url <- paste0(url, "&subreddit=", subreddit)
    }
    if (!is.null(after)) {
      url <- paste0(url, "&before=", as.numeric(after))
    }
    r <- httr::GET(url)
    j <- httr::content(r, as = "text", encoding = "UTF-8")
    j <- jsonlite::fromJSON(j)
    x[[i]] <- as_tbl(non_recs(j$data))
    if (!"created_utc" %in% names(x[[i]])) break
    x[[i]] <- formate_createds(x[[i]])
    after <- x[[i]]$created_utc[nrow(x[[i]])]
    if (length(after) == 0) break
    # tfse::print_complete(
    #   "#", i, ": collected ", nrow(x[[i]]), " posts"
    # )
  }
  tryCatch(docall_rbind(x),
    error = function(e) x)
}

non_recs <- function(x) {
  x[!sapply(x, is.recursive)]
}

formate_createds <- function(d) {
  if ("created" %in% names(d)) {
    d$created <- as.POSIXct(d$created, origin = "1970-01-01")
  }
  if ("created_utc" %in% names(d)) {
    d$created_utc <- as.POSIXct(d$created_utc, origin = "1970-01-01", tz = "UTC")
  }
  d
}

docall_rbind <- function(...) {
  dfs <- list(...)
  if (length(dfs) == 1L && is.list(dfs[[1]]) &&
      is.data.frame(dfs[[1]][[1]])) {
    dfs <- dfs[[1]]
  }
  nms <- unlist(lapply(dfs, names))
  nms <- table(nms)
  max_n <- max(nms, na.rm = TRUE)
  nms <- names(nms)[nms == max_n]
  dfs <- lapply(dfs, function(.x) .x[nms])
  dfs <- do.call("rbind", dfs, quote = TRUE)
  dfs <- dfs[!duplicated(dfs$id), ]
  dfs
}

Through some trial and error, I discovered that there’s 335,116 comments on r/tampabaylightning which date as far back as May 5th, 2010, which brings me to the first interesting point. It’s fairly common knowledge that Jon Cooper is the most tenured coach in the NHL, but it’s interesting that he’s been the coach for 2/3rds of the subreddit’s exisitence. It’s probably safe to say that Cooper is the only head coach that many of the the subreddit’s users may have seen in action.

Anyways, we’ll do a pull of all of the existing comments with the following… Actually, lets get the posts, too, while we’re at it (Trial and Error tells me that there were 27106 posts):

c <- get_comment_reddit("tampabaylightning", n = 335112)
d <- get_r_reddit("tampabaylightning", n = 27106)
# c <- get_comment_reddit("tampabaylightning", n = 5000)
# d <- get_r_reddit("tampabaylightning", n = 3000)

That took a few minutes, but the job got done. Let’s have a look at the general distribution of comments and posts over time.

ggplot() + geom_density(data = c, aes(x = created_utc, color = 'Comments')) + geom_density(data = d, aes(created_utc, color = 'Posts')) + geom_vline(aes(xintercept = as.POSIXct('2013-03-25'), color = 'Cooper Hired')) + 
  scale_x_datetime(breaks = c(as.POSIXct('2010-01-01'), as.POSIXct('2020-01-01')), date_breaks = '1 year', date_labels = "%Y")

Those peaks clearly mark the playoffs and really show the activity drop during the failure-to-launch of the 16-17 season, but the early-exit of the 18-19 season is masked by the activity behind the wildly-successful regular season.

e <- c %>% filter(str_detect(tolower(body), 'fire coop'))
f <- d %>% filter(str_detect(tolower(title), 'fire coop'))
c <- c %>% mutate(fc_y = case_when(str_detect(tolower(body), 'fire coop') == T ~ 1,
                                   TRUE ~ 0),
                  date = as.Date(created_utc, 'EST'),
                  year = year(date))

There is apparently only 266 comments and 15 posts that include the string “fire coop”, which I find crazy because it seems like every post-game thread has been absolutely spammed with that statement for the last year.

c %>% filter(date >= as.Date('2015-01-01')) %>% group_by(date) %>% summarize( perc_fc = sum(fc_y)/ n()) %>% ggplot() + geom_hex(aes(x = date, y = perc_fc)) + scale_x_date( breaks = c(as.Date('2015-01-01'), as.Date('2020-01-01')), date_breaks = '1 year', date_labels = "%Y")

c %>% filter(date >= as.Date('2015-01-01')) %>% group_by(date) %>% summarize( perc_fc = sum(fc_y)/ n()) %>% mutate(year = year(date)) %>% ggplot() + geom_hex(aes(x = date, y = perc_fc)) +  facet_wrap( ~ year, scales= 'free_x') +  theme(axis.text.x = element_text(angle = 90, hjust = 1))