Time series analysis in plot.ly

Here is a time-series graph I made of some data I collected from automated, neural network runs of Super Mario World.  Plot.ly makes this super easy by allowing you to quickly select the axes and add new variables.  Here, I will be discussing some of Stephen Few’s best practices for time-series analysis and how they apply to this graph:

  • Aggregating to Various Time Intervals

I’ve chosen to display this data with the time interval being each “generation” of the neural net.  Increasing the time resolution to each species and genome creates a huge amount of noise in the data, in which trends get lost.  Going by each generation makes it much, much easier to see progression.

  • Viewing Time Periods in Context

Including the data from the start to the finish of each level allows us to see the entire progression of each generation of the neural network.  It’s easy to narrow down the time scale to make it look like there has been no progress made.  Plot.ly makes this very easy zoom into smaller time periods, but also allows easy access to the full context of the graph.

  • Optimizing a Graph’s Aspect Ratio

The problem here might be how I have my website set up (I’m working on a redesign), but this graph is very cramped.  Viewing it on Plot.ly’s site its much more comfortable.  Yoshi’s Island 4 was completed relatively quickly, but it’s hard to see on my site due to its location.

  • Stacking Line Graphs to Compare Multiple Variables

I’ve combined several levels of data to see how progression and the number of generations to completion vary.  Stacking the line graphs makes it much easier to read than having a set of individual graphs, though organizing it can create other readability troubles.

  • Expressing Time as 0-100% to Compare Asynchronous Process

I did not utilize this in my graph but I think it would be interesting to see.  Doing so could possibly uncover some level design patterns.  The Super Mario games are known for being designed in a way that introduces specific level mechanics early on, generally in a safe way, then ramps up the implementation of it.  I suspect that this would be evident when viewing the levels in a manner of percentage of completion.

Module 2: Beginning with plot.ly

Here is my first visualization using plot.ly. I had also wanted to add a pie chart below each user’s name to show what percentage of their total followers came from Facebook and Twitter, but I’m not sure if plot.ly allows that. Importing the data was mostly easy, except I had issues with headers in xlsx files not being imported as headers. I solved this by just using a csv version of my the data.
Plot 8
I created the the Total variable as an aggregate of the Facebook and Twitter values in order to help give re-express the comparison between each user. The data is sorted by the highest number of Total followers, which makes it easier to read. The Y axis is scaled linearly so that the differences are clear and unambiguous. Each value is color coded which highlights each grouping and makes it easier to follow values across users. Additionally, each axis is labeled/annotated and the legend reflects the color codes of each value.

Module 1: Introduction to Visual Analytics

I quite enjoy infographics and I often see creative visual implementations regularly.  Social media outlets I follow to see visualizations are the subreddits r/dataisbeautiful, r/infographics, and r/dataisugly.  Dataisugly  is a great contrast to the other two and shows a ton of great (in the bad way) examples of how bad data visualization can be.  I also look at hockeyviz.com regularly, which does a fantastic job at showing advanced visualizations, for example their relationships between player pairings and success.  Another website that I follow is spaghettimodels.com, which shows visualizations for a huge variety of weather data.  Actual applications that I have experience with are R and Excel.  Others that I haven’t much of or any experience with, but I am aware of, are plot.ly, Gapminder, Tableau, and SAS.

For R, I like how “in charge” of your visualizations you are.  Excel can, many times, quickly generate a plot or chart, but many times you aren’t able to make some specific changes to the visualization.  Plot.ly is nice because of its accessibility, but I don’t know much about its capability.  Gapminder looks great from Hans Rosling’s TED Talks, which are some of my favorite TED Talks, but I don’t know anything about it in the terms of applicability, usability, or capability.