Visuals in R: Minute by Minute GPS Data

Having briefly covered why use R to create plots now we will look at creating plots in R and the ggplot2 package. Let’s start with the data we created in this blog where we broke down a single GPS export into minute by minute distance along with some speed threshold based data.

We will start with a simple bar chart using the follow script:

  • ggplot(data=Min_by_min, aes(x=one_Min, y=TotalDist_Min))+
    geom_bar(stat = 'identity')
  • In the above we are mapping the data to the plot in the first line where data= assigns the dataframe we want to reference, then within the aes we set x= as the data for the x-axis and y= as the y-axis. If we were to plot just the first line we would have an empty plot as we haven’t defined how we want the data plotted yet.
  • geom_bar is the function for a barplot however we must use stat="identity"in order to use the actual values within the y column and not use any statistical function or count metric.
Screenshot 2018-11-06 at 01.32.50
Product of above scrip

While the above is quick way of plotting the data to give us an overview, in terms of visualising the data its lacking. Here are some of the areas we will look to improve upon:

  • Lack of colour
  • X and Y axis labels are hard to see and have large ticks intervals
  • No indication of high or low values, through colour or labels
  • No plot title

Luckily ggplot comes with some built in themes we can use to set some of the smaller details within a plot, with theme_bw or theme_minimal being my go to.

  • ggplot(Min_by_min, aes(one_Min, TotalDist_Min))+geom_bar(stat = 'identity')+theme_minimal()
    • Adding a small piece to the end can make a large difference to the overall plot
    • Also although data=, x= and y= were present in the earlier script, if we are referencing them within the ggplot function itself we don’t usually need to type those in.

Next we need some colour so we don’t have people looking at grey bars all day.

To add colour we have a choice of including it within the aesthetic or outside. Inside means we are mapping the colour the data, outside means we are mapping it to the plot object. Depending on the type of plot we are using we can either use colour= (either American or European spelling will work)or ‚Äčfill=, for a barplot colour= affect the outline of the bars whereas fill= will fill the bars with colour. ggplotwill also automatically apply a colour gradient if it thinks the Y-values warrant it. For our plot we will do the following:

  • ggplot(Min_by_min,aes(one_Min,TotalDist_Min,fill=TotalDist_Min)+geom_bar(stat = 'identity')+
    theme_minimal()
Screenshot 2018-11-06 at 01.51.59
Fill added

I mentioned earlier how the X and Y axis labels didn’t look the best, however the solution to the X-axis isn’t the most obvious. As the minute number column, one_Min, is minutes in numbers, ggplot is reading it as a numerical variable, however from our perspective it is a factor variable,i.e categorical in nature. We can either change the column in the dataframe itself to a factor variable through:

  • raw_gps$one_Min <- factor(raw_gps$one_Min)

Or we can carry this out within the script for the plot itself:

  • ggplot(Min_by_min,aes(factor(one_Min),TotalDist_Min,fill=TotalDist_Min)+geom_bar(stat = 'identity')+
    theme_minimal()
  • Although a bit crowded we now have the full minutes available to view on the x-axis

To set the tick frequency on y-axis we will use the scale_y_continuous function. This lets us determine minimum, maximum and the tick interval.

  • ggplot(Min_by_min, aes(factor(one_Min), TotalDist_Min, fill=TotalDist_Min))+geom_bar(stat = 'identity')+
    scale_y_continuous(breaks=seq(0,220,20))+
    theme_minimal()
  • In the scale_y_continuous we are asking the minimum value to be zero, maximum at 220 and a tick interval of 20

While we are at it lets add some better colours to the plot to really emphasise high/low points. We can do this using scale_colour_gradient or in our case scale_colour_gradient2. While both perform very similar functions, the second will allow us to set the midpoint of our colour scale, whereas the first only allows high and low points.

  • ggplot(Min_by_min, aes(factor(one_Min), TotalDist_Min, fill=TotalDist_Min))+
    geom_bar(stat = 'identity')+
    scale_y_continuous(breaks=seq(0,220,20))+
    scale_fill_gradient2(low='blue', mid='green', high='red', midpoint = 100, name='Meters Per Min')+
    theme_minimal()
  • Here we using scale_colour_gradient2to set what colour we would like the lowest, highest and mid points to be while also setting what datapoint counts as the midpoint and setting a title to the scale legend
  • Note what you set as the midpoint here will depend both on your data and your thoughts as to what high and low within it is
Screenshot 2018-11-06 at 02.17.13
X and Y axis tick rate, X axis labelling and colour gradient added

Finally we will add three last pieces to our plot: main title, x-axis and y-axis labels; alter x-axis scale to make it visible and finally add some data labels to highlight some points of potential interest.

Labs will allow us to change the x and y axis titles along with the main title for us. For the x-axis scale we will go in and alter the plot theme using theme, before using geom_text along with an ifelse statement to label certain data points in the plot.

  • ggplot(Min_by_min, aes(factor(one_Min), TotalDist_Min, fill=TotalDist_Min)) +
    geom_bar(stat = 'identity') +
    scale_y_continuous(breaks=seq(0,220,20)) +
    scale_fill_gradient2(low='blue', mid='green', high='red', midpoint = 100, name='Meters Per Min') +
    labs(y = "Distance Covered Per Min (M/min)",x = "Match Minute", title ="Minute by Minute Breakdown of Distance Covered") +
    geom_text(aes(label=ifelse(TotalDist_Min>100, round(TotalDist_Min,0),''))) +
    theme_minimal() +
    theme(
    axis.text.x = element_text(angle=90),plot.title = element_text(size=20 family = 'Garamond'))
  • While labs is more self- explanatory, geom_text and theme need some explaining
  • First the ifelse statement:
    • ifelse(TotalDist_Min>100, round(TotalDist_Min,0),'')
    • This is similar to an if formula in Excel where we are asking it to look at the TotalDist_Mincolumn, if it is over 100 produce a value rounded to zero decimal places, if it is not return blank.
  • The label within the aesthetics of geom_textare then made equal to the ifelse statement which gives us our datapoint above 100m.min.
  • While we have used theme_minimalto determine a number of visual settings within the plot we can also use theme to go in an tweak them ourselves. Here went to change the angle the x-axis scale is showing at using axis.text.x within theme itself (I also added a line to format the main title). Within that argument we changed the text angle to 90 degrees. I found 65 degrees or higher prevent the labels overlapping but feel free to play around with your own labels

So our final masterpiece looks like this (not a bad effort for about ten lines of code):

Screenshot 2018-11-06 at 02.46.28
Plot title, X and Y axis titles and conditional data labels added

Where we started to where we finished

Todays plot script:

Screenshot 2018-11-06 at 02.52.16

Hopefully this has given an indication as to how we can start to add layers to our plots piece-by-piece in addition to carrying out some data analysis within the plot itself as well. This is also showing why using R to analyse data can be very efficient. With the plot added to the end of the raw data analysis, now in less than a minute we can import, analyse and plot a minute by minute break-down of a game or training. How long would that take in a different software?

PS – Heres some frequent pitfalls when plotting in r:

  • Not closing brackets in the right places or leaving one of the closing brackets out
  • Leaving out the plus + sign between each layer so they are not connected
  • Having arguments which should be within the aes outside of it or vice-versa
  • Missing commas between arguments or missing quotation marks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Blog at WordPress.com.

Up ↑

%d bloggers like this: