As a lifelong Motorcycle Grand Prix (MotoGP) enthusiast, stumbling across Ankur Vishwakarma’s Predicting MotoGP Race Finish Times using Linear Regression was a fantastic read, and I immediately realized that MotoGP data is now much more available than it once was.

As part of an Undergraduate Research Project in 2008, I had briefly tried to use similar datasets to explore the effects of rider attrition and optimal expected performance timeframes, but unlike baseball the data was very difficult to obtain. Fast forward now to official repositories and web scraping resources I could have only dreamt about in 2008.

Vishwakarma’s .csv dataset was scraped from the MotoGP Official Website. I have since forked his research into my Github account. The following is his Data Dictionary:

Summary of Data

This file contains the following data points for racing sessions only between 2005 and 2017 in all 3 racing classes.


Here is the .csv data table:

library(pander)

URL <- "https://raw.githubusercontent.com/nbugliar/motogp_regression/master/MotoGP_2005_2017.csv"
dat <- read.csv( URL, stringsAsFactors=F )
m <- dat[1:5, 2:22]
pandoc.table(m, style= "rmarkdown")
## 
## 
## | Year | TRK |                       Track                        | Category |
## |:----:|:---:|:--------------------------------------------------:|:--------:|
## | 2017 | QAT | Grand Prix of Qatar - Losail International Circuit |  MotoGP  |
## | 2017 | QAT | Grand Prix of Qatar - Losail International Circuit |  MotoGP  |
## | 2017 | QAT | Grand Prix of Qatar - Losail International Circuit |  MotoGP  |
## | 2017 | QAT | Grand Prix of Qatar - Losail International Circuit |  MotoGP  |
## | 2017 | QAT | Grand Prix of Qatar - Losail International Circuit |  MotoGP  |
## 
## Table: Table continues below
## 
##  
## 
## | Session |    Date    | Track_Condition | Track_Temp | Air_Temp |
## |:-------:|:----------:|:---------------:|:----------:|:--------:|
## |   RAC   | 2017-03-26 |       Dry       |     22     |    21    |
## |   RAC   | 2017-03-26 |       Dry       |     22     |    21    |
## |   RAC   | 2017-03-26 |       Dry       |     22     |    21    |
## |   RAC   | 2017-03-26 |       Dry       |     22     |    21    |
## |   RAC   | 2017-03-26 |       Dry       |     22     |    21    |
## 
## Table: Table continues below
## 
##  
## 
## | Humidity | Position | Points | Rider_Number |    Rider_Name     |
## |:--------:|:--------:|:------:|:------------:|:-----------------:|
## |   0.96   |    1     |   25   |      25      | Maverick VIÑALES |
## |   0.96   |    2     |   20   |      4       | Andrea DOVIZIOSO  |
## |   0.96   |    3     |   16   |      46      |  Valentino ROSSI  |
## |   0.96   |    4     |   13   |      93      |   Marc MARQUEZ    |
## |   0.96   |    5     |   11   |      26      |   Dani PEDROSA    |
## 
## Table: Table continues below
## 
##  
## 
## | Nationality |       Team_Name        |  Bike  | Avg_Speed |   Time    |
## |:-----------:|:----------------------:|:------:|:---------:|:---------:|
## |     SPA     | Movistar Yamaha MotoGP | Yamaha |   165.5   | 38'59.999 |
## |     ITA     |      Ducati Team       | Ducati |   165.5   |  +0.461   |
## |     ITA     | Movistar Yamaha MotoGP | Yamaha |   165.4   |  +1.928   |
## |     SPA     |   Repsol Honda Team    | Honda  |    165    |  +6.745   |
## |     SPA     |   Repsol Honda Team    | Honda  |    165    |  +7.128   |
## 
## Table: Table continues below
## 
##  
## 
## |        Finish_Time        |                 GP                 |
## |:-------------------------:|:----------------------------------:|
## | 0 days 00:38:59.999000000 | QAT - Losail International Circuit |
## | 0 days 00:39:00.460000000 | QAT - Losail International Circuit |
## | 0 days 00:39:01.927000000 | QAT - Losail International Circuit |
## | 0 days 00:39:06.744000000 | QAT - Losail International Circuit |
## | 0 days 00:39:07.127000000 | QAT - Losail International Circuit |


There are many different avenues one could pursue with this information. I hope you find it as valuable as I have, and welcome any and all feedback and recommendations as to best use this dataset in the future.

Upon an initial inspection, I think that the following should take place:

Thoughts? my email