So far, we’ve covered general descriptive statistics, such as yearly attribute trends, points won distributions, and competitiveness on our blog. We build on those findings by creating a model capable of predicting ATP tennis match winners. Using historical data points, we achieve 81% accuracy in predicting match winners for the 2016 Australian Open. We delve into the development process and share our predictions below.
In order to predict which player will win a given match, we must first program our model to classify a win as a 1 and a loss as a 0. We train our model on matches from January 2000 up to the 2016 Australian Open to give us a broad exposure to different outcomes. We exclude the 2016 Australian Open from the training data to use in calculating our eventual prediction accuracy.
One of the main features of our model input data is that it is “aggregated historically”. This means that for every match fed into the model, there are corresponding historical variables for each player. For example, we include the historical head-to-head record, same tournament performance last year, surface record, average aces on surface, average break points converted/saved, etc. This is because we do not know the current match attributes, so we must use historical numbers. We also add “current” stats such as height, age, rank, rank point differences, and others between the players as we know these going into any given match. In this way, we use past performance, along with current attributes, to predict future results.
This method differs slightly from other forecasters in this space. Upon examination of FiveThirtyEight’s 2016 US Open predictions, we note that they create a current “Tournament Elo”, or a power rating based on head-to-head matches between players. We examine their performance in the results section below.
Our model is created using a stepwise selection method that automatically picks the most statistically significant set of variables. This has programming and computational benefits but lacks transparency in being able to determine the impact certain variables have on match outcome. As the complexity of the model increases (more variables), it obscures the true coefficients of the input variables. Coefficient interpretation is slated for further research as our current goal is to create the most accurate model possible. Still, it can be meaningful to examine a sample of the “significant” variables that ended up being picked by the selection method:
- Rank Point Difference and Rank Difference – The differential between the ranking points and ranking encompasses the overall performance gap between the player and his opponent.
- Player Career Wins and Losses on Surface – In this case, the wins and losses on hard courts as that is the surface the Australian Open is played on
- Recent Wins and Losses – Encompasses more recent recent player performance
- Head-to-Head Wins and Losses vs. Opponent
- Player and Opponent Entry Type – Qualifier, Lucky Loser, or Direct Admitance
- Career % of Points Won
- Aces, First Serve In %, First Serve Points Won, Second Serve Points Won in Current Tournament – stats from earlier rounds
- Player Age and Opponent Age
- Game Differences and Number of Sets Played in Current Tournament – Variables that represent the players previous margin of victory. This has the potential to favor a player who has been more dominant in the current tournament
- Break Points Saved and Converted – Shows how well a player performs under pressure
With the above inputs (as well as other variables), our model outputs a number from 0 to 1, which is interpreted as a percent and represents the likelihood of a player winning his match.
Please see 2016 Australian Open Round of 32 matches and forward predictions below. For a full list, download the pdf here: 2016-Australian-Open-Predictions
We count a model prediction score above 50% as a predicted win for the player indicated, and a score below 50% as a loss. It should be noted that, for each round, regardless of our model being correct, we use the correct player for further predictions.
As you can see, our model does a pretty good job of predicting outcomes in the later stages of a tournament. There are two reasons for this:
- We created current tournament performance variables, such as Aces, First Serve Won %, and First Serve In % which are only populated starting in the 2nd round of a tournament and onwards.
- Players that reach the later stages of the tournament have been more successful on the tour and thus have more historical matches that our model can train on.
In earlier rounds, our model may experience match-ups that it has never encountered before, making variables like head-to-head meaningless.
We predicted 101 out of 125 matches correctly, or 81%. We were unable to create predictions for two first round matches due to insufficient data.
How does our model compare to just picking the higher seed every time?
The accuracy of picking the higher seed as the winner to every match is around 76%, which is 5% lower than our results. This is great news, as it means that our model taps into a signal above and beyond just picking the “better-ranked” player. This is also the case for head-to-head, as our model has the ability to still pick the correct winner, even if he has a negative W-L.
How does it compare to other models?
The 2016 US Open 538 predictions were about 75% accurate which was similar to our method of picking the higher seed every time. It is a different tournament so direct comparison may not be practical, but it may be representative of the generic accuracy rate for the sport of tennis. It’s possible that there were more upsets in the US Open than the Australian Open, like Novak Djokovic losing to Stanislas Wawrinka in the finals, even though that would have accounted for just one match.
538 uses the same data that we utilize for our blog (see our about page for more info). The data stops in January 2016, so it is unclear if 538 went without the 2016 season data to build their US Open predictions (which occurs in late August every year). That’s about 2,000 matches that could sway a prediction model one way or another. With future data availability, we will look to make our own US Open predictions to see how this model performs for other tournaments.
Model Errors and Upsets
As with any model, it will be wrong from time to time, or even most of the time. However, are there levels of wrongness? For example, when our model was wrong, it predicted the winner had about a 30-50% chance of winning, or a fairly close match.
Here are a few upsets along with some comments. The denoted chance of winning represents the model output for the eventual winner.
Milos Raonic defeated Wawrinka in the Round of 16 with a pre-match chance of winning of 37%. Raonic had lost each of their previous encounters, and is ranked lower than Wawrinka, so it’s not surprising that our model predicted Raonic to lose.
Roberto Bautista Agut defeated Marin Cilic in the Round of 32 with a 36% chance of winning. Cilic typically does not do well at the Australian Open but it’s still surprising that he lost early on. Even so, our model did not have him as a runaway favorite.
In the first round, the biggest upset of the tournament was Fernando Verdasco defeating Rafael Nadal, with only an 8% chance of winning. Clearly from the head-to-head history and rank difference, Nadal was the favorite.
All these upsets and close predictions actually increase my confidence in our model, as it would be just as hard for any human forecaster to predict these outcomes. The model does a good job of approximating the impactful variables that can predict matches, but is still ultimately limited by the human who designed it, made assumptions on what variables to create, and structured the data.
With this first foray into the world of prediction, we used our findings from previous posts to guide our model and variable creation. We examined some basic assumptions of our model, used statistical tools to create predictions, and looked back on our results. With time, we hope to hone our skills to forecast actual unplayed matches and share the results. Stay tuned.