Predicting The Starting Goalie
Predicting the Choice of Starter Goalie
Jack Davis
Sportlogiq, Simon Fraser University
Motivation
Predicting which goaltender will be selected in an NHL hockey game is more difficult than equivalent choices in other sports. In MLB baseball, teams use a ‘rotation’ of starting pitchers, typically five for the sake of load-management and injury prevention. If a pitcher starts in game 1 of a season, they are likely to also pitch in games 6 and 11 as well. While there is some randomness and additional strategic consideration to this process, the rotation is predictable enough that sports media frequently and accurately report the ‘probable starters’ 2 or 3 games in advance.
In NHL hockey, despite teams having fewer goalies to choose from, the prediction task is much harder, and media typically waits until a few hours before a game (e.g. after the morning skate) to report the starting goalie.
Having more accurate predictions is valuable to media to give them something to report early and spin a narrative out of. Predictions are valuable to opposing teams because it helps inform their roster choices, including their own starting goaltender.
Predictions are also valuable to gamblers and bookmakers, as the choice of goalie matters a lot. The average difference between a team’s ‘main’ goalie and their ‘secondary’ goalie amounts to 0.296 [95% CI: 0.214 – 0.378] goals. For comparison, the home team advantage in the last three seasons has been worth 0.288 [95% CI 0.209 – 0.367] goals.
Preliminaries
Teams bring, or ‘dress’ two goalies for each game, but typically only use one. The second goalie is there in case of injury or dramatic underperformance by the starting goalie. Currently, no team is adventurous enough to try changing goalies during the game for other purposes like load management.
Most teams have a designated ‘main’ goalie, who typically starts 65% of games, a ‘secondary’ or ‘backup’ goalie who starts 30% of the time, and one or more emergency goalies that start the remainder of games.
A few teams currently use a hybrid approach where goalies use is closer to 50-50 between two goalies, but this strategy is situational rather than widespread. No team relies on more than two goalies regularly.
To team outsiders, the identity of the starting goalie is never 100% certain until the warmup soon before the game starts. However, the ‘morning skate’, a common practice session done a few hours before the game, is usually pretty informative.
Data Description
The dataset is aggregated from Sportlogiq’s event data from the 2017-18, 2018-19, and 2019-20 seasons. The aggregated data includes raw counts of events like shot attempts, hits, passes, and loose puck recoveries. To get a sense of the ‘stake’ of a given game, the number of season points of both teams is included as well.
The dataset covers 3585 regular season games, or 7170 goalie starts, which we split into a training and test set. The training set is of 5084 appearances in 2542 games covering all games in the 2017-18 and 2018-19 regular NHL seasons. The test set covers the 2086 starts in 1043 in the shortened 2019-20 season. Table 1 shows the raw counts of the response variable (main goalie used or not) by season.
Table 1: Starts by season
Number of Starts | Main goalie | Other goalie |
2017-18 Season | 1612 (63.4%) | 930 (36.6%) |
2018-19 Season | 1536 (60.4%) | 1006 (39.6%) |
2019-20 Season | 1243 (59.6%) | 843 (40.4%) |
Despite the evidence of some season-to-season drift in the way that goalies are used, a model that gives more weight to recent seasons did not result in an improved model. Details are given in the ‘dead ends’ section.
For the ‘main’ and ‘secondary’ goalies for each team, the dataset includes the number of games that each goalie has started out of the last game, 10 games, 40 games, all games in the season so far. For each of those periods, the dataset contains the mean shots against, goals against, and save percentage per game for the main and secondary goalies.
The dataset has detailed scheduling information including the number of days since the team played, the number of days until they are expected to play again, and the number of days since each of the goalies played.
The dataset also includes schedule information like the number of days, and matchup-specific things like the record of games against the current opponent in the last season, and last three seasons.
Model description
(If this next part sounds like technobabble, feel free to skip or skim it. All you really need to know about Table 2 is that a positive log OR next to a variable means that the main goalie is more likely to play if that thing happens (or if there is more of that thing), and a negative log OR means the main goalie is less likely to play. The bigger the log OR, the bigger the effect of that thing. Example: There is a positive value next to “playing as the home team”, and it’s not very big compared some other effects, which means that teams are a little bit more likely to put their main goalie in if that team is playing at home.)
For performance evaluation purposes. We designated the 2017-18 and 2018-19 seasons to be the training set and the 2019-20 season to be the test set.
The model is a simple logistic regression with the canonical logit link function. “Was the main goalie used” was the binary response variable. The explanatory variables and their associated coefficients are given in Table 2. Both the coefficients for the training set and the entire dataset together are included.
The comparison baseline is the team playing as the visitor, while having only one day of rest and expecting one day of rest until the next game, with both the main and backup goalies somehow having only one day of rest.
Table 2: Logistic model for whether the main goalie will start
Variable | Log OR (Training set) | Coefficient (All data) | Standard Error (All data) |
Baseline / Intercept[1] | -2.131 | -2.145 | 0.250 |
Playing as the visiting team | Reference | Reference | Reference |
Playing as the home team | 0.174 | 0.150 | 0.057 |
Proportion of starts to main this season | 4.068 | 4.687 | 0.232 |
Team played 1 night ago | Reference | Reference | Reference |
Team played 2 nights ago | -0.337 | -0.541 | 0.126 |
Team played 3 nights ago | 0.116 | -0.041 | 0.163 |
Team played 4+ nights ago | 0.359 | 0.130 | 0.169 |
Team will play again in 1 night | Reference | Reference | Reference |
Team will play again in 2 nights | 0.289 | 0.288 | 0.077 |
Team will play again in 3 nights | 0.258 | 0.255 | 0.095 |
Team will play again in 4+ nights | 0.308 | 0.293 | 0.115 |
Main goalie has 0-2 nights of rest | Reference | Reference | Reference |
Main goalie has 3-5 nights of rest | -0.298 | -0.332 | 0.129 |
Main goalie has 6-8 nights of rest | -0.687 | -0.565 | 0.163 |
Main goalie has 9-11 nights of rest | -0.723 | -0.808 | 0.193 |
Main goalie has 12+ nights of rest | -0.557 | -0.600 | 0.178 |
Secondary goalie has 0-2 nights of rest | Reference | Reference | Reference |
Secondary goalie has 3-5 nights of rest | -0.524 | -0.576 | 0.123 |
Secondary goalie has 6-8 nights of rest | -0.410 | -0.408 | 0.131 |
Secondary goalie has 9-11 nights of rest | -0.499 | -0.561 | 0.144 |
Secondary goalie has 12+ nights of rest | -0.121 | -0.161 | 0.115 |
Proportion of season completed[2] | -0.469 | -0.401 | 0.101 |
Last Game Goal Differential[3] | -0.106 | -0.120 | 0.018 |
Main goalie last game? | 1.116 | 0.897 | 0.160 |
Last game (differential) X (main started) | 0.213 | 0.225 | 0.023 |
Last game (1 night ago) X (main started) | -2.516 | -2.704 | 0.163 |
Main goalie 2 games ago? | 1.173 | 1.021 | 0.146 |
Main goalie 3 games ago? | 0.562 | 0.483 | 0.129 |
Main goalie last game AND 2 ago? | -1.301 | -1.185 | 0.161 |
Main goalie last game AND 3 ago? | -0.382 | -0.404 | 0.125 |
Main goalie 2 ago AND 3 ago? | -0.364 | -0.296 | 0.129 |
Null deviance: 9574.2, 7169 DF
Model deviance: 7916.2, 7143 DF
Nagelkerke R-squared: 0.2802
[1] Measured in log-odds instead of log-odds ratio
[2] In the case of 2019-20, an 82-game season was expected by teams, so ‘number of games played divided by 82’ is used instead of the proportion of the actual season.
[3] ”This team score” minus “opposing team score” at the end of overtime. Shootout goals are not counted.
The Home Team Matters
Over the last three seasons, in a vacuum, the home team has put their main between the pipes 65% (2335 / 3585) of the time. The visiting team has done so 57% (2056 / 3585) of the time. In other terms, the odds of the home team starting with their main goalie is 1.19 (95% CI: 1.05-1.33, p=0.014) times as high as it is for the visitors.
When accounting for everything else in the model (i.e. “all else being equal”), the odds of the a team starting with their main is exp(0.150) = 1.16 times as high when they’re at home. That 1.16 multiplier is not an overwhelming effect, but it’s persistent even after factoring other variables like scheduling and recent performance.
Long Term Usage Matters
The ‘proportion of starts given to the main this season’ effect, while definitive in this model, is not as overwhelmingly large as it appears. The log-odds ratio of 4.669 given is a comparison between a team with never uses their main goalie to one which always uses its main goalie. In reality, this proportion is rarely outside (0.5, 0.8). A more useful interpretation is that the odds of a team starting with the main goalie is exp(4.687 * 0.3) = 4.080 times as high, all else being equal, if that team historically uses their main 80% of the time instead of 50% of the time.
In short, the more a team has used their main goalie that season, the more likely they are to keep using them.
One important thing we do here is ‘reset’ that proportion each season. A lot can happen between seasons, including trades, development, and aging. We tried models that carried a ‘main goalie usage level’ from season to season, but hitting the reset button every October worked way better.
The proportion we use is calculated by
(Games this season the main has started + 1) / (Games played by the team this season + 2)
Where ‘games’ only refers to games that have been played *before* the current one.
This value starts at 0.5 for each team in each season and usually converges to a constant. There are a few exceptions where the goalie that was designated to be the main for a team was either injured or traded onto the team early in the season. These patterns are shown in Figure 1. Vertical lines are separating seasons.
Figure 1
Days of Rest Matter
The days of rest that a team matters in the starting goalie decision. All else being equal, if a team has had three days of rest, they are less likely to start with their main goalie (log OR = -0.541) than if the team has had two days of rest instead of one. If the team is especially well rested and has had 3 or more days without a game, they are equally likely to start with their main goalie (log OR = -0.041 and 0.130 respectively).
The days of rest that a team is expecting also matters. A team that will be have 2, 3, or 4+ nights until the game after their current one is substantially more likely to use their main goalie than otherwise (log OR= 0.288, 0.255, and 0.293 respectively). Since teams never play three games in three consecutive nights, at least one of the ‘current rest’ and ‘expected rest’ effects will come into play for each game.
On top of all of this is the back-to-back game effect. If there was a game the previous night and the main goalie started, then the log-odds of the main goalie starting again are -2.704 lower than otherwise. This effect is specific to the combination of those two conditions, and is in addition to the individual effects of having only one day rest (reference coefficient) and having started in the previous game (log OR = 0.897).
Time of Year Matters
As the season gets later, the main goalie is less likely to start, all else being equal. The log-odds of the main goalie starting the last game of the season is 0.401 less than the log-odds of them starting the first season. If load management is related to this effect, it could be an indication the main goalie is being saved for more and more critical games as the season progresses. Alternatively, it could reflect the increasing cumulative chance of the main goal being unavailable due to injury.
Finally, the performance of the goalie in the last game matters. For each goal that the team won or lost (or reached the end of overtime) by in the last game, the log-odds of the main goalie starting the next game is -0.120 higher if the secondary goalie was in net last game, and is (-0.120 + 0.225 = 0.105) higher if the main goalie was in net last game.
In other words, if the team won their last game, they are more likely to continue using whatever goalie was in net for the win. If the team lost, they are more likely to change goalies between one game and the next. Furthermore, the strength of this tendency is proportional to the size of the win or loss. Here, a game that goes to shootout is treated as if it has a goal differential of zero.
Model performance
Let’s start with some null model data:
As shown in Table 1, in the training set, 3148 of 5084 (61.9%) of starts were given to the main goalie. In the test set, this rate is a little lower (59.6%) due to season-to- season drift. As such, there are more false positives (predicting a main will start when someone else does) than false negatives. This figure, 59.6%, also gives us a null accuracy to compare to.
Using the full dataset, we find that the model reduces the deviance from 9574 (on 7169 degrees of freedom) to 7961 (on 7138 degrees of freedom), which translates to a Nagelkerke R2 of 0.2733. It doesn’t make sense to speak of variance when using a binary response like “will the main goalie start”, but Nagelkerke’s R2 is a close analogue to reduction in variance from the model. Here it suggests that with the model we can be about 27% less uncertain about the starting goalie than simply assigning the global mean probability.
Table 3 has the confusion matrix of the test set and, for assessing the degree of overfitting, the training set.
Table 3: Confusion matrices of the test and training sets
Test set | Predicted Main | Predicted Someone else |
Actual Main | 1093 (TP) | 150 (FN) |
Actual Someone else | 473 (FP) | 370 (TN) |
Training set | Predicted Main | Predicted Someone else |
Actual Main | 2736 (TP) | 412 (FN) |
Actual Someone else | 960 (FP) | 976 (TN) |
Test Set:
Sensitivity = TP / (TP + FN) = 1093 / 1243 = 0.879
Specificity = TN / (TN + FP) = 350 / 843 = 0.439
Accuracy = (TP + TN) / (TP + TN +FP + FN) = 1451 / 2086 = 0.701
Training set:
Sensitivity = 2736 / 3148 = 0.869
Specificity = 976 / 1936 = 0.504
Accuracy = 3712 / 5084 = 0.730
From the test set and training set accuracy, we can say there are 24.7% and 31.9% fewer misclassifications respectively, as compared to the null. Using the model is better than just guessing the main goalie every time, but it’s far from prophetic.
Breaking down the predictions by the fitted probability of the main being selected doesn’t improve the story. Figure 2 is a stacked histogram of the number of times each goalie was used, binned by the fitted probability from our model. Unfortunately, there is not a strong gradient in the actual results along the fitted probability, meaning that we cannot interpret a fitted value of “90% main” as there being an actual 90% chance of the main goalie being selected.
Figure 2
We hope future models improve on these results, but until then, the ‘dead ends’ section gives a detailed record of everything else that was tried that either didn’t improve the model, or was equally good as something already being done.
Dead ends
Weighing games by recency didn’t help
As with any aspect of the sport, the decisions surrounding goalies are evolving. In our three-season window, we see some evidence of this in the drifting proportion of main goalies used from 64% to 60%. Just in case this was a real effect and not a statistical artifact, we tried some weighted models where the games from each season counted for 5-40% less than the games from the next season. For each decay rate, the model either performed the same or worse, possibly owing to the loss in effective sample size.
If a window of more than three seasons were being used, then any long-term changes in strategy might be more apparent, and some long-term adjustment might be more useful then.
Treating home and away starting goalies as correlated didn’t help
There is only limited evidence that one team’s starting goalie choice carries information about the other team’s choice. Over the last three seasons, the home team selects their main goalie to start 65.1% of the time. When the visiting team also selects their main, that increases to 66.8%. When the visiting team uses someone else, the home team goes with their main 62.9% of the time. Exact counts are in Table 4.
Alternatively, the visiting team uses their main 57.4% of all games, which breaks down to 58.8% and 54.6% when the home team uses their main or someone else, respectively. However, it makes more sense to describe the home team’s selection conditional on the visiting team’s selection because the home team has the privilege to make roster selections second.
Nevertheless, this conditional selection is barely statistically significant against independence (chi-squared Yates = 5.934, p = 0.015); it’s not a large enough difference to warrant modelling ‘home starter’ and ‘away starter’ as anything but independent variables.
Table 4: Starts by home-away situation
Number of games | Visitor starts main | Visitor starts someone else |
Home starts main | 961 | 1374 |
Home starts someone else | 568 | 682 |
More detailed goalie rest information didn’t help
An earlier version of this model used the exact number of days of rest of both the secondary and the main goalie as predictor variables, up to 10. For the purpose of fitting the data and making predictions, many of these variables became redundant, especially after the “last three games’ selections” variables were included. However, the model coefficients for these variables were still useful describing a few trends in goalie selection, so the explanation of these coefficients is included here instead. The complete model is included in the appendix.
The amount of rest of the main and backup goalies has a separate set of effects from those of the team’s rest. For short periods, the more days of rest that the main goalie has had, the more likely the main is to appear in the next game (log OR = 0.967. 0.940, 0.653, and 0.513 for 2,3,4, and 5 days of rest respectively), but for longer periods of time this effect disappears and then reverses (-0.394, -0.378, and =-0.499 for 8,9, and 10+ days of rest respectively), possibly because longer rests are an indication that the main goalie is injured or otherwise scratched from the roster.
For the secondary goalie, the story is simpler: for any amount of time more than 1 night that the secondary goalie has started, the main is less likely to start. There doesn’t seem to be any clear pattern as to the amount of rest the secondary has had, and the effect ranges from -0.530 for 2 days of rest on the secondary, to -1.133 for exactly 9 days of rest on the secondary.
Figure 3 shows the log odds-ratio of the main goalie being selected compared to a ‘one day rest each’ baseline.
More Detailed Goalie Performance didn’t help
The model uses the goal differential for the most recent game, but other than that, no other direct indicators of performance are used. Early versions of the model included moving averages of wins, shots against, goals against, and save percentage for both the main and secondary goalies at 5-, 10-, and 40-game windows. None of these variables explained a significant amount of variance that wasn’t already being explained by other variables.
Some common transformations were also considered, including the difference between the main and secondary goalies’ stats, as well as replacing the actual numbers with their quantiles.
Game Context didn’t help
We suspected that there could be some effects related to the context of the game, such as the relative importance of the game for playoff qualification, whether the opposing team was in the same division, the relative position in the standings of the teams. None of these variables improved the model.
We also considered the possible effect of rivalries (e.g. “we want Schneider in net against Vancouver because they traded him away” or specialty effects (e.g. “every time Quick is in net against Anaheim we win, so keep doing that”). We did this by including the proportion of games that the main goalie is played against an opponent using the same formula as we used for the proportion of games played in a season; it didn’t improve the model.
The number of starts out of the last 10 games, and the last 40 games, were considered for the model, but the proportion of starts in a season explained more variation than both measures together.
Other Discussion / Future Work
Predictions 2-3 games ahead
The current model offers some predictive ability for the next game to be played by a team, but this information is only useful from the end of one game to the morning skate of the next game. For media, fan, and strategic purposes, it would be much more valuable to be able to predict the starting goalies multiple games into the future.
A simple approach would be to apply the current model to a Monte Carlo simulation of goalie selections and game outcomes. However, given the current model’s predictive strength, those predictions would quickly decay towards the global mean for the home and away teams respectively.
In order to offer something like the “likely starting pitchers” that media routinely does for Major League Baseball, a much stronger model is required.
More Seasons, Coaching Data
We only considered 3 seasons of information from the Sportlogiq event database because we wanted to take advantage of the most detailed information available. However, the only features that mattered in the end were those already available to the public. With some additional preparation, a database with more longitudinal depth and coaching information could be used instead.
With a dataset of 10 or even 20 seasons, we could better account for changes in goalie selection strategy over time, ages and level of experience of individual goalies, and effects from head coaches which do occasionally change teams. Some other features like rivalry effects which were too noisy or sparse to contribute to the model might become useful after all.
Game Theoretic Considerations
All of this speculation about improved models assumes that coaches and teams are indifferent to others’ knowledge about their starting goalie selection. If there is any strategic advantage, or even the perception of an advantage, that could be gained from an opposing team knowing this before morning skate, then coaches may take additional steps to make their selection harder to predict.
Alternatively, we could assume that the value gained in making one’s goalie selection unpredictable is too small compared to the value lost from making an otherwise suboptimal decision.
There’s still a lot of work to be done in predicting goalie selections, but we hope this manuscript starts some conversations about strategy.
Appendix
The following is an alternative model that looks at main goalie and secondary goalie rest at a one-day resolution up to 10 days instead of a three-day resolution up to 12 days. It also doesn’t explicitly include the choices of goalie for two and three games before the upcoming one. This model performs worse than the final model (e.g. Nagelkirke R2 = 0.273 instead of 0.280), but provides additional insights into the effects of rest on goalie selection, as shown in Figure 3.
Table 5: Alternative goalie starts model with detailed rest
Variable | Log OR (Training set) | Coefficient (All data) | Standard Error (All data) |
Baseline / Intercept[1] | -1.997 | -2.097 | 0.289 |
Playing as the visiting team | Reference | Reference | Reference |
Playing as the home team | 0.153 | 0.139 | 0.057 |
Proportion of starts to main this season | 4.130 | 4.669 | 0.223 |
Team played 1 night ago | Reference | Reference | Reference |
Team played 2 nights ago | -0.003 | -0.065 | 0.188 |
Team played 3 nights ago | 0.249 | 0.192 | 0.195 |
Team played 4+ nights ago | 0.868 | 0.725 | 0.201 |
Team will play again in 1 night | Reference | Reference | Reference |
Team will play again in 2 nights | 0.278 | 0.282 | 0.077 |
Team will play again in 3 nights | 0.245 | 0.239 | 0.095 |
Team will play again in 4+ nights | 0.373 | 0.326 | 0.115 |
Main goalie has 1 night of rest | Reference | Reference | Reference |
Main goalie has 2 nights of rest | 0.929 | 0.967 | 0.248 |
Main goalie has 3 nights of rest | 0.956 | 0.940 | 0.234 |
Main goalie has 4 nights of rest | 0.563 | 0.653 | 0.231 |
Main goalie has 5 nights of rest | 0.469 | 0.513 | 0.238 |
Main goalie has 6 nights of rest | -0.115 | 0.047 | 0.255 |
Main goalie has 7 nights of rest | 0.208 | 0.263 | 0.266 |
Main goalie has 8 nights of rest | -0.711 | -0.394 | 0.290 |
Main goalie has 9 nights of rest | -0.646 | -0.378 | 0.302 |
Main goalie has 10+ nights of rest | -0.537 | -0.499 | 0.228 |
Secondary goalie has 1 night of rest | Reference | Reference | Reference |
Secondary goalie has 2 nights of rest | -0.372 | -0.580 | 0.196 |
Secondary goalie has 3 nights of rest | -0.583 | -0.748 | 0.197 |
Secondary goalie has 4 nights of rest | -0.644 | -0.835 | 0.185 |
Secondary goalie has 5 nights of rest | -0.636 | -0.825 | 0.192 |
Secondary goalie has 6 nights of rest | -0.701 | -0.817 | 0.198 |
Secondary goalie has 7 nights of rest | -0.645 | -0.730 | 0.200 |
Secondary goalie has 8 nights of rest | -0.703 | -0.873 | 0.212 |
Secondary goalie has 9 nights of rest | -0.819 | -1.133 | 0.215 |
Secondary goalie has 10+ nights of rest | -0.540 | -0.693 | 0.167 |
Proportion of season completed[2] | -0.353 | -0.333 | 0.102 |
Last Game Goal Differential[3] | -0.104 | -0.116 | 0.018 |
Did the main goalie start last game? | -0.184 | -0.359 | 0.114 |
Last game (differential) X (main started) | 0.208 | 0.218 | 0.023 |
Last game (1 night ago) X (main started) | -1.396 | -1.390 | 0.298 |
Null deviance: 9574.2, 7169 DF
Model deviance: 7961.3, 7138 DF
Nagelkerke R-squared: 0.2733
Test Set:
Sensitivity = TP / (TP + FN) = 1101 / 1243 = 0.886
Specificity = TN / (TN + FP) = 350 / 843 = 0.415
Accuracy = (TP + TN) / (TP + TN +FP + FN) = 1451 / 2086 = 0.696
Training set:
Sensitivity = 2709 / 3148 = 0.861
Specificity = 979 / 1936 = 0.506
Accuracy = 3688 / 5084 = 0.725
[1] Measured in log-odds instead of log-odds ratio
[2] In the case of 2019-20, an 82-game season was expected, so ‘number of games played divided by 82’ is used.
[3] ”This team score” minus “opposing team score” at the end of overtime. Shootout goals are not counted.
Follow us on Twitter @Sportlogiq to stay up-to-date with our latest news.