Return to search

Forecasting Success in the National Hockey League Using In-Game Statistics and Textual Data

In this thesis, we look at a number of methods to forecast success (winners and losers),
both of single games and playoff series (best-of-seven games) in the sport of ice hockey,
more specifically within the National Hockey League (NHL). Our findings indicate that
there exists a theoretical upper bound, which seems to hold true for all sports, that
makes prediction difficult.
In the first part of this thesis, we look at predicting success of individual games to
learn which of the two teams will win or lose. We use a number of traditional statistics
(published on the league’s website and used by the media) and performance metrics
(used by Internet hockey analysts; they are shown to have a much higher correlation with
success over the long term). Despite the demonstrated long term success of performance
metrics, it was the traditional statistics that had the most value to automatic game
prediction, allowing our model to achieve 59.8% accuracy.
We found it interesting that regardless of which features we used in our model, we
were not able to increase the accuracy much higher than 60%. We compared the observed
win% of teams in the NHL to many simulated leagues and found that there appears to
be a theoretical upper bound of approximately 62% for single game prediction in the
NHL.
As one game is difficult to predict, with a maximum of accuracy of 62%, then pre-
dicting a longer series of games must be easier. We looked at predicting the winner of
the best-of-seven series between two teams using over 30 features, both traditional and
advanced statistics, and found that we were able to increase our prediction accuracy to
almost 75%.
We then re-explored predicting single games with the use of pre-game textual reports
written by hockey experts from
http://www.NHL.com
using Bag-of-Word features and
sentiment analysis. We combined these features with the numerical data in a multi-layer
meta-classifiers and were able to increase the accuracy close to the upper bound

Identiferoai:union.ndltd.org:uottawa.ca/oai:ruor.uottawa.ca:10393/31553
Date January 2014
CreatorsWeissbock, Joshua
ContributorsInkpen, Diana
PublisherUniversité d'Ottawa / University of Ottawa
Source SetsUniversité d’Ottawa
LanguageEnglish
Detected LanguageEnglish
TypeThesis

Page generated in 0.0022 seconds