pip install mezzala
import mezzala
Fitting a Dixon-Coles team strength model:
First, we need to get some data
import itertools
import json
import urllib.request
# Use 2016/17 Premier League data from the openfootball repo
url = 'https://raw.githubusercontent.com/openfootball/football.json/master/2016-17/en.1.json'
response = urllib.request.urlopen(url)
data_raw = json.loads(response.read())
# Reshape the data to just get the matches
data = list(itertools.chain(*[d['matches'] for d in data_raw['rounds']]))
data[0:3]
To fit a model with mezzala, you need to create an "adapter". Adapters are used to connect a model to a data source.
Because our data is a list of dicts, we are going to use a KeyAdapter
.
adapter = mezzala.KeyAdapter( # `KeyAdapter` = datum['...']
home_team='team1',
away_team='team2',
home_goals=['score', 'ft', 0], # Get nested fields with lists of fields
away_goals=['score', 'ft', 1], # i.e. datum['score']['ft'][1]
)
# You'll never need to call the methods on an
# adapter directly, but just to show that it
# works as expected:
adapter.home_team(data[0])
Once we have an adapter for our specific data source, we can fit the model:
model = mezzala.DixonColes(adapter=adapter)
model.fit(data)
By default, you only need to supply the home and away team to get predictions. This should be supplied in the same format as the training data.
DixonColes
has two methods for making predictions:
predict_one
- for predicting a single matchpredict
- for predicting multiple matches
match_to_predict = {
'team1': 'Manchester City FC',
'team2': 'Swansea City FC',
}
scorelines = model.predict_one(match_to_predict)
scorelines[0:5]
Each of these methods return predictions in the form of ScorelinePredictions
.
predict_one
returns a list ofScorelinePredictions
predict
returns a list ofScorelinePredictions
for each predicted match (i.e. a list of lists)
However, it can sometimes be more useful to have predictions in the form of match outcomes. Mezzala exposes the scorelines_to_outcomes
function for this purpose:
mezzala.scorelines_to_outcomes(scorelines)
Extending the model
It's possible to fit more sophisticated models with mezzala, using weights and model blocks
Weights
You can weight individual data points by supplying a function (or callable) to the weight
argument to DixonColes
:
mezzala.DixonColes(
adapter=adapter,
# By default, all data points are weighted equally,
# which is equivalent to:
weight=lambda x: 1
)
Mezzala also provides an ExponentialWeight
for the purpose of time-discounting:
mezzala.DixonColes(
adapter=adapter,
weight=mezzala.ExponentialWeight(
epsilon=-0.0065, # Decay rate
key=lambda x: x['days_ago']
)
)
mezzala.DixonColes(
adapter=adapter,
# By default, only team strength and home advantage,
# is estimated:
blocks=[
mezzala.blocks.HomeAdvantage(),
mezzala.blocks.TeamStrength(),
mezzala.blocks.BaseRate(), # Adds "average goalscoring rate" as a distinct parameter
]
)
To add custom parameters (e.g. per-league home advantage), you need to add additional model blocks.