import json
import datetime as dt
import numpy as np
import pprint
Let's demo
with open('../data/premier-league-1516.json', 'r') as f:
pl_1516 = json.load(f)
# Let's parse the dates, too
for match in pl_1516:
match['date'] = dt.datetime.strptime(match['date'], '%Y-%m-%d')
pl_1516[0:3]
A model in mezzala
is composed of 2 parts:
- Model blocks (see
mezzala.blocks
) - An adapter (see
mezzala.adapters
)
The model blocks determine which terms your model estimates. In general, you will want
to estimate offensive and defensive strength for each team (TeamStrength
) and
as well as home advantage HomeAdvantage
.
The selected model blocks can be supplied to the model as a list:
blocks = [TeamStrength(), HomeAdvantage()]
An adapter connects your model to the data source. In other words, it tells the model how find the information needed to fit.
The information needed is determined by which model blocks are used. In our case,
- All models require
home_goals
andaway_goals
TeamStrength
- requireshome_team
andaway_team
HomeAdvantage
doesn't require any information, since it assumes all matches have equal home-field advantage by default.
adapter = KeyAdapter( # `KeyAdapter` = data['...']
home_team='team1',
away_team='team2',
home_goals=['score', 'ft', 0], # Get nested fields with lists of fields
away_goals=['score', 'ft', 1], # i.e. data['score']['ft'][1]
)
Pulling this together, we can construct a model from an adapter and blocks
model = DixonColes(adapter=adapter, blocks=blocks)
model.fit(pl_1516)
# All estimates should be valid numbers
assert all(not np.isnan(x) for x in model.params.values())
# Home advantage should be positive
assert 1.0 < np.exp(model.params[HFA_KEY]) < 2.0
Let's inspect the parameters a bit. First, let's look at the boring (non-team) ones:
param_keys = model.params.keys()
param_key_len = max(len(str(k)) for k in param_keys)
for k in param_keys:
if not isinstance(k, TeamParameterKey):
key_str = str(k).ljust(param_key_len + 1)
print(f'{key_str}: {np.exp(model.params[k]):0.2f}')
And the team ones. Let's look at each team's attacking quality:
teams = {k.label for k in param_keys if isinstance(k, TeamParameterKey)}
team_offence = [(t, np.exp(model.params[OffenceParameterKey(t)])) for t in teams]
for team, estimate in sorted(team_offence, key=lambda x: -x[1]):
print(f'{team}: {estimate:0.2f}')
team_defence = [(t, np.exp(model.params[DefenceParameterKey(t)])) for t in teams]
for team, estimate in sorted(team_defence, key=lambda x: x[1]):
print(f'{team}: {estimate:0.2f}')
Making predictions for a single match
scorelines = model.predict_one({
'team1': 'Manchester City FC',
'team2': 'Swansea City FC',
})
# Probabilities should sum to 1
assert np.isclose(
sum(p.probability for p in scorelines),
1.0
)
scorelines[0:5]
outcomes = scorelines_to_outcomes(scorelines)
# MCFC should have a better chance of beating Swansea
# at home than Swansea do of winning away
assert outcomes[Outcomes('Home win')].probability > outcomes[Outcomes('Away win')].probability
list(outcomes.values())
Or for multiple matches
many_scorelines = model.predict([
{'team1': 'Manchester City FC',
'team2': 'Swansea City FC'},
{'team1': 'Manchester City FC',
'team2': 'West Ham United FC'}
])
What about a model with a different weighting method?
By default, the DixonColes
model weights all matches equally. However, it's more realistic to give matches
closer to the current date a bigger weight than those a long time ago.
The original Dixon-Coles paper suggests using an exponential weight, and we can use the same:
season_end_date = max(match['date'] for match in pl_1516)
weight = ExponentialWeight(
# Value of `epsilon` is taken from the original paper
epsilon=-0.0065,
key=lambda x: (season_end_date - x['date']).days
)
model_exp = DixonColes(
adapter=adapter,
blocks=blocks,
weight=weight
)
model_exp.fit(pl_1516)
How much does that change the ratings at season-end?
for k in sorted(param_keys, key=lambda x: x.label):
key_str = str(k).ljust(param_key_len + 1)
model_param = np.exp(model.params[k])
model_exp_param = np.exp(model_exp.params[k])
print(f'{key_str}: {model_param:0.2f} -> {model_exp_param:0.2f} ({model_exp_param/model_param:0.2f})')