Team-strength models in Python
import json
import datetime as dt
import numpy as np
import pprint

Let's demo

with open('../data/premier-league-1516.json', 'r') as f:
    pl_1516 = json.load(f)

# Let's parse the dates, too
for match in pl_1516:
    match['date'] = dt.datetime.strptime(match['date'], '%Y-%m-%d')
    
pl_1516[0:3]
[{'date': datetime.datetime(2015, 8, 8, 0, 0),
  'team1': 'Manchester United FC',
  'team2': 'Tottenham Hotspur FC',
  'score': {'ft': [1, 0]}},
 {'date': datetime.datetime(2015, 8, 8, 0, 0),
  'team1': 'AFC Bournemouth',
  'team2': 'Aston Villa FC',
  'score': {'ft': [0, 1]}},
 {'date': datetime.datetime(2015, 8, 8, 0, 0),
  'team1': 'Leicester City FC',
  'team2': 'Sunderland AFC',
  'score': {'ft': [4, 2]}}]

A model in mezzala is composed of 2 parts:

  • Model blocks (see mezzala.blocks)
  • An adapter (see mezzala.adapters)

The model blocks determine which terms your model estimates. In general, you will want to estimate offensive and defensive strength for each team (TeamStrength) and as well as home advantage HomeAdvantage.

The selected model blocks can be supplied to the model as a list:

blocks = [TeamStrength(), HomeAdvantage()]

An adapter connects your model to the data source. In other words, it tells the model how find the information needed to fit.

The information needed is determined by which model blocks are used. In our case,

  • All models require home_goals and away_goals
  • TeamStrength - requires home_team and away_team

HomeAdvantage doesn't require any information, since it assumes all matches have equal home-field advantage by default.

adapter = KeyAdapter(               # `KeyAdapter` = data['...']
    home_team='team1',
    away_team='team2',
    home_goals=['score', 'ft', 0],  # Get nested fields with lists of fields
    away_goals=['score', 'ft', 1],  # i.e. data['score']['ft'][1]
)

Pulling this together, we can construct a model from an adapter and blocks

model = DixonColes(adapter=adapter, blocks=blocks)
model.fit(pl_1516)

# All estimates should be valid numbers
assert all(not np.isnan(x) for x in model.params.values())

# Home advantage should be positive
assert 1.0 < np.exp(model.params[HFA_KEY]) < 2.0

Let's inspect the parameters a bit. First, let's look at the boring (non-team) ones:

param_keys = model.params.keys()
param_key_len = max(len(str(k)) for k in param_keys)

for k in param_keys:
    if not isinstance(k, TeamParameterKey):
        key_str = str(k).ljust(param_key_len + 1)
        print(f'{key_str}: {np.exp(model.params[k]):0.2f}')
ParameterKey(label='Home-field advantage')           : 1.23
ParameterKey(label='Rho')                            : 0.94

And the team ones. Let's look at each team's attacking quality:

teams = {k.label for k in param_keys if isinstance(k, TeamParameterKey)}

team_offence = [(t, np.exp(model.params[OffenceParameterKey(t)])) for t in teams]
for team, estimate in sorted(team_offence, key=lambda x: -x[1]):
    print(f'{team}: {estimate:0.2f}')
Manchester City FC: 1.38
Tottenham Hotspur FC: 1.33
Leicester City FC: 1.31
West Ham United FC: 1.27
Arsenal FC: 1.25
Liverpool FC: 1.23
Everton FC: 1.16
Chelsea FC: 1.15
Southampton FC: 1.14
Manchester United FC: 0.94
Sunderland AFC: 0.94
AFC Bournemouth: 0.89
Newcastle United FC: 0.87
Swansea City FC: 0.82
Stoke City FC: 0.81
Watford FC: 0.78
Norwich City FC: 0.77
Crystal Palace FC: 0.76
West Bromwich Albion FC: 0.66
Aston Villa FC: 0.54
team_defence = [(t, np.exp(model.params[DefenceParameterKey(t)])) for t in teams]
for team, estimate in sorted(team_defence, key=lambda x: x[1]):
    print(f'{team}: {estimate:0.2f}')
Manchester United FC: 0.82
Tottenham Hotspur FC: 0.84
Leicester City FC: 0.86
Arsenal FC: 0.86
Southampton FC: 0.97
Manchester City FC: 0.99
West Bromwich Albion FC: 1.10
Watford FC: 1.17
Liverpool FC: 1.19
Crystal Palace FC: 1.19
Swansea City FC: 1.21
West Ham United FC: 1.22
Chelsea FC: 1.26
Stoke City FC: 1.28
Everton FC: 1.32
Sunderland AFC: 1.46
Newcastle United FC: 1.52
Norwich City FC: 1.55
AFC Bournemouth: 1.57
Aston Villa FC: 1.75

Making predictions for a single match

scorelines = model.predict_one({
    'team1': 'Manchester City FC',
    'team2': 'Swansea City FC',
})

# Probabilities should sum to 1
assert np.isclose(
    sum(p.probability for p in scorelines),
    1.0
)

scorelines[0:5]
[ScorelinePrediction(home_goals=0, away_goals=0, probability=0.0619999820129133),
 ScorelinePrediction(home_goals=0, away_goals=1, probability=0.03970300056443736),
 ScorelinePrediction(home_goals=0, away_goals=2, probability=0.018568356365315872),
 ScorelinePrediction(home_goals=0, away_goals=3, probability=0.005037154039480389),
 ScorelinePrediction(home_goals=0, away_goals=4, probability=0.0010248451849317163)]
outcomes = scorelines_to_outcomes(scorelines)

# MCFC should have a better chance of beating Swansea
# at home than Swansea do of winning away
assert outcomes[Outcomes('Home win')].probability > outcomes[Outcomes('Away win')].probability

list(outcomes.values())
[OutcomePrediction(outcome=Outcomes('Home win'), probability=0.658650484098139),
 OutcomePrediction(outcome=Outcomes('Draw'), probability=0.21019557218753862),
 OutcomePrediction(outcome=Outcomes('Away win'), probability=0.13115394371432296)]

Or for multiple matches

many_scorelines = model.predict([
    {'team1': 'Manchester City FC',
     'team2': 'Swansea City FC'},
    {'team1': 'Manchester City FC',
     'team2': 'West Ham United FC'}
])

What about a model with a different weighting method?

By default, the DixonColes model weights all matches equally. However, it's more realistic to give matches closer to the current date a bigger weight than those a long time ago.

The original Dixon-Coles paper suggests using an exponential weight, and we can use the same:

season_end_date = max(match['date'] for match in pl_1516)

weight = ExponentialWeight(
    # Value of `epsilon` is taken from the original paper
    epsilon=-0.0065,  
    key=lambda x: (season_end_date - x['date']).days
)
model_exp = DixonColes(
    adapter=adapter,
    blocks=blocks,
    weight=weight
)
model_exp.fit(pl_1516)
DixonColes(adapter=KeyAdapter(home_goals=['score', 'ft', 0], away_goals=['score', 'ft', 1], home_team='team1', away_team='team2'), blocks=[TeamStrength(), HomeAdvantage()]), weight=ExponentialWeight(epsilon=-0.0065, key=<function <lambda> at 0x11eecd158>)

How much does that change the ratings at season-end?

for k in sorted(param_keys, key=lambda x: x.label):
    key_str = str(k).ljust(param_key_len + 1)
    model_param = np.exp(model.params[k])
    model_exp_param = np.exp(model_exp.params[k])
    print(f'{key_str}: {model_param:0.2f} -> {model_exp_param:0.2f} ({model_exp_param/model_param:0.2f})')
OffenceParameterKey(label='AFC Bournemouth')         : 0.89 -> 0.88 (0.99)
DefenceParameterKey(label='AFC Bournemouth')         : 1.57 -> 1.61 (1.02)
OffenceParameterKey(label='Arsenal FC')              : 1.25 -> 1.25 (1.00)
DefenceParameterKey(label='Arsenal FC')              : 0.86 -> 0.85 (0.98)
OffenceParameterKey(label='Aston Villa FC')          : 0.54 -> 0.49 (0.91)
DefenceParameterKey(label='Aston Villa FC')          : 1.75 -> 1.83 (1.04)
OffenceParameterKey(label='Chelsea FC')              : 1.15 -> 1.20 (1.04)
DefenceParameterKey(label='Chelsea FC')              : 1.26 -> 1.16 (0.92)
OffenceParameterKey(label='Crystal Palace FC')       : 0.76 -> 0.70 (0.92)
DefenceParameterKey(label='Crystal Palace FC')       : 1.19 -> 1.25 (1.05)
OffenceParameterKey(label='Everton FC')              : 1.16 -> 1.02 (0.88)
DefenceParameterKey(label='Everton FC')              : 1.32 -> 1.33 (1.01)
ParameterKey(label='Home-field advantage')           : 1.23 -> 1.30 (1.05)
OffenceParameterKey(label='Leicester City FC')       : 1.31 -> 1.25 (0.95)
DefenceParameterKey(label='Leicester City FC')       : 0.86 -> 0.68 (0.79)
OffenceParameterKey(label='Liverpool FC')            : 1.23 -> 1.33 (1.08)
DefenceParameterKey(label='Liverpool FC')            : 1.19 -> 1.18 (1.00)
OffenceParameterKey(label='Manchester City FC')      : 1.38 -> 1.36 (0.98)
DefenceParameterKey(label='Manchester City FC')      : 0.99 -> 1.00 (1.01)
OffenceParameterKey(label='Manchester United FC')    : 0.94 -> 0.92 (0.98)
DefenceParameterKey(label='Manchester United FC')    : 0.82 -> 0.83 (1.01)
OffenceParameterKey(label='Newcastle United FC')     : 0.87 -> 0.93 (1.08)
DefenceParameterKey(label='Newcastle United FC')     : 1.52 -> 1.37 (0.90)
OffenceParameterKey(label='Norwich City FC')         : 0.77 -> 0.69 (0.90)
DefenceParameterKey(label='Norwich City FC')         : 1.55 -> 1.51 (0.97)
ParameterKey(label='Rho')                            : 0.94 -> 0.91 (0.97)
OffenceParameterKey(label='Southampton FC')          : 1.14 -> 1.26 (1.11)
DefenceParameterKey(label='Southampton FC')          : 0.97 -> 0.95 (0.98)
OffenceParameterKey(label='Stoke City FC')           : 0.81 -> 0.82 (1.01)
DefenceParameterKey(label='Stoke City FC')           : 1.28 -> 1.42 (1.11)
OffenceParameterKey(label='Sunderland AFC')          : 0.94 -> 0.99 (1.05)
DefenceParameterKey(label='Sunderland AFC')          : 1.46 -> 1.22 (0.84)
OffenceParameterKey(label='Swansea City FC')         : 0.82 -> 0.88 (1.08)
DefenceParameterKey(label='Swansea City FC')         : 1.21 -> 1.18 (0.97)
OffenceParameterKey(label='Tottenham Hotspur FC')    : 1.33 -> 1.34 (1.01)
DefenceParameterKey(label='Tottenham Hotspur FC')    : 0.84 -> 0.95 (1.12)
OffenceParameterKey(label='Watford FC')              : 0.78 -> 0.77 (0.99)
DefenceParameterKey(label='Watford FC')              : 1.17 -> 1.33 (1.14)
OffenceParameterKey(label='West Bromwich Albion FC') : 0.66 -> 0.60 (0.91)
DefenceParameterKey(label='West Bromwich Albion FC') : 1.10 -> 1.04 (0.94)
OffenceParameterKey(label='West Ham United FC')      : 1.27 -> 1.33 (1.04)
DefenceParameterKey(label='West Ham United FC')      : 1.22 -> 1.33 (1.09)