Extensions to xThreat

stats and snakeoil

2022-02-06

Since Karun Singh’s original introduction, Expected Threat (abbreviated xThreat or xT) has gained an increasing amount of traction and mindshare in the public soccer analytics sphere.

I would attribute this to a few main factors:

An excellent introductory blog post that clearly explains the motivations, mechanisms, and applications of the model
The provision of fitted parameters by Karun Singh (linked in the blog post)
An excellent open-source implementation by the KU Leuven Machine Learning Research Group that can use the fitted parameters

Altogether, this makes it possible for hobbyists and practitioners alike to use xThreat relatively easily when compared with competing models.

Nonetheless, I think there’s a few areas where xThreat could be tweaked and potentially improved, without changing the core idea too much. (This post does assume some familiarity with the xThreat model, so if this is new to you it’s probably worth reading Karun’s introduction.)

I have attempted to credit prior work where relevant; if I’ve missed anything please let me know so I can add any links.

1. Using different pitch divisions…

In general, xThreat implementations divide the pitch into uniform grid, as in this graphic from the socceraction documentation:

A pitch with a 12 by 8 grid overlaid

However, there’s nothing in the xThreat model that mandates this way of dividing up the pitch. Perhaps there are other divisions that could help…

… to better reflect player behaviour

Pitch markings, the laws of the game, and team tactics alter player behaviour around the pitch. The way these factors interact to impact player behaviour is not uniform across the pitch. For example, teams will often set their defensive line just above the penalty area. So should we reconsider splitting the pitch up uniformly in xThreat?

Perhaps the Juego de Posición pitch divisions could be a decent starting point?

More granular divisions in the attacking half of the pitch might be desirable. Which brings us to…

… to increase the resolution of xT estimates

xThreat includes a simple xG model, whereby the probability of scoring from a given zone is estimated. xThreat uses the embedded xG model to propagate value estimates to all zones on the pitch.

Therefore, the precision of the xThreat estimates depends on the precision of the xG model.

In the 12 by 8 pitch division, the zones in the attacking penalty area encompass a large range of different xG values. Getting a shot to the penalty spot tends to be much more valuable than getting one to just outside the box, but uniform pitch bins in xThreat treat these shots the same.

In general, we want to define the pitch zones such that the events within each zone are as similar to one another as possible. Can we increase the precision of the xT model by decreasing the number of bins within high-value areas?

Maybe something like… this?

Granted, this is not the prettiest way to divide the pitch. But it would divide the area where most high value shots come from more equitably than uniform binning.

2. Use an xG model

As mentioned, xThreat embeds a simple xG model within it. But, we already have quite sophisticated xG models that work well. So why not use an existing xG model to assign the initial value of shots from a given zone?

There’s a few ways one could incorporate the information from xG models, but the simplest would be to take the average shot xG within a given location as the value of a given shot.

3. Introduce turnover states

The basic xThreat model estimates the probability of moving or shooting from each zone on the pitch. However, we could add an additional set of turnover states. In other words, we also estimate the chance of giving the ball to the opposition in each of the pitch zones.

This allows us to calculate xT for and against. The xThreat against (xRisk?) at a given location would be the probability of turning over the ball multiplied by the xThreat from that zone, for each pitch zone.

Having xThreat values for and against would bring xT closer to possession value models like OBV, VAEP, and G+.

Rob Hickman introduced the idea of incorporating opposition threat to xThreat in his 2019 talk, “Considering Defensive Risk in Expected Threat Models”.

Taking each of these ideas to their logical conclusions ultimately leads towards more sophisticated possession-value models like VAEP, G+, or OBV. However, these models are significantly more complex and are more difficult to develop and maintain than Expected Threat.

Small tweaks like the ones listed above may be able to improve xThreat while keeping the model complexity manageable, and keeping the benefits of all the public work that has gone into xThreat so far.