Retirement Planning

I have constructed an excel file that can be used in planning for your personal company-run DC pension fund contributions, as well as other factors related to retirement planning (you can download the file above).

A DC (Defined Contribution) pension is the type of pension scheme that you are most likely to be a member of. It is essentially a pot of money that you (and often, your employer) will make contributions to in order to fund your retirement. These contributions are typically paid as a % of your monthly salary.

The government likes to encourage you to invest in your pension in order to relieve burden from the state pension. To do this, one of the measures in place is that pension contributions are tax exempt, up to certain limits (see below).

AgePercentage Limit
Under 3015%
30 – 3920%
40 – 4925%
50 – 5430%
55 – 5935%
60 or over40%

This file allows you to specify the pension contributions made by you (the employee) and your employer at these age intervals. In many cases your employer will make a base rate contribution at each of these age intervals, and then match your contributions up to a certain limit. This will vary by employer.

The current (as of 2021) qualifying age for state pensions is 67 and will increase to 68 in 2028. My file has set this age at 68, but depending on your age, this can be easily adjusted.

It is very common in retirement funds to slowly transition your investments away from equities, which can be more volatile, towards bonds, which provide a safer, but lower return. In this file there is a table where you can specify the ages at which you would like to transition in increments of 25% of fund value. You can also specify the average return expected from your equity and bond investments depending on your risk appetite.

The aim in allowing for these specific return %s is that they are assumed to be the return above inflation, this way we can think about all of these figures in terms of today’s money value.

The main outputs to keep an eye on when adjusting your inputs in this file are (1) the graph at the top, which provides a helpful indicator for how the variables change your pension fund value over time, and (2) the age at which your pension fund is expected to run dry. 

The above means that this file assumes your pension fund will act as an ARF (Approved Retirement Fund). This is a personal retirement fund where you can keep your money invested up to and after retirement. You could, however, ignore this age as you may take your pension fund and purchase an annuity at retirement. An annuity is a contract with a life insurance company that will pay you a regular income for the rest of your life in return for paying a large fixed sum of money at the beginning of the contract.

The hardest part of starting your pension is not knowing what contributions or other factors may be enough, too little, or too much. This file aims to help you to get a grasp of how each of these decisions that you have to make will affect your overall retirement fund. Hopefully this will help you to make these decisions with some confidence in how this will affect your future.

[Disclaimer: Don’t make financial decisions based solely on what you read here. Consult a financial advisor before making any big decisions in relation to your financial future.]

Simple Investment Modelling Algorithms

During my studies, in certain modules there are calculations required related to financial investment instruments such as, for example, pricing a forward contract. Some of these calculations involve the implementation of relatively complicated formulas and a lot of careful manipulation on paper. I found that a lot of these processes are very repetitive and could be automated with the use of algorithms.

I’m going to share here some of my code I developed for solving these questions and for verifying my answers. The best use for this code is in an educational context as the problems I am working with, and the questions they come from, are derived from a university course and may or may not have great value in an actual investing context, because of course, much more complicated algorithms are already in use. However, simple problems involving finding the present value of a bond are in almost every business/accounting/finance/investing course out there, so while practicing these questions I found it very useful to be able to verify my answers, and in turn gain a more intuitive understanding of how such values can change drastically when the parameters of a problem are altered slightly.

The problems I will show here are as follows: finding the present value of a bond, of a swaption, of the expected loss over the lifetime of the corporate bond, pricing a forward contract, constructing a duration-immunising portfolio from two bonds, pricing a European put option contract, extending the LIBOR curve and using the extended LIBOR rates to find future forward rates. I have also included a Google Drive link where you can find my written solutions to these questions which may be useful if you are undertaking similar studies. My code is written in the statistical programming language R, the software to run this code is free to download at https://www.r-project.org or a cloud based option (my preferred method) is available at https://rstudio.cloud/ also for free. I will include my code below, but .txt files are also available in my Google Drive link.

 

1 – Finding the Present Value of a Bond

A typical question you can find asking for the present value of a bond could be:

A 3-year bond is issued with face value €1,000 that has coupon rate of 5% p.a., paid semi-annually. Assuming a flat yield curve of 4%p.a. for all maturities, find the cash value of the bond on that date, using continuous compounding.

The code sets up the discounting options (continuous vs p times per year) each as a function which can be called later. The “time” variable is created as a sequence of values which represent points in time where your coupons will be paid to you, given the parameters of the question. The value of the coupons are calculated below this. The code then initialises the value of the bond at time 0 (B_0) to be worth €0, and then in a for loop, the value will incrementally increased based on the discounted coupon values until the each time point has been accounted for. And finally adding the bond’s face value to the discounted coupons you arrive at the present value of the bond.

Using the below code, all you need to do is enter is the duration of the bond (3 years), the face value (1000), the coupon rate (0.05), the payments per year (2), the yield (0.04) and below that the type of compounding, in this case, continuous (cont).

A written solution can be found here: Simple Investment Modelling Algorithms

# PV of a Bond

# Find the cash value of the bond
T = 3     # T year bond
FV= 1000  # Face Value
c = 0.05  # coupon
p = 2     # payments per year (annually = 1, semiannually = 2, etc)
y = 0.04  # yield

# Setting up the discount options
cont = function(i){exp(-y*i)}      # Continuous Compounding
ptimes = function(i){1/(1+y/p)^i}  # P times per year Compounding

# "cont" vs "ptimes" - select which type of compounding is being used
disc = cont   

# Formula to calculate the value
time = seq(1/p, T, by=1/p)
coupon = (FV*c)/p
coupon
B0 = 0
i = 0
for(i in time){
  B0 = B0 + coupon*disc(i)
}
B0 = B0 + FV*disc(i)
B0  

# B0 is the present value of the bond
# (in this case, €1026.861)

 

2 – Finding the Present Value of a Swaption

A typical question you can find asking for finding the present value of a swaption could be:

Find the present value of a swaption that gives the holder the right to enter into a 5-year annual-pay swap in 1 year where a fixed rate of 1.4% is paid and LIBOR is received. The swap principal is €200,000. Assume the yield curve is flat at 1.2% per annum, the forward swap rate is 1.6% and its volatility is 30%.
Note: the present value of a swaption, in the usual notation, is:

LA[s_0\Phi(d_1) − s_K\Phi(d_2)],  \newline
where \quad d_1 = \frac{ln(s_0/s_K) + \sigma \frac{2T}{2}}{\sigma \sqrt{T}}, \newline
d_2 = d_1 − \sigma\sqrt{T} \quad and \quad A = \sum^{5}_{i=1}P(0, T_i), \newline
where \: T_i \: are  \: the  \: swap \:  dates.

We can see this type of question could cause a lot more problems when working out a solution on paper, however in terms of the code required to solve it, it doesn’t get much trickier than the previous example. The formula for finding the present value of a swaption includes a parameter associated with the value of an equivalent annuity, this is calculated in a similar way to the previous example and is denotes by the variable A. Calculating d_1 and d_2 is relatively simple when you know which values in the question are associated with s_0, s_k, etc. Then using these d values, we can use the pnorm() function which calculates the cumulative density function of the normal distribution. Plugging these values into the final formula gives the present value of a swaption.

Using the below code, again, all you need to do is enter associate the values from the question to their variables labelled in the first section of the code.

A written solution can be found here: Simple Investment Modelling Algorithms

# Present Value of a Swaption

t = 5             # The length of the contract
T = 1             # When you will enter into the swap
sk = 0.014        # Rate to be paid
L = 200000        # Principle
y = 0.012         # Risk free rate ("yield curve")
s0 = 0.016        # Forward Swap Rate
vol = 0.30        # Volatility


# Annuity
A = 0
time = seq(T+1, T+t, 1)
for(i in time){
    A = A + 1*exp(-y*i)}
A

d1 = (log(s0/sk) + 0.5*vol^2*T) / (vol*sqrt(T))
d2 = (log(s0/sk) - 0.5*vol^2*T) / (vol*sqrt(T))
d1
d2

# calculating the normal cdfs
phi1 = pnorm(d1)
phi2 = pnorm(d2)
phi1
phi2

PV = L*A*(s0*phi1 - sk*phi2)
PV

# PV = Present Value of the Swaption
# (in this case, €2822.835)

 

3 – Pricing a Forward Contract

A typical question asking you to find the price of a forward contract could be:

On May 7th, 2013, a 4-year bond is issued with face value €5,000 that has coupon rate of 2% p.a., paid annually. Assuming a flat yield curve of 1.2%p.a. for all maturities, find the cash value of the bond on that date.
Find the forward price in a long forward contract, with maturity three years later, i.e. a contract to buy the bond on May 7th, 2016.
Find the value of this forward contract two years later, on May 7th, 2015. Assume on that date the yield curve is flat at 0.8%p.a.
On that same day, the investor wants to unwind the contract, so she enters into an offsetting position by taking a short position in a new forward contract on offer to sell the bond on May 7th, 2016. Find the forward price of the new contract. Find the combined value of her position in both contracts on the maturity date of the contracts, May 7th, 2016.
Useful formula: the value of a forward contract during its lifetime is:

B(t) - I(t) - F_0 P(t, \tau )

To begin a question like this we must first find the present value of the bond at time 0 (2013) in the same way as in question 1. Then you find the forward price of the bond in 3 years (maturity three years), here we use an intuitive formula to find the forward price of a bond (see written solution) where we take into account the coupons we will miss out on in the first 3 years of the bonds life. To find the value of the contract 2 years later (2015) we first find the value of the bond at time 2 (2 years later) and account for the 1 coupon you will miss out on between year 2 and year 3. Using these values we can then find the value of the contract during the life of the contract. Finding the forward price of the new contract used to offset the first contract is relatively simple and uses the intuitive formula discussed before. Finding the combined value of the contracts simply means taking the forward price of the new contract and taking away the forward price of the original contract.

You are prompted in the code below when a new parameter is needed to solve this question.

A written solution can be found here: Simple Investment Modelling Algorithms

# Pricing a Forward Contract

# Find the cash value of the bond
T = 4      # T year bond
FV= 5000   # Face Value
c = 0.02   # coupon
p = 1      # payments per year (annually = 1, semiannually = 2)
y = 0.012  # yield

time = seq(1/p, T, by=1/p)
coupon = (FV*c)/p
coupon
B0 = 0
i = 0
for(i in time){
  B0 = B0 + coupon*exp(-y*i)
}
B0 = B0 + FV*exp(-y*T)
B0  

# Find the forward price in a long forward contract, with maturity **X** years later,
x = 3 # number of "years later"

time = seq(1/p, x, by=1/p)
time
I = 0
i = 0
for(i in time){
  I = I + coupon*exp(-y*i)
}
I  # this is the value of the coupons you will miss in the "x" years
disc = exp(-y*x)  # discount factor for the "x" years
F0 = (B0 - I)/disc
F0

# Find the value of this forward contract **t** year later, on __ __th ____. 
# Assume on that date the yield curve is flat at **y2**%p.a.
t = 2
y2 = 0.008

# Bt = discounted coupons and FV starting at time t 
# (similar to B0 calc, but youre missing the coupons up to time t)
time = seq(1/p, T-t, by=1/p)
time
Bt = 0
for(i in time){
  Bt = Bt + coupon*exp(-y2*i)
}
Bt = Bt + FV*exp(-y2*(T-t))
Bt

# I will miss out on the coupons issued between now (time t) and when I buy it (time x)
# I will miss out on the coupons issued for this amount of time: **(x - t)** 
It = 0
time = seq(1/p, x-t, by=1/p)
time
for(i in time){
  It = It + coupon*exp(-y2*i)
}
It 

disc = exp(-y2*(x-t))
Value = Bt - It - F0*disc
Value

# Forward price of the new contract
Ft = (Bt - It)/disc
Ft

# Combined Value of her position 
CombValue = Ft - F0
CombValue

# Combined Value = €20.19703 in this example)

# Optional check
# Value*exp(y2*(x-t)) - CombValue
# should = almost zero

 

4 – Finding the Present Value of the Expected Loss over the Lifetime of a Corporate Bond

A typical question on this topic could be:

A 4-year corporate bond, with face value €100, provides a coupon of 1.6% per year, payable annually, and has a yield of 1%. The risk-free yield curve is flat at 0.7% p.a.
Find the present value of the corporate bond and the corresponding risk-free bond with the same payments; hence find the present value of the expected loss over the lifetime of the corporate bond.
Assume that defaults of the corporate bond can only occur at the end of each year, just prior to each coupon payment, and that the recovery amount is €35. Estimate the risk-neutral default probability q per year, assuming it is the same each year.

To find the present value of the corporate bond, you calculate this in the usual way using 1% as your discount rate. Then calculate the present value of the risk-free bond, only this time use 0.7% as your discount rate. The present value of the expected loss is the PV of the riskless bond minus the PV of the corporate bond.
Then taking this further, we will calculate the expected loss at different time points over the lifetime of the bond. To do this, as it says in the question, we assume that defaults only occur at the end of each year, just prior to each coupon payment. Using a recovery amount of €35, we will estimate the risk-neutral probability of default for each time point. To do this, we find the value of the bond just prior to each coupon payment. My code brings you up to this point, you will then enter these values into a table and using discount rates you can find the present value of the expected loss times q (the probability of default) for each time point. Taking the total value of the expected loss times q along with the original value for the expected loss found before, we can solve for q, the probability of default each year.

A written solution can be found here: Simple Investment Modelling Algorithms

# Present Value of Expected Loss

# Find the cash value of the Corporate bond
T = 4       # T year bond
FV= 100     # Face Value
c = 0.016   # coupon
p = 1       # payments per year (annually = 1, semiannually = 2)
y = 0.01    # yield

time = seq(1/p, T, by=1/p)
coupon = (FV*c)/p
coupon
Bc0 = 0
i = 0
for(i in time){
  Bc0 = Bc0 + coupon*exp(-y*i)
}
Bc0 = Bc0 + FV*exp(-y*T)
Bc0  
# PV of Corporate Bond

# Find the cash value of the Risk Free bond
y = 0.007    # The risk-free yield

time = seq(1/p, T, by=1/p)
coupon = (FV*c)/p
coupon
Bf0 = 0
i = 0
for(i in time){
  Bf0 = Bf0 + coupon*exp(-y*i)
}
Bf0 = Bf0 + FV*exp(-y*T)
Bf0  
# PV of Risk Free Bond

# Present Value of Expected Loss
round(Bf0,2) - round(Bc0,2)

# We need to find the value of the RISKLESS BOND 
# just before these default dates
A = 35 # this is the "recovery amount"
y = 0.007     # yield
# this code calculates the value of the Riskless Bond just 
# BEFORE the default date that you will mark as **t**
t = 2  # change this number every time
# this wont work on the last run (when T = t) in this case, 
# Bft should = coupon + FV

time = seq(1/p, T-t, by=1/p)
time 
coupon = (FV*c)/p
coupon
Bft = coupon
i = 0
for(i in time){
  Bft = Bft + coupon*exp(-y*i)
}
Bft = Bft + FV*exp(-y*(T-t))
Bft 

 

5 – Constructing a Duration-Immunising Portfolio from Two Bonds

A typical question asking you to construct a duration-immunising portfolio from two bonds could be:

Assume a yield curve for all maturities of 1.1% p.a. You have a liability of €100,000 to be paid in seven years time. You wish to immunise this debt using two available bonds: a 5-year €100 zero-coupon bond and a 10-year €100 zero-coupon bond. Construct a duration-
immunising portfolio from these bonds (fractions of bonds are permitted). If after exactly two years, all rates fall by 0.3%, find the value of your bond holding (i) using a linear approximation in the rate change and (ii) exactly. Compare these with the new exact value of the liability.

Here you have a liability and a time when it has to be paid, you can create a portfolio consisting of a combination of these two bonds available, such that liability of your portfolio matches the liability that must be paid, in the same way the the duration of this portfolio matches the time at which the liability has to be paid. To do this you allow your liability to equal the present values of your bonds, each bond is multiplied by a constant denoting how many of each bond you will purchase, this value is unknown. Similarly, allow the duration of your liability equal the duration of your bonds, again each multiplied by the amounts. You can then take these simultaneous equations and solve them to find how many of each bond must be purchased to immunise your liability. My code finds these amounts for you.
Market conditions are susceptible to change overt this time span so we then consider the case where after two years all rates drop by 0.3%. We will find the value of our portfolio in this case using two methods. The first is a linear approximation in the rate of change using a formula used to calculate the change in the value of a bond by taking the negative duration multiplied by the change in yield multiplied by the value of the bond, this is denoted: ΔB = -D(Δy)B
The second finds value of our portfolio by recalculating using the new yield.
Both of these methods involve a lot of manipulation of equations, my code below solves for the value of the portfolio in both cases.

A written solution can be found here: Simple Investment Modelling Algorithms

# Duration Imunising Portfolio

y = 0.011    # risk free rate
# Liability of **L** to be paid in **T** years
L = 100000
T = 7

# 2 bonds available
# Bond 1:
t1 = 5    # length of bond 1
FV1 = 100
# Bond 2:
t2 = 10
FV2 = 100


# Present Value of Liability (then PV of Bond 1 and Bond 2)
PVL = L*exp(-y*T)
PVL
PVB1 = FV1*exp(-y*t1)
PVB2 = FV2*exp(-y*t2)
PVB1
PVB2

# Duration of Liability (then D of Bond 1 and Bond 2)
DL = L*T*exp(-y*T)
DL
DB1 = FV1*t1*exp(-y*t1)
DB2 = FV2*t2*exp(-y*t2)
DB1
DB2

# Then you should manipulate these simultaneous equations until 
# you find the number of Bond 1's and Bond 2's you need.
# This code will give you the answer

# Number of Bond 2's to buy
NoB2 = (DL - (DB1*PVL)/PVB1) / (DB2 - (DB1*PVB2)/PVB1)
NoB2
# Number of Bond 1's to buy
NoB1 = (PVL - PVB2*NoB2) / PVB1
NoB1




# after exactly **x** years, all rates FALL by **y1%** 
x = 2
y1 = 0.003

# Linear approx. of rate of change
#Bond price will rise by:
rise = (DL/PVL)*y1 
rise  # now write this as a percentage
( NoB1*FV1*exp(-y*t1) + NoB2*FV2*exp(-y*t2) ) * (1+rise)
# this is the value of the bond holding using linear approx rate of change

# Exactly
newy = y - y1
NoB1*FV1*exp(-newy*t1) + NoB2*FV2*exp(-newy*t2) 
# exact value of bond holding using the new rate

 

6 – Pricing a European Put Option Contract

A typical question asking you to price a European put option contract could be:

Consider a 2.5-year coupon bond with a face value of €100 and a coupon rate of 4% paid
semi-annually. The zero curve is flat at 3.6%. Find the present value of the bond.
Consider a European call option on this bond with strike price €98 and expiration date in one year. The forward yield volatility is 60%. Find the present value of the interest paid during the life of the option and hence calculate the forward price of the bond in one year.
Find the duration of the bond option at maturity (i.e. in one year) and hence find the bond price volatility at that time.
Using Black’s model, calculate the present value of the call option.
Some relevant formulae are:

c = P(0,T)[F_B\Phi(d_1)-K\Phi(d_2)], \newline
where \; d_1 = \frac{ln(\frac{F_B}{K})+\frac{\sigma^2_BT}{2}}{\sigma_B\sqrt{T}} \newline
and \; d_2 = d_1 - \sigma_B \sqrt{T}\newline
\sigma_B = Dy\sigma_y

Of course we find the present value of the bond in the usual way. The present value of the interest paid is equivalent to the present value of the coupons missed, we will miss two coupons in the first year of the life of the bond. We then use these two values to find the forward price of the bond in one year. To find the duration of the bond option at maturity (of the option) we recognise that at this time there will be 1.5 years still left until the maturity of the bond. We use this knowledge in applying the duration calculation (see written example). Using this duration value, we plug this into the formula to find the bond price volatility (σB​=Dyσy). Taking all of this together, we can input these values into Black’s model to find the present value of the call option c (see formula above).

A written solution can be found here: Simple Investment Modelling Algorithms

# Pricing a European Put Option Contract

# Bond 1
t = 2.5      # duration of bond
c = 0.04     # coupon rate
FV = 100     # Face Value
y = 0.036    # risk free rate ("the zero curve is flat at __%")

# European Call Option / Bond 2 
k = 98         # strike price
T = 1          # expiration date of contract / duration of second bond
sigmay = 0.60  # forward yield volatility

# other variables
p = 2      # payments per year (annually = 1, semiannually = 2)


# PV of the Bond
time = seq(1/p, t, by=1/p)
coupon = (FV*c)/p
coupon
B0 = 0
i = 0
for(i in time){
  B0 = B0 + coupon*exp(-y*i)
}
B0 = B0 + FV*exp(-y*t)
B0  


# Find the present value of coupons paid during the life 
# of the option and hence find the forward price,
x = T # length of option

time = seq(1/p, x, by=1/p)
time
I = 0
i = 0
for(i in time){
  I = I + coupon*exp(-y*i)
}
I  # this is the value of the coupons you will miss in the "x" years
disc = exp(-y*x)  # discount factor for the "x" years
F0 = (B0 - I)/disc
F0 
# forward price


# Find the duration of the bond option at maturity
time = seq(1/p, t-T, by=1/p)
time
coupon = (FV*c)/p
coupon
top = 0
i = 0
for(i in time){
  top = top + i*coupon*exp(-y*i)
}
top = top + (t-T)*FV*exp(-y*(t-T))
top  
bottom = 0
i = 0
for(i in time){
  bottom = bottom + coupon*exp(-y*i)
}
bottom = bottom + FV*exp(-y*(t-T))
bottom  
Duration = top/bottom
Duration


# find the bond price volatility at this time
sigmab = Duration*y*sigmay
sigmab


# use Black's Model to find the present value of the call option
d1 = (log(F0/k) + 0.5 * sigmab^2 * T) / sigmab*sqrt(T)
d2 = (log(F0/k) - 0.5 * sigmab^2 * T) / sigmab*sqrt(T)
d1
d2

phi1 = pnorm(d1)
phi2 = pnorm(d2)

c = exp(-y*T)*(F0*phi1 - k*phi2)
c
# c = the present value of the call option

 

7 – Extending the LIBOR Curve and Using the Extended LIBOR Rates to Find Future Forward Rates

A typical question on this topic could be:

Use six-monthly compounding for all rates. The 6-month LIBOR spot rate is 0.4% and 12-month LIBOR spot rate is 0.6%. A bank trades swaps where a fixed rate of interest is exchanged for 6-month LIBOR, with payments being exchanged biannually. The 18-month and 24-month swap rates are 0.8% and 1.2% per annum. Estimate the 18-month and 24-month LIBOR rates. (Hint: construct par value bonds with coupon rates equal to the relevant swap rates). Using the extended LIBOR rates, find the 6-month forward rates starting in 6, 12 and 18 months.
Using these, show that the present value of 18-month swap is zero.

To find the LIBOR estimates, we construct par value bonds with coupon rates equal to the relevant swap rates, this involves some manipulation on paper (see written solutions), but my code below is configured to find these estimates. Finding the 6-month forward rates for the three dates is done using a formula which represented in the code below. See written solutions to see how you can show that the present value of 18-month swap is zero.

A written solution can be found here: Simple Investment Modelling Algorithms

# Extend the LIBOR Curve

L6  = 0.004  # 6 month LIBOR
L12 = 0.006  # 12 month LIBOR
S18 = 0.008  # 18 month Swap
S24 = 0.012  # 24 month Swap

c1 = S18*100
c1
c2 = S24*100
c2

# Construct par value bonds with coupons equal to the 
# relevant Swap rates

# 1.5 year LIBOR estimate (18 month LIBOR est)
L18=(((100+c1/2)/(100-(c1/2)/(1+L6/2)-(c1/2)/(1+L12/2)^2))^(1/3)-1)*2
L18 
# this is your 1.5 year LIBOR est 

# 2 year LIBOR estimate (24 month LIBOR est)
xx = 100-(c2/2)/(1+L6/2)-(c2/2)/(1+L12/2)^2-(c2/2)/(1+L18/2)^3
L24=(((100+c2/2)/xx)^(1/4)-1)*2
L24


# Using the extended LIBOR rates, find the 6-month 
# forward rates starting in 6, 12 and 18 months.
# Using these, show that the present value of 18-month swap is zero.

# 6-month forward rates starting in 6 Months
F6 = ( ( ((1+L12/2)^2) / ((1+L6/2)^1) ) -1 )*2
F6

# 6-month forward rates starting in 12 Months
F12 = ( ( ((1+L18/2)^3) / ((1+L12/2)^2) ) -1 )*2
F12

# 6-month forward rates starting in 18 Months
F18 = ( ( ((1+L24/2)^4) / ((1+L18/2)^3) ) -1 )*2
F18

 

Mo Salah – Beneath The Headlines

Mohamed Salah, winner of the Golden Boot, holds the record for ‘Most goals scored in a 38-game Premier League season for 2017/18’ at 32 goals. This is just 2 goals away from the all-time record.

With 36 games and 32 goals, this gives Salah a very impressive goal-to-game ratio of 0.89, the highest ratio of the 50 top scorers. So his record clearly sets him apart as the the best scorer, right?

 

 

 

 

Here are the top 10 Premier League goal-scorers that season:

Rank Last
Name
First
Name
Team Games
Played
Games
Started
Minutes
Played
Goals Assists Shots
on Goal
Total
Shots
1 Salah Mohamed LIV 36 34 2921 32 10 67 143
2 Kane Harry TOT 37 35 3083 30 2 76 184
3 Aguero Sergio MCI 25 22 1969 21 6 42 95
4 Vardy Jamie LEI 37 37 3255 20 1 35 69
5 Sterling Raheem MCI 33 29 2592 18 11 36 89
6 Lukaku Romelu MUN 34 33 2869 16 7 43 86
7 Firmino Roberto LIV 37 32 2777 15 7 38 84
8 Lacazette A ARS 32 26 2212 14 4 35 68
9 Gabriel Jesus MCI 29 19 1671 13 3 31 53
10 Hazard Eden CHE 34 28 2433 12 4 37 71

In this analysis, I took raw data from the top 50 goal scorers of that season taken from  Fox Sports. I cleaned the data and it is available in a notepad file here. I then imported this into SPSS to analyse.

 

Salah primarily plays on the right wing, this position allows him to cut into the centre where he can shoot on goal with his speed and dribbling skills or play quick passes to his capable teammates. Salah has praised his manager Jürgen Klopp for letting him in more central positions, saying “I play closer to the goal than any club before.” He has even been nicknamed the “Egyptian Messi” by Italian media.

Salah’s accolades include:

  • PFA Players’ Player of the Year: 2017–18
  • FWA Footballer of the Year: 2017–18
  • Premier League Golden Boot: 2017–18
  • Premier League Player of the Season: 2017–18
  • PFA Team of the Year: 2017–18 Premier League
  • Liverpool F.C. Player of the Season: 2017–18
  • Liverpool F.C. Players’ Player of the Season: 2017–18

These are all great achievements and qualities and I don’t deny any of them. However I don’t feel that the data show a player as consistently remarkable as the headlines would suggest.

 

Simple Stats

For instance, when looking at his total shots in the top 50 list, that number is considerably larger than most other players. And when you take his goal to shot ratio you get 0.22, meaning he scores 0.22 goals per shot taken on average. This puts him at #9 in this list of the top 50 goal scorers.

Then taking his ratio for goals to shots on goal, this gives 0.48, so nearly half of his on-target shots end up as a goal. This might seem impressive, but this paces him as #12 on the list when this is the deciding variable. Even when dividing shots on goal by total shots puts him at #13 for his shot accuracy.

 

Pearson’s
Correlation = 0.776

 

 

 

 

 

Pearson’s
Correlation = 0.856

 

 

 

We can see from the scatter plots that the records by Kane and Salah, the extreme cases in these graphs, are simply a continuation of this trend.

It’s still hard to break through and make the opportunity to take these shots, but other players seem to be more precise in their use of their shots. I think this shows that there are many players that display a more consistent ability to turn opportunities into goals.

It’s possible that Salah has this accolade of top scorer more due to the amount of scoring opportunities his team have allowed for him rather than because of his superior talent.

I would argue that Salah is not an outlier, but rather an extreme value of a predictable trend.

 

 

 

A Deeper Look

I then used the data for ‘Goals to Shots-on-Goal’ data to model a linear regression.

This gave the formula for expected number of goals scored from our independent variable, SOG (Shots-on-Goal).

exp(Goals) = −0.239 + 0.389 ∗ SOG

When layered on our graph we get this:

We can hold this to be appropriate given our regression test assumptions are valid:

  • Normality – the residuals should be normally distributed.
  • Homogeneity of variance (homoscedasticity) – the error variance should be constant.
  • Independence – the errors associated with one observation are not correlated with the errors of any other observation.

 

Regression residuals (the
difference between the
predicted and observed
values) are normally
distributed above and
below the line.

 

 

 

Variance of errors
remain constant, does
not grow or shrink.

 

 

  

 

Independent as errors
show a random patter
about the zero line.

 

 

When we look back at the graph with the regression line; Salah, with 32 goals, is above the line. Therefore, he has scored more goals than the model has predicted given the amount of shots-on-goal he has taken. However, he is not the furthest above the line given his position.

Salah, having taken 67 SOG (Shots-on-Goal), should have scored 25.824 goals by this prediction. This means he scored 23.9% more goals than expected.

Sergio Aguero, with 21 goals and 42 SOG, was expected to score 16.1 goals, outperforming this model by 30.4%.

Wayne Rooney took only 17 shots-on-goal and amassed 10 goals, scoring 56.9% more than the expected number of only 6.37 goals. Almost double what Salah beat the model by. This may seem like a sample with too few goals to be significant. So I give you:

Jamie Vardy, with 20 goals to his name, taking 35 SOG out-scored his expected value by a very impressive 53.8%.

 

 

I wanted to show you that the raw number of goals a player scores in a season is just one very small measure of overall performance. It is a much more accessible, and easier to understand statistic, but there are so many ways to quantify every little aspect of how a player or team have played. And what I have done here is only a small subset of this.

I would urge you to do your own research and investigate statistical claims, because there is always much more beneath the surface headlines.