Well I'm glad you should ask. The website Rugby4Cast started formally in late 2017, for the Autumn Internationals that year, but had existed in some form or another for at least a few years before that.
Graeme - our Glorious Visionary and Founder, the First Sports4Caster - had started tracking International rugby results as early as 2014 to see how the rankings fluctuated over time and built a spreadsheet into which he could enter future imaginary results and see how the rankings would look should they come to pass. As a Scot, he particularly enjoyed entering imaginary Scottish Grand Slams and seeing the corresponding rise through the rankings. And every year he was always bitterly disappointed when, 3 minutes into the first match of the Six Nations, it became immediately clear that this year was not their year. Perhaps next year then ...
However, from this data was collected and before long some basic stats about histories and form could be generated. It was a short step from these to generate some rudimentary predictions, and Algorithm 0.0 was born. Crude, inaccurate and with a curious tendency for predicting Scottish wins, it nevertheless served its purpose and lit the fire and desire for improvement.
Over the course of the next few years, the predictions process was refined. Essentially it looked at a blended measure of average scoring patterns for the both sides; home, away and in head to head matches and then adjusted for differences in official World Rankings. This was all still completed within Excel. This worked relatively well, and predictions were circulated locally amongst friends and family just for fun.
Then came the big breakthrough. After a night in the pub in 2017 discussing the upcoming Autumn Internationals it was suggested that this should be turned into a website, and the results and predictions circulated from there rather than via hundreds of WhatsApp and email messages. In the spirit of taking things too far, the website was launched the very next morning.
In those days, the website was just a blog, into which pieces were written and basic .jpg pictures of the predictions were uploaded each week. This was enjoyable, but fairly time consuming as the majority of the data and website updates had to be completed manually. As such, content was limited just to Tier 1 Internationals. But all that was to change ...
In Spring 2018 Graeme discovered Python. To be perfectly honest, he claims he can't remember how or why he came to it, but for some reason he downloaded it in around May 2018. He recalls:
To be perfectly honest I'm not sure how or why I've done this, but for some reason I've downloaded Python.
Graeme - around May 2018
It was a swift love affair. Data was uploaded and rudimentary predictions were produced shortly thereafter. All sorts of other possibilities opened up. Webscraping became possible, and results were scraped from the web to build the first Rugby4Cast database, including club matches as well as internationals.
By sheer good fortune Graeme's first son arrived around the same time and he had planned, from June 2018 to take 4 or so months paternity leave. This time was put to good use and over the course of those months Algorithm 2.0 was built - the first fully functioning program that scraped results, updated the databases, calculated all the associated metrics and variables that formed the basis of the machine learning features, and then produced predictions. This went live in around August 2018.
[N.B Graeme would like to clarify that his son was also well cared for throughout this period, and was still alive and well at the launch of Algorithm 2.0.]
Alex joined the team around the same time in summer 2018. Graeme had been grooming him for several months and finally managed to convert just before the Rugby Championship 2018. Alex joined as the technical website consultant and with the initial task of converting the back-end of the website into a proper database that could be updated straight from output from the new Algorithm 2.0. Alex took to the task with gusto, and before long Algorithm 2.0 and Website 2.0 were chatting happily together, with the former updating the latter with new predictions and results each week.
It's worth mentioning I wasn't even first choice for the web role, but that he went AWOL for about 6 months and Graeme reluctantly had to ask me instead.
However, he convinced me to get involved over a kick-off meeting where we had a lovely pizza and beer at Pizzeria Mama Mia in Oxford - I had the Mama Mia special, which comes with a fried egg in the middle - and so here I am 2 years later.
Alex - reflecting fondly on his time at Rugby4Cast
In October 2018 the European Club leagues began, and due to the streamlined process perfected with Algorithm 2.0 of scraping results and producing predictions, these were accommodated without fuss.
This is not what happened. 5 days before the start of the Top 14 it became clear that all predictions for French matches were missing. It turned out the source had this year just decided not to cover the Top 14. In a blind panic, and with the energy that usually comes from such hysteria, overnight Graeme rewrote the Algorithm to scrape results from 2 other sources to ensure the databases were as complete as possible.
Then, and only then, the additional European leagues were accommodated without fuss.
Over the course of the next few months Algorithm 3.0 was written. This didn't make large scale changes to the predictions, but rather upgraded the process of running, storing and producing predictions and results. The whole process was much more robust and didn't involve Graeme and Alex pulling their hair out at 3AM on a Thursday night trying to get the predictions updated for the weekend. Well, not as often as they have been previously anyway.
Subject: Getting Involved...
Will keep this potentially very long and boring email where I brown nose you lot a fair bit to a short one.....I'm a big rugby fan and also I'm a big data fan (sounds like I cool down servers ...) and was wondering if you were either actively working on the models and I could get involved or if there was anything else I could partake in as like I say very up my street! I'm a data scientist by trade, currently working at a company called ... in London where we crowdsource data from about ... and produce user experience metrics (and a few others) about quality of ... across the globe.
Looking forward to hearing back,
A re-enactment of Alex and Graeme reaction to Sam's email, and Sam's subsequent joining of the team can be seen below.
And thus Sam joined the team.
I'm very excited by the prospect of Sam joining the team as he is actually a data scientist 'by profession', rather than 'by night', as Graeme is.
Given Graeme has little to no idea how to program - rather he just seems to have stumbled his way to something that works - personally I am delighted that someone is joining the team who might have an inkling just what the hell they are doing.
Alex - late 2019
I am terrified that I may have to show my code to him. I've forgotten how it all works.
Graeme - late 2019
Sam took one look at the code Graeme had written for Algorithm 3.0, politely kept his thoughts on it to himself, and decided he would just have to redo the whole damn thing himself. And thus, the prospect of Algorithm 4.0 was born.
Essentially, at this stage they were in too deep to do anything else. It was decided to design the website properly, and rebuild it from the ground up at the same time as developing Algorithm 4.0. Why the hell not? It is new, sleek, shiny, and most importantly; runs, optimises and produces predictions automatically. These are then taken by the brand new Website and showcased in all their glory.
In theory, Graeme, Alex and Sam don't have to do anything ... although this is yet to be properly tested. Somehow they are skeptical.
We are very skeptical that this will run as expected.
Graeme, Alex and Sam - Summer 2020
And there you have it. That pretty much takes us to the present day. Plans are afoot for more, and we'll be informing you of those soon ... stay tuned.
Addendum from Alex: Just to clarify that Graeme did just set this up with a view to predicting Scotland wins for every match. In case you're wondering.
Alex - just now
Predictions were the first thing we started doing here at Rugby4Cast, way back in the mists of time, and it has remained steadfastly the most controversial topic throughout. Regardless of accuracy there are always a few trolls out there sharpening the knives.
So how is it done? Is it all just done by watching a couple of goldfish swimming around a bowl? Sometimes it feels like that that might be better…
What started out as just a simple Excel spreadsheet that used average scores home and away and made some slight adjustments based on team rankings has now evolved into a full machine learning algorithm written in Python. Talk about using a sledgehammer to crack a walnut.
Previous results are scraped from various sources and thrown into the Algorithm every week (usually Monday if you want specifics). The Algorithm churns through this mish-mash of matches and players, updating our databases, calculating rankings for teams and sorting out a few other metrics which we use at various stages of the process.
The Algorithm uses these metrics from historical matches to work out its expected score for each match and the probabilities of each outcome using various regression tools.
Here's one we made earlier
More accurately speaking, this means that each predicted score can be thoughts of as a metric to describe the relative historical strength of the two teams, based on their performances over previous years. However, we think it is more fun to think of them as predictions. At the very least, it certainly serves to provoke those trolls lurking on Twitter (you know who you are – yes you!)
Currently, matches are predicted for the next 60 or so days in the Gallagher Premiership, Pro 14, Top 14, European Cups, Super Rugby and International matches. We could go further, but beyond that the predictions become steadily less accurate as the expected scores will change based on the results in between. But we’re working on it, don’t worry.
We are also looking to roll out similar predictions and stats for the various Southern Hemisphere Cups, along with Major League Rugby and the Japanese Top League in due course. Women’s rugby is also very definitely on the list, as soon as we can find an easily scrape-able resource for matches. If you have any ideas for sources, please point them out.
These predictions are uploaded into the website every Monday (give or take) so you have a full 5 or 6 days to check them out and get really riled up regarding how incorrect and biased you think they are. Please be sure to let us know just how angry you get on all the various social media platforms.
If you’d like you can also subscribe to our mailing list where you will get all the latest updates straight to your inbox, so you don’t need to worry about all the middlemen. Lovely job.
The results relative to the previous predictions are also uploaded weekly so you can see how the Algorithm has performed over time. A running total for how it is performing in each league is also shown in the interests of full disclosure. We aren’t hiding anything! Apart from the bodies of those of have bested us. Those are very well hidden.
This is always the first thing people ask about after hearing about the predictions.
Yes, we do bet based on these predictions. And yes, it does beat the bookies. Bonzer. Feel free to join in. See our advice for using the predictions for betting here.
What this also means is that we collect a huge amount of data. At the time of writing we have over 21,000 rugby matches in our database, and well over 700,000 players associated with those matches.
This is where all the information that we post on various social media platforms and articles comes from. Generally, it is all cross-referenced and checked against the other data sources to ensure that it is correct but it is entirely possible that something has slipped through the net, so if you see anything, please give us a shout to point it out. There’s only so much we can do.
If you have any questions about the data, the model or anything else, please don’t hesitate to get in contact. There’s nothing we’d like better than a good argument about the importance of home advantage in the outcome of a rugby match!
The bread and butter of Rugby4Cast is making predictions on the outcome of rugby matches - ultimately that is probably why you were on the site in the first place so this probably isn't news; we do this using a vast range of data predominantly quantifying the historical performance of teams
After making predictions we do a number of things with them; firstly we publicly expose them on our website as well as on our social media feeds and secondly we use them to inform betting decisions and guide how we place money on games. The way we decide what games to bet on, and indeed how much money to place per bet, largely revolves around finding value in the odds offered by bookies.
When a bookie offers odds on a match they are also offering a prediction of the likelihood of each team winning (plus a bit of an "overground" so that they always make a profit: remember the phrase the house always wins?). Value can be identified when there is some disparity between the implied percentage chance of a win from the odds and our own prediction - for whatever reason we think the bookies might be missing something and either giving a team too much or too little chance of winning.
To find value bets you need two things, firstly the odds a bookie is offering and secondly a prediction - here at R4C we conveniently have both of those things! In fact we go one better and have the odds from a huge number of bookies. We use these to identify value according to our take on the Kelly Criterion. Please see here for how that was developed.
Yes and no - the above is the basis for a lot of what we do, however we have done a lot of work and augmented the simple value tactic with further data and optimisations per each league we make predictions, likewise we have odds data from a large number of bookies so make sure we always have the best odds.
Yes! You can see the success of this current approach here; overall and split across time, leagues and specific matches.
Here at Rugby4Cast you may find a few sets of rankings, depending on which events you are looking at. Below we’ve outlined them all so you know what’s what.
We at Rugby4Cast use the Elo rating system to maintain our own set of rankings which are used in our predictions algorithm, and to rate and compare club sides.
Elo is a method for calculating the relative skill levels of teams in matches and it is named after its creator Arpad Elo, a Hungarian-American physics professor, who invented it as a chess rating system.
The difference in the ratings between two teams serves as a predictor of the outcome of a match, with the difference in rating being the difference in expected score. Therefore, the expected score difference for a team rated 5 points higher than their opponent would be … you guessed it … 5 points.
After each game, points are transferred based on the actual outcome relative to the expectation. In the above scenario, if a team was expected to win by 5 but only won by 4, then the higher ranked team would lose points with the lower ranked team would gain the corresponding amount. The actual amount transferred is dependent on the k-factor (which can be thought of as a sort of ‘refresh rate’) so it takes a while for results to be fully reflected into the rankings, and a greater picture of team form emerges.
We have analysed all the previous matches to find the optimal k-factor for each league, as well as analysing home advantage to see what extra edge the home team should be given. For each league’s specific home advantage please see the individual event page.
Large discrepancies in expected results (say a team winning by 45 points, rather than 30, compared to a team winning by 6 rather than 2) can be quite difficult to normalise. Therefore certain results may lead to a larger rankings change than may actually be indicated by the match outcome.
World Rugby maintain a set of rankings to judge international sides and their respective strength. International nations are ranked based on match results, with the most successful teams being ranked highest.
Rankings are based on the match outcome, with more significant matches being more heavily weighted to help reflect the current competitive state of a team. The ranking system was introduced the month before the 2003 Rugby World Cup, with the first new rankings issued on 8 September 2003, with England topping the first ever rankings table during their run to World Cup glory.
Unsurprisingly, since then New Zealand have been the most consistently ranked #1 team since the introduction of World Rankings, having held the top spot for more than 85 percent of the time during this period. South Africa and England are the only other sides that have ever held the top spot.
As new results come in, past successes or losses will fade and be superseded by the more recent results and their points gain or loss. Thus, it is thought that it will produce an accurate picture depicting the actual current strength and the rank of the nations.
Team ranking is calculated using an exchange system (zero sum game), in which sides receive points from each other on the basis of the match result – whatever one side gains, the other loses.
The exchanges are based on the match result, the rankings of each team, and the margin of victory, with an allowance for home advantage.
A full explanation with worked example is available here on the official World Rugby website, but essentially a range of up to 2 points can be exchanged in a match, the exact amount depending on the rankings of the two sides before the match.
Should the winning team be more than 10 ranking points ahead of the losing team before the match begins, the winning team will gain no ranking points from victory. They are deemed to be sufficiently far ahead that no points exchange is required to update the rankings. If the winning team is 10 ranking points behind, they will gain a maximum of 2 ranking points from a victory.
Should the difference in team rankings be less than 10 points, then the points exchange is calculated by interpolating between 0 and 2 based on the difference in team rankings. A few scenarios are demonstrated here:
Whilst this is an excellent system for judging team strength, there are a few limitations (as there are in any ranking system).
It is impossible to gain points from losing a match which, whilst logical from one point of view, can be a little counterintuitive from another. Namely, it doesn’t account for the heroic 1 point loss by a much weaker team (heroic is a purely subjective measure, obviously).
For example, imagine Japan play New Zealand in Eden Park and perform incredibly, losing by just 1 point. They would gain no ranking points from that particular match, which might not be indicative of their current strength given that performance.
Under our alternative rankings for international sides, which uses the ELO system this limitation is accounted for.
Aside from the doubling of points exchanged for World Cup matches, there is no system to reflect game importance. Some matches played outside of the official international windows have much weaker teams and are little more than a warm up match, meaning the outcome may not be indicative of team strength, and points exchange is often meaningless.