G ..O.. O.. A.. A.. A.. L.. L.. L.. L.. - celebrating the statistics of the beautiful game
Release Date: 06/14/2018
Rosemary Pennington : Every four years sports fans around the world are glued to their TVs or barstools as the beautiful game is featured during World Cup play. Considered one of the most important tournaments in soccer, this year's World Cup contenders are gathered in Russia chasing after the golden trophy. Twenty eight teams' favorites to win the tourney include perennial powerhouses Germany, Brazil, Spain and France. Belgium is seen as a team poised to pull off an upset. While commentators will be focusing on the fleet footwork and deft dribbling of the sports stars, analysts will be crunching numbers trying to figure out if the secret to a team's success lie somewhere in sports data.
The statistics of the beautiful game are the focus of this episode of stats and stories where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's departments of statistics and media journalism and film as well as the American Statistical Association. Joining me in the studio, our regular panelist John Bailer, chair of Miami statistics department, and Richard Campbell, chair of media journalism and film. Our guest today is Luke Burnn, an assistant professor of statistics at Simon Fraser University and the vice president of strategy and analytics for the Sacramento Kings. He's also one of the authors of a piece in Significance magazine exploring the world of soccer stats. Thanks so much for being here today Luke.
Luke Born : Happy to be here.
Pennington : I love soccer but I have a ton of friends who have no idea what's going on in the game. They love American football, they love baseball, they love basketball but they have a really hard time sort of following what's happening on the soccer pitch. When you're talking about soccer statistics, how are those different for the…from say, baseball stats or basketball stats?
Bornn : You know one of the big difference is, with soccer, there is so much less scoring. So, in basketball or baseball, with all the scoring that happens, all the home runs, all the three point shots in basketball, I'm sure you get a lot of outcomes that you can use to measure players and to sort of understand players that are good, teams that are good. In soccer where you have a lot less scoring there's a lot more randomness and so as a result you have a lot more upsets and it becomes a lot harder to figure out who's…who are the good players and who are the bad ones.
John Bailer : You know one question that I have is what's the breadth of app analytics in soccer? What are…you know there's…you certainly think about things like player assessment. But what else have you seen analytics used for in soccer?
Bornn : Yeah I think it is used for a variety of things. As you mentioned it's used for player valuation, recruitment, you know, figuring out who to bring into the team and who to let go. So that's things like which players are skilled in the right ways. But it is also use tactically to understand, sort of you can think of it as like a point scouting, where you're thinking about OK what are the types of tactics that my opponent is going to employ and how can we adjust our tactics to counter that. So that's a big piece of how it's used as well and probably the third way is in performance, so health and wellness, where teams are using lots of data on their players to make sure that they're well rested, that they're fit, that they're not fatigued coming up to games.
Bailer : So you've spent some time working at a professional club.
Bornn : Yeah that's right I've spent a little over a year at AS Roma which is a club in Serie A that actually made it far this year that probably on people's minds they made it far in the Champions League this year and yes I was there last year and spent some time in Rome which is just delightful.
Bailer : Can you talk a bit about what did you do for the club, what was sort of your responsibility, as someone who's a quantitative guy working for a soccer club?
Richard Campbell : Yeah and were you responsible for them doing well?
Bornn : You know, like most teams, there's a lot of people involved and so, I would like to think that I was, in some sense, a part of that you know building tools and infrastructure to help make those decisions. But ultimately, you know, there's lots of players involved, there's lots of coaches and performance people and ultimately the players on the pitch that are the ones that deliver. So what did I do there on a day to day basis? Basically help the organization be more objective and data driven in their decision making and then also provide all the tools and infrastructure to support those decisions. So lots of models and visualizations to help measure player value, understand opponent tactics and then also get out of player fatigue and fitness and that kind of thing.
Campbell : So I read your piece in Significance, so I'm wondering how do you explain what you're doing to players, or do you have to do that? Or how do you explain it to coaches, so they know what you're doing?
Bornn : Yeah that's a good question. For the most part in professional sports, communication with the players goes through the coaches and this was definitely true when I was in soccer and it is true now in basketball as well. So for the most part you're dealing with coaches. You do occasionally have interaction with players as well but that's usually a little bit of a different focus. With coaches, it really depends on the coach. So some of them are really into looking at analytical views of their opponents and their players. Others, the way that they think about the game is through video and so sometimes what we do is instead of thinking about okay, let's give this coach or this trainer a spreadsheet of numbers, let's give this to the, work with the video guys to produce a clip of the coming opponent, that's going to be more objective and more data driven. So instead of having the video guy just go watch the upcoming opponent's last two or three matches, let's give him all the tendencies of that opponent, so that he can cut up a film clip for the coach which is going to be more accurate to what that team's actual tendencies are.
Pennington : You mentioned a couple times, the sort of defensive way that you can use analytics in soccer and I'm wondering, so if I'm Spain and I'm trying to make sure that Belgium isn't going to cause an upset in World Cup play, if I'm looking at this video, what am I looking for, as far as the sort of things to help me understand analytically how to sort of ensure that upset doesn't happen?
Bornn : Yeah, so I think there's a few things. First off, as you can imagine, it's fairly straightforward with the current data to understand which players are the sort of key drivers of offensive actions so by that I mean if you just look at sort of the sequence of players that the ball passes through maybe from goalkeeper all the way through to the striker who takes a shot you can see tendencies in the team. Do they got off to the left, do they go to the right, is that their wing back, is it their…do they go through their center, midfield and from that teams can understand their opponent's tendencies in terms of where they bring the ball, who carries the ball or which players are more likely to create scoring opportunities and from that they can adjust their defense to counterbalance that.
Bailer : When you started to provide input to the club, you said you were talking about maybe helping the organization to become more data driven. I'm curious, what ideas were embraced the fastest? What kind of input was really most welcomed and what kind of input or ideas did you find the most reluctance to consider?
Bornn : I think probably what I've seen universally is that the groups that are within teams that are most receptive are the performance staff. So these are strength and conditioning coaches, physiotherapists, doctors etc. and that's largely because they've been driven by the world of sports science which is you know very data focused and has been for decades and so that was the sort of place for the Roma but also with the Kings where we got a lot of adoption and are really able to create a lot of impact right away. There's other areas where it becomes a little bit harder. So if you're familiar with Moneyball you can think about baseball and baseball use of data. I tend to say that that baseball is about ten years ahead of basketball in terms of the adoption and understanding of data within the sport and then I would say soccer is probably another ten years behind basketball.
Pennington : Oh wow!
Bornn : So when you're talking with coaches and players this just isn't the language that they're used to speaking.
Bailer : OK.
Bornn : The use of data really has only been around the last few years and because of that coaches and players did not grow up with this level of statistical literacy around the game and so it's probably going to be another generation before the kids coming up learn to speak that way, where they start to think about expected goals and goal differentials and pass completion percentages and those kinds of things.
(Background music plays)
Pennington : You're listening to Stats and Stories and our discussion today focuses on statistics and sports. Luke, in the article for Significance, you and yours co-authors are writing about the fact that there's going to be real time player tracking available in the World Cup. How might that change the way teams can prepare and the kind of analytics they can sort of, gather in their preparation for matches?
Bornn : Yes, I think the first thing that you might see is teams overreacting to the small sample sizes.
(Collective laughter heard)
Bornn : You have tracking data for one half of a match. It becomes really easy to just think that that's the way that team operates when in fact there's a lot of variability within the game and even across games. But I do think where it'll come in is primarily on the performance side as I mentioned earlier. It's really hard in the sort of sample size of the World Cup where you have a handful of games to really identify meaningful trends and patterns in a team or player's style of play and so most likely what you're going to see is the teams using it really heavily for understanding the load the players are going through. So for example if you have a player that's maybe playing as a central midfielder and not sort of a box to box midfielder but I mean in some sort of way, you know not covering the field, not running a lot, we might then say OK, he's not getting a lot of, he doesn't have a lot of load during that game or that he's not going through a lot of certain physical stress. Or maybe another player like the left backs are going up and down the pitch over and over has a lot more load. And as a result, in the days between games, the performance staff will might actually adjust their training schedule to adjust for those different in-game loads.
Campbell : Could you talk a little bit about…because this is related how wearable technology works, you know how does that…how does it actually work? How are you gathering data with that kind of technology?
Bornn : Yeah, that's interesting. So I think usually in practices and as our teams will use this wearable technology that's essentially a small…the size a sort of a couple of AA batteries that goes between the shoulder blades and it measures acceleration and different movement patterns it can also be GPS connected as well. But what you are going to see in the World Cup actually is an optical based system and so what they do there is they install several cameras around the perimeter of the stadiums and then these are essentially just a handful of high definition security cameras and then after that they use image processing techniques, essentially multiagent target tracking to produce the co-ordinates of the players every twenty-fifth of a second. So they're using sort of fairly complicated machine learning techniques to sort of extract the locations of the players from these video feeds and they do things like optical character recognition on the player's jerseys on their back to figure out which player is which. So that's the technology. So people think that it's a hardware solution that players are wearing but actually it's coming from image processing, from cameras.
Campbell : Very interesting.
Bailer : That is that's really cool. So they're going to be saying who's been covering the most ground obviously, they're going to talk about the acceleration of these players, there is going to be all sorts of information that is being fed to their commentators?
Bornn : Yeah and for the most part people who work in this or tend to really downplay things like how far a player runs, or acceleration, but it's ultimately something that the media loves. So this is like you're really sort of this gap between what people within teams use and what the media tends to focus on. And the media for sure will talk about the players that ran the longest and the sharpest acceleration but the things that teams are looking at are more sort of overall impact on the player's body which are sort of more complex metrics than that.
Bailer : So what from the data that's going to be collected in real time during the World Cup is the most interesting statistical information?
Bornn : I think it's hard to say. First off what level of analytical sophistication each team will have. The thing with international squads is that they tend to be sort of off for long periods of the year and they come together and they meet and a lot of them don't have dedicated analysts. They do have these people who are coming from another club and just working during the World Cup. So oftentimes they don't have the time and resources to do the same level of analysis that a top tier club in Europe might, and so I think even though this data is coming to these clubs in the World Cup, there will be I think a gap in the ability to turn this data into meaningful insights.
Pennington : You mentioned earlier that soccer has sort of been behind the curve when it comes to analytics and statistics for team prep. Why has that been?
Bornn : That's a good question. I think there could be a few reasons for that. One is that baseball really led the way and so whether it's due to soccer's sort of the thematic distance from baseball and by that I mean the sort of the style of the game is drastically different, baseball's a very discrete game and by that I mean it's sort of, you know, a pitcher-batter one at a time, whereas soccer is a much more free-flowing game so they're very different but also there's a geographical distance and so I think the fact that the sport has been based primarily in Europe has led to slower infiltration of data into the sport.
Pennington : So do you have to do any work, again, following up on the resistance to analytics? I mean, there are, you know this was part of what I remember about Moneyball the early resistance to using analytics that a lot of coaches feel like this is a feel game that analytics doesn't have much to do with it. So do you face much of that both in basketball and in soccer, given that they're a little far behind in terms of baseball?
Bornn : Yeah I think this is a common thing that happens when statisticians or other people who sort of work in a quantitative field go into industry from academia and so when I was in academia we would spend all our time building these really complex, really fancy models using tracking data to answer really cool questions and we presented and got lots of great feedback and you find that in contrast when you go into industry you spend a lot of your time and in fact the majority of your time communicating statistical information, sometimes quite simple statistical information and so that really is the biggest challenge. It's not coming up with the greatest metrics or the fanciest ways to measure players, it's the ability to communicate that information to coaches and others so that it can actually have an impact.
Campbell : Do you have an example of that? Something that you found that you had to explain, something that was more technical and break it down?
Bornn : There's this idea, and well, it's actually true in both soccer and basketball, there's this notion in soccer that there that's become quite prominent in the last couple years of expected goals and the idea there is you can imagine if a player takes a shot and whether it goes in or doesn't, instead of saying OK this counting shot, so you know Player A got ten shots and Player B got one shot over the course of the tournament. You can instead look at the probability that each shot becomes a goal so if someone takes a really really great shot that has a fifty percent chance of going in and then they sort of get point five expect goals and in contrast if someone takes a really long shot from way outside the box then it might be point one expected calls for example. This is actually sort of the equivalent to basketball of a field goal percentage and we're looking at sort of spatially reference field goal percentage you know a player might shoot thirty five percent from this location and forty five percent from this other location and so that's something that's really tricky to get across. It sounds relatively simple but this idea that his certain locations are much more valuable to get shots from, and in fact we can measure it. So in basketball the most frustrating thing as an analyst is when you see a player go to take a three point shot but they take one step inside of the three point.
Bornn : To a long two and so you have to sort of get across this idea that hey, the probability of making the shot when you take that one step forward basically stays the same so there might be sort of a thirty five percent shot for the three versus maybe a slight drop like thirty three percent but when you meet one of those you're multiplying by three points and the other one you are multiplying by two points and so there's a big gap. And the same thing holds true in soccer where some players have these shot profiles where they take all these really long shots then it's a really inefficient way to score goals and so it's much better to sort of trying to get that extra pass, even though in the end you get less shots you're going to be better off because those are higher quality shots.
Pennington : That's also frustrating as a fan. (Voices overlap) (Background music plays) Today we're talking with Simon Fraser University's Luke Bornn about sport stats. You mentioned a little earlier about how the media are looking for different things and stats than the people who are working for teams, and so I'm wondering if there are things about the way sports reporters or news reporters cover sports statistics. Whether it's you know field goal percentage or shot percentage that you find frustrating and think they could do better.
Bornn : You know that's a good question. I think for the most part media is not terribly adoptive of these ideas and so, you know I think people can remember Charles Barkley's rants against analytics and those kinds of things. So for someone who's in the sports analytics world for the most part when the media uses quantitative information to present the game I think it's…I enjoy it almost regardless of whether it's done well or not but I think the biggest thing that you will see now is focus…it is focusing on what I would call volume. So I think there will be a lot of reporting of number of shots, number of passes and really for the most part in sports what you do you don't really care too much about volume you care a lot about efficiency and so as an example in basketball if a player plays thirty minutes versus a player plays ten minutes they might show these the stats the same way for both players you know this player scored twenty points in thirty minutes and this other player scored fifteen points in ten minutes. Well I would actually much rather have the guy who scored fifteen points in ten minutes than the guy who scored twenty in thirty minutes so but if you're just presenting the cumulative totals without accounting for the fact of OK how much usage did that player get, you're missing out a lot of the nuance of the game.
Bailer : I'm looking forward to seeing the box score that you develop for soccer. You know I think well you know you know part of the interest and in baseball is you know people could go through the game and see the detailed performance of players and you know you could look over the course of seasons, be able to track that and I'm trying to picture what the equivalent of a baseball card would be for professional soccer or for the World Cup.
Bornn : Yeah I think it's only been in the last couple of years in basketball that teams have really thrown out the traditional box score and started to do something a lot more intelligent so I think we are a ways away from something like that in soccer but I think ideally you'd see something that is a very simple example if you look at tackles and interceptions. So these are defensive events so if a player intercepts a pass between teammates or a tackle is sort of when you steal the ball from another player if you look at those counts almost always you'll see as an example if you look at La Liga guy, you'll see Barcelona as having very few tackles than interceptions so you might think well they were very good defensively but of course the other team almost never had the ball. It is hard to tackle the other team when they don't have the ball so it's a very simple start you know I can imagine that in the media they're going to report tackles and interceptions but they really need to be adjusting it for the amount of possession that each team has.
Pennington : In Moneyball I remember one of the things that stuck was the way that analytics helped recruit not star players but role players. Is there a similar kind of thing that's applicable in basketball and soccer where you can study data to imagine that the team needs somebody that can do a specific thing but he's not a star?
Bornn : Yeah. I think the thing about star players is that they play so many minutes and they get so much attention that for the most part they're fairly well valued by the public so I think you know you don't need analytics to tell you how great Messi is or how great Ronaldo is or any of these players but it's the players that sort of fall behind the wayside that don't appear in the highlight clips that you can get a lot more insight into from analytics. And so as an example of that you know some players that might play less minutes or aren't the ones involved in goals they might never appear they might never be in the highlight reels but you'll see in the data you can see all of these people are actually they're the ones that created those opportunities maybe three or four or five passes back from the shot so you can extract that information relatively straightforwardly from the data or it can be hard as a fan or as…even as a coach or as a scout you tend to remember those shots and those key actions that lead the goals and so oftentimes scouts are really good at remembering those the goals and the scoring opportunities, so good at valuing sort of the scores on the team but maybe not so good at valuing those players that create those opportunities farther back in the pipeline.
Bailer : So one dimension of what you've written about in the Significance piece and other work is the idea of the value of space and location and it makes me think about some of the old, the soccer coaching guidelines about preparing players to be you know the first attacker or second attacker or third attacker or first defender, second defender, third defender all those concepts of role and space and interaction with the player on ball. So how do you think you start to quantify, I know you started to quantify this some but I'm just trying to think about how we're going to read about this and report about you know why this player is such an important has such important impact on space in the play or the flow of play?
Bornn : Yeah, this is actually measuring space and understanding its impact is a really hard thing to do and one of the first projects I did with the student at Harvard, Dan Cervone who's now at the L.A. Dodgers is we used this analogy to real estate. So you can imagine that if you're paying a thousand dollars a month for a flat and for a small flat in Manhattan and you then choose to pay the same amount for a flat that's twice as big in Brooklyn. Well that tells you something about the relative price of real estate in each of those so in other words even if you didn't know how much a person was paying if you sort of see that they swap between Manhattan to Brooklyn and you look at the relative size of those two apartments you can get a notion of the relative value in each of those two places and the same can be done in soccer in basketball where you look at OK I am in this really wide open space and I had no defenders around me I have lots of room but if I choose to pass it to a teammate of mine who's maybe ahead of me and quite tightly guarded that tells you that I value the space where he is much more about much with much more value than I value the space where I am and so by using this sort of every pass you can think of this dynamics of trading one space of one sort of ownership space for another ownership space and from this you can actually get a pretty good notion of the areas of value on a pitch but also how different players in different teams value different areas of the space.
Bailer : Very cool. You know it's interesting to me that you've talked a lot about soccer and basketball and the analogy here and I keep thinking about soccer and hockey as maybe being more similar in terms of you know a lot of flow but not…you know without a lot of scoring.
Bornn : Yeah hockey's…Hockey is a really interesting one and one of the reasons that it's…that it hasn't caught up is largely due to the data gap and so definitely hockey and basketball and soccer are very much the…I think of the three that can learn the most from each other and only really in the last year or two has hockey caught up in terms of data so there's a company based out of Montreal called Sport Logic that creates incredible data for hockey that's akin to this sort of tracking data. So I think you're going to see hockey catch up very rapidly.
Campbell : I hear you. I want to know, as a frustrated basketball fan what the data show that hitting three-point shots is so much harder on the road than it is that home you see it's like it's incredible in this particular playoff how badly the Cavaliers hit shoot three pointers on the road. Do you have data on this that suggests there are other factors that affect players on the road?
Bornn : Yeah I think the biggest thing, there is…there's definitely a fatigue factor that comes into play. For the most part there's not a drastic impact on shooting. What you do see is there's a lot in terms of the calls. That the different teams receive. There definitely is a home field advantage but it's not nearly as large in basketball as it is in soccer so in soccer that aside from the actual teams themselves that their home field this is a huge, huge impact. So it's this is why for example in the Champions League they do home and away so they don't play single games except for a final which is at a neutral location for the most part all throughout they play home and home which is to say if you have the two teams playing each other, they play one game at one team's stadium and then week later they play at the other team stadium and that's to balance out the really extreme home field advantage.
Campbell : But what causes that you would think in a small indoor arena like basketball the crowd noise would have an impact but in a large soccer stadium, why is there more of a home field advantage?
Bornn : Yeah there's been a lot of study along these lines and some of it comes down to fatigue and definitely crowd influence. There's been studies around the officials' calls so especially in a sport like soccer where awarding a penalty can basically mean the difference between winning and losing a game. Then there's a variety of factors that come into play.
Bailer : OK. How would someone prepare for a career like you have?
Bornn : You know the first thing I would say is that is that the funnel is really narrow so when I started at the Kings, we hired three full time staff and for those three positions we had over a thousand applicants. So it's a really challenging field to get into but you know there is a fair bit of jobs and it's growing. So I would sort of start by and with that answer to sort of check people's expectations. That it is a very difficult field to get into but in terms of how to get in the there's a tough balance because if you look at the people who got into teams maybe ten years ago their skill sets were what you might obtain in a…in fact a lot of them have MBA's they're sort of really great with Excel spreadsheets and those kinds of tools, whereas if you look at the level that teams are hiring now they're asking for machine learning expertise and databases and scripting Python or R. The level of technical sophistication has gone through the roof and so because of this now it becomes even harder because if you want to get into a team you now not only need to have this great technical sophistication but these incredible communication skills where you can you know build the complex hierarchical Bayesian model and then in the next second sit down with a coach and translate it to him you know someone who has maybe high school mathematics, and you have to communicate this rather complex hierarchical model so it's this combination of really high level of technical skills that are required to work with this tracking data which is often hundreds of gigabytes and then at the same time be able to communicate it to a lay person really straightforwardly.
Pennington : Well thank you so much Luke, for being here today. That's all the time we have for this episode of Stats and Stories. Stats and Stories is a partnership between Miami University's Departments of statistics and media journalism and film and the American Statistical Association. You can follow us on Twitter or iTunes. If you'd like to share your thoughts on the program send your e-mail to stats and stories at MiamiOH.edu and be sure to listen for future editions of stats and stories where we discuss the statistics behind the stories and the stories behind the statistics.Bailer : Indeed! Stats and stories is a partnership between Miami University's departments of statistics and media journalism and film and the American Statistical Association. Stay tuned and keep following us on Twitter or Apple podcasts. If you'd like to share your thoughts on our program, send your e-mail to statsandstories@MiamiOH.edu and be sure to listen for future editions of stats and stories when we discuss the statistics behind the stories and the stories behind the statistics.
Click to close the script.