It’s a great day in sports analytics! Sports and Statistics…what do these two have in common? The first thing that pops to mind is… Baseball! Isn’t it? With all those stats, numbers and fancy graphics superimposed on screen, tossed at you between each and every action. And this happens for a good reason too.
Let’s start considering that the average baseball match has only about 18 minutes or real play, of action, so to say.
Sports broadcasters have learned all too well that if they provide us with interesting data about the game, if they give us things to keeps us engaged while the real action is at a standstill, this helps them keeping us. The audience, glued to the tv-screens… despite the fact that the only thing we’re looking at is maybe a guy in a striped shirt pulling up and adjusting the rim of his socks… Not too exciting, right? And this in turns buys sportscasters a better share of the audience and this helps skyrocketing the price of the advertising slots they can sell on the media market-But wait a second… Without even really putting our mind to it we’re already knee-deep into statistics aren’t we…?
Didn’t we mention a second ago: “the average baseball match”..?
Yes that’s right, you’re spot-on. Statistics is much more of a “normal feat” for us than we can really think of it.
And as you may discover, following us in this series of videos. Statistics, the base of all sports-analytics, is easy. In facts, much easier than you may suspect. And when you apply statistics to sports magical things can happen! Yes. Even better things than preventing people from switching channel on the TV-set, as Billy Beane and Paul DePodesta discovered during their 2002 Oakland Athletics season when they were the first to systematically apply sport-analytics to their managerial and coaching decisions and in this way managed to secure a 20 games winning streak for their team As we will see in the coming episodes many more team coaches and managers have learned to leverage the same tools, over the past quarter of a century.
But let’s proceed in order. We’ll look at how these tools are applied more in depth in the coming episodes.
We’ve been so bold as to say “statistics, the discipline mother of all sports-analytics is easy”, right? Mmmh, don’t know why, but I suspect you’re not fully bought into this statement yet. Are you?
With all those mathematics and funny symbols and Greek letters involved… How can Statistics ever be “easy”…?
Well, maybe I’ve exaggerated a bit… Maybe not everything we’ll talk about is really just plain “easy”. But I can promise you that if not “fourth-grade easy”, we can make statistics and the concepts involved, at least, “readily understandable” for you…
And all this becomes possible when we realize one important thing: Statistics is not an esoteric science, available only to a handful of initiated priest… No sir, it’s not. Statistics is nothing more than another way we have devised to look at the world around us and maybe describe in a useful way what’s going on with it… Statistics is just another useful tool that we, clever human beings, the animals than more than any other one has learned to observe the world and speculate about it, have readied to describe and better understand certain facts that catch our attention.
Okay, here’s the catch. The key to unlock the mysteries of Statistics is to realize that we’re talking about nothing more than another language, a language different from the verbal one we’re so accustomed to, a language better fit to describe certain notable facts about the world around us and its phenomena. And by shifting just only a bit our attention, and concentrating on the real thing, the world and its phenomena, and understand what really is that we are describing through statistics, then the language and its conventions, its intricacies, can become all the way much easier and manageable.
Not convinced yet? Let’s start untangle the knot together, then.
—- end intro —-
Let’s go back to our “average baseball match” that we stated is only roughly 18 minutes long, when we consider just the action bits.
As we will see better in the coming episodes the concept of “average”, or “mean” as it’s conventionally named in jargon, is one of central importance in Statistics. So it’s a clever place where to start our journey from, since it’s one of the pillars that holds all the house up. And as promised let’s start from the observation of the world and the things we want to describe and understand better:
Okay, let’s go. What is the “mean” or “average” in Statistics?
First, by watching match after match we realized that our beloved game has lengthy moments without actual ball-play. And we take note that this same fact holds true for each and every game played, so it’s something worth our attention, worth investigating. Okay, but for how long then, during a game, do the teams actually play the ball..? We ask…
We solve this problem in a clever way. Timing with a chronograph the time that teams are actually busy playing the ball and adding together all the measures we collected. We end up with a final figure of 16 minutes and 48 seconds. For that specific game we’re scrutinizing. Second we ask ourselves: okay but do all baseball games can when stripped down will equal to this very same 16 minutes and 48 seconds of actual ball-play…? Probably not, we realize… so we get really curious and we go about timing in the same way a few other games, in search of evidences. As expected we get different figures for each one of the other three games we measured. One game was about 20 minutes worth of game play, another one was about 15 minutes, and the last one we timed was about 23 minutes. We decide as well that for the first game we timed we can get away rounding those 16 minutes and 48 seconds to 17 whole minutes: after all we decide that an answer
Precise-to-the-minutes would satisfy well enough our curiosity… We just don’t care if we won’t be more precise than that, for our speculations.
So we’re left with four measures to make sense of: 17, 20, 15 and 23 minutes. When we ask ourselves again: “okay, enlightened by my newly acquired observations how long do teams play the ball for in a baseball game, then…” I can notice two facts using common sense: first, usually games don’t last for the same length of actual ball-play, it’s different for every game as expected: second, this length it’s about 20 minutes, more or less…
“It’s about 20 minutes…”, “…more or less…” Wow, did you notice what just happened? No…?
Let’s see together what we just went about, and how easily we can land from that to the concept we use in statistics of “mean” or “average”.
First, more or less consciously, we tried to “generalize” our answer. That is we tried and agree about a single figure that could somewhat represent with acceptable approximation each and every one of the four games we considered.
Second, we tried and mentally calculate a value that could be as much as possible “equally distant” from all the single measures we started with, and which could therefore represent well enough each one of the games we are considering.
So these characteristics of generalizing a set of different quantities, by providing a single figure that, when needed, can stand in place of each of the values we started with, and of equidistance, which means “still landing in the right ballpark, with a meaningful figure” no matter what starting observation I substitute my “general value” for, are exactly the defining characteristics of the Statistical concept of “mean”, which by the way, is routinely represented in written form by putting a little horizontal dash above the name of the items we are considering the average value of…
We can then ask ourselves: “Okay, but in order to do the best job possible, what is then the best, the most precise, way we can use in order to calculate the mean… this equidistant, general, value… instead of relying only on our common sense…?” We are clever enough not to go about re-inventing the wheel from scratch every time, and we decide to borrow a bit of knowledge from our cousin-languages mathematics and geometry. After all these two have already gone a long way describing cool methods to work with quantities in meaningful and correct ways… And we find out that the way to calculate a number that has as best as possible the same distance from each one of a bunch of other given numbers is to sum the values of all the starting quantities together and divide the total result we get by how many numbers we are starting our calculation with.
If you want to try and understand why this mathematics trick works you can try and simplify this method to the case where we’re considering only two linear distances. By looking at this method “visually” it should be easy enough to see why the procedure works…
Anyway, let me state out clearly the approach we’ll use in these vids, we’re interested more into understanding how certain Statistical tools and methods came into being, we’ll look in detail into what these tools really describe and what’s their actual meaning and usage, rather than laying out the detailed mathematics and the actual procedure to calculate them. We’ll look at these last two subjects only to the extent needed and useful to understand the meaning and utility of each tool that we’re going to cover.
So, for this post the two important takeaways for the concept of Statistical “mean” or “average” are the fact that when using the “mean” what we are really doing is that we are attempting to generalize a bunch of other values, observations, so we’re looking for a quick, synthetic way to represent them all… and that we are at the same time accepting to pay a little price for this usefulness. The price is that our “mean”, average, value won’t be exactly equal to anyone of the values It represents, expect very special cases, but it will only be good enough to approximate them all by being as much as possible equidistant from each one of these.
Find out how Sports Analytics Expert Victor Holman can give your team the competitive advantage.
How mature is your team’s analytics program? Take the Sports Analytics Maturity Assessment.
Learn about the Groundbreaking Sports Analytics Model coaches and sports analysts are talking about!
Learn all about Sports Analytics here.mean