Wednesday, July 23, 2014

Setting A League-Wide Standard For Corsi

In my past posts, I've used a lot of numbers. This post will be no different. I began talking about what Corsi is in my last post. Essentially it's a statistic measuring possession. It is a recording of every corsi event for and against while a player is on the ice. A corsi event is when there is any type of shot on goal, whether it is a registered shot, a missed shot, or a blocked shot. When it happens for a player's team against the opponent's goal, it falls under the corsi for statistic. If the opponent's team has corsi events on the player's goal, a corsi against statistic is recorded.

As "advanced statistics" for hockey continue to develop, Corsi is just scratching the surface, as there are even more ways of measuring Corsi, which include iCorsi, expected Corsi, a new trend called delta corsi, and zone start adjusted Corsi. For the sake of my own sanity, this post will only focus on the total overall Corsi stat, known as Corsi For %.

Why is Corsi such an important statistic? As noted before, it is an indication towards the possession rate of teams. Looking at information from Extra Skater, 12 of the top 13 possession teams in the NHL last season made the playoffs. The exception to that would be the New Jersey Devils, sitting in 4th place. It is a way of determining just how good a player is with the puck (or without it), without taking point totals into account.

So what exactly is considered a "good" Corsi %? The conventional thought process behind it is that any Corsi rating above 50% is considered above average. After all, it makes sense, right? If you throw the puck more times at your opponent's goal than they throw the puck at yours, you're in good shape. I took the time to compile all the individual (completely unadjusted) Corsi For% from Extra Skater for every player that played 41 games or more (half the season) during the 2013-2014 season.


Above is the distribution of all 570 players involved, sorted from highest to lowest that I compiled.

So what is pictured above? It's not exactly a normal distribution. It's rather skewed to the right, in the positive direction. It's also graphed with a line of best fit with an R squared correlation value of 0.9487. Just a side note, the closer an R squared value is to 1, the closer it fits the given line formula (which is y = -0.0258x+ 57.164). Above, any player can be found plotted.

So let's get into some of the numbers of the distribution. Some quartiles to include have the minimum of 36.8% (Luke Gazdic), a 25th percentile of 47% (Jamie McGinn, Karl Alzner, Jonas Brodin, Thomas Vanek), a 50th percentile, or median of 50.3% (Nino Niederreiter, Ray Whitney, Ryan Ellis, Marty Havlat, David Desharnais, and Eric Brewer), a 75th percentile of 52.775% (Brian Campbell), and a maximum of 61.2% (Patrice Bergeron). The average of the entire distribution is 49.79% (Shawn Thornton, Cory Sarich), with a standard deviation of 4.36%.

There is a measurement in statistics called a Confidence Interval, which is a formula to determine the true population mean of a distribution. Using a formula for a 95% confidence interval, the interval is determined to be 0.36. This means that the range for the true population mean is between 49.43% and 50.15%. However, for argument's sake we will stick with the original determined mean of 49.79%.

It is important to note that Corsi, is just a single statistic, and should be taken with a grain of salt. It does not depict the entire image of a player. Just like "standard" stats like points, goals, and assists, a player's Corsi% can be inflated/deflated by his teammates, since hockey is a team sport. There is a measurement which determines how much a player's teammates help or hurt him. It's called Corsi Relative %, and essentially, it measures the difference of team average Corsi when the player is on and off the ice. I will not delve further into this stat, but just to provide an example, Mark Giordano, defenseman of the Calgary Flames, posted an overall average Corsi For% of 53.3% last season, but had a +10.3% Corsi Relative. This means that the team was 10.3% better with Corsi when he was on the ice. This is precisely why the Corsi stat must be taken with a grain of salt.

So what is considered good? What is considered above average? I will address "above average" first because it is the most straightforward. As the average is 49.79%, anybody with a higher Corsi than that would be considered above average. As for good, I would consider a "good" player to have a Corsi equal to or greater than 1 standard deviation above the mean. I would also even go far to say that a player is considered "elite" if he has a Corsi equal to or greater than 2 standard deviations above the average. This would make a "good" Corsi player have a value of at least 54.14%, and an "elite" Corsi player to have a value of at least 58.49%. This would make 13 "elite" Corsi players from last season. These players are: Patrice Bergeron (BOS), Jake Muzzin (LAK), Anze Kopitar (LAK), Justin Williams (LAK), Tyler Toffoli (LAK), Brad Marchand (BOS), Jonathan Toews (CHI), Jaromir Jagr (NJD), Michal Rozsival (CHI), Reilly Smith (BOS), Dwight King (LAK), Drew Doughty (LAK), and Travis Zajac (NJD).

The above information was determined by the entire amount of players. Let's break down how the numbers may differ at each position.

Centers:


Above is the distribution of Corsi For% for Centers plotted with a line of best fit. The R squared correlation, 0.9353, is not as high as the previous graph (the entire league) due to a decreased sample size as well as a few outliers, but the formula for the line is y = -0.0915x +57.028.

Let's get right to the gist. The minimum value is 37.5%, the 25th percentile is 46.7%, the median is 49.9%, the 75th percentile is 52.4%, and the maximum value is 61.2%. The mean is 49.61%, with a standard deviation of 4.397%. How does this compare to the entire league? In short, the centers have a lower Corsi For% than the rest of the players.

A direct comparison of the numbers:
Value:                Entire League:            Centers:
Minimum                   36.8                      37.5
25th                          47                         46.7
Median                     50.3                      49.9
75th                          52.775                  52.4
Maximum                  61.2                      61.2
Mean                        49.79                    49.61
SD                            4.36                      4.397

While these are marginal differences, hockey is a game of inches, and the SD is larger for the centers due to a decreased sample size. What does this mean for the values of a "above average", "good", and "elite" Corsi value center? Any center with a Corsi For% higher than 49.61% would be considered "above average". Staying consistent with the determination of "good" and "elite" values, (1 and 2 standard deviations above the mean, respectively), a "good" Corsi center would have a value of 54%. An "elite" Corsi center would have a value of 58.4%. This would mean, relatively, that there are 5 elite Corsi centers: Patrice Bergeron, Anze Kopitar, Tyler Tiffoli, Jonathan Toews, and Travis Zajac.

Let's look at Left Wings:


The first thing to note is the higher R squared value, which is 0.9602. This is due to two main reasons: lower sample size (105 players) and better correlation along the formula: y = -0.1591x+58.244. Let's get a better look at the numbers: the minimum value is 36.8%, the 25th percentile value is 45.7% the median is 50.95%, the 75th percentile is 53.15%, and the maximum value is 58.5%. The mean is 49.72% with a standard deviation of 4.84%. How does this compare with the rest of the league?
Adding on to the chart before.

A direct comparison of the numbers:
Value:                Entire League:            Centers:                LW:
Minimum                   36.8                      37.5                  36.8
25th                          47                         46.7                  45.7
Median                     50.3                      49.9                  50.95
75th                          52.775                  52.4                  53.15
Maximum                  61.2                      61.2                 58.5
Mean                        49.79                    49.61               49.72
SD                            4.36                      4.397               4.84

So, in general, Left Wings have a higher Corsi For% than centers, with a few exceptions, including the minimum, 25th percentile, and maximum values. How does this play out with the evaluation of "above average", "good", and "elite" Corsi Left Wingers? Continuing off the definitions from before, a Left Wing with a Corsi above 49.72% would be considered above average. A "good" left winger Corsi For% would be 54.56% or greater. An "elite" left winger Corsi For% would be 59.4% or greater. Along those parameters, there is only one "elite" left winger: Brad Marchand.

Let's look at Right Wingers now:


As with the previous graph, the first thing I will note is the correlation value (R squared) being 0.9437, which is slightly lower than the Left Winger group, but still a very positive correlation. The formula for the equation depicted in this distribution is y = -0.1337x+57.458.
Let's take a closer look at some of the numbers:
The minimum value is 38.4%, the 25th percentile value is 47.2%, the median value is 50.3%, the 75th percentile is 52.3%, and the maximum value is 60.6%. The mean is 49.8% with a standard deviation of 4.33%. Let's add to the chart before.

A direct comparison of the numbers:

Value:                Entire League:            Centers:                LW:              RW:
Minimum                   36.8                      37.5                  36.8              38.4
25th                          47                         46.7                  45.7              47.2
Median                     50.3                      49.9                  50.95            50.3
75th                          52.775                  52.4                  53.15            52.3
Maximum                  61.2                      61.2                 58.5               60.6
Mean                        49.79                    49.61               49.72             49.8
SD                            4.36                      4.397               4.84               4.33

The Mean for the Corsi For% is higher for Right Wingers than any other group of the data at this point, but the median is lower than that of the Left Wingers.

What does this constitute for "above average", "good" and "elite" Corsi Right Wingers? Continuing the current trend, a Right Winger with a Corsi For% higher than 49.8% would be considered "above average". A "good" Right Winger, in Corsi, would have a value of 54.13% or greater. An "elite" Right Winger, in Corsi, would have a value of 58.46% or greater. This would mean there are 3 "elite" Right Winger Corsi values: Justin Williams, Jaromir Jagr,and Reilly Smith.

Finally, let's look at the defensemen:

There is not as strong of a correlation for the line of best fit in this distribution plot. The R squared value is 0.9456, which is lower than that of the Left Wing graph. The formula, however, for the plot is y =-0.0693x+56.711.
Let's look at some other numbers: The minimum value is 40.1%, the 25th percentile value is 47.65%, the median value is 50.2%, the 75th percentile value is 52.55%, and the maximum value is 61.1%. The mean is 49.92%, with a standard deviation of 4.01%.
Let's finalize the chart from before.

A direct comparison of the numbers:

Value:                Entire League:            Centers:                LW:              RW:              D:
Minimum                   36.8                      37.5                  36.8              38.4              40.1
25th                          47                         46.7                  45.7              47.2              47.65
Median                     50.3                      49.9                  50.95            50.3              50.2
75th                          52.775                  52.4                  53.15            52.3              52.55
Maximum                  61.2                      61.2                 58.5               60.6              61.1
Mean                        49.79                    49.61               49.72             49.8              49.92
SD                            4.36                      4.397               4.84               4.33              4.01

As one may have expected, the Defensemen have, just about, the highest overall Corsi For values, in comparison to the rest of the league and other positions.

Finally, for the "above average", "good", and "elite" corsi defensemen ratings: A Defenseman with a Corsi For value higher than 49.92% would be considered "above average". A "good" Corsi For Defenseman would have a value of 53.93% or greater. An "elite" Corsi Defenseman would have a value of 57.94% or greater. This means there are 4 "elite" Corsi Defensemen in the league: Jake Muzzin, Michal Rozsival, Drew Doughty, and Marc-Edouard Vlasic.

This is a lot to take in, so I will put in a chart what we have discussed:

Position "Above Average" "Good"  "Elite"
C 49.61 54 58.4
L 49.72 54.56 59.4
R 49.8 54.13 58.46
D 49.92 53.93 57.94
Overall 49.97 54.15 58.51

I will conclude by saying that contrary to popular belief, to be an "above average" Corsi skater does not mean one has to have a Corsi value over 50%, as proven by last year's possession statistics.


Thanks for reading! I gathered all the above information from ExtraSkater.com, and compiled them myself.

Please follow my Twitter account @DTJ_AHockeyBlog and give me some feedback!

1 comment:

  1. Why would defensemen have a higher Corsi For %? Isn't their main job defense, and not throwing the puck at the net?

    ReplyDelete