Category Archives: metrics

Mr. Consistency

Who are the most consistent scorers in the NBA? This is a question of some interest for those who participate in fantasy leagues, as consistency might be a virtue in determining the value of a player on your roster. For various reasons, a player might be worth more to you if they score 20 points every game, rather than alternate between 10 and 30 every other game. Further, some measure of consistency may highlight a player’s ability to impose their will on a game: a player able to get his scoring in, regardless of the opposition, could be said to be more of a game-defining player.

I’ve managed to estimate, for players since the 86-87 season, each individual’s mean points per 48 minutes, as well as the standard deviation of said statistic, and thus the coefficient of variation (sd/mean) and 95% confidence interval. Here’s a spreadsheet of the top (634) players in the league, by mean pts/48, sorted by coefficient of variation. Thus, the players at top could be said, in some way, to be more consistent scorers than those at the bottom.

Most consistent scorers, 1986-2008

Below is another way to view the same question. Using each player’s mean and standard deviation pts/48, along with the sample size, we can construct a 95% confidence interval for our estimate of their true mean. In the graphic linked below, each player is ranked by their mean pts/48, and the x-axis indicates how they fare under this measure of scoring. Each mean is surrounded by a line indicating the 95% confidence interval. This means, essentially, that we can be 95% sure that the player is within the span of their colored line. For players with smaller samples or greater variance, the error bars will be wider.

NBA Pts/48 min means with error bars

As you can see, some players have no error bars at all–this means that they only have one observation. Others’ error bars go down past zero. This means that we can be 95% sure that their mean pts/48 is in a range that includes zero, which doesn’t tell us very much. Anyway, here is the same graphic, for the 2007-08 season only:

Note that Carl Landry (#73) has a greater variance than most players around him, but he ranks as a better per-48 scorer than Shaquille O’Neal.

Finally, here’s a regular-season 2007-08 graphic for players’ MEV (or model-estimated value, using regression-derived regression weights like those seen here). Landry does even better here (18th), in terms of his mean, but his confidence interval is very large. This estimate suggests, though, that at worst, he’s about as good as Odom, Andre Miller, and Kirilenko; while at best, he is in rarified air. Keep in mind that this is still just a 95% confidence interval, so statistically, there’s still a 1 in 20 chance the true mean isn’t even in this interval. All should be taken with a grain of salt. One of the things I like most about this presentation is that it’s a per-minute stat, which controls for playing time (although not pace), but still reminds us that estimates for those players with little playing time should be taken with large grains of salt, and might not really mean much of anything. Josh McRoberts, for example, is probably not the 406th, much less the 6th, most valuable player in the NBA, even though his simple arithmetic mean indicates as much–his confidence interval reminds us of this, while maintaining the simple ordering.

I suppose this is also the public debut of any sort of official MEV ordering for 2007-08. I’d be interested to hear what people thought about this… this is something similar to Berri’s estimates, but I think the weightings are a little more appropriate. Let me know in the comments if they seem, at least, per-minute, to be reasonable estimates and orderings of player value.

Credit where credit is due

Last night’s game was awesome–I admit a pro-Celtics bias on account of a pro-Kevin Garnett bias, but also and anti-Kobe Bryant bias. When claiming objectivity, it’s always a good thing to clear potential biases up front, to let the reader know who they’re dealing with… That said, Garnett had a pretty good game last night, and Bryant had a pretty bad game… Garnett would have been even better without that cold streak–the only player who missed more shots was Kobe, who missed 17! Using the results of a huge, awesome linear regression the results of which have not yet been made public (although it’s very similar to the coefficients seen here), I derive the following from last nights box scores:

Player min pts MEV PVC PtC Credit
Kevin Garnett 40.65 24 22.33 0.250 25.94 0.279
Paul Pierce 31.07 22 18.49 0.207 21.48 0.231
Pau Gasol 41.47 15 19.39 0.246 20.52 0.221
Derek Fisher 40.82 15 18.98 0.241 20.09 0.216
Ray Allen 43.95 19 16.85 0.189 19.57 0.210
Rajon Rondo 35.03 15 14.92 0.167 17.34 0.186
Lamar Odom 39.02 14 12.90 0.163 13.65 0.147
Kobe Bryant 41.87 24 11.06 0.140 11.71 0.126
Vladimir Radmanovic 17.05 5 9.86 0.125 10.43 0.112
Leon Powe 9.32 4 6.85 0.077 7.96 0.086
Sam Cassell 12.97 8 5.40 0.061 6.28 0.067
P.J. Brown 21.20 2 5.08 0.057 5.90 0.063
Sasha Vujacic 26.52 8 4.17 0.053 4.41 0.047
Ronny Turiaf 12.38 5 1.74 0.022 1.84 0.020
Jordan Farmar 7.18 2 0.86 0.011 0.91 0.010
Kendrick Perkins 23.02 1 0.63 0.007 0.73 0.008
Luke Walton 13.70 0 -0.05 -0.001 -0.06 -0.001
James Posey 22.80 3 -1.39 -0.016 -1.61 -0.017
Totals 480 186 168.05 2.000 187.08 2.012

MEV is the term for model-estimated value or point difference created, using only the regression weights. PVC is percent of valuable contributions, which is each player’s part of total team MEV. PtC is points created, which scales MEV values according to actual team and opponent scoring, to roughly account for those factors unmeasured by the box score, and Credit is, essentially, the amount of a win each player should be credited for. MEV and PtC are intended to account for both offensive and defensive contributions, that is, the player’s contribution to his own team’s scoring, and his defense preventing his opponent’s scoring. Boston’s total team Credit was 1.114, and LA’s was 0.898, based on the number of points each scored. It appears as though Boston was able to do to Kobe what they did in the regular season. Note that Posey, despite a timely three, actually hurt his team some: his two turnovers and three personal fouls effectively cancelled out his two steals, while his two defensive rebounds and three points could not compensate sufficiently for four missed shots. However, this is only based on box scores… he may have had tremendous unmeasured defense which I cannot capture, since his plus/minus was +3. It’s interesting to compare my metrics with plus/minus figures: Kobe was -13 for the game…

BoxScores: Player contributions to team success

Note: Since this post was published, the Winshares formula has undergone some revisions of some substantive import, as well as a renaming. To see the most current iteration and accurate tables and graphs, please see the BoxScores page.

This post is a lengthy discussion of the theory and methodology behind the Winshares player value metric. If you are already familiar enough with Winshares, or are impatient, read the “In brief” section just below, and then you might want to skip ahead to the payoff graphics at the very end of this post. As always, comments and criticisms are encouraged!

In brief

Winshares are a statistic developed to estimate a player’s value in terms of wins. Combining individual statistics with team performance, Winshares allocate credit for team wins according to each team member’s contributions to team total production. As of the end of the 2007-08 regular season, Winshares are calculated as follows:

winshr = (val / team val) * team wins

val = pts – fgx*0.5603802 – ftx*0.9345311 + as*0.7697530 + or*0.8709732 + dr*0.7111727 + st*0.9190908 + bk*0.9495596 – to*0.8473544 – pf*0.7729732

Motivation

Why create yet another statistic that attempts to reduce all of player value to one number? Especially when there are so many other good and widely accepted measures already in use? Because the theory is sound, the operationalization is elegant, and the results appear valid.

Why use boxscore stats, ignoring plus/minus and everything that modern science now knows about possessions and efficiency, especially since defense is so poorly captured and other statistics, like assists, are arbitrary? Because boxscore stats go back to the beginning of professional basketball. Plus/minus is extremely data-intensive to calculate, and we have no way of getting that kind of data for most historical games. I’m ignoring possessions, and not emphasizing defense, because it is my belief that comparing one player’s boxscore stats to those of his team gives a reasonable estimate of player contributions–sometimes overestimating, other times underestimating, but on average, getting it approximately right. Mostly, though, calculating Winshares is possible as long as the same stats are tracked for all players on a team, and we know how many times the team won–meaning it can be applied very generally.

Why even try to use statistics to measure player value? You can’t capture that with a number! There is much to be said on both sides of this issue. I am of the opinion that statistics ought to be considered within a larger context of other data, qualitative and quantitative. However, I do feel strongly that numbers have a lot to tell us–they allow us the hope of greater objectivity, and therefore possibly less subjective, more accurate assessments. When applied identically to all players, Winshares will adjudicate “fairly,” paying no attention to max contracts, shoe endorsements, nicknames, or “intangibles.” Intangibles are tricky–they may indeed be part of player value, but they are also, by definition immeasurable, and may therefore expand to fill the role required of them? Was your favorite player not voted league MVP? Certainly they failed to consider his intangibles, which would have easily put him over the top…

Why are Winshares measured in that specific way? Don’t you know that linear weights are no good, or that assists are worth much more than you give them credit for? Read on…

Theory

Imagine a cooperative grocery store, owned by those who work there. At the end of one year, the store’s revenues exceed its expenditures by a large margin, and the workers are to be paid out of this surplus. One concept of fairness might dictate that a worker who worked p% of the total man-hours for that year ought to receive p% of the surplus. Arguably, he contributed p% of whatever effort determined whether or not the store would succeed, and should be rewarded accordingly. A worker working a large number of hours could be said to have contributed more to the store’s success or failure than another who only worked one shift a month–if the store profits by a large margin, that employee should receive a larger share of the windfall, just as if the store loses money, that employee should be held culpable for a larger share of the deficit.

Now imagine another similar store competing in the same market. Its surplus at the end of the year is twice that of the first store. Is it possible to compare the value, in terms of surplus, of employees from the two different stores? I would argue that it is possible: if pay is allocated in the same manner in both stores, with worker i in store j receiving payment in proportion to his labor contribution, the worker who receives the highest paycheck is the most valuable. That is, if pay is equal to worker man-hours over store total man-hours times store surplus, we can compare employees across any two firms in the same market.

But wait–what if some employees are more efficient workers than others? What if Alice can generate three times the revenue that Bob can generate in the same number of hours? Doesn’t our payment formula then overpay Bob and under-reward Alice, and doesn’t this complicate yet again the comparison across firms? Yes it does, and so we might try to find better measures of worker contributions to the surplus. Perhaps we could keep statistics on the number of cans shelved, or the number of transactions tendered, or the number of smiles flashed–if we could figure out even just the relative value of each of these things (that is, not necessarily how they each translate into surplus, but whether one smile is worth two cans shelved, etc.), then we are back on track. It doesn’t matter whether or not we can measure exactly how much revenue is brought in by each additional shelve stocked (although this would be interesting and useful), but if we know that it’s worth more (by some scalar factor) to clean the bathroom than it is to check receipts at the door, we can still estimate each workers contribution to the total amount of valuable work being done at the store.

This analogy carries over very well to sports, and specifically here, to basketball. A player who plays fully 1/5th of total team minutes played (that is 48 minutes per game for 82 games) ought to be credited with approximately 1/5th of his team’s success or failure–both of which can be measured in terms of wins. Using minutes to assess contributions runs into the same problem as in the stores above–they say nothing about efficiency–and as such, it is useful to find other statistics that more accurately estimate contributions to team success. The statistics employed in Winshares are boxscore stats, such as points, rebounds, assists, missed shots, etc. These are imperfect measures, but to the extent their relative value can be assessed, they may be useful in estimating each player’s contribution.

Calculation

Unfortunately, this relative evaluation is very difficult. It is often claimed by more “sophisticated” observers of the game that most fans fail to look past point-per-game numbers, giving infinitely more weight to scoring than to any other contributions. Yet, it is exceedingly difficult to identify just what the appropriate weights might be. Multiple regression analysis yields somewhat unsatisfactory results when applied in a straightforward manner–typically finding, for example, that offensive rebounds are actually detrimental to team success. Other work, including that done by Berri and Hollinger, is much more thorough, but leaves something to be desired (a topic which has been covered better elsewhere than can be possibly done by this author in this exposition).

As for Winshares, it would be disingenuous to claim that the ideal and true set of values has been found, but it is my belief that the reasoning is sound, and the results pass the “laugh test,” that is, given a subjective assessment of the sport, the relative importance of each boxscore statistic seems to be, at the very least, in the right order.

To identify the weights used, we may begin with a simple but strong assumption: the most valuable “good things” are those that opponents are most resistant to allowing, and thus are relatively rare, while the most detrimental “bad things” are those that a player is most trying to avoid, and thus are similarly relatively rare. With this in mind, I present counting sums for each of 8? boxscore counting stats from 1979-80 through 2007-08 (which I call the Modern era, characterized by the introduction of the three point shot to NBA play):

pts fgx* ftx* as or dr st bk to pf
6384067 2806562 417958 1469912 823716 1843893 516530 322015 974500 1449354

* field goals missed and free throws missed

Dividing each of these totals by the sum of the totals (17,008,507), we arrive at the following frequencies:

pts fgx ftx as or dr st bk to pf
0.37535 0.16501 0.0246 0.08642 0.0484 0.10841 0.0304 0.0189 0.0573 0.08521

Normalizing these frequencies to that of points, we get:

pts fgx ftx as or dr st bk to pf
1 0.43962 0.0655 0.23025 0.129 0.28883 0.0809 0.0504 0.1526 0.22703

Then, subtract each of the above from 1, so we are placing more weight on the rarer occurances, and set the points coefficient to 1, because the ultimate aim of all defense is to prevent scoring, and the ultimate aim of all offense is to score:

pts fgx ftx as or dr st bk to pf
1 0.56038 0.9345 0.76975 0.871 0.71117 0.9191 0.9496 0.8474 0.77297

Assign positivity and negativity according to whether each is helpful or deleterious to team success, and we arrive at a set of scalars for estimating valuable contributions (often abbreviated val):

val = pts – fgx*0.5603802 – ftx*0.9345311 + as*0.7697530 + or*0.8709732 + dr*0.7111727 + st*0.9190908 + bk*0.9495596 – to*0.8473544 – pf*0.7729732

Any player’s val less than zero is then set to zero, but val is rarely a large negative number. Compared to the difficulty of valuable contribution assessment, the final steps in Winshare calculation are extremely simple: merely find each player’s percent contribution to his team’s total sum of valuable contributions from all players, and multiply this by team wins:

winshr = (val / team val) * team wins

We are left with an estimate of individual player value that combines individual contributions and team success, and allocates the most credit to those players who did the most to win the most. There is just one adjustment made to allow comparisons across all NBA seasons: for seasons prior to the official distinction between offensive and defensive rebounds, the formula is adjusted to incorporate total rebounds in their stead.

Discussion

The first thing to note is that as we apply the formula increasingly further back in time, we might become somewhat less certain of its absolute accuracy as the boxscore statistics on which it is based drop from the official record. Thus, for the very earliest years of the BAA, we might not be as confident in our estimate as for most years since, but the results are still very compelling, and seem to hold up to scrutiny despite the relative dearth of data. One of the merits of Winshares as a measure is that it is relatively flexible across a variety of situations, relying as it does on player percent contributions, which can almost always be measured in some manner.

Another caveat is to bear in mind that Winshares is a season-cumulative statistic, and so the ceiling varies by the number of games played in a season. Winshares for the strike-shortened season of 1998-99 are much lower than other contemporary seasons, due to the fact that all teams won fewer games than they normally would have. Adjustments can easily be made, however, by finding per-game or per-minute Winshare rates, and making comparisons at that level. This helps, too, in determining the impact of an injured player, given that he has played fewer games. However, the initial impetus for constructing Winshares was to estimate player value in terms of wins, and this is best done on a season-cumulative scale.

One thing done relatively poorly by Winshares in its current iteration is measurement of the value of players traded during the season. To do this completely accurately, it would be useful to isolate only the games the player appeared in for each of his several teams, looking at individual statistics and team wins within those sub-season units. However, this sort of analysis requires data not generally available in convenient form, and truly, the logical extension of this idea is fairly well captured by the plus/minus statistic. As it stands, Winshares still does a relatively good job (subjectively assessed) in measuring traded players’ value, but it is something worth noting.

Winshares in application

Often understanding is best achieved through application, and so I present

The Top 1,000 Winshare Seasons

covering the NBA, ABA, and BAA from 1946-2008. Keep in mind the above caveats about data availability, especially for seasons prior to 1951-52. In a similar vein, here is a list of

The Top 100 Winshare Careers

again, this is cumulative across the entirety of each player’s career, and so players with longevity are advantaged. I have included games played in this listing, to allow the reader to make his or her own adjustments.

Finally, every player, every team played for, 2007-08 season.

Geometric representation

One of the more useful ways to conceptualize Winshares is as player percent valuable contributions * team success. This has a particularly interesting expression in geometric terms, where Winshares can be thought of as the area of the rectangle created by multiplying valpct by team wins. The following series of visualizations depicts Winshares as a geometric comparison of player value. The color scheme is based on playing style–more detail on this classification may be found here.

2007-08 NBA: Chris Paul edges out Kobe Bryant as most valuable player according to Winshares, Kevin Garnett and Paul Pierce turn in stellar seasons for the Celtics, and LeBron James carries a huge load for his team, and is rewarded in terms of Winshares, if not in post-season success.

1986-87 NBA: A season featuring more all-time greats than perhaps any other (as noted here), we see Larry Bird and Magic Johnson at the height of their rivalry, Michael Jordan and Hakeem Olajuwon coming into their own, and too many other star players to even mention.

1971-72 NBA & ABA (combined): Classic Lakers and Celtics teams, a young Dr. J, Kareem’s greatest year, an almost-as-great year from Artis Gilmore, and countless other NBA past greats.

Sacramento Kings Franchise History: This storied franchise didn’t quite make the playoffs in a very competitive 2007-08 Western Conference, but its history is littered with greats such as Oscar Robertson and Chris Webber.

Choosing the MVP, geometrically

To begin, here are is my pick/prediction for the 2008 NBA MVP award: Chris Paul of the New Orleans Hornets. Second most valuable is Kobe Bryant, followed by LeBron James and Paul Pierce. How did I decide this? Read on…

I have discussed the concept of Winshares previously in this space, and I believe that this measure is the most parsimonious and theoretically satisfying way to estimate player value. If you are unfamiliar with the construction, here is the formula:

  • valuable contributions = pts + as*2 + tr + st + bk – to
  • winshares = (valuable contributions / team valuable contributions) * team wins

The very simple motivating theory is that each player is responsible for some fraction of his team’s success (and here I define success as winning, plain and simple–value is a separate concept from quality or talent, and value in athletics is commonly gauged by game outcomes and the contribution of individuals thereto). The better the player doing the contributing, the more successful the team, and so contributions should be weighted by team success to reward those players whose efforts result in winning.

Picture a team with one player who contributes substantially more than his teammates (say, Minnesota with Al Jefferson, or Cleveland with LeBron James). It stands to reason that win or lose, that player deserves a large share of the credit for that team’s outcomes. Now picture a team for which valuable contributions are more evenly made (say, Chicago, Sacramento, or Boston). It similarly stands to reason that credit for the success of those teams ought to be more evenly attributed to the several players who contribute.

This means that a great player doing all the work for an otherwise very poor team should be worth about the same amount, in terms of wins, as a great player doing a smaller part of the work for an otherwise very good team. This makes sense, both are great players, so both should be able to generate similar levels of success. LeBron James should be approximately as valuable as Kevin Garnett, since although the quality of their teammates is different, so is the amount they are required to contribute to their teams’ success.

So this is how I arrived at my formulation of player value: essentially add up all the good things a player has done for his team, and divide that by the total number of good things his team did. Multiply this percentage by the number of team wins, and there you have it–a per-player number of Winshares.

Now, there are several downsides to this operationalization. It takes no account of intangibles, or anything besides basic boxscore statistics. Kevin Garnett’s incredible intensity defensive leadership doesn’t count in this formulation (except as they are expressed in the boxscore–no doubt they contributed to team wins), so Paul Pierce comes through as slightly more valuable. Keep in mind, however, that this (Pierce for MVP) is what Garnett himself has told us all year long, and also keep in mind that this is not a per-minute or per-possession measure. Garnett played 2329 minutes to Pierce’s 2873, a substantial difference. Garnett had less time to add wins, even though he may have been more valuable per-minute than Pierce. However, for the MVP award, the focus ought to be on total value over the season, not player quality or efficiency. I am as big a Garnett fan as anyone, but no one would argue that injured Gilbert Arenas has been more valuable to the Wizards this year than Jamison or Butler, even if he is more valuable in some per-minute sense (though this is questionable).

The other problem with Winshares is that it does not take into account the specific possessions, minutes or games in which the valuable contributions came. I’m working on this, but in the meantime, you’ll want to use something like plus/minus figures if this is what you’re looking for. This disadvantage is most marked in attempting to measure the value of players traded during the season, but let’s face it–it is unlikely that an MVP-level player will be traded in the midst of an MVP-type season, and it’s even more unlikely that a player who was traded in the midst of the season would be in the running for MVP.

Any questions or critiques on this methodology are welcome, please feel free to leave a comment, but I submit that as far as elegance, parsimony, accessibility, and theoretical validity, Winshares as measured here are an optimal conceptualization of value.

After all that, here is the payoff: I’ve constructed a visualization depicting each player’s value in Winshares: their percent of valuable contributions is depicted on the vertical axis, and team success along the horizontal. Multiplying these two figures together results in Winshares, and each player is listed with their Winshare value and represented as a rectangle, the area of which is exactly proportional to his value. (Color is derived from my favorite way to capture playing type–the RGB scorer/perimeter/interior quasi-trichotomy.)

In a new twist, I’ve got it set up in a Google-Maps-style interface, so you can get as big a picture or as much detail as you’d like. Enjoy! (You’ll probably want to zoom in when the page first loads…)

Winshare Area Graph:

If that’s not the coolest, most straightforward way to envision basketball value, I don’t know what is!

Put together enough lines and shapes, and eventually they’ll point to a winner

By way of a NCAA tournament championship game preview, I present to you Memphis vs. Kansas, head-to-head, sparkline edition, courtesy of our friends at ESPN. If you like that, you may enjoy some of the designer’s other work at https://arbitrarian.wordpress.com.

Basking in adulation or drowning in hate?

Which better applies to your favorite team? Using NCAA basketball data collected by Facebook, I’ve thrown together a scatterplot of the teams which elicit most passion (measured by number of opinions expressed), contrasted with the favorability with which each team is viewed. Unsurprisingly, several of the larger state schools rank among the top in terms of number of opinions expressed, and just as obviously, Duke elicits the greatest number of opinions. Princeton, Yale and Harvard all rank toward the bottom in terms of favorability, although this is likely not due to their fearsome basketball reputations. I have to feel sorry for the Bethune-Cookman Wildcats, who appear to have a small, but hateful, following. The most beloved team appears to be the Wake Forest Demon Deacons, followed closely by St. John’s Red Storm. Between Wake, NC State (also well-liked), UNC and Duke, North Carolina is well represented at the extremes. Enough prologue, Here’s the graphic:

NCAA Men’s Basketball Fans and Haters [pdf]

And, for those interested, here is a listing of teams by percent favorable opinions:

NCAA Men’s Basketball Favorability

I would love to see crosstabs for the fans/haters. If one wanted to operationalize “greatest rivalry” I think this would be an excellent way to do so.

The MVP debate, part II

We often think of players’ boxscore statistics in terms of cumulative sums or averages, but these statistics, while they tell us about prolificness and what one might expect from any player in a given game, tell us very little else about the player’s output. Consider three hypothetical players in an 82 game season: Player A scores 5 points in 41 of his games, and 35 points in the other 41 games, player B scores 20 points in each of the 82 games, and player C scores 19 points in 80 of his games, and scores 60 in each of the remaining two. Each of these players ends the season averaging 20 ppg, each scored a total of 1,640 points. However, there should be no question that they are very different players, even without considering non-scoring contributions. B is extremely consistent, C is pretty consistent but has rare scoring outbursts, and A is either a big threat or hardly a threat at all. (Please keep in mind that I could be doing the same with per-minute statistics–it’s just a little easier conceptually to discuss per-game stats, while making an equivalent point.) Opposing teams would need to plan differently when facing each of these three players, and their value to their own team is a function not only of their scoring average, but their entire scoring distribution. Since it is much easier to keep track of cumulative totals, and since the simple mean can be calculated by dividing total points (ast, reb, etc.) by total games, we have all been raised on means and sums–which are useful as far as they go, but don’t tell the whole story. So, into the plethora of other “modern” statistics, I would like to add several statistics that have been with us the entire time, but hidden behind season sums and means: the standard deviation, the geometric mean, and the distribution.

The standad deviation is a summary statistic like the mean, but it measures dispersion. Essentially, it attempts to capture the typical deviance from the mean of each data point. So, players whose per-game boxscore stats vary a lot from game-to-game will have a higher standard deviation than will players who are more consistently close to their own mean. Whether a high or low standard deviation is a good thing is a normative question, although I tend to think that consistency (indicated by a low standard deviation) is a good thing. Bear in mind also, that typically, the greater the mean, the more room there is for variance, and thus the more potential for a larger standard deviation. Thus, another statistic, the coefficient of variation, can be used to give an idea of variation while controlling for the magnitude of the mean.

The geometric mean is similar to the arithmetic mean, in that it is a measure of centrality. However, it seems to emphasize consistency more than does a simple arithmetic mean. Where the arithmetic mean is the sum of the data divided by the number of data points, the geometric mean is the product of the data exponentiated by the inverse of the number of data points. Thus, in our above example, each player has the same mean (20 ppg), but B has a geometric mean of 20, C’s is 19.54, and A’s is 13.23. According to the geometric mean, then, player A is valued almost exactly the same as player D, who scores 13 points in each of 63 games, and 14 points in every other game. Both of their g.means are around 13.23, but player A’s arithmetic mean is 20, while player D’s is 13.23. As such, the geometric mean, especially when presented alongside the arithmetic mean, may tell us even more about a player’s output.*

Finally, there is the entire distribution of per-game point totals. This encapsulates all of the information about a player’s production, because it is the player’s entire production. It’s not a numerical statistic, but can be represented as a graphic, or even (theoretically) an equation. The distribution is represents essentially the same thing as does a histogram or bar chart of each statistic’s frequency at each level of output. In the graphic below, I display each of four players’ distributions on six different per-game statistics. This should give the viewer a very complete idea of each players’ production. I also include the summary statistics I’ve described, which individually give some information about the distribution, and taken together represent a partial but informative view of player production.

mvpdensities.png

This graphic presents the output of four potential MVP candidates through about 60 games of this season. Note that LeBron James tops Kobe Bryant in arithmetic means across every category, and seems to be a more consistent scorer (on a per-game level, at least)… I hope you find this depiction of production useful and informative–please don’t hesitate to participate in the ongoing MVP debate (see this post).

* A note about geometric means: since a player might have zero points, or assists or blocks, etc. in any given game, there is the potential that this zero would “wipe out” their geometric mean for that statistic, making it relatively uninformative. Thus, I have replaced each instance of 0 with 0.9 — which penalizes the player for having a low figure, but maintains valuable information. This is probably not a perfect solution, but I’ve applied it consistently, so it should at least be “fair” in some sense. Let me know in the comments if there is a better way of doing this.