Category Archives: infovis

Assigning credit to Team USA

During the NBA Finals, I made an effort to estimate player contributions to the final score, using Model-Estimated value, and a metric which I unimaginatively called “Credit.” MEV is a linear-weighting player productivity measure (read about it here), and Credit (which I’ve modified somewhat since the Finals coverage), attempts to divide credit (or blame) for a team’s success among individual players:

Player Credit = Player MEV / Team Total MEV * (Team Points / (Team Points + Opponent Points))

This way, total Credit for all players on both teams sums to one in every game, and players on teams that win by a lot are allocated more Credit to divide amongst themselves, whereas in tight games, each team has closer to 0.50 Credits to attribute. In the spirit of the upcoming Olympic Games, I hope to return to semi-regular coverage, though not necessarily in the form of any actual posting. Rather, I will endeavor to update, after each game leading up to and played in Beijing, MEV and Credit statistics for each member of Team USA. Each game’s statistics, as well as cumulative stats can be found here:

http://bit.ly/teamusa

I hope you find it useful and insightful over the next few weeks.

Mr. Consistency

Who are the most consistent scorers in the NBA? This is a question of some interest for those who participate in fantasy leagues, as consistency might be a virtue in determining the value of a player on your roster. For various reasons, a player might be worth more to you if they score 20 points every game, rather than alternate between 10 and 30 every other game. Further, some measure of consistency may highlight a player’s ability to impose their will on a game: a player able to get his scoring in, regardless of the opposition, could be said to be more of a game-defining player.

I’ve managed to estimate, for players since the 86-87 season, each individual’s mean points per 48 minutes, as well as the standard deviation of said statistic, and thus the coefficient of variation (sd/mean) and 95% confidence interval. Here’s a spreadsheet of the top (634) players in the league, by mean pts/48, sorted by coefficient of variation. Thus, the players at top could be said, in some way, to be more consistent scorers than those at the bottom.

Most consistent scorers, 1986-2008

Below is another way to view the same question. Using each player’s mean and standard deviation pts/48, along with the sample size, we can construct a 95% confidence interval for our estimate of their true mean. In the graphic linked below, each player is ranked by their mean pts/48, and the x-axis indicates how they fare under this measure of scoring. Each mean is surrounded by a line indicating the 95% confidence interval. This means, essentially, that we can be 95% sure that the player is within the span of their colored line. For players with smaller samples or greater variance, the error bars will be wider.

NBA Pts/48 min means with error bars

As you can see, some players have no error bars at all–this means that they only have one observation. Others’ error bars go down past zero. This means that we can be 95% sure that their mean pts/48 is in a range that includes zero, which doesn’t tell us very much. Anyway, here is the same graphic, for the 2007-08 season only:

Note that Carl Landry (#73) has a greater variance than most players around him, but he ranks as a better per-48 scorer than Shaquille O’Neal.

Finally, here’s a regular-season 2007-08 graphic for players’ MEV (or model-estimated value, using regression-derived regression weights like those seen here). Landry does even better here (18th), in terms of his mean, but his confidence interval is very large. This estimate suggests, though, that at worst, he’s about as good as Odom, Andre Miller, and Kirilenko; while at best, he is in rarified air. Keep in mind that this is still just a 95% confidence interval, so statistically, there’s still a 1 in 20 chance the true mean isn’t even in this interval. All should be taken with a grain of salt. One of the things I like most about this presentation is that it’s a per-minute stat, which controls for playing time (although not pace), but still reminds us that estimates for those players with little playing time should be taken with large grains of salt, and might not really mean much of anything. Josh McRoberts, for example, is probably not the 406th, much less the 6th, most valuable player in the NBA, even though his simple arithmetic mean indicates as much–his confidence interval reminds us of this, while maintaining the simple ordering.

I suppose this is also the public debut of any sort of official MEV ordering for 2007-08. I’d be interested to hear what people thought about this… this is something similar to Berri’s estimates, but I think the weightings are a little more appropriate. Let me know in the comments if they seem, at least, per-minute, to be reasonable estimates and orderings of player value.

BoxScores: Player contributions to team success

Note: Since this post was published, the Winshares formula has undergone some revisions of some substantive import, as well as a renaming. To see the most current iteration and accurate tables and graphs, please see the BoxScores page.

This post is a lengthy discussion of the theory and methodology behind the Winshares player value metric. If you are already familiar enough with Winshares, or are impatient, read the “In brief” section just below, and then you might want to skip ahead to the payoff graphics at the very end of this post. As always, comments and criticisms are encouraged!

In brief

Winshares are a statistic developed to estimate a player’s value in terms of wins. Combining individual statistics with team performance, Winshares allocate credit for team wins according to each team member’s contributions to team total production. As of the end of the 2007-08 regular season, Winshares are calculated as follows:

winshr = (val / team val) * team wins

val = pts – fgx*0.5603802 – ftx*0.9345311 + as*0.7697530 + or*0.8709732 + dr*0.7111727 + st*0.9190908 + bk*0.9495596 – to*0.8473544 – pf*0.7729732

Motivation

Why create yet another statistic that attempts to reduce all of player value to one number? Especially when there are so many other good and widely accepted measures already in use? Because the theory is sound, the operationalization is elegant, and the results appear valid.

Why use boxscore stats, ignoring plus/minus and everything that modern science now knows about possessions and efficiency, especially since defense is so poorly captured and other statistics, like assists, are arbitrary? Because boxscore stats go back to the beginning of professional basketball. Plus/minus is extremely data-intensive to calculate, and we have no way of getting that kind of data for most historical games. I’m ignoring possessions, and not emphasizing defense, because it is my belief that comparing one player’s boxscore stats to those of his team gives a reasonable estimate of player contributions–sometimes overestimating, other times underestimating, but on average, getting it approximately right. Mostly, though, calculating Winshares is possible as long as the same stats are tracked for all players on a team, and we know how many times the team won–meaning it can be applied very generally.

Why even try to use statistics to measure player value? You can’t capture that with a number! There is much to be said on both sides of this issue. I am of the opinion that statistics ought to be considered within a larger context of other data, qualitative and quantitative. However, I do feel strongly that numbers have a lot to tell us–they allow us the hope of greater objectivity, and therefore possibly less subjective, more accurate assessments. When applied identically to all players, Winshares will adjudicate “fairly,” paying no attention to max contracts, shoe endorsements, nicknames, or “intangibles.” Intangibles are tricky–they may indeed be part of player value, but they are also, by definition immeasurable, and may therefore expand to fill the role required of them? Was your favorite player not voted league MVP? Certainly they failed to consider his intangibles, which would have easily put him over the top…

Why are Winshares measured in that specific way? Don’t you know that linear weights are no good, or that assists are worth much more than you give them credit for? Read on…

Theory

Imagine a cooperative grocery store, owned by those who work there. At the end of one year, the store’s revenues exceed its expenditures by a large margin, and the workers are to be paid out of this surplus. One concept of fairness might dictate that a worker who worked p% of the total man-hours for that year ought to receive p% of the surplus. Arguably, he contributed p% of whatever effort determined whether or not the store would succeed, and should be rewarded accordingly. A worker working a large number of hours could be said to have contributed more to the store’s success or failure than another who only worked one shift a month–if the store profits by a large margin, that employee should receive a larger share of the windfall, just as if the store loses money, that employee should be held culpable for a larger share of the deficit.

Now imagine another similar store competing in the same market. Its surplus at the end of the year is twice that of the first store. Is it possible to compare the value, in terms of surplus, of employees from the two different stores? I would argue that it is possible: if pay is allocated in the same manner in both stores, with worker i in store j receiving payment in proportion to his labor contribution, the worker who receives the highest paycheck is the most valuable. That is, if pay is equal to worker man-hours over store total man-hours times store surplus, we can compare employees across any two firms in the same market.

But wait–what if some employees are more efficient workers than others? What if Alice can generate three times the revenue that Bob can generate in the same number of hours? Doesn’t our payment formula then overpay Bob and under-reward Alice, and doesn’t this complicate yet again the comparison across firms? Yes it does, and so we might try to find better measures of worker contributions to the surplus. Perhaps we could keep statistics on the number of cans shelved, or the number of transactions tendered, or the number of smiles flashed–if we could figure out even just the relative value of each of these things (that is, not necessarily how they each translate into surplus, but whether one smile is worth two cans shelved, etc.), then we are back on track. It doesn’t matter whether or not we can measure exactly how much revenue is brought in by each additional shelve stocked (although this would be interesting and useful), but if we know that it’s worth more (by some scalar factor) to clean the bathroom than it is to check receipts at the door, we can still estimate each workers contribution to the total amount of valuable work being done at the store.

This analogy carries over very well to sports, and specifically here, to basketball. A player who plays fully 1/5th of total team minutes played (that is 48 minutes per game for 82 games) ought to be credited with approximately 1/5th of his team’s success or failure–both of which can be measured in terms of wins. Using minutes to assess contributions runs into the same problem as in the stores above–they say nothing about efficiency–and as such, it is useful to find other statistics that more accurately estimate contributions to team success. The statistics employed in Winshares are boxscore stats, such as points, rebounds, assists, missed shots, etc. These are imperfect measures, but to the extent their relative value can be assessed, they may be useful in estimating each player’s contribution.

Calculation

Unfortunately, this relative evaluation is very difficult. It is often claimed by more “sophisticated” observers of the game that most fans fail to look past point-per-game numbers, giving infinitely more weight to scoring than to any other contributions. Yet, it is exceedingly difficult to identify just what the appropriate weights might be. Multiple regression analysis yields somewhat unsatisfactory results when applied in a straightforward manner–typically finding, for example, that offensive rebounds are actually detrimental to team success. Other work, including that done by Berri and Hollinger, is much more thorough, but leaves something to be desired (a topic which has been covered better elsewhere than can be possibly done by this author in this exposition).

As for Winshares, it would be disingenuous to claim that the ideal and true set of values has been found, but it is my belief that the reasoning is sound, and the results pass the “laugh test,” that is, given a subjective assessment of the sport, the relative importance of each boxscore statistic seems to be, at the very least, in the right order.

To identify the weights used, we may begin with a simple but strong assumption: the most valuable “good things” are those that opponents are most resistant to allowing, and thus are relatively rare, while the most detrimental “bad things” are those that a player is most trying to avoid, and thus are similarly relatively rare. With this in mind, I present counting sums for each of 8? boxscore counting stats from 1979-80 through 2007-08 (which I call the Modern era, characterized by the introduction of the three point shot to NBA play):

pts fgx* ftx* as or dr st bk to pf
6384067 2806562 417958 1469912 823716 1843893 516530 322015 974500 1449354

* field goals missed and free throws missed

Dividing each of these totals by the sum of the totals (17,008,507), we arrive at the following frequencies:

pts fgx ftx as or dr st bk to pf
0.37535 0.16501 0.0246 0.08642 0.0484 0.10841 0.0304 0.0189 0.0573 0.08521

Normalizing these frequencies to that of points, we get:

pts fgx ftx as or dr st bk to pf
1 0.43962 0.0655 0.23025 0.129 0.28883 0.0809 0.0504 0.1526 0.22703

Then, subtract each of the above from 1, so we are placing more weight on the rarer occurances, and set the points coefficient to 1, because the ultimate aim of all defense is to prevent scoring, and the ultimate aim of all offense is to score:

pts fgx ftx as or dr st bk to pf
1 0.56038 0.9345 0.76975 0.871 0.71117 0.9191 0.9496 0.8474 0.77297

Assign positivity and negativity according to whether each is helpful or deleterious to team success, and we arrive at a set of scalars for estimating valuable contributions (often abbreviated val):

val = pts – fgx*0.5603802 – ftx*0.9345311 + as*0.7697530 + or*0.8709732 + dr*0.7111727 + st*0.9190908 + bk*0.9495596 – to*0.8473544 – pf*0.7729732

Any player’s val less than zero is then set to zero, but val is rarely a large negative number. Compared to the difficulty of valuable contribution assessment, the final steps in Winshare calculation are extremely simple: merely find each player’s percent contribution to his team’s total sum of valuable contributions from all players, and multiply this by team wins:

winshr = (val / team val) * team wins

We are left with an estimate of individual player value that combines individual contributions and team success, and allocates the most credit to those players who did the most to win the most. There is just one adjustment made to allow comparisons across all NBA seasons: for seasons prior to the official distinction between offensive and defensive rebounds, the formula is adjusted to incorporate total rebounds in their stead.

Discussion

The first thing to note is that as we apply the formula increasingly further back in time, we might become somewhat less certain of its absolute accuracy as the boxscore statistics on which it is based drop from the official record. Thus, for the very earliest years of the BAA, we might not be as confident in our estimate as for most years since, but the results are still very compelling, and seem to hold up to scrutiny despite the relative dearth of data. One of the merits of Winshares as a measure is that it is relatively flexible across a variety of situations, relying as it does on player percent contributions, which can almost always be measured in some manner.

Another caveat is to bear in mind that Winshares is a season-cumulative statistic, and so the ceiling varies by the number of games played in a season. Winshares for the strike-shortened season of 1998-99 are much lower than other contemporary seasons, due to the fact that all teams won fewer games than they normally would have. Adjustments can easily be made, however, by finding per-game or per-minute Winshare rates, and making comparisons at that level. This helps, too, in determining the impact of an injured player, given that he has played fewer games. However, the initial impetus for constructing Winshares was to estimate player value in terms of wins, and this is best done on a season-cumulative scale.

One thing done relatively poorly by Winshares in its current iteration is measurement of the value of players traded during the season. To do this completely accurately, it would be useful to isolate only the games the player appeared in for each of his several teams, looking at individual statistics and team wins within those sub-season units. However, this sort of analysis requires data not generally available in convenient form, and truly, the logical extension of this idea is fairly well captured by the plus/minus statistic. As it stands, Winshares still does a relatively good job (subjectively assessed) in measuring traded players’ value, but it is something worth noting.

Winshares in application

Often understanding is best achieved through application, and so I present

The Top 1,000 Winshare Seasons

covering the NBA, ABA, and BAA from 1946-2008. Keep in mind the above caveats about data availability, especially for seasons prior to 1951-52. In a similar vein, here is a list of

The Top 100 Winshare Careers

again, this is cumulative across the entirety of each player’s career, and so players with longevity are advantaged. I have included games played in this listing, to allow the reader to make his or her own adjustments.

Finally, every player, every team played for, 2007-08 season.

Geometric representation

One of the more useful ways to conceptualize Winshares is as player percent valuable contributions * team success. This has a particularly interesting expression in geometric terms, where Winshares can be thought of as the area of the rectangle created by multiplying valpct by team wins. The following series of visualizations depicts Winshares as a geometric comparison of player value. The color scheme is based on playing style–more detail on this classification may be found here.

2007-08 NBA: Chris Paul edges out Kobe Bryant as most valuable player according to Winshares, Kevin Garnett and Paul Pierce turn in stellar seasons for the Celtics, and LeBron James carries a huge load for his team, and is rewarded in terms of Winshares, if not in post-season success.

1986-87 NBA: A season featuring more all-time greats than perhaps any other (as noted here), we see Larry Bird and Magic Johnson at the height of their rivalry, Michael Jordan and Hakeem Olajuwon coming into their own, and too many other star players to even mention.

1971-72 NBA & ABA (combined): Classic Lakers and Celtics teams, a young Dr. J, Kareem’s greatest year, an almost-as-great year from Artis Gilmore, and countless other NBA past greats.

Sacramento Kings Franchise History: This storied franchise didn’t quite make the playoffs in a very competitive 2007-08 Western Conference, but its history is littered with greats such as Oscar Robertson and Chris Webber.

Choosing the MVP, geometrically

To begin, here are is my pick/prediction for the 2008 NBA MVP award: Chris Paul of the New Orleans Hornets. Second most valuable is Kobe Bryant, followed by LeBron James and Paul Pierce. How did I decide this? Read on…

I have discussed the concept of Winshares previously in this space, and I believe that this measure is the most parsimonious and theoretically satisfying way to estimate player value. If you are unfamiliar with the construction, here is the formula:

  • valuable contributions = pts + as*2 + tr + st + bk – to
  • winshares = (valuable contributions / team valuable contributions) * team wins

The very simple motivating theory is that each player is responsible for some fraction of his team’s success (and here I define success as winning, plain and simple–value is a separate concept from quality or talent, and value in athletics is commonly gauged by game outcomes and the contribution of individuals thereto). The better the player doing the contributing, the more successful the team, and so contributions should be weighted by team success to reward those players whose efforts result in winning.

Picture a team with one player who contributes substantially more than his teammates (say, Minnesota with Al Jefferson, or Cleveland with LeBron James). It stands to reason that win or lose, that player deserves a large share of the credit for that team’s outcomes. Now picture a team for which valuable contributions are more evenly made (say, Chicago, Sacramento, or Boston). It similarly stands to reason that credit for the success of those teams ought to be more evenly attributed to the several players who contribute.

This means that a great player doing all the work for an otherwise very poor team should be worth about the same amount, in terms of wins, as a great player doing a smaller part of the work for an otherwise very good team. This makes sense, both are great players, so both should be able to generate similar levels of success. LeBron James should be approximately as valuable as Kevin Garnett, since although the quality of their teammates is different, so is the amount they are required to contribute to their teams’ success.

So this is how I arrived at my formulation of player value: essentially add up all the good things a player has done for his team, and divide that by the total number of good things his team did. Multiply this percentage by the number of team wins, and there you have it–a per-player number of Winshares.

Now, there are several downsides to this operationalization. It takes no account of intangibles, or anything besides basic boxscore statistics. Kevin Garnett’s incredible intensity defensive leadership doesn’t count in this formulation (except as they are expressed in the boxscore–no doubt they contributed to team wins), so Paul Pierce comes through as slightly more valuable. Keep in mind, however, that this (Pierce for MVP) is what Garnett himself has told us all year long, and also keep in mind that this is not a per-minute or per-possession measure. Garnett played 2329 minutes to Pierce’s 2873, a substantial difference. Garnett had less time to add wins, even though he may have been more valuable per-minute than Pierce. However, for the MVP award, the focus ought to be on total value over the season, not player quality or efficiency. I am as big a Garnett fan as anyone, but no one would argue that injured Gilbert Arenas has been more valuable to the Wizards this year than Jamison or Butler, even if he is more valuable in some per-minute sense (though this is questionable).

The other problem with Winshares is that it does not take into account the specific possessions, minutes or games in which the valuable contributions came. I’m working on this, but in the meantime, you’ll want to use something like plus/minus figures if this is what you’re looking for. This disadvantage is most marked in attempting to measure the value of players traded during the season, but let’s face it–it is unlikely that an MVP-level player will be traded in the midst of an MVP-type season, and it’s even more unlikely that a player who was traded in the midst of the season would be in the running for MVP.

Any questions or critiques on this methodology are welcome, please feel free to leave a comment, but I submit that as far as elegance, parsimony, accessibility, and theoretical validity, Winshares as measured here are an optimal conceptualization of value.

After all that, here is the payoff: I’ve constructed a visualization depicting each player’s value in Winshares: their percent of valuable contributions is depicted on the vertical axis, and team success along the horizontal. Multiplying these two figures together results in Winshares, and each player is listed with their Winshare value and represented as a rectangle, the area of which is exactly proportional to his value. (Color is derived from my favorite way to capture playing type–the RGB scorer/perimeter/interior quasi-trichotomy.)

In a new twist, I’ve got it set up in a Google-Maps-style interface, so you can get as big a picture or as much detail as you’d like. Enjoy! (You’ll probably want to zoom in when the page first loads…)

Winshare Area Graph:

If that’s not the coolest, most straightforward way to envision basketball value, I don’t know what is!

The long and winding road… to a championship

Regarding tonight’s game: The road to the NCAA Championship, brought to you by ESPN. For the record, I have Kansas winning 71.07 to 70.74. I suppose this is just a statistical way of saying that it’s a toss-up, but I’m sticking to it. Enjoy the game!