BoxScores

BoxScores: Individual contributions to team success

In brief

BoxScores are a statistic developed to estimate a player’s value in terms of wins. Combining individual statistics with team performance, BoxScores allocate credit for team wins according to each team member’s contributions to team total production. As of the end of the 2007-08 regular season, BoxScores are calculated as follows:

BoxScores = (val / team val) * team wins

val = pts – fgx*.994 – ftx*0.607 + as*1.130 + or*.988 + dr*0.605 + st*1.651 + bk*0.911 – to*1.550 – pf*0.221

Motivation

Why create yet another statistic that attempts to reduce all of player value to one number? Especially when there are so many other good and widely accepted measures already in use? Because the theory is sound, the operationalization is elegant, and the results appear valid.

Why use box score stats, ignoring plus/minus and everything that modern science now knows about possessions and efficiency, especially since defense is so poorly captured and other statistics, like assists, are arbitrary? Because box score stats go back to the beginning of professional basketball. Plus/minus is extremely data-intensive to calculate, and we have no way of getting that kind of data for most historical games. I’m ignoring possessions, and not emphasizing defense, because it is my belief that comparing one player’s box score stats to those of his team gives a reasonable estimate of player contributions–sometimes overestimating, other times underestimating, but on average, getting it approximately right. Mostly, though, calculating BoxScores is possible as long as the same stats are tracked for all players on a team, and we know how many times the team won–meaning it can be applied very generally.

Why even try to use statistics to measure player value? You can’t capture that with a number! There is much to be said on both sides of this issue. I am of the opinion that statistics ought to be considered within a larger context of other data, qualitative and quantitative. However, I do feel strongly that numbers have a lot to tell us–they allow us the hope of greater objectivity, and therefore possibly less subjective, more accurate assessments. When applied identically to all players, BoxScores will adjudicate “fairly,” paying no attention to max contracts, shoe endorsements, nicknames, or “intangibles.” Intangibles are tricky–they may indeed be part of player value, but they are also, by definition immeasurable, and may therefore expand to fill the role required of them? Was your favorite player not voted league MVP? Certainly they failed to consider his intangibles, which would have easily put him over the top…

Why are BoxScores measured in that specific way? Don’t you know that linear weights are no good, or that assists are worth much more than you give them credit for? Read on…

Theory

Imagine a cooperative grocery store, owned by those who work there. At the end of one year, the store’s revenues exceed its expenditures by a large margin, and the workers are to be paid out of this surplus. One concept of fairness might dictate that a worker who worked p% of the total man-hours for that year ought to receive p% of the surplus. Arguably, he contributed p% of whatever effort determined whether or not the store would succeed, and should be rewarded accordingly. A worker working a large number of hours could be said to have contributed more to the store’s success or failure than another who only worked one shift a month–if the store profits by a large margin, that employee should receive a larger share of the windfall, just as if the store loses money, that employee should be held culpable for a larger share of the deficit.

Now imagine another similar store competing in the same market. Its surplus at the end of the year is twice that of the first store. Is it possible to compare the value, in terms of surplus, of employees from the two different stores? I would argue that it is possible: if pay is allocated in the same manner in both stores, with worker i in store j receiving payment in proportion to his labor contribution, the worker who receives the highest paycheck is the most valuable. That is, if pay is equal to worker man-hours over store total man-hours times store surplus, we can compare employees across any two firms in the same market.

But wait–what if some employees are more efficient workers than others? What if Alice can generate three times the revenue that Bob can generate in the same number of hours? Doesn’t our payment formula then overpay Bob and under-reward Alice, and doesn’t this complicate yet again the comparison across firms? Yes it does, and so we might try to find better measures of worker contributions to the surplus. Perhaps we could keep statistics on the number of cans shelved, or the number of transactions tendered, or the number of smiles flashed–if we could figure out even just the relative value of each of these things (that is, not necessarily how they each translate into surplus, but whether one smile is worth two cans shelved, etc.), then we are back on track. It doesn’t matter whether or not we can measure exactly how much revenue is brought in by each additional shelve stocked (although this would be interesting and useful), but if we know that it’s worth more (by some scalar factor) to clean the bathroom than it is to check receipts at the door, we can still estimate each workers contribution to the total amount of valuable work being done at the store.

This analogy carries over very well to sports, and specifically here, to basketball. A player who plays fully 1/5th of total team minutes played (that is 48 minutes per game for 82 games) ought to be credited with approximately 1/5th of his team’s success or failure–both of which can be measured in terms of wins. Using minutes to assess contributions runs into the same problem as in the stores above–they say nothing about efficiency–and as such, it is useful to find other statistics that more accurately estimate contributions to team success. The statistics employed in Winshares are boxscore stats, such as points, rebounds, assists, missed shots, etc. These are imperfect measures, but to the extent their relative value can be assessed, they may be useful in estimating each player’s contribution.

Calculation

Unfortunately, this relative evaluation is very difficult. It is often claimed by more “sophisticated” observers of the game that most fans fail to look past point-per-game numbers, giving infinitely more weight to scoring than to any other contributions. Yet, it is exceedingly difficult to identify just what the appropriate weights might be. Other work, including that done by Berri and Hollinger, is much more thorough, but leaves something to be desired (a topic which has been covered better elsewhere than can be possibly done by this author in this exposition).

As for Winshares, it would be disingenuous to claim that the ideal and true set of values has been found, but it is my belief that the reasoning is sound, and the results pass the “laugh test,” that is, given a subjective assessment of the sport, the relative importance of each boxscore statistic seems to be, at the very least, in the right order.

Using a sample of individual game box scores from between 1986 and 2008, I ran a multiple regression analysis using point differential as the dependent variable, and all other possible statistics (non-blocked missed field goal attempts, missed free throws, assists, offensive and defensive rebounds, steals, blocked shots, non-stolen turnovers, and personal fouls) produced by both teams as the independent variables. All predictors were significant at the 0.05 level, and averaging the beneficial effect of own team’s stats and the harmful effect of opponents’ stats, gives us the coefficients used in our value calculation. The exception is the coefficient for assists, returned in the regression to be approximately 0.39, appears to drastically undervalue the contribution of those players making the assist. I ran another regression, of assists and unassisted field goals made (no intercept), on points per points possible (fgm*2+f3m)/(fga*2+f3a). The estimated coefficient for assists was larger, as one might expect, and so the coefficient used in the Winshares value calculation is the ratio of the assists coefficient to the unassisted field goals made coefficient, or about 1.130.

Assigning a value of one to every point scored, and using coefficients from the various regression models, we arrive at a set of scalars for estimating valuable contributions (often abbreviated val):

val = pts – fgx*0.9940236 – ftx*0.6066705 + as*1.1298004 + or*0.9877381 + dr*0.6045051 + st*1.6514982 + bk*0.9109706 – to*1.5498633 – pf*0.2212825

In terms of face validity, these coefficients appear to make sense. A missed field goal, offensive rebound, or block is valued approximately equal to one point, while missed free throws and defensive rebounds are worth about 3/5 of that–appropriate because not every missed free throw results in a turnover, and not as many defensive rebounds will result in scores as will offensive rebounds. Steals are worth somewhat more than the typical turnover, because while in both cases possession changes, steals often leave the ball-retaining team in better position to score than, say, traveling violations.

Any player’s val less than zero is then set to zero, but val is rarely a large negative number. Compared to the difficulty of valuable contribution assessment, the final steps in BoxScores calculation are extremely simple: merely find each player’s percent contribution to his team’s total sum of valuable contributions from all players, and multiply this by team wins:

BoxScores = (val / team val) * team wins

We are left with an estimate of individual player value that combines individual contributions and team success, and allocates the most credit to those players who did the most to win the most. There is just one adjustment made to allow comparisons across all NBA seasons: for seasons prior to the official distinction between offensive and defensive rebounds, the formula is adjusted to incorporate total rebounds in their stead.

Discussion

The first thing to note is that as we apply the formula increasingly further back in time, we might become somewhat less certain of its absolute accuracy as the box score statistics on which it is based drop from the official record. Thus, for the very earliest years of the BAA, we might not be as confident in our estimate as for most years since, but the results are still very compelling, and seem to hold up to scrutiny despite the relative dearth of data. One of the merits of BoxScores as a measure is that it is relatively flexible across a variety of situations, relying as it does on player percent contributions, which can almost always be measured in some manner.

Another caveat is to bear in mind that BoxScores is a season-cumulative statistic, and so the ceiling varies by the number of games played in a season. BoxScores for the strike-shortened season of 1998-99 are much lower than other contemporary seasons, due to the fact that all teams won fewer games than they normally would have. Adjustments can easily be made, however, by finding per-game or per-minute Winshare rates, and making comparisons at that level. This helps, too, in determining the impact of an injured player, given that he has played fewer games. However, the initial impetus for constructing Winshares was to estimate player value in terms of wins, and this is best done on a season-cumulative scale.

One thing done relatively poorly by BoxScores in its current iteration is measurement of the value of players traded during the season. To do this completely accurately, it would be useful to isolate only the games the player appeared in for each of his several teams, looking at individual statistics and team wins within those sub-season units. However, this sort of analysis requires data not generally available in convenient form, and truly, the logical extension of this idea is fairly well captured by the plus/minus statistic. As it stands, BoxScores still does a relatively good job (subjectively assessed) in measuring traded players’ value, but it is something worth noting.

BoxScores in application

Often understanding is best achieved through application, and so I present

The Top 1,000 BoxScore Seasons

covering the NBA, ABA, and BAA from 1946-2008. Keep in mind the above caveats about data availability, especially for seasons prior to 1951-52. In a similar vein, here is a list of

The Top 100 BoxScores Careers

again, this is cumulative across the entirety of each player’s career, and so players with longevity are advantaged. I have included games played in this listing, to allow the reader to make his or her own adjustments.

Finally, every player, every team played for, 2007-08 season.

Geometric representation

One of the more useful ways to conceptualize BoxScores is as player percent valuable contributions * team success. This has a particularly interesting expression in geometric terms, where Winshares can be thought of as the area of the rectangle created by multiplying valpct by team wins. The following series of visualizations depicts BoxScores as a geometric comparison of player value. The color scheme is based on playing style–more detail on this classification may be found here.

2007-08 NBA: Chris Paul edges out Kobe Bryant as most valuable player according to BoxScores, Kevin Garnett and Paul Pierce turn in stellar seasons for the Celtics, and LeBron James carries a huge load for his team, and is rewarded in terms of BoxScores, if not in post-season success.

1986-87 NBA: A season featuring more all-time greats than perhaps any other (as noted here), we see Larry Bird and Magic Johnson at the height of their rivalry, Michael Jordan and Hakeem Olajuwon coming into their own, and too many other star players to even mention.

1971-72 NBA & ABA (combined): Classic Lakers and Celtics teams, a young Dr. J, Kareem’s greatest year, an almost-as-great year from Artis Gilmore, and countless other NBA past greats.

Sacramento Kings Franchise History: This storied franchise didn’t quite make the playoffs in a very competitive 2007-08 Western Conference, but its history is littered with greats such as Oscar Robertson and Chris Webber.

14 responses to “BoxScores

  1. Pingback: Hardaway : Mourning : Brown :: Wade : O’Neal : Haslem « The Arbitrarian

  2. Pingback: The Road to the NBA Finals « The Arbitrarian

  3. Nice read, I was curious if you think pace needs to be in boxscores? It seems that players on high pace teams (relative to a lower pace team with the same amount of wins) will look better. Also, I’m not sure what you did with assists? My thought is that adding assists in such linear weight system double counts. If the passer gets x% credit for the assisted basket then the actual scorer should be docked the same amount. For sake of discussion: just view everything conceptually as points. So if Nash finds Bell for a 3, Nash should get 1 pt and Bell the other 2. In your system there are 4.4 pts to be spread around (3 + 1.3 for the AST).

  4. rapidadverbssuck

    Christopher: Pace is accounted for in BoxScores, because they are calculated as players’ percentage of team total contributions–thus a player doing 20% of the things on a slow team with 50 wins will be valued the same as a player doing 20% of the things on a fast team. With the assists, I did a little bit of trickery–in essence, I’m using assists to represent more than just the pass to the scorer–they’re representative, in some way, of controlling the ball for the team, and not turning it over, and getting it to a scorer–so it’s more than points… I know that’s a little nebulous, but this formula isn’t designed to model points, exactly, but rather “valuable contributions,” which will lead to points and defense and thus to wins.

  5. Pingback: Credit where credit is due « The Arbitrarian

  6. Pingback: Improving Brand’s Image « The Arbitrarian

  7. Another thought, and this ties into my pet peeve about WOW: What about strength of competition? I firmly believe that dropping 30-10-8 against Seattle should be worth much less that doing the same against, say, Detroit.

    Also, in your base regression die you use PTS on the rhs or just assume the coeff. is unity?

  8. Christopher: Strength of competition would be interesting to investigate, but in terms of productivity, strictly speaking, the opponent ought not matter. A win is a win is a win, regardless of the opponent. If you want to attempt to measure talent, or something else, it might be worth controlling for strength of competition, and it’s certainly an interesting idea.

    In the regression, I don’t have points on the right hand side, because points is on the lhs. And then, since I’m regressing on point differential: team points – opponent points, at a single-game level, I do go ahead and assume that every point scored increases point differential by one.

  9. Thanks. have you seen:

    http://www.draftexpress.com/article/2008-Win-Scores-NBA-Draft-Preview-2932

    That’s one of many reasons I think SOS matters. I would just love to be able to filter PER, WS, BoxScores by, say, quartiles of a quality distribution. Not to mention splits in the usual sense.

    Thanks for your patience for my questions: 2 more too 🙂

    How did you generate your graphs? Was that done in R too?

    And where did you get the data? I know it’s out there somewhere but is there some site where you can just download a zip file of box scores since the merger, for example?

  10. Pingback: Mr. Consistency « The Arbitrarian

  11. Christopher: Excellent article at draftexpress. I wish I had such data for college players. I agree, strength-of-schedule is crucial, although I think it’s less significant in the NBA than in the NCAA, where variance in team and player quality is greater. I might take a look at player performance versus opponent quality sometime when I get the time.

    I make all the graphics in R, mostly with custom code. If you’d like to see the code for something, just let me know, and I’ll (probably) post it.

    As for the data… there is unfortunately no such zip file, to my knowledge, and if you ask, all you’ll get is “it took me X hours and Y dollars to put this data together!!!”. All I can tell you, sadly, is ctrl-C, ctrl-V….

  12. Pingback: Predicting the future, by analogy « The Arbitrarian

  13. Pingback: NBA playing style spectrum « The Arbitrarian

  14. Pingback: The best of the WNBA, updated daily « The Arbitrarian

Leave a comment