Category Archives: nba

Assigning credit to Team USA

Posted on July 28, 2008 | 6 comments

During the NBA Finals, I made an effort to estimate player contributions to the final score, using Model-Estimated value, and a metric which I unimaginatively called “Credit.” MEV is a linear-weighting player productivity measure (read about it here), and Credit (which I’ve modified somewhat since the Finals coverage), attempts to divide credit (or blame) for a team’s success among individual players:

Player Credit = Player MEV / Team Total MEV * (Team Points / (Team Points + Opponent Points))

This way, total Credit for all players on both teams sums to one in every game, and players on teams that win by a lot are allocated more Credit to divide amongst themselves, whereas in tight games, each team has closer to 0.50 Credits to attribute. In the spirit of the upcoming Olympic Games, I hope to return to semi-regular coverage, though not necessarily in the form of any actual posting. Rather, I will endeavor to update, after each game leading up to and played in Beijing, MEV and Credit statistics for each member of Team USA. Each game’s statistics, as well as cumulative stats can be found here:

http://bit.ly/teamusa

I hope you find it useful and insightful over the next few weeks.

6 Comments

Posted in basketball, infovis, nba, statistics

Change of venue

Posted on July 17, 2008 | 2 comments

Folks, starting today, I’ll be writing a weekly post on Thursdays, over at Hardwood Paroxysm. I’m likely to continue posting here, although frequency might drop a little bit. If you would like to continue subscribing to The Arbitrarian’s RSS feed, but would also like to read my posts at HP, I’ve put together a joint feed in Yahoo! Pipes that will do just that. Thanks for all of your support, readership, and insight–having readers makes blogging fun.

Here’s the feed: Abitrarian Everywhere

2 Comments

Posted in basketball, nba

Estimating team chemistry

Posted on July 14, 2008 | 2 comments

I posted recently to introduce a new method of characterizing basketball playing styles, which I call the SPI Style Trichotomy. The advantage of this methodology is, among other things, that it is an objective, performance-based means of characterizing player type that offers substantially more nuance and accuracy than the traditional position adjectives.

Well, today I’m going to take a step back from this seamless, continuous spectrum perspective, and impose some order in order to investigate the value of each playing style. Since the SPI characterizations themselves are productivity- and value-independent, it may be of interest to see the degree to which employing a player who plays a given style can add to team success. My first step was to identify, for each player-season, which of seven arbitrary playing style categorizations they most closely match. A quick look at the SPI Spectrum Graphic indicates that I’ve already “named” six spokes–each of the pure SPI styles, plus their opposites. For this post (and possibly into the future), I will refer to these six spoke-categories as (counter-clockwise from the 3 o’clock position) Pure Scorer, Perimeter Scorer, Pure Perimeter, Scorer’s Opposite (though catchier, “Defender” is too bold, and inaccurate), Pure Interior, and Interior Scorer. Note that I could have made any number of categories here, and that one of the positives of the SPI System is the lack of such arbitrary distinctions–nevertheless, for the purposes of running a regression, I’ve categorized them. Each player’s SPI numbers were used to identify the spoke to which they are closest, and for a given season, this is the category into which that player is lumped. To the six already mentioned above, I added a seventh identifier, “Mixed,” for those players who were closer to the center of the diagram than any of the six style archetypes. To give an idea of the results of the sorting, here is a table presenting the top 50 players for each archetype:

Exemplars of each SPI7 Style

The ranking was derived by summing each player’s BoxScores over seasons during which he was classified under a given archetype–thus, this isn’t a “Best-ever” list, necessarily–just a list of familiar players and their categorization.

Following this categorization process, I calculated, for each team-season, the sum of minutes played by players fitting in to each of the seven categories. Thus, the 07-08 Blazers featured 8,307 minutes of playing time from Interior Scorers–coming mostly from Aldridge, Outlaw and Webster, but with contributions from James Jones and Von Wafer. In fact, the team sums are pretty interesting in and of themselves, so I added that table as a second sheet to the Google Doc linked above: SPI7 Team Sums.

From here, I ran very basic regression analysis. I was hoping to identify the (relative) value of a minute played by each archetype. Thus, I regressed the team minute sums on team win totals (from the 52-53 season, onward, except 1999). This is a very simplistic analysis, but it yielded interesting results (in the variable names, SS is the Pure Scoring, SP is Perimeter Scoring, etc.):

Residuals: Min 1Q Median 3Q Max -31.4014 -8.4507 0.7324 8.6521 33.1204

Coefficients: Estimate Std. Error t value Pr(>|t|) SSmin 0.0013482 0.0002170 6.213 7.2e-10 *** SPmin 0.0016794 0.0001653 10.161 < 2e-16 *** PPmin 0.0024116 0.0001986 12.142 < 2e-16 *** PImin 0.0027014 0.0002147 12.580 < 2e-16 *** IImin 0.0026275 0.0002129 12.344 < 2e-16 *** ISmin 0.0019478 0.0001317 14.785 < 2e-16 *** MMmin 0.0019719 0.0001493 13.204 < 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.92 on 1170 degrees of freedom (3 observations deleted due to missingness) Multiple R-Squared: 0.9217, Adjusted R-squared: 0.9212 F-statistic: 1967 on 7 and 1170 DF, p-value: < 2.2e-16

Each coefficient is significant, and we may gain some insight by comparing the magnitude of these coefficients. The least valuable archetype (in this extremely superficial analysis which should be taken with several hundred salt grains), is the Pure Scorer, who adds 0.0013 wins per additional minute played. The most valuable (surprisingly?) are the Scorer’s Opposite types–Kevin Garnett, Shane Battier, Kirilenko, etc. who add roughly double that number of wins per minute played. The rest you can figure out easily from the regression output. As a biased observer, with my own subjective preferences, I like these results a lot: one-dimensional scorers, adored by causal fans, but disdained by me, are identified as less valuable than the glue guys and lockdown defenders, etc. who focus on things other than scoring (although as Garnett and Barkley show, they can score, too). Keep in mind that this output is somewhat hastily done and only somewhat less hastily thought-through, but the results are certainly interesting.

Another question regarding these playing types might concern the combinations of types which are most effective. From a team-building standpoint, when considering the draft, trades, or free agent acquisitions, such an investigation might prove useful. Using the same set of data as above, I ran another regression, this time using only the interactions of each team’s minutes-by-type sums. In other words, instead of seven independent variables, there are now 21: one for each combination of archetypes, Pure Perimeter/Scoring Interior, Scorer’s Opposite/Pure Scorer, etc. The interaction means that the minutes for each of the two categories are multiplied together, and this is the value included in the regression. The output is as follows:

Coefficients: Estimate Std. Error t value Pr(>|t|) SSmin:SPmin -1.956e-07 9.312e-08 -2.100 0.035909 * SSmin:PPmin -5.331e-08 1.227e-07 -0.434 0.664067 SSmin:PImin 5.307e-07 1.203e-07 4.413 1.11e-05 *** SSmin:IImin 5.360e-07 7.759e-08 6.908 8.12e-12 *** SSmin:ISmin 3.934e-07 5.915e-08 6.652 4.46e-11 *** SSmin:MMmin 1.263e-07 1.066e-07 1.184 0.236560 SPmin:PPmin -1.104e-08 9.102e-08 -0.121 0.903440 SPmin:PImin 5.573e-07 5.492e-08 10.146 < 2e-16 *** SPmin:IImin 4.618e-07 4.015e-08 11.499 < 2e-16 *** SPmin:ISmin 3.647e-07 3.967e-08 9.195 < 2e-16 *** SPmin:MMmin 2.264e-07 6.264e-08 3.614 0.000314 *** PPmin:PImin 6.224e-07 1.213e-07 5.132 3.37e-07 *** PPmin:IImin 4.720e-07 9.427e-08 5.007 6.39e-07 *** PPmin:ISmin 2.489e-07 8.016e-08 3.105 0.001948 ** PPmin:MMmin 4.288e-07 6.213e-08 6.902 8.43e-12 *** PImin:IImin -3.740e-07 1.114e-07 -3.358 0.000811 *** PImin:ISmin 1.617e-08 9.902e-08 0.163 0.870291 PImin:MMmin 1.937e-07 9.729e-08 1.991 0.046721 * IImin:ISmin 1.150e-07 7.674e-08 1.498 0.134291 IImin:MMmin 2.654e-07 8.736e-08 3.038 0.002432 ** ISmin:MMmin 2.622e-07 6.672e-08 3.930 9.00e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.15 on 1156 degrees of freedom (3 observations deleted due to missingness) Multiple R-Squared: 0.9196, Adjusted R-squared: 0.9182 F-statistic: 630 on 21 and 1156 DF, p-value: < 2.2e-16

Note: If you thought the above regression was methodologically shaky, this one is even worse! But, nevertheless, it’s interesting to look at. Here, the coefficients are much more difficult to interpret, so I would recommend focusing mainly on whether or not they are significant (indicated by *s), and the sign attributed to the variable. It appears as though PP/PI combinations are especially fruitful, while PI/II is a deadly combination… Anyway, that’s more than enough for one post, but please feel free to add your own insights, and especially your criticisms!

2 Comments

Posted in basketball, nba, statistics

Mr. Consistency

Posted on June 24, 2008 | 3 comments

Who are the most consistent scorers in the NBA? This is a question of some interest for those who participate in fantasy leagues, as consistency might be a virtue in determining the value of a player on your roster. For various reasons, a player might be worth more to you if they score 20 points every game, rather than alternate between 10 and 30 every other game. Further, some measure of consistency may highlight a player’s ability to impose their will on a game: a player able to get his scoring in, regardless of the opposition, could be said to be more of a game-defining player.

I’ve managed to estimate, for players since the 86-87 season, each individual’s mean points per 48 minutes, as well as the standard deviation of said statistic, and thus the coefficient of variation (sd/mean) and 95% confidence interval. Here’s a spreadsheet of the top (634) players in the league, by mean pts/48, sorted by coefficient of variation. Thus, the players at top could be said, in some way, to be more consistent scorers than those at the bottom.

Most consistent scorers, 1986-2008

Below is another way to view the same question. Using each player’s mean and standard deviation pts/48, along with the sample size, we can construct a 95% confidence interval for our estimate of their true mean. In the graphic linked below, each player is ranked by their mean pts/48, and the x-axis indicates how they fare under this measure of scoring. Each mean is surrounded by a line indicating the 95% confidence interval. This means, essentially, that we can be 95% sure that the player is within the span of their colored line. For players with smaller samples or greater variance, the error bars will be wider.

NBA Pts/48 min means with error bars

As you can see, some players have no error bars at all–this means that they only have one observation. Others’ error bars go down past zero. This means that we can be 95% sure that their mean pts/48 is in a range that includes zero, which doesn’t tell us very much. Anyway, here is the same graphic, for the 2007-08 season only:

Note that Carl Landry (#73) has a greater variance than most players around him, but he ranks as a better per-48 scorer than Shaquille O’Neal.

Finally, here’s a regular-season 2007-08 graphic for players’ MEV (or model-estimated value, using regression-derived regression weights like those seen here). Landry does even better here (18th), in terms of his mean, but his confidence interval is very large. This estimate suggests, though, that at worst, he’s about as good as Odom, Andre Miller, and Kirilenko; while at best, he is in rarified air. Keep in mind that this is still just a 95% confidence interval, so statistically, there’s still a 1 in 20 chance the true mean isn’t even in this interval. All should be taken with a grain of salt. One of the things I like most about this presentation is that it’s a per-minute stat, which controls for playing time (although not pace), but still reminds us that estimates for those players with little playing time should be taken with large grains of salt, and might not really mean much of anything. Josh McRoberts, for example, is probably not the 406th, much less the 6th, most valuable player in the NBA, even though his simple arithmetic mean indicates as much–his confidence interval reminds us of this, while maintaining the simple ordering.

I suppose this is also the public debut of any sort of official MEV ordering for 2007-08. I’d be interested to hear what people thought about this… this is something similar to Berri’s estimates, but I think the weightings are a little more appropriate. Let me know in the comments if they seem, at least, per-minute, to be reasonable estimates and orderings of player value.

3 Comments

Posted in analysis, basketball, graphics, infovis, metrics, nba, sports, statistics, Uncategorized

Anything’s possible.

Posted on June 18, 2008 | 4 comments

Huge game. I’d like to point out, that despite seemingly everyone picking the Lakers to win this series, I had the Celtics winning the championship (not that this was a particularly bold pick):

At the end of the series, this is how each player’s cumulative perfomance looks:

Player	PTS	MEV	PVC	PtC	Credit	GoB
Ray Allen	122	116.58	0.194	125.90	1.313	3.00
Kobe Bryant	154	109.97	0.195	122.87	1.291	1.90
Kevin Garnett	109	111.13	0.185	119.34	1.206	2.22
Paul Pierce	131	108.78	0.181	119.60	1.184	2.17
Pau Gasol	88	95.94	0.170	105.69	1.069	2.61
Lamar Odom	81	80.58	0.143	90.23	0.912	2.27
Rajon Rondo	56	77.40	0.129	79.18	0.788	2.39
Derek Fisher	65	60.57	0.108	67.49	0.690	2.33
James Posey	52	57.54	0.096	60.86	0.621	3.30
Jordan Farmar	42	37.59	0.067	43.97	0.453	2.47
Vladimir Radmanovic	44	41.23	0.073	44.66	0.449	2.14
Sasha Vujacic	50	29.68	0.053	36.41	0.399	1.75
Leon Powe	37	31.97	0.053	31.97	0.306	2.74
Kendrick Perkins	20	24.68	0.041	26.69	0.299	2.20
Eddie House	32	27.21	0.045	29.00	0.295	2.20
P.J. Brown	24	26.02	0.043	26.34	0.250	2.16
Trevor Ariza	13	14.86	0.026	16.69	0.185	2.86
Luke Walton	15	10.49	0.019	14.48	0.137	1.55
Sam Cassell	19	10.35	0.017	12.57	0.130	1.54
Ronny Turiaf	11	4.85	0.009	5.75	0.054	1.44
sum	1165	1077.41	1.846	1179.68	12.030	2.26

So, I have Ray Allen as MVP. I’m willing to concede that due to the lack of defensive box score statistics, I may underestimate defensive contributions to some extent, and as such, Pierce’s relative lockdown on Bryant in several games might push him to the top, at least statistically. Subjectively, though, I like Pierce as MVP anyway. It’s also worth mentioning that Kobe Bryant, for the series, scored 154 points and missed 78 field goals. This means he scored 27.3% of his team’s points, and missed 31.2% of his team’s misses. Bryant is a very good, sometimes dominant player, but he takes an aweful lot of shots, and missed shots hurt your team.

I have to say, though, I am impressed with the degree to which the Celtics dominated the Lakers throughout the series, and especially in the clinching game. Everyone stepped up–according to MEV (model-estimated value) Garnett had the best individual game of the series, with 35. Here is a spreadsheet listing each individual game performance, sorted by Credit–that is, players’ contributions to their teams’ relative success:

http://spreadsheets.google.com/ccc?key=pjtolzxemBV74inofaEcgQQ&hl=en

I hope you’ve found the Arbitrarian’s Finals coverage interesting and valid… can’t wait ’til next season.

4 Comments

Posted in basketball, nba, sports, statistics