NBA playing style spectrum

Many conversations about sports revolve around comparisons of quality — team A is better than team B, player X is the best of all time, this draftee will help his team more than that one, etc. For this type of discussion, many metrics exist, both qualitative and quantitative, one of which is BoxScores, developed here at the Arbitrarian. Other conversations center around similarity–team C plays like team D did in the 1980s, player Y is a taller, faster player Z, etc. The Arbitrarian has spent substantial time investigating this type of comparison as well, using statistical proximity and network diagrams. Yet another characterization, somewhat more general than direct similarity comparisons, is that of type, or style. While playing style has been discussed here, and style markers can be seen everywhere in my work in the form of various colorations, I would like to develop the idea a little more fully, and present a novel graphical visualization of the concept applied to NBA players.

Very rudimentary factor and cluster analysis I performed a long time ago indicated that there are distinctions in the data between players who tend to try to score a lot, those who play a “smaller” game, and those who play like “big men.” In terms of the NBA’s tracked counting statistics, this translates to a differentiation between those who specialize in points and field goal attempts, rebounds and blocks, and steals and assists. I have chosen to call each of these three tendencies Scorer, Perimeter, and Interior, and collectively they form the SPI Style Trichotomy.

Calculation

To identify each player’s style is conceptually simple, but computationally somewhat more complex. Essentially, one sums each player’s fga + tr + bk + as + st, and determines what percentage of the total each SPI factor constitutes:

• Scorer percentage = fga / (fga + tr + bk + as + st)
• Perimeter percentage = (as + st) / (fga + tr + bk + as + st)
• Interior percentage = (tr + bk) / (fga + tr + bk + as + st)

These numbers are interesting on their own, but for the calculation of an index of style, they require further manipulation. In the league as a whole, the Scorer percentage is around 50%, the Perimeter percentage around 20%, and Interior 30%. Thus, if using these percentages, the vast majority of players would appear to be very scoring-centered. My concern here, in constructing a useful index, is to identify player propensities relative to other players, and for that, I calculate the percentile of each player’s percentages.

• Scorer index = percentile(Scorer percentage)
• Perimeter index = percentile(Perimeter percentage)
• Interior index = percentile(Interior percentage)

Thus, even though the maximum Scorer percentage in a season might be close to 75% while the maximum Perimeter percentage is closer to 25%, the players with the highest percentages in the sample under consideration will be assigned an index value of 1. Players with median values on a percentage will have an index value of 0.5, and so on. The percentilization normalizes accross style tendencies and player subpopulations, and has the added virtue of scaling from 0 to 1.

Interpretation

Thus we have a set of three numbers for each player which can be used to characterize his playing style. The numbers easily translate to more qualitative descriptions. A player with a SPI triplet of (0.8, 0.2, 0.7) is an interior scorer, without much perimeter production. A player with this triplet (0.1, 0.7, 0.75) is anything but a scorer, sometimes called a “glue” guy. Someone at (0.5, 0.5, 0.5) produces the league median of each type, which is different from a player whose percentages are 33%, 33% and 33%. Such a player would have a relatively lower Scoring index, for example.

Since each individual is characterized by three variables, their SPI type can be plotted in three dimensions. Unfortunately, three dimensions are difficult to convey on a computer screen, so here is a plot which depicts Perimeter indices along the X-axis, Interior indices on the vertical axis, and Scoring indices as the size of the point.

(Click to enlarge)

Historical application note: Since steals and blocks have not been kept for the entirety of the history of professional basketball, players from earlier eras may have slightly skewed SPI values. While percentages and indices can still be calculated based only on fga, tr, and as, it is not difficult to see that leaving out blocks and steals, in comparison to eras in which those defensive statistics are included, will tend to skew players from an earlier era more toward the Scoring type. Unfortunately, without substantial era-specific correction, this effect is unavoidable. However, the sorting still manages to work well, especially if this detail is kept in mind when making certain cross-temporal comparisons.

Presentation

One of the advantages of using three sub-indices to construct the overall SPI Trichotomy is the convenient translation of index values to color. The three primary colors of light are Red, Green and Blue, and when combined in certain proportions, it is possible to generate infinite gradations of color (see Wikipedia). This means that each SPI triplet for each player can be represented as a single color. This aids understanding and comparison, as it is much easier to keep in mind that a certain player is a deep red than that his SPI triplet is (0.9, 0.1, 0.2), or that a player is a medium grey than that his triplet is (0.45, 0.53, 0.55). Further, a greenish-blue player is easily paired with another greenish-blue player, without having to specifically compare each of the players’ three index values. The human eye is capable of extremely high-resolution discernment, and using a single color to represent three numerical values takes advantage of this.

Here is the above plot, with color added according to RGB values derived from each player’s SPI indices, as you can see, “blueness” increases from bottom to top, “greenness” from left to right, and “redness” varies with the size of the point. The top-right corner is aqua or cyan, while the bottom left is mostly reddish, due to an absence of green and blue.

(Click to enlarge)

Unfortunately, this presentational format leaves a lot to be desired. Since each player can be represented by just one color, can we do better than a pseudo-3-dimensional plot? The answer is yes and no: No, because to ensure that the hue, saturation, and value of each color are captured, we still require three variables (see Wikipedia); yes, because most of what we are interested in here is hue–the underlying color for each player, red, yellow, green, aquamarine, vivid tangerine, indigo, etc. The other two components of HSV color space, saturation and value, allow us to see how “pure” the hue is, which in our basketball application, translates to how “pure” an individual’s playing style is.

The advantage of a conversion from RGB to HSV is that, by combining the S and V, we can represent the entire playing style spectrum in a format resembling a color wheel. This is the most straightforward and useful format for presentation, and this graphic is the big payoff:

Click to view in a Google Maps format. Also available at the easy-to-remember http://bit.ly/spi

As you can see, each of these NBA greats is aligned at a certain angle and distance from the center (I used polar coordinates as a basis for this plot), and this allows us to identify relatively similar players, player’s “opposites,” clusters, and other interesting observations. In this graphic, players with greater MEV (Model Estimated Value) are larger, and this allows comparisons along radii, as in X plays in a similar manner to Y, but is more valuable/productive.

Vocabulary

There are several ways our statistical vocabulary can be expanded via the SPI Style Trichotomy. The first is that we can characterize the degree to which a given player is Scoring/Interior/Perimeter-focused by reporting their index value. The second is that we can describe the player’s color–“He’s a deep blue defensive center,” or “Shane Battier is not chucker! He’s a cyan-colored scorer’s opposite type.” Finally, we can approximate a vector–I suggest the convention of overlaying the hours of a clock over the spectrum diagram (to indicate vector direction), with 12 o’clock at the very top, 1:00 at the interior’s opposite position, 3:00 at the scorer’s position, 4 o’clock bisecting the scorer and perimeter’s opposite position, etc. Distance from the center of the diagram (the length of the vector) indicates the degree to which a player fits exactly into their playing style–those players whose games are more balanced are closer to the center, players whose games are more specialized or narrow are further from the center. So, one could say, for example, “Dikembe Mutombo is a pure 7 o’clock,” or, “Michael Jordan was between about a two and three for most of his career, but in 88-89, he shifted closer to a pure 11.” And so forth.

Feedback

I would be very interested to hear any and all comments–does the trichotomy make sense? Is it useful? Are the S/P/I typologies a reasonable first division? Is it helpful to have a more continuous, yet still quantifiable, tool to describe player type than just position labels? Do you like the idea of equating, for example, a shoot-first point guard with a yellow-green 12 o’clock player? Does the graphic offer any new insight, confirm your subjective observations, or conflict with your opinions? I will be following this post up with many more using this methodology and this type of display, I hope you will come back often, or possibly subscribe.

Predicting the future, by analogy

Many times before, I’ve posted network diagrams which I suggest highlight objective similarities between athletes, according only to their statistical production. I’ve also noted that one of the most common discussions, especially around the draft and its aftermath, is that which attempts to identify which current or past professional player is most similar to which draftee. This is done, I believe, to convey some idea of playing style, but also, I think, to convey some idea of an individual’s potential. If a collegiate or recent draft pick gets compared to Michael Jordan instead of Zan Tabak, it means that the comparer thinks the rookie is more of a scoring wing player than a non-scoring center type, and that he has the potential to be a very good player in the NBA, rather than a very good player in Europe.

Thus, I thought it would be useful to do this same sort of comparison, but statistically, rather than subjectively. The main problem I encountered is that one cannot just add a college player’s statistics to a database of pros, match them, and expect the results to be valid. A player who scores 28 ppg in college could turn out to be a prolific scorer in the NBA, but he may also turn out to be Adam Morrison. Even comparisons of two players’ statistics across NCAA teams, I would submit, is shaky, given that college teams are so variable in terms of playing styles and abilities. Nevertheless, that it what I have chosen to do: Compare the collegiate statistical profile’s of some of this year’s draftees to those of other recent draftees, and suggest the inference, by analogy, that their professional careers will be similar to those whom their college careers match. I understand that this is fraught with tenuous connections and weak connections, but given my personal data limitations and relative lack of patience and time, this is what I’ve come up with:

Statistical Proximity of Selected NCAA Basketball Players [pdf]

Incidentally, player vertices are scaled according to their per-game MEV (Model-Estimated Value-similar to the calculation for BoxScores), and colors are according to the Playing Style Trichotomy outlined here. I find it interesting that the algorithm matches Michael Beasley with Kevin Durant, who just had a ROY season. Derrick Rose isn’t directly connected to anyone spectacular, though he is only two degrees of separation from Chris Paul, which is good company. OJ Mayo is tied to Ben Gordon, who is off to a promising start in the NBA, and Rodney Stuckey is most closely matched to Dwyane Wade (perhaps the Pistons used similar methodology in making their pick). Anyway, I’m sure many of you will gain greater insight from the graphic than my own descriptions, so please fill me in with a comment.

Mr. Consistency

Who are the most consistent scorers in the NBA? This is a question of some interest for those who participate in fantasy leagues, as consistency might be a virtue in determining the value of a player on your roster. For various reasons, a player might be worth more to you if they score 20 points every game, rather than alternate between 10 and 30 every other game. Further, some measure of consistency may highlight a player’s ability to impose their will on a game: a player able to get his scoring in, regardless of the opposition, could be said to be more of a game-defining player.

I’ve managed to estimate, for players since the 86-87 season, each individual’s mean points per 48 minutes, as well as the standard deviation of said statistic, and thus the coefficient of variation (sd/mean) and 95% confidence interval. Here’s a spreadsheet of the top (634) players in the league, by mean pts/48, sorted by coefficient of variation. Thus, the players at top could be said, in some way, to be more consistent scorers than those at the bottom.

Most consistent scorers, 1986-2008

Below is another way to view the same question. Using each player’s mean and standard deviation pts/48, along with the sample size, we can construct a 95% confidence interval for our estimate of their true mean. In the graphic linked below, each player is ranked by their mean pts/48, and the x-axis indicates how they fare under this measure of scoring. Each mean is surrounded by a line indicating the 95% confidence interval. This means, essentially, that we can be 95% sure that the player is within the span of their colored line. For players with smaller samples or greater variance, the error bars will be wider.

NBA Pts/48 min means with error bars

As you can see, some players have no error bars at all–this means that they only have one observation. Others’ error bars go down past zero. This means that we can be 95% sure that their mean pts/48 is in a range that includes zero, which doesn’t tell us very much. Anyway, here is the same graphic, for the 2007-08 season only:

Note that Carl Landry (#73) has a greater variance than most players around him, but he ranks as a better per-48 scorer than Shaquille O’Neal.

Finally, here’s a regular-season 2007-08 graphic for players’ MEV (or model-estimated value, using regression-derived regression weights like those seen here). Landry does even better here (18th), in terms of his mean, but his confidence interval is very large. This estimate suggests, though, that at worst, he’s about as good as Odom, Andre Miller, and Kirilenko; while at best, he is in rarified air. Keep in mind that this is still just a 95% confidence interval, so statistically, there’s still a 1 in 20 chance the true mean isn’t even in this interval. All should be taken with a grain of salt. One of the things I like most about this presentation is that it’s a per-minute stat, which controls for playing time (although not pace), but still reminds us that estimates for those players with little playing time should be taken with large grains of salt, and might not really mean much of anything. Josh McRoberts, for example, is probably not the 406th, much less the 6th, most valuable player in the NBA, even though his simple arithmetic mean indicates as much–his confidence interval reminds us of this, while maintaining the simple ordering.

I suppose this is also the public debut of any sort of official MEV ordering for 2007-08. I’d be interested to hear what people thought about this… this is something similar to Berri’s estimates, but I think the weightings are a little more appropriate. Let me know in the comments if they seem, at least, per-minute, to be reasonable estimates and orderings of player value.

Dennis, Eddie, Frank, Gus, Joe, Kevin, Linton and Neil

All Johnsons, all Phoenix Suns players. In fact, some of the greatest Johnsons to ever play the game played some of their best seasons for the Suns. Looking at Winshares, over the history of professional basketball, approximately 2.4 percent of all wins can be attributed to players with the Johnson surname. For the Suns franchise, however, that number jumps to 7.8 percent. A look at the Suns’ Winshare franchise history gives a sense of just how pivotal these Johnsons have been:

Barkley had the all-time most valuable season for a Sun in 1992-93, but it certainly looks like Stoudemire has the potential to take that title away. Amare had a huge rookie year in terms of Winshares, and was duly recognized for the Rookie of the Year award. Since then, he has essentially doubled his win production, and his best years are likely still ahead of him.

Another pattern of interest in this visualization is the recent history of all-star quality point guards. Kevin Johnson, Jason Kidd, Stephon Marbury (when he was a productive player), and Steve Nash, all played large roles in their teams’ success. However, it’s equally interesting to note that they played very different types of games. Just looking at their playing type spectrum coloration (see this post for more detail), it is possible to see that KJ and Nash are much purer perimeter players, while Kidd, as evidence by his slightly bluish tinge, was more of a rebounder, and mustard-colored Marbury shows evidence of a proclivity toward scoring along with his perimeter play–at least moreso than the other three.

What other trends do you notice in this history? Is it possible that Nash hasn’t ever been the Suns’ most valuable player, even in his MVP years? Can any of you basketball historians comment on the Westphal, Davis and Nance years?

Note: Since this post was published, the Winshares formula has undergone some revisions of some substantive import. To see the most current iteration and accurate tables and graphs, please see the Winshares page.

Anything’s possible.

Huge game. I’d like to point out, that despite seemingly everyone picking the Lakers to win this series, I had the Celtics winning the championship (not that this was a particularly bold pick):

At the end of the series, this is how each player’s cumulative perfomance looks:

 Player PTS MEV PVC PtC Credit GoB Ray Allen 122 116.58 0.194 125.90 1.313 3.00 Kobe Bryant 154 109.97 0.195 122.87 1.291 1.90 Kevin Garnett 109 111.13 0.185 119.34 1.206 2.22 Paul Pierce 131 108.78 0.181 119.60 1.184 2.17 Pau Gasol 88 95.94 0.170 105.69 1.069 2.61 Lamar Odom 81 80.58 0.143 90.23 0.912 2.27 Rajon Rondo 56 77.40 0.129 79.18 0.788 2.39 Derek Fisher 65 60.57 0.108 67.49 0.690 2.33 James Posey 52 57.54 0.096 60.86 0.621 3.30 Jordan Farmar 42 37.59 0.067 43.97 0.453 2.47 Vladimir Radmanovic 44 41.23 0.073 44.66 0.449 2.14 Sasha Vujacic 50 29.68 0.053 36.41 0.399 1.75 Leon Powe 37 31.97 0.053 31.97 0.306 2.74 Kendrick Perkins 20 24.68 0.041 26.69 0.299 2.20 Eddie House 32 27.21 0.045 29.00 0.295 2.20 P.J. Brown 24 26.02 0.043 26.34 0.250 2.16 Trevor Ariza 13 14.86 0.026 16.69 0.185 2.86 Luke Walton 15 10.49 0.019 14.48 0.137 1.55 Sam Cassell 19 10.35 0.017 12.57 0.130 1.54 Ronny Turiaf 11 4.85 0.009 5.75 0.054 1.44 sum 1165 1077.41 1.846 1179.68 12.030 2.26

So, I have Ray Allen as MVP. I’m willing to concede that due to the lack of defensive box score statistics, I may underestimate defensive contributions to some extent, and as such, Pierce’s relative lockdown on Bryant in several games might push him to the top, at least statistically. Subjectively, though, I like Pierce as MVP anyway. It’s also worth mentioning that Kobe Bryant, for the series, scored 154 points and missed 78 field goals. This means he scored 27.3% of his team’s points, and missed 31.2% of his team’s misses. Bryant is a very good, sometimes dominant player, but he takes an aweful lot of shots, and missed shots hurt your team.

I have to say, though, I am impressed with the degree to which the Celtics dominated the Lakers throughout the series, and especially in the clinching game. Everyone stepped up–according to MEV (model-estimated value) Garnett had the best individual game of the series, with 35. Here is a spreadsheet listing each individual game performance, sorted by Credit–that is, players’ contributions to their teams’ relative success:

I hope you’ve found the Arbitrarian’s Finals coverage interesting and valid… can’t wait ’til next season.

Improving Brand’s Image

If the BoxScores methodology and the Scorer-Perimeter-Interior playing style trichotomy is new to you, you can read about them here and here.

For the latest in my series of BoxScore team histories, this week we turn to the LA Clippers–a storied franchise whose latest chapters have not been the most gripping. First things first, though, check out 38 illustrious seasons of Clippers franchise history:

Clippers Franchise History

The first thing to attend to upon first appraisal is just how productive Elton Brand is. Perhaps it is because he is oft-injured, but it seems as though Brand gets very little attention from the national sports media. Given the mostly mediocre level of talent with which he is surrounded, his productivity is impressive. Though Chris Kaman and Maggette are solid players, Brand has been adding 8-12 wins in his healthier seasons. Especially as Thornton develops in to a more well-rounded and more productive player (his very pink color in the graphic indicates that his rookie season focused much more on shooting than anything else), a healthy Elton Brand poises the Clippers for a return to the form of 2005-06. Though you would not know it from the amount of media coverage he receives, Brand is 29th among active players in BoxScores per 82 games, and his 05-06 was the second most productive season in Clippers history. One has to go back to Bob McAdoo in 1974-75 to find a more valuable Clippers season.

Other observations that stand out from a cursory overview of this graphical Clippers history: Interesting how Danny Manning went from a pretty well-rounded game, with somewhat of a passing/stealing bent (1990, 91, 92–note his fairly neutral greenish color, not that unlike Quinton Ross this past season), to focusing much more on scoring (1993 especially, his pinkish hue is much more like Quentin Richardson’s style). Is Manning’s transformation from Quinton-like to Quentin-like entirely a function of Mark Jackson’s strong season of perimeter play? I find it interesting to see how player’s styles evolve over the years, not only as they age, but in response to what their team needs for them to do.

Please let me know in the comments this portrayal of Clippers history meshes with your more subjective memories of it. Are the players ranked correctly within seasons? (e.g. in 2008, was Maggette more valuable in terms of wins than Kaman, followed by Mobley, Thornton, and Thomas?) Was Brand’s best year really 2005-06? What do you pick up by looking at the colors that I might have missed?

Paul Pierce: An MVP performance in a losing effort

Last night, Paul Pierce made a strong case to be finals MVP–unfortunately for him, the Celtics could not close the game out. You may be interested to find that Pau, not Paul, had the best game in terms of Model-estimated value. However, once team performance is accounted for Pierce comes out ahead, credited with 37.8 Points Created, and 0.376 wins. Nevertheless, this was Gasol’s best game yet:

 tm Player MP PTS MEV PVC PtC Credit G/B bos Paul Pierce 47.97 38 29.85 0.395 37.80 0.376 2.34 lal Pau Gasol 42.12 19 30.97 0.325 34.36 0.342 5.79 lal Lamar Odom 41.23 20 24.34 0.256 27.01 0.269 3.88 lal Kobe Bryant 44.33 25 18.68 0.196 20.73 0.206 1.78 bos Kevin Garnett 33.10 13 13.28 0.176 16.82 0.167 1.97 lal Jordan Farmar 22.30 11 10.47 0.110 11.61 0.116 3.21 lal Derek Fisher 35.23 15 9.79 0.103 10.86 0.108 1.80 bos Tony Allen 10.98 6 7.70 0.102 9.74 0.097 8.56 bos Ray Allen 39.48 16 6.86 0.091 8.69 0.086 1.51 bos Sam Cassell 18.02 9 6.23 0.082 7.88 0.078 2.13 bos James Posey 32.42 3 5.87 0.078 7.43 0.074 2.29 lal Vladimir Radmanovic 19.22 7 5.60 0.059 6.21 0.062 1.72 bos Eddie House 13.63 6 2.37 0.031 2.99 0.030 1.37 bos P.J. Brown 24.88 4 1.89 0.025 2.40 0.024 1.46 lal Luke Walton 10.43 2 2.10 0.022 2.33 0.023 1.99 bos Leon Powe 4.98 0 0.81 0.011 1.03 0.010 4.68 bos Rajon Rondo 14.53 3 0.66 0.009 0.83 0.008 1.07 lal Trevor Ariza 1.25 0 -0.22 -0.002 -0.25 -0.002 0.00 lal Ronny Turiaf 1.40 0 -1.66 -0.017 -1.84 -0.018 0.00 lal Sasha Vujacic 19.72 4 -1.96 -0.021 -2.18 -0.022 0.77 lal Chris Mihm 2.77 0 -2.90 -0.030 -3.22 -0.032 0.00 Totals 480 201 170.72 2.000 201.25 2.002 2.07

InĀ  other surprises, Tony Allen out-produced Ray Allen, who I had just proclaimed potential finals MVP, and the Lakers’ big three dominated that of the Celtics. The only sign of hope for Boston is the last four players on the list, all of whom are Lakers, and all of whom actually detracted from their team’s final outcome. Inicentally, in terms of Credit for wins cumulative over the course of the series (on which I would base the MVP decision), Bryant Leads with 1.222, followed by Ray Allen at 1.059, and Paul Pierce at 1.013. My suspicion is that if Boston wins the series, the award will go to the player who has the most impressive final game. If neither PP nor RA distinguish themselves, the award goes to Pierce, who has more points, and many more assists than Allen.