Category Archives: Uncategorized

New Blog

Though this blog is no longer being updated, I am now writing on political science and information visualization at dsparks.wordpress.com.

For the record

Point projection: Obama’s percent of the nationwide two-party vote:

52.47%

Even more quiet

See here. It’s going to be even more quiet around here than usual. Thanks to everyone for reading and contributing your ideas.

The best of the WNBA, updated daily

Borrowing from the extremely useful DougStats, and making use of Google Docs, I present a (more-or-less) daily-updated list of the 100 most valuable WNBA players, using the BoxScores methodology.

WNBA Top 100 BoxScores

easy URL: http://bit.ly/wboxscores

As I said, this ought to be updated more-or-less daily, and should serve as an easy, quick reference to see the state of the WNBA. For the uninitiated, BoxScores attempt to estimate the value of each individual player in terms of contributions to team success, and the unit being estimated is wins. Thus, as of this posting, we can see that rookie Candace Parker leads the league with a 3.76 BoxScore/wins created, followed by Lindsay Whalen with 3.34… odd that Whalen failed to make the Olympic team, being that she is the second most valuable WNBA player in the league right now.

Note: Since the regression coeffeicients employed in Model-Estimated Value (MEV) were fitted for the NBA, it is unclear whether or not their values translate identically to WNBA play–that is, a steal in the WNBA may be worth more than in the NBA, or less, etc. However, based on the work of others, I’m assuming there is relatively little difference between the leagues on this front, and since the formula is applied evenly across all WNBA players, I will assume that the differnces even out on average. Let me know if you find this list useful, and whether or not the ranking seems to mesh with your own subjective perceptions.

NBA playing style spectrum

Many conversations about sports revolve around comparisons of quality — team A is better than team B, player X is the best of all time, this draftee will help his team more than that one, etc. For this type of discussion, many metrics exist, both qualitative and quantitative, one of which is BoxScores, developed here at the Arbitrarian. Other conversations center around similarity–team C plays like team D did in the 1980s, player Y is a taller, faster player Z, etc. The Arbitrarian has spent substantial time investigating this type of comparison as well, using statistical proximity and network diagrams. Yet another characterization, somewhat more general than direct similarity comparisons, is that of type, or style. While playing style has been discussed here, and style markers can be seen everywhere in my work in the form of various colorations, I would like to develop the idea a little more fully, and present a novel graphical visualization of the concept applied to NBA players.

Very rudimentary factor and cluster analysis I performed a long time ago indicated that there are distinctions in the data between players who tend to try to score a lot, those who play a “smaller” game, and those who play like “big men.” In terms of the NBA’s tracked counting statistics, this translates to a differentiation between those who specialize in points and field goal attempts, rebounds and blocks, and steals and assists. I have chosen to call each of these three tendencies Scorer, Perimeter, and Interior, and collectively they form the SPI Style Trichotomy.

Calculation

To identify each player’s style is conceptually simple, but computationally somewhat more complex. Essentially, one sums each player’s fga + tr + bk + as + st, and determines what percentage of the total each SPI factor constitutes:

  • Scorer percentage = fga / (fga + tr + bk + as + st)
  • Perimeter percentage = (as + st) / (fga + tr + bk + as + st)
  • Interior percentage = (tr + bk) / (fga + tr + bk + as + st)

These numbers are interesting on their own, but for the calculation of an index of style, they require further manipulation. In the league as a whole, the Scorer percentage is around 50%, the Perimeter percentage around 20%, and Interior 30%. Thus, if using these percentages, the vast majority of players would appear to be very scoring-centered. My concern here, in constructing a useful index, is to identify player propensities relative to other players, and for that, I calculate the percentile of each player’s percentages.

  • Scorer index = percentile(Scorer percentage)
  • Perimeter index = percentile(Perimeter percentage)
  • Interior index = percentile(Interior percentage)

Thus, even though the maximum Scorer percentage in a season might be close to 75% while the maximum Perimeter percentage is closer to 25%, the players with the highest percentages in the sample under consideration will be assigned an index value of 1. Players with median values on a percentage will have an index value of 0.5, and so on. The percentilization normalizes accross style tendencies and player subpopulations, and has the added virtue of scaling from 0 to 1.

Interpretation

Thus we have a set of three numbers for each player which can be used to characterize his playing style. The numbers easily translate to more qualitative descriptions. A player with a SPI triplet of (0.8, 0.2, 0.7) is an interior scorer, without much perimeter production. A player with this triplet (0.1, 0.7, 0.75) is anything but a scorer, sometimes called a “glue” guy. Someone at (0.5, 0.5, 0.5) produces the league median of each type, which is different from a player whose percentages are 33%, 33% and 33%. Such a player would have a relatively lower Scoring index, for example.

Since each individual is characterized by three variables, their SPI type can be plotted in three dimensions. Unfortunately, three dimensions are difficult to convey on a computer screen, so here is a plot which depicts Perimeter indices along the X-axis, Interior indices on the vertical axis, and Scoring indices as the size of the point.

(Click to enlarge)

Historical application note: Since steals and blocks have not been kept for the entirety of the history of professional basketball, players from earlier eras may have slightly skewed SPI values. While percentages and indices can still be calculated based only on fga, tr, and as, it is not difficult to see that leaving out blocks and steals, in comparison to eras in which those defensive statistics are included, will tend to skew players from an earlier era more toward the Scoring type. Unfortunately, without substantial era-specific correction, this effect is unavoidable. However, the sorting still manages to work well, especially if this detail is kept in mind when making certain cross-temporal comparisons.

Presentation

One of the advantages of using three sub-indices to construct the overall SPI Trichotomy is the convenient translation of index values to color. The three primary colors of light are Red, Green and Blue, and when combined in certain proportions, it is possible to generate infinite gradations of color (see Wikipedia). This means that each SPI triplet for each player can be represented as a single color. This aids understanding and comparison, as it is much easier to keep in mind that a certain player is a deep red than that his SPI triplet is (0.9, 0.1, 0.2), or that a player is a medium grey than that his triplet is (0.45, 0.53, 0.55). Further, a greenish-blue player is easily paired with another greenish-blue player, without having to specifically compare each of the players’ three index values. The human eye is capable of extremely high-resolution discernment, and using a single color to represent three numerical values takes advantage of this.

Here is the above plot, with color added according to RGB values derived from each player’s SPI indices, as you can see, “blueness” increases from bottom to top, “greenness” from left to right, and “redness” varies with the size of the point. The top-right corner is aqua or cyan, while the bottom left is mostly reddish, due to an absence of green and blue.

(Click to enlarge)

Unfortunately, this presentational format leaves a lot to be desired. Since each player can be represented by just one color, can we do better than a pseudo-3-dimensional plot? The answer is yes and no: No, because to ensure that the hue, saturation, and value of each color are captured, we still require three variables (see Wikipedia); yes, because most of what we are interested in here is hue–the underlying color for each player, red, yellow, green, aquamarine, vivid tangerine, indigo, etc. The other two components of HSV color space, saturation and value, allow us to see how “pure” the hue is, which in our basketball application, translates to how “pure” an individual’s playing style is.

The advantage of a conversion from RGB to HSV is that, by combining the S and V, we can represent the entire playing style spectrum in a format resembling a color wheel. This is the most straightforward and useful format for presentation, and this graphic is the big payoff:

Click to view in a Google Maps format. Also available at the easy-to-remember http://bit.ly/spi

As you can see, each of these NBA greats is aligned at a certain angle and distance from the center (I used polar coordinates as a basis for this plot), and this allows us to identify relatively similar players, player’s “opposites,” clusters, and other interesting observations. In this graphic, players with greater MEV (Model Estimated Value) are larger, and this allows comparisons along radii, as in X plays in a similar manner to Y, but is more valuable/productive.

Vocabulary

There are several ways our statistical vocabulary can be expanded via the SPI Style Trichotomy. The first is that we can characterize the degree to which a given player is Scoring/Interior/Perimeter-focused by reporting their index value. The second is that we can describe the player’s color–“He’s a deep blue defensive center,” or “Shane Battier is not chucker! He’s a cyan-colored scorer’s opposite type.” Finally, we can approximate a vector–I suggest the convention of overlaying the hours of a clock over the spectrum diagram (to indicate vector direction), with 12 o’clock at the very top, 1:00 at the interior’s opposite position, 3:00 at the scorer’s position, 4 o’clock bisecting the scorer and perimeter’s opposite position, etc. Distance from the center of the diagram (the length of the vector) indicates the degree to which a player fits exactly into their playing style–those players whose games are more balanced are closer to the center, players whose games are more specialized or narrow are further from the center. So, one could say, for example, “Dikembe Mutombo is a pure 7 o’clock,” or, “Michael Jordan was between about a two and three for most of his career, but in 88-89, he shifted closer to a pure 11.” And so forth.

Feedback

I would be very interested to hear any and all comments–does the trichotomy make sense? Is it useful? Are the S/P/I typologies a reasonable first division? Is it helpful to have a more continuous, yet still quantifiable, tool to describe player type than just position labels? Do you like the idea of equating, for example, a shoot-first point guard with a yellow-green 12 o’clock player? Does the graphic offer any new insight, confirm your subjective observations, or conflict with your opinions? I will be following this post up with many more using this methodology and this type of display, I hope you will come back often, or possibly subscribe.