The dataset I’ve been using doesn’t have player position data in it, so the other day I was playing around with cluster and factor analysis, which I don’t really know how to do yet, and trying to come up with a way to estimate players’ positions from the data. I did come up with a pretty novel method, which works about 70% of the time (for another post, someday), but I also had the following idea:
How would one arbitrarily divided basketball skills a priori? In the course of my cluster and factor analyses, a few things fell out (I know, not exactly, a priori): there are distinctions between people who take a lot of shots, small men (guard-types), and big men (centers/PFs). I came up with the following rough estimators of the degree to which every player aligns with each invented archetype:
“shooteR” = fga/(fga+tr+as+st+bk)
“Guard” = (as+st)/(fga+tr+as+st+bk)
“Big” = (tr+bk)/(fga+tr+as+st+bk)
This way, the individual with the highest shooteR rating for whom field goal attempts comprise the highest percent of his stat sum. These ratings don’t really mean very much, except to roughly suggest certain tendencies, but when you generate percentiles for each player, you get a nicely ordered rating (which is something I’ll use several times in future posts) on each aspect that falls between 0 and 1. This is useful, because the means of each aspect in the population as a whole are very different:
stat : mean
R : 0.504
G : 0.189
B : 0.307
So, the “percentalization” sends a player with an average distribution to (0.5, 0.5, 0.5).
The great thing about these three aspects is that they really lend themselves to comparison: in 2007, for example, the top players in each category were as follows:
R: Willie Green (0.725), Michael Redd (0.723), Adam Morrison (0.689)
G: Steve Nash (0.430), Brevin Knight (0.430), Eric Snow (0.422)
B: Tyson Chandler (0.644), Jeff Foster (0.630), Erick Dampier (0.617)
The numbers in parentheses are proportions of the player’s total stats, not their percentiles. Notice how the highest Guard values are much lower than the highest shooteR and Big values. Converting these to percentiles adjusts for this somewhat. It also means that, while the R, G and B percentages must add to one for each player, the sum of a players’ three percentiles may be greater than 1.5.
Rather than just listing some players with their values here, I have created a visualization, in which each players percentiles for R,G, and B have been converted to RGB color values. For the axes, I have used y = points/gp and x = other/gp, or (tr+as+st+bk-to)/gp. “Other” per game is interesting to see and at least somewhat useful to measure a player’s nonscoring contributions, but it works best for now because it is simple, and I am only trying to get the players spaced out in two dimensions for display. Without further ado, here is the scatterplot (you’re going to want to click on it, which will open a 2048 x 1536 .png (perfect for your desktop background) .
I have already listed the reddest, greenest and bluest players, but here’s another list of high acheivers:
Yellowest (R&G): Allen Iverson, Earl Boykins, Tyronn Lue, Leandro Barbosa
Cyanmost (G&B): Ben Wallace, Andrei Kirilenko, Marcus Camby, Jason Kidd
Magentalikes (B&R): Eddy Curry, Andrea Bargnani, Rasual Butler, Andres Nocioni
Closest to white (R&G&B): LeBron James, Hedo Turkoglu, Bobby Jackson, Cuttino Mobley
Let me know if you notice anything interesting in the plot (for example, Carlos Boozer, Pau Gasol and Elton Brand appear to be pretty similar players), and enjoy!
10 responses so far ↓
40,000 Pieces of Information in One Screen « The Arbitrarian // August 9, 2007 at 7:02 am
[...] Pieces of Information in One Screen Just like the last graph posted, this one charts points per game and “other” per game, and colors each player-season [...]
NBA similarity networks « The Arbitrarian // February 22, 2008 at 10:52 am
[...] cluster together in groups, by similarity. I’ve then colored each player/node according to the usual formula, meaning that each is colored according, basically, to how their contributions are distributed. [...]
Choosing the MVP, geometrically « The Arbitrarian // April 25, 2008 at 7:01 am
[...] as a rectangle, the area of which is exactly proportional to his value. (Color is derived from my favorite way to capture playing type–the RGB scorer/perimeter/interior [...]
Winshares: Player contributions to team success « The Arbitrarian // May 20, 2008 at 6:04 am
[...] One of the more useful ways to conceptualize Winshares is as player percent valuable contributions * team success. This has a particularly interesting expression in geometric terms, where Winshares can be thought of as the area of the rectangle created by multiplying valpct by team wins. The following series of visualizations depicts Winshares as a geometric comparison of player value. The color scheme is based on playing style–more detail on this classification may be found here. [...]
Hardaway : Mourning : Brown :: Wade : O’Neal : Haslem « The Arbitrarian // May 22, 2008 at 3:56 pm
[...] member of the two trios produced in approximately the same proportion to the other two, while their playing styles across the analogous pairings were remarkably similar: Wade and Hardaway? Both perimeter-producing [...]
The Road to the NBA Finals « The Arbitrarian // June 4, 2008 at 7:52 am
[...] of the more interesting things about this representation is the different types of games each player has–indicated here by coloration. Games in which a player primarily contributes [...]
Carrying the burden « The Arbitrarian // June 5, 2008 at 6:33 am
[...] players, and plotted that against team scoring differential (the colors of the dots are what type of game they [...]
The Grizzlies were good, once « The Arbitrarian // June 11, 2008 at 1:52 pm
[...] It’s always interesting to me to find players who are colored gray, indicating that their playing style is very close to the league average–that is, their propensity to score, play perimeter ball, [...]
Improving Brand’s Image « The Arbitrarian // June 16, 2008 at 2:06 pm
[...] June 16, 2008 · No Comments If the BoxScores methodology and the Scorer-Perimeter-Interior playing style trichotomy is new to you, you can read about them here and here. [...]
Dennis, Eddie, Frank, Gus, Joe, Kevin, Linton and Neil « The Arbitrarian // June 21, 2008 at 7:08 am
[...] played very different types of games. Just looking at their playing type spectrum coloration (see this post for more detail), it is possible to see that KJ and Nash are much purer perimeter players, while [...]
Leave a Comment