Toward a basketball taxonomy

I hope this isn’t getting repetitive, because I’ve got a diagram that will blow your mind: it’s like the entire NBA in a petri dish, with all different phyla and genera of player types represented. I used the same methodology I’ve been using (with the per-minute, rather than ratio statistics), but generated the graph with fewer connections (just the single closest match) per player. As a result, there are a whole lot of isolated clusters instead of one completely interconnected network. Also, I went ahead and did 1,000 players at once, instead of the standard 250. What I got astounded me–they look like microorganisms swimming around on the microscope slide that is the NBA. I apologize for the tiny font–if you zoom in to 125%, it should be readable–but had I made the names any larger, they would have overlapped to an illegible degree.

nbapetrithumb.png The NBA “petri dish” diagram [pdf]

I would be very interested in collectively coming up with a sort of “baller’s taxonomy,” wherein we try and identify the different clusters using some more subjective terms. I think we could come up with a better vocabulary to describe players and define playing styles. If you have any ideas, please put them in the comments, and if there is sufficient interest, I may come up with a more formalized process, in the hopes of putting together a follow-up diagram with labels.


Since I had already run the algorithm anyway (it takes a lot of cycles to do 1,000 players), I went ahead and made a completely connected version of the 1,000 player diagram. Warning: this one is pretty hard to parse.

nba1000thumb.png 1000 player network diagram [pdf]

Keep in mind that the search function (ctrl-f) will be really useful for these.


5 responses to “Toward a basketball taxonomy

  1. I just registered for an APBR account, when I get that I’ll respond more fully. I saw your post over there as well, and I thought I’d add some stuff.

    I’ve looked at your top 500 players distance matrix (p.s., the player names are not in the same format for the rows and columns, a little perl fixed that), and I decided to try some advanced clustering techniques.

    I used multi-dimensional scaling to reduce the 500 dimension matrix to something more manageable, and it gave the first 12 dimensions as accounting for 99% of the variance. The first eigenvector (53.4% of the variance) basically corresponding to size, with little guys at one end and big men at the other.

    The 2nd eigenvector (15.8%) is more troublesome as players like Shaq, AI, Moses Malone, World B Free, and David Thompson are at one end, and the other end has Bruce Bowen, Nate McMillan, Robert Horry, Chris Duhon, and Popeye Jones. I’ll post more on this later.

    I also used self-organizing maps to cluster the matrix (using correlations between the distances), and it looks like a 4×2 model fits the data pretty well. For example, it breaks up the forwards into two classes, with Karl Malone, Kevin Garnett, Charles Barkley, Dirk Nowitzki, and Antawn Jamison into one group, and Shawn Marion, Lamar Odom, Rasheed Wallace, Josh Howard, and Shane Battier into another. There’s some pretty interesting results from it. Anyway, I’ll post more later.

  2. rapidadverbssuck

    That sounds awesome. I’m glad to see that you seem to have picked up where my knowledge runs a little thin, but to the extent that I can fully follow your process, it seems to be very much in line with my original intent in starting this project–autoclassification of player types. If you’re interested, I could provide you with a .csv of the Top 1000 distances matrix as well.

    It sounds to me like you have a position/size (which correlate anyway) eigenvalue, and an offensive/defensive alignment eigenvalue. Shaq, AI, Malone, et. al. all show up as reddish in my simple color scheme classification, while Bowen, McMillan, Horry et. al. are more known for their “grit” or defense. I would love to see more of your results, I hope your APBR clearance comes through soon.

  3. Pingback: NBA similarity networks « The Arbitrarian

  4. Are you able to publish the code you used for this? I’m assuming you used R and the SNA package. I guess what I’m really after is seeing code examples that show the SNA package in use.

  5. I am a ‘Moneyballer’ and have often wondered when the SABR approach would be applied more to other sports.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s