Having covered my operationalization of statistical similarity, and offered some evidence of its usefulness, I’d like to share what I perceive as the best part of the whole endeavor, the pictures. Using R and the sna package, along with the distances I’d previously computed [zip], I’ve put together a network diagram of player similarity. Basically, each player has two or three arrows coming out of him, pointing to the players that are most similar to him. Then, using some brilliant algorithm I don’t fully grasp, each player is plotted so that they all cluster together in groups, by similarity. I’ve then colored each player/node according to the usual formula, meaning that each is colored according, basically, to how their contributions are distributed. Past analysis has indicated that propensity to take shots, post-area stats, and perimeter-area stats (to apply somewhat arbitrary characterizations), are a good way of determining colors. See other posts for more on this. Anyway, I have two versions each of two different networks: Both .png and .pdf versions of the Top 250 players of the modern era, and then .png and .pdf versions of players 251-500, the second tier. (I recommend looking at the .pdfs first, because they’re higher-resolution, and easier to scroll around. Note that the .png and .pdf versions are different because of the way the plotting algorithm works… it’s the same data, shown in a somewhat different way.) I hope you find this interesting and/or useful, and please feel free to comment on the validity of this approach.

Tier One [pdf] [png]

Tier Two [pdf] [png]

**Update**: If you like those graphs, you will *really really *like these:

### Like this:

Like Loading...

*Related*

“R” in the house, baby! ha, love R–cheaper than S-plus, and quite powerful, once you take the time to print out the huge help files and find some sample code.

It’s interesting to see Larry Bird and Kevin Garnett linked together. You wouldn’t immediately make that connection just based on how you remember Bird played and how you’ve seen Garnett play, but when you look at the numbers, they were both 20-10-5 guys.

I don’t know how you would implement this, but I think factoring in where a player takes his shots might provide an even better picture. Maybe starting with just 3’s vs. 2’s for modern players…

For what it’s worth, three-pointers vs. total field goals is already included in the algorithm. Given that Bird took a substantially greater number of threes than does Garnett, and given that this fact was factored in equally with all other factors, one can conclude that their playing styles must be VERY similar in all other respects for the pair of them to still come out as close matches. One thing of interest that is not evident from the diagram is that if you look at each players minimum distance to another player, Bird and Garnett’s minimums are substantially higher than most players — that is, they are fairly unique. I plan on having a post in the future along these lines.

Pingback: Quarterback network diagram by statistical proximity « The Arbitrarian

Pingback: The microcosmic NBA petri dish « The Arbitrarian

Pingback: Links/Articles Tagged Between February 21st and February 27th

This is VERY cool I must say. Great work.

Pingback: MLB Batter network diagram by statistical proximity « The Arbitrarian

Pingback: Pensieri sparsi » Crossover

Pingback: Cornell Info 204 - Networks » Blog Archive » Networking Player Similarities

Pingback: NBA playing style spectrum « The Arbitrarian

Pingback: ThunderNumbers: Visualizing Success – Team Similarity Diagrams | Daily Thunder.com