Having covered my operationalization of statistical similarity, and offered some evidence of its usefulness, I’d like to share what I perceive as the best part of the whole endeavor, the pictures. Using R and the sna package, along with the distances I’d previously computed [zip], I’ve put together a network diagram of player similarity. Basically, each player has two or three arrows coming out of him, pointing to the players that are most similar to him. Then, using some brilliant algorithm I don’t fully grasp, each player is plotted so that they all cluster together in groups, by similarity. I’ve then colored each player/node according to the usual formula, meaning that each is colored according, basically, to how their contributions are distributed. Past analysis has indicated that propensity to take shots, post-area stats, and perimeter-area stats (to apply somewhat arbitrary characterizations), are a good way of determining colors. See other posts for more on this. Anyway, I have two versions each of two different networks: Both .png and .pdf versions of the Top 250 players of the modern era, and then .png and .pdf versions of players 251-500, the second tier. (I recommend looking at the .pdfs first, because they’re higher-resolution, and easier to scroll around. Note that the .png and .pdf versions are different because of the way the plotting algorithm works… it’s the same data, shown in a somewhat different way.) I hope you find this interesting and/or useful, and please feel free to comment on the validity of this approach.
Update: If you like those graphs, you will really really like these:
10 responses so far ↓
mcbias // February 22, 2008 at 7:58 pm
“R” in the house, baby! ha, love R–cheaper than S-plus, and quite powerful, once you take the time to print out the huge help files and find some sample code.
succotashi // February 23, 2008 at 2:56 pm
It’s interesting to see Larry Bird and Kevin Garnett linked together. You wouldn’t immediately make that connection just based on how you remember Bird played and how you’ve seen Garnett play, but when you look at the numbers, they were both 20-10-5 guys.
I don’t know how you would implement this, but I think factoring in where a player takes his shots might provide an even better picture. Maybe starting with just 3’s vs. 2’s for modern players…
rapidadverbssuck // February 23, 2008 at 9:04 pm
For what it’s worth, three-pointers vs. total field goals is already included in the algorithm. Given that Bird took a substantially greater number of threes than does Garnett, and given that this fact was factored in equally with all other factors, one can conclude that their playing styles must be VERY similar in all other respects for the pair of them to still come out as close matches. One thing of interest that is not evident from the diagram is that if you look at each players minimum distance to another player, Bird and Garnett’s minimums are substantially higher than most players — that is, they are fairly unique. I plan on having a post in the future along these lines.
Quarterback network diagram by statistical proximity « The Arbitrarian // February 25, 2008 at 6:00 am
[...] 25, 2008 by rapidadverbssuck For more information, just refer back to my last post, which dealt with this same methodology applied to the NBA. I’ve done the same thing here [...]
The microcosmic NBA petri dish « The Arbitrarian // February 25, 2008 at 7:43 pm
[...] all different phyla and genera of player types represented. I used the same methodology I’ve been using (with the per-minute, rather than ratio statistics), but generated the graph with fewer connections [...]
Links/Articles Tagged Between February 21st and February 27th // February 27, 2008 at 5:11 pm
[...] NBA similarity networks « The Arbitrarian :: [Tags: basketball maps stats ] [...]
Shayan // February 28, 2008 at 2:35 pm
This is VERY cool I must say. Great work.
MLB Batter network diagram by statistical proximity « The Arbitrarian // March 2, 2008 at 8:23 pm
[...] 2, 2008 by rapidadverbssuck The next in a series consists of batters in the MLB from 1955-2007 (because the modern set of statistics has not changed [...]
Pensieri sparsi » Crossover // March 4, 2008 at 5:19 pm
[...] si è messo a fare un diagramma che mette in luce le similitudini fra giocatori NBA, se avete [...]
Cornell Info 204 - Networks » Blog Archive » Networking Player Similarities // March 6, 2008 at 2:46 pm
[...] debates about athletic prowess, one blogger has developed statistical similarity networks for NBA players and Major League Baseball batters. By finding each player’s ratio for all statistical [...]
Leave a Comment