Having covered my operationalization of statistical similarity, and offered some evidence of its usefulness, I’d like to share what I perceive as the best part of the whole endeavor, the pictures. Using R and the sna package, along with the distances I’d previously computed [zip], I’ve put together a network diagram of player similarity. Basically, each player has two or three arrows coming out of him, pointing to the players that are most similar to him. Then, using some brilliant algorithm I don’t fully grasp, each player is plotted so that they all cluster together in groups, by similarity. I’ve then colored each player/node according to the usual formula, meaning that each is colored according, basically, to how their contributions are distributed. Past analysis has indicated that propensity to take shots, post-area stats, and perimeter-area stats (to apply somewhat arbitrary characterizations), are a good way of determining colors. See other posts for more on this. Anyway, I have two versions each of two different networks: Both .png and .pdf versions of the Top 250 players of the modern era, and then .png and .pdf versions of players 251-500, the second tier. (I recommend looking at the .pdfs first, because they’re higher-resolution, and easier to scroll around. Note that the .png and .pdf versions are different because of the way the plotting algorithm works… it’s the same data, shown in a somewhat different way.) I hope you find this interesting and/or useful, and please feel free to comment on the validity of this approach.
Update: If you like those graphs, you will really really like these: