It has been suggested that I look at players’ statistics from only the primes of their careers. This is a good idea, given that both very inexperienced and very old players will “regress to the mean” in terms of their performance and possibly, playing style. As such, I generated a sum of each player’s boxscore statistics during the modern area across only their best seasons. My definition of “best” was simple: not their worst. For each player, I found their mean seasonal winshr, as well as their winshr standard deviations. Any seasons for which a player’s winshr was greater than the mean less one standard deviation was included in this analysis. This way, I excluded seasons in which a player was injured or relatively underused because of age or because of a minor role on their team. Chris Webber’s current and previous seasons, for example, would not be included. In this way, I hope to get at the “pure” essence of each player for an even better comparison. You will probably not be surprised to see that the diagram looks very similar to the non-peak-performance versions:
NBA players at peak performance [pdf]
A few interesting things to note, however: at their peak, Michael Jordan and Larry Bird are now among each others’ closest matches. Also, taking a macro view of the whole network, it is now easy to identify several different nodes: In bluish purple at top left, we can see defensive-minded, “dirty work” bigs, while at the bottom in blue are more scoring bigs. To their right is a reddish group of primarily scorers, while going north from there in green we see “pure point guards” and then more scoring point guards. Etc, etc. Let me know if you notice any other interesting connections or clusters in the comments.
Using roll call votes from the 110th Senate through the end of last year, I have constructed a network diagram based on maximum similarities between Senators’ voting records. Essentially, distances were calculated by assigning a 1 to yes votes, and a 0 to no votes, and finding the difference between each pair of Senators on each possible roll call vote. Thus, two Senators who vote identically have a distance of 0, while two Senators who vote completely opposite ways have a distance equal to the total number of roll calls. Based on these distances, I constructed a network diagram linking each Senator to their two most-similarly-voting counterparts. I also colored each vertex according to how similar each Senator is to “all Republicans” and “all Democrats” collectively. The result revealed the highly polarized nature of the Senate: there is only a single strand linking Republicans to Democrats:
11oth Senate Roll Call Network Diagram [pdf]
I then decided to reduce the number of connections to only the single closest match for each Senator, and found something interesting that you will hear only rarely from the media: Senators Clinton and Obama are each others’ closest match, based, at least, on roll call votes in the 110th Senate through the end of 2007. This would seem to indicate that the wide disparities perceived between them in the eyes of the media and the public have little to do with actual policy/ideological divides, but rather that personality and framing (and possibly demographics) are making up the bulk of voting preferences in many Americans’ minds.
110th Senate Roll Call Isolated Networks [pdf]
I was aware, to some extent, of the constructed, rather than actual, nature of the differences between the two Democratic competitors, but to see the roll call evidence fall out so starkly was surprising.
The next in a series consists of batters in the MLB from 1955-2007 (because the modern set of statistics has not changed since the 1955 season). I think these statistics lend themselves less well to this sort of analysis, but it may be interesting to you baseball enthusiasts out there.
Batter Statistical Proximity [pdf]
I’ve had requests for my data and for the code I used to make these plots. So, in the spirit of openness, I’m posting them. If you would like to use them, please adhere to the Creative Commons license I’ve chosen, and let me know what you come up with. The .csv is the top 1000 careers over the last quarter century-or-so, determined by a playing-time-based statistic, and the .R file will run in R, and requires you to install the package sna. The sna package is awesome, it makes network diagramming essentially idiot-proof. Note that I currently have this code writing to a PDF, and that it cannot write to the pdf if a pdf with the same filename is open. Also, remember to make sure you change the csv’s file directory in the R code, or it won’t ever work. Please let me know if it’s not working for you, or if you know of a more efficient way of doing the same thing.
1000 Top Careers [csv]
Network Diagram Example [R]
In response to some questions at the APBRmetrics forum, I’ve put together a new NBA similarities network (Top 250 players version), wherein I use per-minute statistics, instead of my “patented” ratios method, just to see how it looks. In a lot of ways, this looks just as good or even better than the ratios version… I’m still somewhat torn, though: The ratios method, by ignoring time statistics completely, attempts to match players who, given a possession (or given an opponent with a possession), will do similar things with it, while the per-minute method does a better job of representing “substitutability.” I suppose I will let history be the judge, but I don’t think anyone loses when more pretty graphs are made:
NBA player similarities [pdf]
Another version with Extremely High Contrast Labels for Easy Reading: [pdf]