Tag Archives: proximity

Plotting the colors

Here is my contribution to a growing “literature” on Mechanical Turk color-naming: Dolores Labs paid MechaTurks to apply labels to 10,000 color swatches, and offer a cool color explorer (hat tip to Infosthetics). They generously made the data available, and some public-minded soul cleaned up the data. A Mr. Wattenburg took the first shot at aesthetic presentation, and as FlowingData has noted, his version is an improvement. Neoformix produced an even more kicked-up version. Thus concludes my lit review…

I took the cleaned-up data and narrowed it down to those color names applied at least twice. For each unique label, I found the mean RGB values, and estimated a distance matrix based on each colorname’s characteristics. This distance matrix I then fed into the two newest ways of visualizing the Dolores Labs Mechanical Turk Color Data (DLMTCD): network and cluster diagrams!

DLMTCD cluster diagram [pdf]

Network with white background, different algorithm [pdf]

DLMTCD network diagram [pdf]

The size of the vertices is a function of the number of times that color label is applied. Note that the network diagram employs transparency for easy reading, so the color you see is not exactly the color presented to the MTs. The nice thing about the pdf format is that you can zoom in and out and pan around as much as you want, and ctrl-f allows you to find any term in the network. Let me know what you think in the comments, and many thanks to Dolores Labs.

MLB Batter network diagram by statistical proximity

The next in a series consists of batters in the MLB from 1955-2007 (because the modern set of statistics has not changed since the 1955 season). I think these statistics lend themselves less well to this sort of analysis, but it may be interesting to you baseball enthusiasts out there.

batnetthumb.png Batter Statistical Proximity [pdf]

NBA season network diagram

It  was suggested that I compare players on single season data, rather than career sum data, both as a validity test and to gain other insight. It goes without saying that players’ styles change over their career–often, scorers become less effective and try to do other things well. Sometimes (as with Jordan, for example), we see players add dimensions to their game over time. So, I present yet another network diagram, one which illustrates the changing nature of each player. A few notes: this set is somewhat scorer-heavy, because of the way I generated the list of best seasons (using a euclidean distance metric). Also, when looking at this, it helps to keep in mind that this is a two-dimensional rendering of a hyperdimensional network–unless players are actually connected, visual proximity doesn’t necessarily mean anything, although it may not mean nothing. It would appear, given the degree to which players’ seasons cluster together, that the proximity algorithm functions fairly well.

NBA Seasons Proximity Network [PDF]

Toward a basketball taxonomy

I hope this isn’t getting repetitive, because I’ve got a diagram that will blow your mind: it’s like the entire NBA in a petri dish, with all different phyla and genera of player types represented. I used the same methodology I’ve been using (with the per-minute, rather than ratio statistics), but generated the graph with fewer connections (just the single closest match) per player. As a result, there are a whole lot of isolated clusters instead of one completely interconnected network. Also, I went ahead and did 1,000 players at once, instead of the standard 250. What I got astounded me–they look like microorganisms swimming around on the microscope slide that is the NBA. I apologize for the tiny font–if you zoom in to 125%, it should be readable–but had I made the names any larger, they would have overlapped to an illegible degree.

nbapetrithumb.png The NBA “petri dish” diagram [pdf]

I would be very interested in collectively coming up with a sort of “baller’s taxonomy,” wherein we try and identify the different clusters using some more subjective terms. I think we could come up with a better vocabulary to describe players and define playing styles. If you have any ideas, please put them in the comments, and if there is sufficient interest, I may come up with a more formalized process, in the hopes of putting together a follow-up diagram with labels.


Since I had already run the algorithm anyway (it takes a lot of cycles to do 1,000 players), I went ahead and made a completely connected version of the 1,000 player diagram. Warning: this one is pretty hard to parse.

nba1000thumb.png 1000 player network diagram [pdf]

Keep in mind that the search function (ctrl-f) will be really useful for these.