March Madness is officially upon us, and this year I have a plan to dominate my pool using big data. I hope. I’m going to balance my gut feelings and desires with statistics and … I’m going to hope everyone else doesn’t do the same. A bunch of lawyers should have better things to do with their time, right?
The problem with picking, of course, is that there are so many variables to consider (and I know so little about the teams). Looking at the Final Four alone, my gut tells me to pick chalk—top seeds across the board—because I watch just enough ESPN to know that Kentucky, Syracuse, North Carolina, and Michigan State are really good. But I’m from Wisconsin, so my heart is with the Wisconsin Badgers. I also live in Las Vegas, so I have to root for UNLV, right? And there has to be a sleeper team that creeps into the semifinal round …
Skipping right to the good stuff
My Final Four: No. 1 Kentucky, No. 1 Michigan State, No. 2 Ohio State, and No. 3 Georgetown. My championship game: Kentucky vs. Ohio State (No. 1 vs. No. 2 is the most likely possibility). My winner: Kentucky.
To get there, I worked backward from a tool called BracketOdds, created by University of Illinois computer science professor Sheldon Jacobson, which gives the probability of any combination of seeds actually making it to any given round in the tournament. For the Final Four, the most likely seed combination is 1, 1, 2, 3, with the odds against this outcome being only 16.08 to 1. The odds against all the top seeds making it are 48.7 to 1, so I’ll take my chances on the scenario that’s three times more likely to happen. I guess a No. 3 seed is my sleeper.
This is where the gut and the heart—and more data—come into play (I like to think it’s similar in theory to this study, only far less scientific). Three separate sites for statistically ranking NCAA basketball teams (Georgia Tech’s LRMC Basketball Rankings, 2012 Pomeroy College Basketball Rankings, and USA Today‘s Jeff Sagarin NCAA Basketball Ratings) have some combination of Kentucky, Kansas, Michigan State, and Ohio State as being the best four teams in the country. The only problem: Their NCAA tournament seeds are 1, 2, 1, 2, respectively. One of them has to go.
Kentucky and Michigan State make my Final Four because they are the only No. 1 seeds that statistically rank in the top four. (If you’re wondering why the NCAA Tournament selection committee’s seedings aren’t too reliable, check out this article from Nate Silver of the New York Times.) Picking a No. 2 seed between Kansas and Ohio State is a bit trickier. My heart wants me to pick a Big Ten school (Ohio State), but my gut tells me Kansas—which won as recently as 2008—has the stuff to make a run.
So what do the data say? I haven’t found anyone who has used a prediction model to predict the results of this year’s tournament yet, so I’ll go with the ratings again (and hope my logic holds up). The two more-accurate ratings systems (LRMC and Pomeroy, albeit according to the LMRC’s measure) have Midwest region No. 1 seed North Carolina as a better team than East region No. 1 Syracuse (and even Sagarin has them within one place of each other). That would suggest No. 2 seed Ohio State, which is in the East region, has a better chance of making it out. Of course, that also means a No. 3 seed—Georgetown—might have to defeat both No. 2 seed Kansas and No. 1 seed North Carolina to make it out of the Midwest region and into the Final Four.
But I’m fine with taking that risk. Georgetown is ranked higher than East region No. 3 seed Florida State across the board and is at least within spitting distance of being one of the best teams. In this case, the data back up my heart’s desire rather than my gut feeling: Ohio State is my No. 2 seed, which leaves Georgetown as my No. 3.
My earlier-round picks still look at the ratings and somewhat at the probability of upsets—the University of Illinois’s Jacobson points out that No. 12 seeds and No. 11 seeds are equally as likely to pull stunners in the first round—but do leave a little more room for desire to play a role. I have my adopted hometown UNLV Runnin’ Rebels losing in the second round to Baylor, while my home-state Wisconsin Badgers make an Elite Eight appearance in my bracket. That No. 12 seed running rampant in the Midwest region is California, which ranks fairly high and which I assume will win its play-in game on Wednesday.
Feel free to chime in and let me know how foolish (or wise) these picks are. I’m neither a basketball expert nor a statistician, but I’m feeling better than ever about my chances.
Also from GigaOM: