Heritage Health Prize

My current side project is the Heritage Health Prize, a data mining competition with a $3 million first prize. I’ve teamed up with a much smarter friend and started playing around with new algorithms, in particular Random Forests.

I think we’re in with a chance. I believe that the difference between mathematicians and programmers is that mathematicians try to model the world with equations, whereas programmers use look-up tables, and this competition seems to favour the programmers (like me).

To prevent individual patients from being identified, the dataset has been massaged to conceal rarely-occurring variables. So instead of providing the patient’s age as a number they give us a range, accurate to the nearest decade. instead of a continuous variable, which could be plugged into an equation, we have nine discrete values, which favours a look-up table.

For the time being we’re limiting our efforts to building generic algorithms rather than focusing on the actual data. There are some inconsistencies in the dataset, and I wouldn’t be surprised if there are changes before the release of the rest of the data on May 4th.

But it looks like a fun competition, and I’m picking up some useful skills along the way.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s