Higgs Boson Project
Overview
The Higgs Boson is a recently verified fundamental particle. The goal of this challenge is to improve the accuracy of recognition of "tau tau decay" of the Higgs Boson. There is a labeled dataset so no knowledge of the physics of Higgs decay is needed. You can find out more about the challenge, and the dataset itself here Links to an external site..
Questions
This is a well-defined data challenge and the project will be an exploration of machine learning methods. Nevertheless there are some interesting questions to consider:
* how important is featurization vs. choice of learning approach? Can featurization be automated?
* can you derive a generative model for the decay process? Does that improve accuracy relative to discriminative methods?
* how much can you improve accuracy by ensemble methods (boosting and bagging)?
The Dataset
Is approximately 195.5 MB, and so can be easily handled on one machine (fits in memory).
References
This link contains background information as well as a link to the dataset.
Tools
This dataset is small, and it should be possible to process on most platforms. However, getting competitive performance will probably require a powerful platform.