| The challenge is over, it ended on December 12, 2003, but the web site is now open again for people who want to benchmark their system against the challenge entries. |
The aim of the challenge in feature selection is to find feature selection algorithms that significantly outperform methods using all features, on ALL five benchmark datasets. To facilitate entering results for all five datasets, all tasks are two-class classification problems. You can download the datasets from the table below:
| Dataset | Size | Type | Features | Training Examples | Validation Examples | Test Examples |
|---|---|---|---|---|---|---|
| Arcene | 8.7 MB | Dense | 10000 | 100 | 100 | 700 |
| Gisette | 22.5 MB | Dense | 5000 | 6000 | 1000 | 6500 |
| Dexter | 0.9 MB | Sparse integer | 20000 | 300 | 300 | 2000 |
| Dorothea | 4.7 MB | Sparse binary | 100000 | 800 | 350 | 800 |
| Madelon | 2.9 MB | Dense | 500 | 2000 | 600 | 1800 |
Now that the challenge is over you may also download the validation labels.
There is now information about the datasets which was revealed at the challenge workshop. (The original participants did not have this information.)
All the data sets are in the same format and include 5 files in ASCII format:
* These labels are now made available and can be used to train systems if desired. They were not initially available to challenge participants to produce the Dec 1 results, but they were made available to produce the Dec 8 results.
The matrix data formats used are (in all cases, each line represents a pattern):
If you are a Matlab user, you can download some sample code to read and check the data.