Feature Selection Challenge
The challenge is over, it ended on December 12, 2003, but the web site is now open again for people who want to benchmark their system against the challenge entries.

The Challenge

The aim of the challenge in feature selection is to find feature selection algorithms that significantly outperform methods using all features, on ALL five benchmark datasets. To facilitate entering results for all five datasets, all tasks are two-class classification problems. You can download the datasets from the table below:

Dataset Size Type Features Training Examples Validation Examples Test Examples
Arcene 8.7 MB Dense 10000 100 100 700
Gisette 22.5 MB Dense 5000 6000 1000 6500
Dexter 0.9 MB Sparse integer 20000 300 300 2000
Dorothea 4.7 MB Sparse binary 100000 800 350 800
Madelon 2.9 MB Dense 500 2000 600 1800

Now that the challenge is over you may also download the validation labels.

There is now information about the datasets which was revealed at the challenge workshop. (The original participants did not have this information.)

Dataset Formats

All the data sets are in the same format and include 5 files in ASCII format:

* These labels are now made available and can be used to train systems if desired. They were not initially available to challenge participants to produce the Dec 1 results, but they were made available to produce the Dec 8 results.

The matrix data formats used are (in all cases, each line represents a pattern):

If you are a Matlab user, you can download some sample code to read and check the data.