| The challenge is over, it ended on December 12, 2003, but the web site is now open again for people who want to benchmark their system against the challenge entries. |
The results are evaluated according to the following performance measures.
The results for a classifier can be represented in a confusion matrix, where a,b,c and d represent the number of examples falling into each possible outcome:
| Prediction | |||
|---|---|---|---|
| Class -1 | Class +1 | ||
| Truth | Class -1 | a | b |
| Class +1 | c | d | |
The balanced error rate is the average of the errors on each class: BER = 0.5*(b/(a+b) + c/(c+d)).
The area under curve is defined as the area under the ROC curve. This area is equivalent to the area under the curve obtained by plotting a/(a+b) against d/(c+d) for each confidence value, starting at (0,1) and ending at (1,0). The area under this curve is calculated using the trapezoid method. In the case when no confidence values are supplied for the classification the curve is given by {(0,1),(d/(c+d),a/(a+b)),(1,0)} and AUC = 1 - BER.
The fraction of features is simply the ratio of the number of features used by the classifier to the total number of features in the dataset.
Some additional features were added to the original datasets having similar distributions to the original features; these additional features are termed probes. The fraction of probes is simply the ratio of the number of probes used to the number of features used by a classifier.