Datasets for benchmarking and comparative studies
An archive of data set archives may be found on the
PRInfo
pages.
TC1 has the intention to collect pointers to existing public data sets
that may be useful for benchmarking and comparative studies. During the
MCS
2001 on combining classifiers the participants agreed to focus in particular
on data sets and protocols suitable for comparing multiple classifier systems.
Pointers to suggested material will be made available here later.
Some references on benchmarking:
- Z. Zheng, A benchmark
for classifier learning. Proceedings of the 6th Australian Joint Conference
on Artificial Intelligence, Singapore: World Scientific, 281-286, 1993.
[ai93-benchmark.ps.gz,
54kB]
- Lutz
Prechelt. PROBEN1 -- A Set of Benchmarks and Benchmarking Rules for
Neural Network Training Algorithms. [1994-21.ps.gz].
Technical Report 21/94, 38 pages, Fakultät für Informatik, Universität
Karlsruhe, September 1994. (see also the CiteSeer
page)
- D.J. Hand, Construction
and Assessment of Classification Rules, Wiley, 1997.
- Michie, D., Spiegelhalter, D.J., and Taylor, C.C., Machine Learning, Neural
and Statistical Classification, Ellis Horwood, New York, 1994.
- R.P.W. Duin, A
note on comparing classifiers, Pattern Recognition Letters, vol. 17, no. 5, 1996, 529-536.
- D.A. Rachkovskij and E.M. Kussul, Datagen - a generator of datasets for
evaluation of classification algorithms, Pattern Recognition Letters, vol. 19, no. 7, 1998, 537-544.
We received the following contribution of
MCS participants:
- Daniel Keysers uses
the USPS
dataset in a number of his papers.
- Paul C. Smits report experiments with
a multispectral data set published as the electronic
annexes of:
- Giorgio Giacinto, Fabio Roli and Lorenzo Bruzzone, "Combination of
neural and statistical algorithms for supervised classification of remote-sensing
images", Pattern Recognition Letters, 21 (5) (2000) pp. 385-397.
as well as with hyperspectral remote sensing data which can be downloaded
from the MultiSpec
website maintained by Purdue University in:
- P.C. Smits, "Combining Supervised Remote Sensing Image Classifiers
Based on Individual Class Performances," In: Proceedings of the International
workshop on Multiple Classifier Systems (MCS 01), Cambridge, UK, July
2-4, 2001. Lecture notes in Computer Science Vol. 2096: Springer, 2001.
- Robert P.W. Duin uses
a multi-feature
digit dataset in:
- R.P.W. Duin and D.M.J. Tax, Experiments
with Classifier Combining Rules, in: J. Kittler, F. Roli (eds.), Multiple
Classifier Systems (Proc. First International Workshop, MCS 2000, Cagliari,
Italy, June 2000), Lecture Notes in Computer Science, vol. 1857, Springer,
Berlin, 2000, 16-29.
- A.K. Jain, R.P.W. Duin, and J. Mao, Statistical
Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. 22, no. 1, 2000, 4-37.
- Emilio
Benfenati reports the data used in EC project COMET containing
chemicals, descriptors (inputs) and toxicity values (output).
data, 263kB (zipped Excel file), description, 25kB (Word file)