Datasets for benchmarking and comparative studies


An archive of data set archives may be found on the PRInfo pages.

TC1 has the intention to collect pointers to existing public data sets that may be useful for benchmarking and comparative studies. During the MCS 2001 on combining classifiers the participants agreed to focus in particular on data sets and protocols suitable for comparing multiple classifier systems. Pointers to suggested material will be made available here later.

Some references on benchmarking:
  • Z. Zheng, A benchmark for classifier learning. Proceedings of the 6th Australian Joint Conference on Artificial Intelligence, Singapore: World Scientific, 281-286, 1993. [ai93-benchmark.ps.gz, 54kB]
  • Lutz Prechelt. PROBEN1 -- A Set of Benchmarks and Benchmarking Rules for Neural Network Training Algorithms. [1994-21.ps.gz]. Technical Report 21/94, 38 pages, Fakultät für Informatik, Universität Karlsruhe, September 1994. (see also the CiteSeer page)
  • D.J. Hand, Construction and Assessment of Classification Rules, Wiley, 1997.
  • Michie, D., Spiegelhalter, D.J., and Taylor, C.C., Machine Learning, Neural and Statistical Classification, Ellis Horwood, New York, 1994.
  • R.P.W. Duin, A note on comparing classifiers, Pattern Recognition Letters, vol. 17, no. 5, 1996, 529-536.
  • D.A. Rachkovskij and E.M. Kussul, Datagen - a generator of datasets for evaluation of classification algorithms, Pattern Recognition Letters, vol. 19, no. 7, 1998, 537-544.
We received the following contribution of MCS participants:
  • Daniel Keysers uses the USPS dataset in a number of his papers.
  • Paul C. Smits report experiments with a multispectral data set  published as the electronic annexes of:
    • Giorgio Giacinto, Fabio Roli and Lorenzo Bruzzone, "Combination of neural and statistical algorithms for supervised classification of remote-sensing images", Pattern Recognition Letters, 21 (5) (2000) pp. 385-397.

    as well as  with hyperspectral remote sensing data which can be downloaded from the MultiSpec website maintained by Purdue University in:

    • P.C. Smits, "Combining Supervised Remote Sensing Image Classifiers Based on Individual Class Performances," In: Proceedings of the International workshop on Multiple Classifier Systems (MCS 01), Cambridge, UK, July 2-4, 2001. Lecture notes in Computer Science Vol. 2096: Springer, 2001.

  • Robert P.W. Duin uses a multi-feature digit dataset in:
    • R.P.W. Duin and D.M.J. Tax, Experiments with Classifier Combining Rules, in: J. Kittler, F. Roli (eds.), Multiple Classifier Systems (Proc. First International Workshop, MCS 2000, Cagliari, Italy, June 2000), Lecture Notes in Computer Science, vol. 1857, Springer, Berlin, 2000, 16-29.
    • A.K. Jain, R.P.W. Duin, and J. Mao, Statistical Pattern Recognition: A Review, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 1, 2000, 4-37.

  • Emilio Benfenati reports the data used in EC project COMET containing chemicals, descriptors (inputs) and toxicity values (output). data, 263kB (zipped Excel file), description, 25kB (Word file)