Decision Combining



Dynamic Weighting Factors for Decision Combining


Terry Windeatt

Reza Ghaderi

 Centre for Vision, Speech and Signal Processing,
School of Electronic Eng, Information Tech. and Comp. Science
Univ .of Surrey, Guildford Surrey

email:
T.Windeatt@surrey.ac.uk

email:
R.Ghaderi@surrey.ac.uk


Problem:There are five classes of overlapping two dimensional vectors awith Gaussian distribution of these parameters and we want to classify a set with equal number of samples from each class.

class

C1

C2

C3

C4

C5

mean

[0,0]

[0,2]

[2,0]

[4,0]

[0,4]

Var.

1

4

9

16

25

 

Rate of classification in Bayesian classifier is 55.75 %


Achieving optimal performance for a pattern recognition system is not necessarily consistent with obtaining the best performance for a single classifier. Under appropriate assumptions, combining multiple classifiers may lead to improved generalisation performance when compared with any one of the constituent classifiers. However certain conditions need to be satisfied to realise the performance improvement, in particular that the individual classifiers be not too highly correlated.

Various methods have been devised to reduce the correlation between classifiers before combining. For unstable classifiers, such as neural networks and decision trees, a small perturbation in the training set may lead to a significant change in the constructed classifier, even if identical feature representations are used. Methods based on perturbing the training set prior to combining, such as Bagging and Boosting, have been shown to be effective for these unstable predictors. In Bagging, the training set is perturbed by bootstrapping (random sampling with replacement) and the results combined with a majority vote. Boosting also operates on random subsets but is more complex, and constructs a filtered sequence of classifiers that increases the probability of selecting previously misclassified patterns -- followed by weighted vote combination. Boosting, in particular, has been shown to be very effective for turning a series of weak learners into a strong learner.

In our work we investigate the behaviour of multiple classifiers using perturbed training sets as the optimal (Bayes) rate is approached. We propose simple 2D overlapping Gaussian experiments and visualise the decision boundaries to understand how the Bayes boundary can be approximated. Reporting generalisation as a comparison with the Bayes classification rate also supports our conclusions. Boosting and Bagging are both voting schemes and for some problems achieve impressive results by combining hard-level classifications (i.e. after the final decision of each expert is taken). The decision boundaries that we present here are formed by combining soft-level classifications (i.e. before the final decision is taken) using a linear weighting scheme, whose weights are learned by a neural network. A longer-term objective of the research is to characterise the type of problems and conditions under which hard and soft-level combination schemes are appropriate.

To understand the difficulties of producing a sound theoretical framework that
incorporates weighting factors for combining multiple classifiers, when a sequential scheme like Boosting is incorporated, see [1]. Until we are able to provide a more coherent framework for combining with sequential construction methods, we resort to heuristic techniques for training the output classifier weights. Outputs of individual classifiers are treated as new features, and a sequential partitioning technique similar to Boosting is employed for constructing the experts. So far, in our work we have restricted the base classifiers to be BackPropagation networks.

The main advantage of the visualisation for these simple 2D experiments is that we can see clearly the effects of design parameter changes in a multiple classifier system. Not only does this increase our understanding of the system, but in our neural network (gradient descent) experiments it also aids in developing strategies for achieving local minima with good generalisation performance.




  

  

  • Comparison of Bayes boundary with decision boundary obtained from weighted combination of multiple neural network classifiers on sequentially filtered random subsets of training data

  


A set of Matlab files is available (from R.Ghaderi@surrey.ac.uk) for viewing decision boundaries as the following parameters are varied:

  Number of BackProp hidden nodes both for individual experts and Oracle

  BackProp termination criteria

 Training subset sizes for constructing sequential experts