Decision Combining
Centre for Vision, Speech and Signal Processing,
School of Electronic Eng, Information Tech. and Comp. Science
Univ .of Surrey, Guildford Surrey
email:
T.Windeatt@surrey.ac.uk
email: R.Ghaderi@surrey.ac.uk
Problem:There are five classes of overlapping two dimensional vectors awith Gaussian distribution of these parameters and we want to classify a set with equal number of samples from each class.
|
class |
C1 |
C2 |
C3 |
C4 |
C5 |
|
mean |
[0,0] |
[0,2] |
[2,0] |
[4,0] |
[0,4] |
|
Var. |
1 |
4 |
9 |
16 |
25 |
Rate of classification in Bayesian classifier is 55.75 %
Achieving optimal performance for a pattern
recognition system is not necessarily consistent with obtaining the best
performance for a single classifier. Under appropriate assumptions,
combining multiple classifiers may lead to improved generalisation
performance when compared with any one of the constituent classifiers.
However certain conditions need to be satisfied to realise the performance
improvement, in particular that the individual classifiers be not too highly
correlated.
Various methods have been devised to reduce the correlation between
classifiers before combining. For unstable classifiers, such as neural
networks and decision trees, a small perturbation in the training set may
lead to a significant change in the constructed classifier, even if
identical feature representations are used. Methods based on perturbing the
training set prior to combining, such as Bagging and Boosting, have been
shown to be effective for these unstable predictors. In Bagging, the
training set is perturbed by bootstrapping (random sampling with
replacement) and the results combined with a majority vote. Boosting also
operates on random subsets but is more complex, and constructs a filtered
sequence of classifiers that increases the probability of selecting
previously misclassified patterns -- followed by weighted vote combination.
Boosting, in particular, has been shown to be very effective for turning a
series of weak learners into a strong learner.
In our work we investigate the behaviour of multiple classifiers using
perturbed training sets as the optimal (Bayes) rate is approached. We
propose simple 2D overlapping Gaussian experiments and visualise the
decision boundaries to understand how the Bayes boundary can be
approximated. Reporting generalisation as a comparison with the Bayes
classification rate also supports our conclusions. Boosting and Bagging are
both voting schemes and for some problems achieve impressive results by
combining hard-level classifications (i.e. after the final decision of each
expert is taken). The decision boundaries that we present here are formed by
combining soft-level classifications (i.e. before the final decision is
taken) using a linear weighting scheme, whose weights are learned by a
neural network. A longer-term objective of the research is to characterise
the type of problems and conditions under which hard and soft-level
combination schemes are appropriate.
To understand the difficulties of producing a sound theoretical framework
that
incorporates weighting factors for combining multiple classifiers, when a
sequential scheme like Boosting is incorporated, see [1]. Until we are able
to provide a more coherent framework for combining with sequential
construction methods, we resort to heuristic techniques for training the
output classifier weights. Outputs of individual classifiers are treated as
new features, and a sequential partitioning technique similar to Boosting is
employed for constructing the experts. So far, in our work we have
restricted the base classifiers to be BackPropagation networks.
The main advantage of the visualisation for these simple 2D experiments is
that we can see clearly the effects of design parameter changes in a
multiple classifier system. Not only does this increase our understanding of
the system, but in our neural network (gradient descent) experiments it also
aids in developing strategies for achieving local minima with good
generalisation performance.


Number of BackProp hidden nodes both for individual experts and Oracle
BackProp termination criteria
Training subset sizes for constructing sequential
experts