Home

Stochastic Discrimination Utilities

This page will ultimately contain links to various things that I have made available to the public (usually under the GPL), but right now in contains only my implementation of Dr. Eugene Kleinberg's Stochastic Discrimination algorithm ("SDUtils"), available here.

From Dr. Kleinberg's abstract to the paper linked below:

Stochastic discrimination is a general methodology for constructing classifiers appropriate for pattern recognition. It is based on combining arbitrary numbers of very weak components, which are usually generated by some pseudorandom process, and it has the property that the very complex and accurate classifiers produced in this way retain the ability, characteristic of their weak component pieces, to generalize to new data. In fact, it is often observed, in practice, that classifier performance on test sets continues to rise as more weak components are added, even after performance on training sets seems to have reached a maximum. This is predicted by the underlying theory, for even though the formal error rate on the training set may have reached a minimum, more sophisticated measures intrinsic to this method indicate that classifier performance on both training and test sets continues to improve as complexity increases.

The documentation makes no attempt to describe the algorithm. Interested parties should start with the papers available on Dr. Kleinberg's own stochastic discrimination site.

This is the paper that got me started.

If you wind up using this implementation for something cool, I'd be delighted to hear about it. You can find my email address in the documentation.

Good luck.

- David C. Lambert

11 September 2002