Semi-supervised learning
From Wikipedia, the free encyclopedia
Semi-supervised learning is a type of machine learning technique which makes use of both unlabeled and labeled data for training - typically a small amount of labeled data with a large amount of unlabeled data. Many machine learning researchers have found that unlabeled data, when used in conjunction with a small amount of labeled data, can produce considerable improvement in learning accuracy. The acquisition of labeled data for a learning problem often requires a skilled human agent to manually classify training examples. The cost associated with the labeling process thus may render a fully labeled training set infeasible, whereas acquisition of unlabeled data is relatively inexpensive. In such situations, semi-supervised learning can be of great practical value.
One example of a semi-supervised learning technique is co-training, in which two or possibly more learners are each trained on a set of examples, but with each learner using a different, and ideally independent, set of features for each example.
An alternative approach is to model the joint probability distribution of the features and the labels. For the unlabelled data the labels can then be treated as 'missing data'. It is common to use the EM algorithm to maximise the likelihood of the model.
[edit] See Also
[edit] References
Blum, A., Mitchell, T. Combining labeled and unlabeled data with co-training. COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann, 1998, p. 92-100.
Chapelle, O., B. Schölkopf and A. Zien: Semi-Supervised Learning. MIT Press, Cambridge, MA (in press) (2006). further information.
Huang T-M., Kecman V., Kopriva I. [1], "Kernel Based Algorithms for Mining Huge Data Sets, Supervised, Semisupervised and Unsupervised Learning", Springer-Verlag, Berlin, Heidelberg, 260 pp. 96 illus., Hardcover, ISBN 3-540-31681-7, 2006