In the era of big data, sample acquisition has never been a problem in the recommendation and advertising fields. It seems that pseudo-labelling techniques for small sample learning are fading out of sight, but in fact in the fields of finance, medical imaging and security, where samples are precious, pseudo-labelling is a sharp dagger, simple and effective.
What is pseudolabeling technology
The definition of pseudo-labelling comes from semi-supervised learning, the core idea of which is to improve the performance of a model in a supervised process by drawing on unlabelled data.
For a simple example of semi-supervised learning, I wanted to train a model to diagnose breast cancer from chest radiographs, but the expert had to charge for labelling one chest radiograph, so I emptied my wallet and asked the expert to label 10 chest radiographs for me, but I had to divide the 10 images into a training set and a test set, so I had to overfit the training.
I asked the expert if he would pay for the unlabeled chest films. The expert was stunned, no money,replica watches feel free to take (ignore the issue of patient privacy here, just give an example). So I took out 1 marked chest film and exchanged it for 10 unmarked ones and skipped out before the specialist could catch his breath.
Back at home, I began the semi-supervised learning process shown in the diagram ~
Roughly speaking, pseudolabeling is a process of using a model trained on the labeled data to make predictions on the unlabeled data, filtering the samples based on the predictions, and feeding them back into the model for training.
In practice, however, the application of pseudo-labelling is much less straightforward than it sounds, so let’s take a look at how pseudo-labelling is practised.
Specific uses of pseudolabelling
There is a great deal of freedom in the use of pseudo-labelling techniques, and here we introduce the three most commonly used and effective ones. For some specific scenarios, there may be more fancy methods, and here we hope to throw in a few ideas to broaden your horizons.
- training a supervised model M using labelled data
- use supervised model M to make predictions on unlabelled data to derive a prediction probability P
- filtering high-confidence samples by prediction probability P
- train a new model M’ using labelled data as well as pseudo-labelled data
- train a supervised model M using labelled data
- use supervised model M to predict unlabelled data to obtain prediction probability P
- filter high confidence samples by prediction probability P
- train new model M’ with labelled data and pseudo-labelled data
- Replace M with M’ and repeat the above steps until there is no improvement in model performance
- Train supervised model M with labelled data
- Use the supervised model M to predict the unlabelled data and obtain the prediction probability P
- change the model loss function to Loss = loss(labeled_data) + alpha*loss(unlabeled_data)
- Train the new model M’ using both labelled data and pseudo-labelled data
These are the three most common methods of pseudo-labeled learning.
In the spirit of knowing what works and knowing why, the following describes why pseudo-labelling works. After knowing why it works, we can find its suitable scenarios for semi-supervised learning purposes.
Why pseudo-labels work
In the paper Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks, it is explained why pseudo-label learning is effective, and its effectiveness can be considered in two ways, as follows.
Low-Density Separation between Classes
“The goal of semi-supervised learning is to improve generalization performance using unlabeled data. The cluster assumption states that The cluster assumption states that the decision boundary should lie in low-density regions to improve generalization performance (Chapelle et al., 2005). Recently proposed methods of training neural networks using manifold learning such as Semi-Supervised Embedding and Manifold Tangent Classifier utilize this assumption. Semi-Supervised Embedding (Weston et al., 2008) uses embedding-based regularizer to improve the generalization Because neighbors of a data sample have similar activations with the sample by embedding based penalty term, it ‘s more likely that data samples Manifold Tangent Classifier (Rifai et al., 2011b) encourages the network output to be insensitive. The network output to be insensitive to variations in the directions of low-dimensional manifold. so the same purpose is achieved.”
“Entropy Regularization (Grandvalet et al., 2006) is a means to benefit from unlabeled data in the framework of maximum a posteriori estimation. This scheme favors low density separation between classes without any modeling of the density by minimizing the conditional entropy of class probabilities for unlabeled data.”
According to the clustering assumption (cluster assumption), these points with higher probabilities are usually more likely to be in the same class, so their pseudo-label is highly plausible. (Plausibility)
Entropy regularisation, a method of obtaining information from unlabelled data within a maximum replica breitling watches a posteriori estimation framework, facilitates low density separation between classes by minimising the conditional entropy of the class probabilities of unlabelled data without any modelling of density, by entropy regularisation having the same effect of action as pseudo-labelling, both wishing to exploit information about the degree of overlap in the distribution of unlabelled data.
Outside of theory, the pseudo-labelling technique gives the first impression of using samples with high confidence levels to improve the fit of the model. In terms of clustering assumptions and entropy regularisation, this is in line with our feelings, which makes the use of this technique natural.