Jakob Manz

Semester Work
Supervisors: Christoph Mayer and Dr. Radu Timofte

Cross Domain Semi-Supervised Learning

A common assumption in Semi-Supervised learning is that labeled and unlabeled data sets have a similar distribution, e.g. that the labeled set is sampled from the unlabeled data set. In practice, this is however often not the case and mismatches may exist between the two sets. Labeled and unlabeled data can come from different domains and/or the present label categories in the labeled and unlabeled data sets may differ. In this thesis, we evaluated how well the state-of-the-art VAT method operates in presence of these mismatches. Examined on a Wide ResNet we encountered a decrease in performance when different digits data sets were combined such as labeled samples from SVHN and unlabeled samples form MNIST. We implemented with Separate VAT, Split Batch Normalization and Domain Classification three modifications of the original method to overcome these limitations. Whereas Split Batch Normalization failed to improve the test accuracy compared to VAT, Separate VAT reached a similar test error rate (12.8 % to 12.7 %). Adding a Domain Classifier to VAT outperformed VAT by 1% in test accuracy on one experiment using unlabeled samples from Syn Numbers and labeled samples from SVHN. Hence, Domain Classification might fail on dissimilar data sets but can help to overcome domain mismatch between labeled and unlabeled data sets with rather similar data distributions.