Challenge Results

Rank Team name Top-5 Accuracy (%)
1 Vibranium 79.25
2 Overfit 75.30
3 ACRV_ANU 69.56
4 EBD_birds 69.44
5 INFIMIND 68.74
6 CMIC 61.14

Team Information

Team name Team member Method description
Vibranium Jianfeng Zhu, Lele Cheng, Ying Su, Lin Ye, Pengcheng Yuan, Shumin Han, Jia Li


Beihang University.
Our work in large-scale web image classification challenge is based on Resnext,DPN,SENet etc. The key improvement points are as follows: 1. Class-weighted loss. To handle data imbalance problem, we use the class-weighted loss strategy. The class weight is in inverse proportion to its amount in training dataset. 2. Clustered-weighted sampling strategy. Balanced sampling strategy is used to adapt unbalanced challenge dataset. To tackle uneven image quality problem, we further design cluster-weighted sampling strategy. Features of training instances were extracted and clustered. In every cluster, each instance is assigned a sampling weight according to the average image quality. Images with higher sampling weights are sampled more times than images with lower weights. 3. Instance-weighted sampling strategy. Besides clustering method, we also trained a model to assign the sampling weight of each image. The model is trained to learn the relativity between query word vectors and image features using google images with lower rank. We find images with good quality has higher correlation score while noisy images not. Therefore we use the correlation score as the sampling weight of each training instance. 4. Multi-instance learning(MIL). By formulating noisy image classification as a MIL problem, we present a bag+instance loss function given bag-level and instance-level supervision. 5. Ensemble. Multi-scale and multi-crop testing methods are used. We design probability and non-probability based voting strategies on the class level to ensemble our 29 models.
Overfit Shengju Qian1, Qing Lian1, Wayne Wu2, Fumin Shen1, Chen Qian2, Heng Tao Shen1

1University of Electronic Science and Technology of China (UESTC)
2SenseTime Research
Our method is mainly based on squeeze and excitation structure on ResneXt101, and we have also explored the performance of various inception-based and resnet-based models including incepv3, densenet161, densenet169, resnet101 and so on. The final result is based on ensembling above five selected models. We use center crop during inference. The training phase is made of there stages. At the first stage, we use all noisy labeled data to train a 'coarse' network. At the second stage, some clean samples are selected out based the confidence value the network produces if the value is bigger than a manually-set threshold. We only use those "clean" samples to train a 'fine' network at this stage. At the third stage, We again use all images to train the network, which could bring some useful noise to make the network more robust. Different learning rate for convolution layers and FC layers is used which apprently gives better performance to the network.
ACRV_ANU Rodrigo Santa Cruz and Stephen Gould

Australian National University (ANU) and Australian Center of Excelence for Robotic Vision (ACRV)
Our method is based on densenet121 trained with standard stochastic gradient descent and scheduled learning rate decay. We optimize the combination of softmax outputs and cross entropy loss. In order to handle the unbalanced data, we sample the images according to the inverse of its class frequency. We focus on exploring self-supervised pretraining as way to promote robstuness to label noyse. We pretrain our model in a pretext task using the webvision training data. More specifically, we discard the labels from the training data and train our model in the visual permutation learning task proposed by Santa Cruz et al. on the paper "DeepPermNet: Visual Permutation Learning" in CVPR17. At test time, we average our predictions over different image cropping schema.
Entry Description:
Entry 1: DenseNEt + 10 Crops
Entry 2: DenseNEt + 10 Crop
Entry 3: DenseNEt + pretrain + 10 Crops
Entry 4: DenseNEt + pretrain + 10 Crops
Entry 5: DenseNEt + pretrain + 10 Crops
EBD_birds Chen Yang, Yao Chen, Changsheng Li, Lixin Duan

UESTC, and YouEData
Our method is based on the googlenet-bn and resnet50, and multicrop to add prediction accuracy.
7. 5. Entry Description:
Entry 1: single model googlenet-bn A
Entry 2: model googlenet-bn A with five multicrop
Entry 3: model googlenet-bn A with ten multicrop
Entry 4: model googlenet-bn B with five multicrop
Entry 5: model googlenet-bn B with ten multicrop
INFIMIND Wu Bin, Xu Zhen, Li Yongbin

1 incepv4_train_original_data
2 incepv4_train_original_data_more_time 3 incepv4_train_original_data_and_the_same_label_distribute_like_val 4 incepv4_train_more_time 5 incepv4_train_more_time.
CMIC Hao Wu, Jiangchao Yao, Jiajie Wang

Our method is based on the ResNet with a contrastive-additive network

7. 5. Entry Description: Entry 1: test entry Entry 2: test entry Entry 3: test entry Entry 4: test entry Entry 5: test entry