Challenge Results

Rank Team Name top-5 accuracy (%) top-1 accuracy (%)
1 Smart Image 82.97 61.17
2 fISHpAM 82.01 59.76
3 PCI-AI 79.88 57.38
4 AntVision 77.37 53.93

Team Information

Team name Team member Method description
Smart Image Lingxi Xie, Xiaopeng Zhang, Bingcheng Liu, Zhao Yang, Zewei Du, Hang Chen, Longhui Wei, Yaxiong Chi

Huawei Cloud & Huawei 2012 Labs
Our work is implemented on Huawei ModelArts platform [1], which slightly improves accuracy while being much faster in training. As for the algorithms, the main idea is to leverage area under the margin and knowledge distillation for handling noise labels, as well as an algorithm for learning an ensemble model.
The details are as follows:
a. We use different types of state-of-the-art network architectures, including ResNeXt、ResNeSt、seNet and SE-ResNeXt;
b. We use the Area Under the Margin (AUM) algorithm [2] and knowledge distillation [3] for handling noise label;
c. Curriculum learning strategy is used to refine the network many times;
d. Training models with large resolution can improve model performance;
e. During testing, we apply multi-scale and multi-crop to each test image;
f. We also ensemble different models using different strategies.
[1] What Is ModelArts? https://support.huaweicloud.com/en-us/productdesc-modelarts/modelarts_01_0001.html
[2] Identifying Mislabeled Data using the Area Under the Margin Ranking. arXiv preprint arXiv:2001.10528 (2020).
[3] Learning from Noisy Labels with Distillation. ICCV. 2017.

Entry Description :
Entry 1: model ensemble with weighted average
Entry 2: model ensemble with different weights
Entry 3: model distillation + model ensemble with weighted average
Entry 4: model distillation + model ensemble with different weights
Entry 5: model ensemble (heuristic algorithm)
fISHpAM Canxiang Yan, Cheng Niu, Jie Zhou

Pattern Recognition Center, WeChat AI, Tencent Inc, China
We use pretraining and ensembling techniques to improve the performance. Using WordNet, each image can be mapped to several word tags (e.g., noun and adjective.). Then base models are pretrained with those multi-label images and different network archtectures. Totally, there are 43 learned models. For ensembling, we use xgboost tool to dig the abilities of learned models with a part of training set. Other methods include large-scale finetuning, hard sampling and class-balanced sampling.

Entry Description :
entry1: ensemble all base models
entry2: ensemble all base models with large-scale finetuning
entry3: ensemble all base models with hard sampling
entry4: ensemble all base models with class-balanced sampling
entry5: ensemble all entries
PCI-AI Zhiwei Wu, Shuwen Sun, Kunmin Li, Rui Zhang, Zhenjie Huang, Yanyi Feng

pcitech (https://www.pcitech.com/)
Our method is based on the ResNet and ResNet variants, ResNet101,ResNet152[1], ResNext101[2] and ResNest101[3]. Due to limited resources, we use fp16, part of training samples and less training epochs to speed up. We totaly trained 8 models. In the test stage, we use multi-scale,multi-crop and multi-model fusion.
[1] Kaiming He, Xiangyu Zhang, et al. Deep Residual Learning for Image Recognition, CVPR 2016
[2] Saining Xie, Ross Girshick, et al. Aggregated Residual Transformations for Deep Neural Networks, CVPR 2017
[3] HangZhang, et al. ResNeSt: Split-Attention Networks. arXiv:2004.08955 (2020)


Entry Description :
Entry 1: fusion 7 models with the highest validation accuracy
Entry 2: fusion 7 models with different weights
Entry 3: fusion all of the models
Entry 4: fusion all of the models with different weights
Entry 5: fusion of randomly selected models