# Challenge

The goal of this challenge is to advance the area of learning knowledge and representation from web data. The web data not only contains huge numbers of visual images, but also rich meta information concerning these visual data, which could be exploited to learn good representations and models. We organize two tasks to evaluate the learned knowledge and representation: (1) WebVision Image Classification Task, and (2) Pascal VOC Transfer Learning Task. The second task is built upon the first task. Researchers can participate into only the first task, or both tasks.

News: A 10,000\$ cash award will be given to the winners of the challenge!

The WebVision dataset is composed of training, validation, and test set. The training set is downloaded from Web without any human annotation. The validation and test set are human annotated, where the labels of validation data are provided but the labels of test data are withheld. To imitate the setting of learning from web data, the participants are required to learn their models solely on the training set and submit classification results on the test set. The validation set could only be used to evaluate the algorithms during development (see details in Honor Code). Each submission will produce a list of 5 labels in the descending order of confidence for each image. The recognition accuracy is evaluated based on the label which best matches the ground truth label for the image. Specifically, an algorithm will produce a label list: $$c_i$$, $$i=1,...,5$$ for each image and the ground truth labels of the image are: $$y_j$$, $$j = 1,..., n$$ with n class labels. The error of this prediction is defined as: $$E = \frac{1}{n} \sum_{j=1}^n \min_{i} d(c_i, y_j).$$ The $$d(c_i,y_j)$$ is calculated as 0 if $$c_i=y_j$$ and 1 otherwise. The final errors of the algorithm is the average corresponding error across all test images. For this version of the challenge, there is only one ground truth label for each image (i.e., $$n=1$$).