Mar 22, 2022|

A Major Achievement: Super Deep Learning Model by JD Explore Academy


by Doris Liu

A super deep learning model, ViTAEv2, which was jointly proposed by JD Explore Academy and the University of Sydney, Australia, was recently ranked top in accuracy of image classification on ImageNet ReaL without using extra training data.

With larger scale, better results and better adaptability to various vision tasks, the model with 600 million parameters set a world record of 91.2 percent accuracy in February in the field of image classification technology by not using extra training data.

The ViTAEv2 model adopts a “pre-training and fine-tuning” paradigm to make a breakthrough in model architecture and training paradigm, which fully uses the effectiveness of inductive bias in large-scale models, as well as pre-training and migration learning algorithms that fit the structure of the model to achieve the target results.

JD Explore Academy also developed the few-shot learning capability of the large-scale ViTAEv2 model by fine-tuning the large-scale ViTAEv2 model with 1, 10 and 100 percent of the data respectively, and the result showed that when fine-tuned with only a small amount of data, namely 10 percent, the large-scale model significantly outperformed the smaller scale model which was using all the data. “It further confirmed that the large-scale model has a strong ability in few-shot learning, which indicates that the super deep model has strong representation ability, learning ability and sample efficiency,” said a deep learning scientist from JD Explore Academy.

The ViTAEv2 model’s ability to help solve challenging tasks with low or even zero resources was fully validated, as well as its ability to reduce data annotation costs, accelerate algorithm development cycles, simplify model deployment, empower and facilitate the R&D and implementation of next-generation automated learning technologies.

As the largest public dataset for image classification, the ImageNet dataset has attracted the attention and participation of top international technology companies such as Google, Microsoft and Facebook, as well as leading research universities such as Stanford University, Massachusetts Institute of Technology and National University of Singapore especially in terms of the accuracy ranking list. Its data metrics are widely recognized as an important criterion to measure the level of computer vision technology.

The performance of the ViTAEv2 model helped JD Explore Academy take its computer vision models to a new level, and is expected to continue to push forward the development of a range of vision tasks, such as semantic segmentation, object detection, pose estimation, and video object segmentation.