130 KiB
130 KiB
Awesome Computer Vision Models !Awesome (https://awesome.re/badge-flat.svg) (https://awesome.re)
A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.
Contents
- Classification models (#classification-models)
- Segmentation models (#segmentation-models)
- Detection models (#detection-models)
Classification models
│ Model │Number of parameters│ FLOPS │Top-1 Error│Top-5 Error│Year│
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┼──────────┼───────────┼───────────┼────┤
│ AlexNet ('One weird trick for parallelizing convolutional neural networks' (https://arxiv.org/abs/1404.5997)) │ 62.3M │1,132.33M │ 40.96 │ 18.24 │2014│
│ VGG-16 ('Very Deep Convolutional Networks for Large-Scale Image Recognition' (https://arxiv.org/abs/1409.1556)) │ 138.3M │ ? │ 26.78 │ 8.69 │2014│
│ ResNet-10 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 5.5M │ 894.04M │ 34.69 │ 14.36 │2015│
│ ResNet-18 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 11.7M │1,820.41M │ 28.53 │ 9.82 │2015│
│ ResNet-34 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 21.8M │3,672.68M │ 24.84 │ 7.80 │2015│
│ ResNet-50 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 25.5M │3,877.95M │ 22.28 │ 6.33 │2015│
│ InceptionV3 ('Rethinking the Inception Architecture for Computer Vision' (https://arxiv.org/abs/1512.00567)) │ 23.8M │ ? │ 21.2 │ 5.6 │2015│
│ PreResNet-18 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 11.7M │1,820.56M │ 28.43 │ 9.72 │2016│
│ PreResNet-34 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 21.8M │3,672.83M │ 24.89 │ 7.74 │2016│
│ PreResNet-50 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 25.6M │3,875.44M │ 22.40 │ 6.47 │2016│
│ DenseNet-121 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 8.0M │2,872.13M │ 23.48 │ 7.04 │2016│
│ DenseNet-161 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 28.7M │7,793.16M │ 22.86 │ 6.44 │2016│
│ PyramidNet-101 ('Deep Pyramidal Residual Networks' (https://arxiv.org/abs/1610.02915)) │ 42.5M │8,743.54M │ 21.98 │ 6.20 │2016│
│ ResNeXt-14(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 9.5M │1,603.46M │ 30.32 │ 11.46 │2016│
│ ResNeXt-26(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 15.4M │2,488.07M │ 24.14 │ 7.46 │2016│
│ WRN-50-2 ('Wide Residual Networks' (https://arxiv.org/abs/1605.07146)) │ 68.9M │11,405.42M│ 22.53 │ 6.41 │2016│
│ Xception ('Xception: Deep Learning with Depthwise Separable Convolutions' (https://arxiv.org/abs/1610.02357)) │ 22,855,952 │8,403.63M │ 20.97 │ 5.49 │2016│
│ InceptionV4 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 42,679,816 │12,304.93M│ 20.64 │ 5.29 │2016│
│ InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 55,843,464 │13,188.64M│ 19.93 │ 4.90 │2016│
│ PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' (https://arxiv.org/abs/1611.05725)) │ 95,366,600 │34,821.34M│ 19.10 │ 4.52 │2016│
│ DarkNet Ref ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 7,319,416 │ 367.59M │ 38.58 │ 17.18 │2016│
│ DarkNet Tiny ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 1,042,104 │ 500.85M │ 40.74 │ 17.84 │2016│
│ DarkNet 53 ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 41,609,928 │7,133.86M │ 21.75 │ 5.64 │2016│
│ SqueezeResNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 40.09 │ 18.21 │2016│
│ SqueezeNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 39.31 │ 17.72 │2016│
│ ResAttNet-92 ('Residual Attention Network for Image Classification' (https://arxiv.org/abs/1704.06904)) │ 51.3M │ ? │ 19.5 │ 4.8 │2017│
│ CondenseNet (G=C=8) ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions' (https://arxiv.org/abs/1711.09224)) │ 4.8M │ ? │ 26.2 │ 8.3 │2017│
│ DPN-68 ('Dual Path Networks' (https://arxiv.org/abs/1707.01629)) │ 12,611,602 │2,351.84M │ 23.24 │ 6.79 │2017│
│ ShuffleNet x1.0 (g=1) ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' (https://arxiv.org/abs/1707.01083)) │ 1,531,936 │ 148.13M │ 34.93 │ 13.89 │2017│
│ DiracNetV2-18 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 11,511,784 │1,796.62M │ 31.47 │ 11.70 │2017│
│ DiracNetV2-34 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 21,616,232 │3,646.93M │ 28.75 │ 9.93 │2017│
│ SENet-16 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 31,366,168 │5,081.30M │ 25.65 │ 8.20 │2017│
│ SENet-154 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 115,088,984 │20,745.78M│ 18.62 │ 4.61 │2017│
│ MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' (https://arxiv.org/abs/1704.04861)) │ 4,231,976 │ 579.80M │ 26.61 │ 8.95 │2017│
│ NASNet-A 4@1056 ('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 5,289,978 │ 584.90M │ 25.68 │ 8.16 │2017│
│ NASNet-A 6@4032('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 88,753,150 │23,976.44M│ 18.14 │ 4.21 │2017│
│ DLA-34 ('Deep Layer Aggregation' (https://arxiv.org/abs/1707.06484)) │ 15,742,104 │3,071.37M │ 25.36 │ 7.94 │2017│
│ AirNet50-1x64d (r=2) ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' (https://ieeexplore.ieee.org/document/8510896)) │ 27.43M │ ? │ 22.48 │ 6.21 │2018│
│ BAM-ResNet-50 ('BAM: Bottleneck Attention Module' (https://arxiv.org/abs/1807.06514)) │ 25.92M │ ? │ 23.68 │ 6.96 │2018│
│ CBAM-ResNet-50 ('CBAM: Convolutional Block Attention Module' (https://arxiv.org/abs/1807.06521)) │ 28.1M │ ? │ 23.02 │ 6.38 │2018│
│ 1.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 921,816 │ 285.82M │ 40.77 │ 17.85 │2018│
│ 1.5-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 1,953,616 │ 550.97M │ 33.81 │ 13.01 │2018│
│ 2.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 3,366,344 │ 897.60M │ 29.63 │ 10.66 │2018│
│ ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' (https://arxiv.org/abs/1807.11164)) │ 2,278,604 │ 149.72M │ 31.44 │ 11.63 │2018│
│ 456-MENet-24×1(g=3) ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' (https://arxiv.org/abs/1803.09127)) │ 5.3M │ ? │ 28.4 │ 9.8 │2018│
│ FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' (https://arxiv.org/abs/1802.03750)) │ 2,901,288 │ 147.46M │ 34.23 │ 13.38 │2018│
│ MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks' (https://arxiv.org/abs/1801.04381)) │ 3,504,960 │ 329.36M │ 26.97 │ 8.87 │2018│
│ IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' (https://arxiv.org/abs/1806.00178)) │ 3.5M │ ? │ 28.22 │ 9.54 │2018│
│ DARTS ('DARTS: Differentiable Architecture Search' (https://arxiv.org/abs/1806.09055)) │ 4.9M │ ? │ 26.9 │ 9.0 │2018│
│ PNASNet-5 ('Progressive Neural Architecture Search' (https://arxiv.org/abs/1712.00559)) │ 5.1M │ ? │ 25.8 │ 8.1 │2018│
│ AmoebaNet-C ('Regularized Evolution for Image Classifier Architecture Search' (https://arxiv.org/abs/1802.01548)) │ 5.1M │ ? │ 24.3 │ 7.6 │2018│
│ MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile' (https://arxiv.org/abs/1807.11626)) │ 4,308,816 │ 317.67M │ 31.58 │ 11.74 │2018│
│ IBN-Net50-a ('Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' (https://arxiv.org/abs/1807.09441)) │ ? │ ? │ 22.54 │ 6.32 │2018│
│ MarginNet ('Large Margin Deep Networks for Classification' (http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) │ ? │ ? │ 22.0 │ ? │2018│
│ A^2 Net ('A^2-Nets: Double Attention Networks' (http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) │ ? │ ? │ 23.0 │ 6.5 │2018│
│ FishNeXt-150 ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction' │ 26.2M │ ? │ 21.5 │ ? │2018│
│ (http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf)) │ │ │ │ │ │
│ Shape-ResNet ('IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' (https://arxiv.org/pdf/1811.12231v2.pdf)) │ 25.5M │ ? │ 23.28 │ 6.72 │2019│
│ SimCNN(k=3 train) ('Greedy Layerwise Learning Can Scale to ImageNet' (https://arxiv.org/pdf/1812.11446.pdf)) │ ? │ ? │ 28.4 │ 10.2 │2019│
│ SKNet-50 ('Selective Kernel Networks' (https://arxiv.org/pdf/1903.06586.pdf)) │ 27.5M │ ? │ 20.79 │ ? │2019│
│ SRM-ResNet-50 ('SRM : A Style-based Recalibration Module for Convolutional Neural Networks' (https://arxiv.org/pdf/1903.10829.pdf)) │ 25.62M │ ? │ 22.87 │ 6.49 │2019│
│ EfficientNet-B0 ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 5,288,548 │ 414.31M │ 24.77 │ 7.52 │2019│
│ EfficientNet-B7b ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 66,347,960 │39,010.98M│ 15.94 │ 3.22 │2019│
│ ProxylessNAS ('PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' (https://arxiv.org/pdf/1812.00332.pdf)) │ ? │ ? │ 24.9 │ 7.5 │2019│
│ MixNet-L ('MixNet: Mixed Depthwise Convolutional Kernels' ( https://arxiv.org/abs/1907.09595)) │ 7.3M │ ? │ 21.1 │ 5.8 │2019│
│ ECA-Net50 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 24.37M │ 3.86G │ 22.52 │ 6.32 │2019│
│ ECA-Net101 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 7.3M │ 7.35G │ 21.35 │ 5.66 │2019│
│ ACNet-Densenet121 ('ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' (https://arxiv.org/abs/1908.03930)) │ ? │ ? │ 24.18 │ 7.23 │2019│
│ LIP-ResNet-50 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 23.9M │ 5.33G │ 21.81 │ 6.04 │2019│
│ LIP-ResNet-101 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 42.9M │ 9.06G │ 20.67 │ 5.40 │2019│
│ LIP-DenseNet-BC-121 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 8.7M │ 4.13G │ 23.36 │ 6.84 │2019│
│ MuffNet_1.0 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' │ 2.3M │ 146M │ 30.1 │ ? │2019│
│ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │ │ │ │ │ │
│ MuffNet_1.5 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' │ 3.4M │ 300M │ 26.9 │ ? │2019│
│ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │ │ │ │ │ │
│ ResNet-34-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 21.8M │3,672.68M │ 25.80 │ ? │2019│
│ ResNet-50-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 25.5M │3,877.95M │ 22.96 │ ? │2019│
│ MobileNetV2-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 3,504,960 │ 329.36M │ 27.50 │ ? │2019│
│ FixRes ResNeXt101 WSL ('Fixing the train-test resolution discrepancy' (https://arxiv.org/abs/1906.06423)) │ 829M │ ? │ 13.6 │ 2.0 │2019│
│ Noisy Student(L2) ('Self-training with Noisy Student improves ImageNet classification' (https://arxiv.org/abs/1911.04252)) │ 480M │ ? │ 12.6 │ 1.8 │2019│
│ TResNet-M ('TResNet: High Performance GPU-Dedicated Architecture' (https://arxiv.org/abs/2003.13630)) │ 29.4M │ 5.5G │ 19.3 │ ? │2020│
│ DA-NAS-C ('DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' (https://arxiv.org/abs/2003.12563v1)) │ ? │ 467M │ 23.8 │ ? │2020│
│ ResNeSt-50 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 27.5M │ 5.39G │ 18.87 │ ? │2020│
│ ResNeSt-101 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 48.3M │ 10.2G │ 17.73 │ ? │2020│
│ ResNet-50-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 25.5M │ 3.87G │ 22.40 │ ? │2020│
│ ResNet-101-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 44.5M │ 7.6G │ 22.10 │ ? │2020│
│ ResNet-50-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.33 │ 4.91 │2020│
│ ResNet-50-MEALv2 + CutMix ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.02 │ 4.65 │2020│
│ MobileNet V3-Large-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.48M │ ? │ 23.08 │ 6.68 │2020│
│ EfficientNet-B0-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.29M │ ? │ 21.71 │ 6.05 │2020│
│ T2T-ViT-7 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 4.2M │ 0.6G │ 28.8 │ ? │2021│
│ T2T-ViT-14 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 19.4M │ 4.8G │ 19.4 │ ? │2021│
│ T2T-ViT-19 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 39.0M │ 8.0G │ 18.8 │ ? │2021│
│ NFNet-F0 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 71.5M │ 12.38G │ 16.4 │ 3.2 │2021│
│ NFNet-F1 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 132.6M │ 35.54G │ 15.4 │ 2.9 │2021│
│ NFNet-F6+SAM ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 438.4M │ 377.28G │ 13.5 │ 2.1 │2021│
│ EfficientNetV2-S ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 16.1 │ ? │2021│
│ EfficientNetV2-M ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 14.9 │ ? │2021│
│ EfficientNetV2-L ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 14.3 │ ? │2021│
│ EfficientNetV2-S (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 15.0 │ ? │2021│
│ EfficientNetV2-M (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 13.9 │ ? │2021│
│ EfficientNetV2-L (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 13.2 │ ? │2021│
Segmentation models
│ Model │Year│PASCAL-Context│Cityscapes (mIOU)│PASCAL VOC 2012 (mIOU)│COCO Stuff│ADE20K VAL (mIOU)│
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼──────────────┼─────────────────┼──────────────────────┼──────────┼─────────────────┤
│ U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation' (https://arxiv.org/pdf/1505.04597.pdf)) │2015│ ? │ ? │ ? │ ? │ ? │
│ DeconvNet ('Learning Deconvolution Network for Semantic Segmentation' (https://arxiv.org/pdf/1505.04366.pdf)) │2015│ ? │ ? │ 72.5 │ ? │ ? │
│ ParseNet ('ParseNet: Looking Wider to See Better' (https://arxiv.org/abs/1506.04579)) │2015│ 40.4 │ ? │ 69.8 │ ? │ ? │
│ Piecewise ('Efficient piecewise training of deep structured models for semantic segmentation' (https://arxiv.org/abs/1504.01013)) │2015│ 43.3 │ 71.6 │ 78.0 │ ? │ ? │
│ SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' (https://arxiv.org/pdf/1511.00561.pdf)) │2016│ ? │ 56.1 │ ? │ ? │ ? │
│ FCN ('Fully Convolutional Networks for Semantic Segmentation' (https://arxiv.org/pdf/1605.06211.pdf)) │2016│ 37.8 │ 65.3 │ 62.2 │ 22.7 │ 29.39 │
│ ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' (https://arxiv.org/pdf/1606.02147.pdf)) │2016│ ? │ 58.3 │ ? │ ? │ ? │
│ DilatedNet ('MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' (https://arxiv.org/pdf/1511.07122.pdf)) │2016│ ? │ ? │ 67.6 │ ? │ 32.31 │
│ PixelNet ('PixelNet: Towards a General Pixel-Level Architecture' (https://arxiv.org/pdf/1609.06694.pdf)) │2016│ ? │ ? │ 69.8 │ ? │ ? │
│ RefineNet ('RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' (https://arxiv.org/pdf/1611.06612.pdf)) │2016│ 47.3 │ 73.6 │ 83.4 │ 33.6 │ 40.70 │
│ LRR ('Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' (https://arxiv.org/pdf/1605.02264.pdf)) │2016│ ? │ 71.8 │ 79.3 │ ? │ ? │
│ FRRN ('Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' (https://arxiv.org/pdf/1611.08323.pdf)) │2016│ ? │ 71.8 │ ? │ ? │ ? │
│ MultiNet ('MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' (https://arxiv.org/pdf/1612.07695.pdf)) │2016│ ? │ ? │ ? │ ? │ ? │
│ DeepLab ('DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs' │2017│ 45.7 │ 64.8 │ 79.7 │ ? │ ? │
│ (https://arxiv.org/pdf/1606.00915.pdf)) │ │ │ │ │ │ │
│ LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' (https://arxiv.org/pdf/1707.03718.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │
│ Tiramisu ('The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' (https://arxiv.org/pdf/1611.09326.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │
│ ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images' (https://arxiv.org/pdf/1704.08545.pdf)) │2017│ ? │ 70.6 │ ? │ ? │ ? │
│ ERFNet ('Efficient ConvNet for Real-time Semantic Segmentation' (http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) │2017│ ? │ 68.0 │ ? │ ? │ ? │
│ PSPNet ('Pyramid Scene Parsing Network' (https://arxiv.org/pdf/1612.01105.pdf)) │2017│ 47.8 │ 80.2 │ 85.4 │ ? │ 44.94 │
│ GCN ('Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' (https://arxiv.org/pdf/1703.02719.pdf)) │2017│ ? │ 76.9 │ 82.2 │ ? │ ? │
│ Segaware ('Segmentation-Aware Convolutional Networks Using Local Attention Masks' (https://arxiv.org/pdf/1708.04607.pdf)) │2017│ ? │ ? │ 69.0 │ ? │ ? │
│ PixelDCN ('PIXEL DECONVOLUTIONAL NETWORKS' (https://arxiv.org/pdf/1705.06820.pdf)) │2017│ ? │ ? │ 73.0 │ ? │ ? │
│ DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation' (https://arxiv.org/pdf/1706.05587.pdf)) │2017│ ? │ ? │ 85.7 │ ? │ ? │
│ DUC, HDC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ ? │ 77.1 │ ? │ ? │ ? │
│ ShuffleSeg ('SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' (https://arxiv.org/pdf/1803.03816.pdf)) │2018│ ? │ 59.3 │ ? │ ? │ ? │
│ AdaptSegNet ('Learning to Adapt Structured Output Space for Semantic Segmentation' (https://arxiv.org/pdf/1802.10349.pdf)) │2018│ ? │ 46.7 │ ? │ ? │ ? │
│ TuSimple-DUC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ 80.1 │ ? │ 83.1 │ ? │ ? │
│R2U-Net ('Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation' (https://arxiv.org/pdf/1802.06955.pdf))│2018│ ? │ ? │ ? │ ? │ ? │
│ Attention U-Net ('Attention U-Net: Learning Where to Look for the Pancreas' (https://arxiv.org/pdf/1804.03999.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │
│ DANet ('Dual Attention Network for Scene Segmentation' (https://arxiv.org/pdf/1809.02983.pdf)) │2018│ 52.6 │ 81.5 │ ? │ 39.7 │ ? │
│ ENCNet ('Context Encoding for Semantic Segmentation' (https://arxiv.org/abs/1803.08904)) │2018│ 51.7 │ 75.8 │ 85.9 │ ? │ 44.65 │
│ ShelfNet ('ShelfNet for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1811.11254.pdf)) │2018│ 48.4 │ 75.8 │ 84.2 │ ? │ ? │
│ LadderNet ('LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' (https://arxiv.org/pdf/1810.07810.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │
│ CCC-ERFnet ('Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' (https://arxiv.org/pdf/1812.04920v1.pdf)) │2018│ ? │ 69.01 │ ? │ ? │ ? │
│ DifNet-101 ('DifNet: Semantic Segmentation by Diffusion Networks' │2018│ 45.1 │ ? │ 73.2 │ ? │ ? │
│ (http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf)) │ │ │ │ │ │ │
│ BiSeNet(Res18) ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1808.00897.pdf)) │2018│ ? │ ? │ 74.7 │ 28.1 │ ? │
│ ESPNet ('ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' (https://arxiv.org/pdf/1803.06815.pdf)) │2018│ ? │ ? │ 63.01 │ ? │ ? │
│ SPADE ('Semantic Image Synthesis with Spatially-Adaptive Normalization' (https://arxiv.org/pdf/1903.07291.pdf)) │2019│ ? │ 62.3 │ ? │ 37.4 │ 38.5 │
│ SeamlessSeg ('Seamless Scene Segmentation' (https://arxiv.org/pdf/1905.01220v1.pdf)) │2019│ ? │ 77.5 │ ? │ ? │ ? │
│ EMANet ('Expectation-Maximization Attention Networks for Semantic Segmentation' (https://arxiv.org/pdf/1907.13426.pdf)) │2019│ ? │ ? │ 88.2 │ 39.9 │ ? │
Detection models
│ Model │Year│VOC07 (mAP@IoU=0.5)│VOC12 (mAP@IoU=0.5)│COCO (mAP)│
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼───────────────────┼───────────────────┼──────────┤
│ R-CNN ('Rich feature hierarchies for accurate object detection and semantic segmentation' (https://arxiv.org/pdf/1311.2524.pdf)) │2014│ 58.5 │ ? │ ? │
│ OverFeat ('OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' (https://arxiv.org/pdf/1312.6229.pdf)) │2014│ ? │ ? │ ? │
│ MultiBox ('Scalable Object Detection using Deep Neural Networks' (https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf)) │2014│ 29.0 │ ? │ ? │
│ SPP-Net ('Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' (https://arxiv.org/pdf/1406.4729.pdf)) │2014│ 59.2 │ ? │ ? │
│ MR-CNN ('Object detection via a multi-region & semantic segmentation-aware CNN model' │2015│ 78.2 │ 73.9 │ ? │
│ (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf)) │ │ │ │ │
│ AttentionNet ('AttentionNet: Aggregating Weak Directions for Accurate Object Detection' (https://arxiv.org/pdf/1506.07704.pdf)) │2015│ ? │ ? │ ? │
│ Fast R-CNN ('Fast R-CNN' (https://arxiv.org/pdf/1504.08083.pdf)) │2015│ 70.0 │ 68.4 │ ? │
│ Fast R-CNN ('Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks' │2015│ 73.2 │ 70.4 │ 36.8 │
│ (https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf)) │ │ │ │ │
│ YOLO v1 ('You Only Look Once: Unified, Real-Time Object Detection' (https://arxiv.org/pdf/1506.02640.pdf)) │2016│ 66.4 │ 57.9 │ ? │
│ G-CNN ('G-CNN: an Iterative Grid Based Object Detector' (https://arxiv.org/pdf/1512.07729.pdf)) │2016│ 66.8 │ 66.4 │ ? │
│ AZNet ('Adaptive Object Detection Using Adjacency and Zoom Prediction' (https://arxiv.org/pdf/1512.07711.pdf)) │2016│ 70.4 │ ? │ 22.3 │
│ ION ('Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' (https://arxiv.org/pdf/1512.04143.pdf)) │2016│ 80.1 │ 77.9 │ 33.1 │
│ HyperNet ('HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' (https://arxiv.org/pdf/1604.00600.pdf)) │2016│ 76.3 │ 71.4 │ ? │
│ OHEM ('Training Region-based Object Detectors with Online Hard Example Mining' (https://arxiv.org/pdf/1604.03540.pdf)) │2016│ 78.9 │ 76.3 │ 22.4 │
│ MPN ('A MultiPath Network for Object Detection' (https://arxiv.org/pdf/1604.02135.pdf)) │2016│ ? │ ? │ 33.2 │
│ SSD ('SSD: Single Shot MultiBox Detector' (https://arxiv.org/pdf/1512.02325.pdf)) │2016│ 76.8 │ 74.9 │ 31.2 │
│ GBDNet ('Crafting GBD-Net for Object Detection' (https://arxiv.org/pdf/1610.02579.pdf)) │2016│ 77.2 │ ? │ 27.0 │
│ CPF ('Contextual Priming and Feedback for Faster R-CNN' (https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) │2016│ 76.4 │ 72.6 │ ? │
│ MS-CNN ('A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' (https://arxiv.org/pdf/1607.07155.pdf)) │2016│ ? │ ? │ ? │
│ R-FCN ('R-FCN: Object Detection via Region-based Fully Convolutional Networks' (https://arxiv.org/pdf/1605.06409.pdf)) │2016│ 79.5 │ 77.6 │ 29.9 │
│ PVANET ('PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' (https://arxiv.org/pdf/1608.08021.pdf)) │2016│ ? │ ? │ ? │
│ DeepID-Net ('DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' (https://arxiv.org/pdf/1412.5661.pdf)) │2016│ 69.0 │ ? │ ? │
│ NoC ('Object Detection Networks on Convolutional Feature Maps' (https://arxiv.org/pdf/1504.06066.pdf)) │2016│ 71.6 │ 68.8 │ 27.2 │
│ DSSD ('DSSD : Deconvolutional Single Shot Detector' (https://arxiv.org/pdf/1701.06659.pdf)) │2017│ 81.5 │ 80.0 │ ? │
│ TDM ('Beyond Skip Connections: Top-Down Modulation for Object Detection' (https://arxiv.org/pdf/1612.06851.pdf)) │2017│ ? │ ? │ 37.3 │
│ FPN ('Feature Pyramid Networks for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) │2017│ ? │ ? │ 36.2 │
│ YOLO v2 ('YOLO9000: Better, Faster, Stronger' (https://arxiv.org/pdf/1612.08242.pdf)) │2017│ 78.6 │ 73.4 │ 21.6 │
│ RON ('RON: Reverse Connection with Objectness Prior Networks for Object Detection' (https://arxiv.org/pdf/1707.01691.pdf)) │2017│ 77.6 │ 75.4 │ ? │
│ DCN ('Deformable Convolutional Networks' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ ? │
│ DeNet ('DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' (https://arxiv.org/pdf/1703.10295.pdf)) │2017│ 77.1 │ 73.9 │ 33.8 │
│ CoupleNet ('CoupleNet: Coupling Global Structure with Local Parts for Object Detection' (https://arxiv.org/pdf/1708.02863.pdf)) │2017│ 82.7 │ 80.4 │ 34.4 │
│ RetinaNet ('Focal Loss for Dense Object Detection' (https://arxiv.org/pdf/1708.02002.pdf)) │2017│ ? │ ? │ 39.1 │
│ Mask R-CNN ('Mask R-CNN' (http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ 39.8 │
│ DSOD ('DSOD: Learning Deeply Supervised Object Detectors from Scratch' (https://arxiv.org/pdf/1708.01241.pdf)) │2017│ 77.7 │ 76.3 │ ? │
│ SMN ('Spatial Memory for Context Reasoning in Object Detection' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) │2017│ 70.0 │ ? │ ? │
│ YOLO v3 ('YOLOv3: An Incremental Improvement' (https://pjreddie.com/media/files/papers/YOLOv3.pdf)) │2018│ ? │ ? │ 33.0 │
│ SIN ('Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships' │2018│ 76.0 │ 73.1 │ 23.2 │
│ (http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf)) │ │ │ │ │
│ STDN ('Scale-Transferrable Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) │2018│ 80.9 │ ? │ ? │
│ RefineDet ('Single-Shot Refinement Neural Network for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf)) │2018│ 83.8 │ 83.5 │ 41.8 │
│ MegDet ('MegDet: A Large Mini-Batch Object Detector' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) │2018│ ? │ ? │ ? │
│ RFBNet ('Receptive Field Block Net for Accurate and Fast Object Detection' (https://arxiv.org/pdf/1711.07767.pdf)) │2018│ 82.2 │ ? │ ? │
│ CornerNet ('CornerNet: Detecting Objects as Paired Keypoints' (https://arxiv.org/pdf/1808.01244.pdf)) │2018│ ? │ ? │ 42.1 │
│ LibraRetinaNet ('Libra R-CNN: Towards Balanced Learning for Object Detection' (https://arxiv.org/pdf/1904.02701v1.pdf)) │2019│ ? │ ? │ 43.0 │
│ YOLACT-700 ('YOLACT Real-time Instance Segmentation' (https://arxiv.org/pdf/1904.02689v1.pdf)) │2019│ ? │ ? │ 31.2 │
│ DetNASNet(3.8) ('DetNAS: Backbone Search for Object Detection' (https://arxiv.org/pdf/1903.10979v2.pdf)) │2019│ ? │ ? │ 42.0 │
│ YOLOv4 ('YOLOv4: Optimal Speed and Accuracy of Object Detection' (https://arxiv.org/pdf/2004.10934.pdf)) │2020│ ? │ ? │ 46.7 │
│ SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 37.8 │
│ D-SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 40.5 │
│ SNIPER ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 86.6 │ ? │ 47.9 │
│ AutoFocus ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 85.8 │ ? │ 47.9 │
computervisionmodels Github: https://github.com/nerox8664/awesome-computer-vision-models
A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.
Contents
- Classification models (#classification-models)
- Segmentation models (#segmentation-models)
- Detection models (#detection-models)
Classification models
│ Model │Number of parameters│ FLOPS │Top-1 Error│Top-5 Error│Year│
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┼──────────┼───────────┼───────────┼────┤
│ AlexNet ('One weird trick for parallelizing convolutional neural networks' (https://arxiv.org/abs/1404.5997)) │ 62.3M │1,132.33M │ 40.96 │ 18.24 │2014│
│ VGG-16 ('Very Deep Convolutional Networks for Large-Scale Image Recognition' (https://arxiv.org/abs/1409.1556)) │ 138.3M │ ? │ 26.78 │ 8.69 │2014│
│ ResNet-10 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 5.5M │ 894.04M │ 34.69 │ 14.36 │2015│
│ ResNet-18 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 11.7M │1,820.41M │ 28.53 │ 9.82 │2015│
│ ResNet-34 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 21.8M │3,672.68M │ 24.84 │ 7.80 │2015│
│ ResNet-50 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 25.5M │3,877.95M │ 22.28 │ 6.33 │2015│
│ InceptionV3 ('Rethinking the Inception Architecture for Computer Vision' (https://arxiv.org/abs/1512.00567)) │ 23.8M │ ? │ 21.2 │ 5.6 │2015│
│ PreResNet-18 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 11.7M │1,820.56M │ 28.43 │ 9.72 │2016│
│ PreResNet-34 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 21.8M │3,672.83M │ 24.89 │ 7.74 │2016│
│ PreResNet-50 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 25.6M │3,875.44M │ 22.40 │ 6.47 │2016│
│ DenseNet-121 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 8.0M │2,872.13M │ 23.48 │ 7.04 │2016│
│ DenseNet-161 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 28.7M │7,793.16M │ 22.86 │ 6.44 │2016│
│ PyramidNet-101 ('Deep Pyramidal Residual Networks' (https://arxiv.org/abs/1610.02915)) │ 42.5M │8,743.54M │ 21.98 │ 6.20 │2016│
│ ResNeXt-14(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 9.5M │1,603.46M │ 30.32 │ 11.46 │2016│
│ ResNeXt-26(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 15.4M │2,488.07M │ 24.14 │ 7.46 │2016│
│ WRN-50-2 ('Wide Residual Networks' (https://arxiv.org/abs/1605.07146)) │ 68.9M │11,405.42M│ 22.53 │ 6.41 │2016│
│ Xception ('Xception: Deep Learning with Depthwise Separable Convolutions' (https://arxiv.org/abs/1610.02357)) │ 22,855,952 │8,403.63M │ 20.97 │ 5.49 │2016│
│ InceptionV4 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 42,679,816 │12,304.93M│ 20.64 │ 5.29 │2016│
│ InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 55,843,464 │13,188.64M│ 19.93 │ 4.90 │2016│
│ PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' (https://arxiv.org/abs/1611.05725)) │ 95,366,600 │34,821.34M│ 19.10 │ 4.52 │2016│
│ DarkNet Ref ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 7,319,416 │ 367.59M │ 38.58 │ 17.18 │2016│
│ DarkNet Tiny ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 1,042,104 │ 500.85M │ 40.74 │ 17.84 │2016│
│ DarkNet 53 ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 41,609,928 │7,133.86M │ 21.75 │ 5.64 │2016│
│ SqueezeResNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 40.09 │ 18.21 │2016│
│ SqueezeNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 39.31 │ 17.72 │2016│
│ ResAttNet-92 ('Residual Attention Network for Image Classification' (https://arxiv.org/abs/1704.06904)) │ 51.3M │ ? │ 19.5 │ 4.8 │2017│
│ CondenseNet (G=C=8) ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions' (https://arxiv.org/abs/1711.09224)) │ 4.8M │ ? │ 26.2 │ 8.3 │2017│
│ DPN-68 ('Dual Path Networks' (https://arxiv.org/abs/1707.01629)) │ 12,611,602 │2,351.84M │ 23.24 │ 6.79 │2017│
│ ShuffleNet x1.0 (g=1) ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' (https://arxiv.org/abs/1707.01083)) │ 1,531,936 │ 148.13M │ 34.93 │ 13.89 │2017│
│ DiracNetV2-18 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 11,511,784 │1,796.62M │ 31.47 │ 11.70 │2017│
│ DiracNetV2-34 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 21,616,232 │3,646.93M │ 28.75 │ 9.93 │2017│
│ SENet-16 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 31,366,168 │5,081.30M │ 25.65 │ 8.20 │2017│
│ SENet-154 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 115,088,984 │20,745.78M│ 18.62 │ 4.61 │2017│
│ MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' (https://arxiv.org/abs/1704.04861)) │ 4,231,976 │ 579.80M │ 26.61 │ 8.95 │2017│
│ NASNet-A 4@1056 ('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 5,289,978 │ 584.90M │ 25.68 │ 8.16 │2017│
│ NASNet-A 6@4032('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 88,753,150 │23,976.44M│ 18.14 │ 4.21 │2017│
│ DLA-34 ('Deep Layer Aggregation' (https://arxiv.org/abs/1707.06484)) │ 15,742,104 │3,071.37M │ 25.36 │ 7.94 │2017│
│ AirNet50-1x64d (r=2) ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' (https://ieeexplore.ieee.org/document/8510896)) │ 27.43M │ ? │ 22.48 │ 6.21 │2018│
│ BAM-ResNet-50 ('BAM: Bottleneck Attention Module' (https://arxiv.org/abs/1807.06514)) │ 25.92M │ ? │ 23.68 │ 6.96 │2018│
│ CBAM-ResNet-50 ('CBAM: Convolutional Block Attention Module' (https://arxiv.org/abs/1807.06521)) │ 28.1M │ ? │ 23.02 │ 6.38 │2018│
│ 1.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 921,816 │ 285.82M │ 40.77 │ 17.85 │2018│
│ 1.5-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 1,953,616 │ 550.97M │ 33.81 │ 13.01 │2018│
│ 2.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 3,366,344 │ 897.60M │ 29.63 │ 10.66 │2018│
│ ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' (https://arxiv.org/abs/1807.11164)) │ 2,278,604 │ 149.72M │ 31.44 │ 11.63 │2018│
│ 456-MENet-24×1(g=3) ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' (https://arxiv.org/abs/1803.09127)) │ 5.3M │ ? │ 28.4 │ 9.8 │2018│
│ FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' (https://arxiv.org/abs/1802.03750)) │ 2,901,288 │ 147.46M │ 34.23 │ 13.38 │2018│
│ MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks' (https://arxiv.org/abs/1801.04381)) │ 3,504,960 │ 329.36M │ 26.97 │ 8.87 │2018│
│ IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' (https://arxiv.org/abs/1806.00178)) │ 3.5M │ ? │ 28.22 │ 9.54 │2018│
│ DARTS ('DARTS: Differentiable Architecture Search' (https://arxiv.org/abs/1806.09055)) │ 4.9M │ ? │ 26.9 │ 9.0 │2018│
│ PNASNet-5 ('Progressive Neural Architecture Search' (https://arxiv.org/abs/1712.00559)) │ 5.1M │ ? │ 25.8 │ 8.1 │2018│
│ AmoebaNet-C ('Regularized Evolution for Image Classifier Architecture Search' (https://arxiv.org/abs/1802.01548)) │ 5.1M │ ? │ 24.3 │ 7.6 │2018│
│ MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile' (https://arxiv.org/abs/1807.11626)) │ 4,308,816 │ 317.67M │ 31.58 │ 11.74 │2018│
│ IBN-Net50-a ('Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' (https://arxiv.org/abs/1807.09441)) │ ? │ ? │ 22.54 │ 6.32 │2018│
│ MarginNet ('Large Margin Deep Networks for Classification' (http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) │ ? │ ? │ 22.0 │ ? │2018│
│ A^2 Net ('A^2-Nets: Double Attention Networks' (http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) │ ? │ ? │ 23.0 │ 6.5 │2018│
│ FishNeXt-150 ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction' │ 26.2M │ ? │ 21.5 │ ? │2018│
│ (http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf)) │ │ │ │ │ │
│ Shape-ResNet ('IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' (https://arxiv.org/pdf/1811.12231v2.pdf)) │ 25.5M │ ? │ 23.28 │ 6.72 │2019│
│ SimCNN(k=3 train) ('Greedy Layerwise Learning Can Scale to ImageNet' (https://arxiv.org/pdf/1812.11446.pdf)) │ ? │ ? │ 28.4 │ 10.2 │2019│
│ SKNet-50 ('Selective Kernel Networks' (https://arxiv.org/pdf/1903.06586.pdf)) │ 27.5M │ ? │ 20.79 │ ? │2019│
│ SRM-ResNet-50 ('SRM : A Style-based Recalibration Module for Convolutional Neural Networks' (https://arxiv.org/pdf/1903.10829.pdf)) │ 25.62M │ ? │ 22.87 │ 6.49 │2019│
│ EfficientNet-B0 ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 5,288,548 │ 414.31M │ 24.77 │ 7.52 │2019│
│ EfficientNet-B7b ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 66,347,960 │39,010.98M│ 15.94 │ 3.22 │2019│
│ ProxylessNAS ('PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' (https://arxiv.org/pdf/1812.00332.pdf)) │ ? │ ? │ 24.9 │ 7.5 │2019│
│ MixNet-L ('MixNet: Mixed Depthwise Convolutional Kernels' ( https://arxiv.org/abs/1907.09595)) │ 7.3M │ ? │ 21.1 │ 5.8 │2019│
│ ECA-Net50 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 24.37M │ 3.86G │ 22.52 │ 6.32 │2019│
│ ECA-Net101 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 7.3M │ 7.35G │ 21.35 │ 5.66 │2019│
│ ACNet-Densenet121 ('ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' (https://arxiv.org/abs/1908.03930)) │ ? │ ? │ 24.18 │ 7.23 │2019│
│ LIP-ResNet-50 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 23.9M │ 5.33G │ 21.81 │ 6.04 │2019│
│ LIP-ResNet-101 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 42.9M │ 9.06G │ 20.67 │ 5.40 │2019│
│ LIP-DenseNet-BC-121 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 8.7M │ 4.13G │ 23.36 │ 6.84 │2019│
│ MuffNet_1.0 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' │ 2.3M │ 146M │ 30.1 │ ? │2019│
│ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │ │ │ │ │ │
│ MuffNet_1.5 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' │ 3.4M │ 300M │ 26.9 │ ? │2019│
│ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │ │ │ │ │ │
│ ResNet-34-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 21.8M │3,672.68M │ 25.80 │ ? │2019│
│ ResNet-50-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 25.5M │3,877.95M │ 22.96 │ ? │2019│
│ MobileNetV2-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 3,504,960 │ 329.36M │ 27.50 │ ? │2019│
│ FixRes ResNeXt101 WSL ('Fixing the train-test resolution discrepancy' (https://arxiv.org/abs/1906.06423)) │ 829M │ ? │ 13.6 │ 2.0 │2019│
│ Noisy Student(L2) ('Self-training with Noisy Student improves ImageNet classification' (https://arxiv.org/abs/1911.04252)) │ 480M │ ? │ 12.6 │ 1.8 │2019│
│ TResNet-M ('TResNet: High Performance GPU-Dedicated Architecture' (https://arxiv.org/abs/2003.13630)) │ 29.4M │ 5.5G │ 19.3 │ ? │2020│
│ DA-NAS-C ('DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' (https://arxiv.org/abs/2003.12563v1)) │ ? │ 467M │ 23.8 │ ? │2020│
│ ResNeSt-50 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 27.5M │ 5.39G │ 18.87 │ ? │2020│
│ ResNeSt-101 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 48.3M │ 10.2G │ 17.73 │ ? │2020│
│ ResNet-50-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 25.5M │ 3.87G │ 22.40 │ ? │2020│
│ ResNet-101-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 44.5M │ 7.6G │ 22.10 │ ? │2020│
│ ResNet-50-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.33 │ 4.91 │2020│
│ ResNet-50-MEALv2 + CutMix ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.02 │ 4.65 │2020│
│ MobileNet V3-Large-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.48M │ ? │ 23.08 │ 6.68 │2020│
│ EfficientNet-B0-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.29M │ ? │ 21.71 │ 6.05 │2020│
│ T2T-ViT-7 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 4.2M │ 0.6G │ 28.8 │ ? │2021│
│ T2T-ViT-14 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 19.4M │ 4.8G │ 19.4 │ ? │2021│
│ T2T-ViT-19 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 39.0M │ 8.0G │ 18.8 │ ? │2021│
│ NFNet-F0 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 71.5M │ 12.38G │ 16.4 │ 3.2 │2021│
│ NFNet-F1 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 132.6M │ 35.54G │ 15.4 │ 2.9 │2021│
│ NFNet-F6+SAM ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 438.4M │ 377.28G │ 13.5 │ 2.1 │2021│
│ EfficientNetV2-S ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 16.1 │ ? │2021│
│ EfficientNetV2-M ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 14.9 │ ? │2021│
│ EfficientNetV2-L ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 14.3 │ ? │2021│
│ EfficientNetV2-S (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 15.0 │ ? │2021│
│ EfficientNetV2-M (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 13.9 │ ? │2021│
│ EfficientNetV2-L (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 13.2 │ ? │2021│
Segmentation models
│ Model │Year│PASCAL-Context│Cityscapes (mIOU)│PASCAL VOC 2012 (mIOU)│COCO Stuff│ADE20K VAL (mIOU)│
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼──────────────┼─────────────────┼──────────────────────┼──────────┼─────────────────┤
│ U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation' (https://arxiv.org/pdf/1505.04597.pdf)) │2015│ ? │ ? │ ? │ ? │ ? │
│ DeconvNet ('Learning Deconvolution Network for Semantic Segmentation' (https://arxiv.org/pdf/1505.04366.pdf)) │2015│ ? │ ? │ 72.5 │ ? │ ? │
│ ParseNet ('ParseNet: Looking Wider to See Better' (https://arxiv.org/abs/1506.04579)) │2015│ 40.4 │ ? │ 69.8 │ ? │ ? │
│ Piecewise ('Efficient piecewise training of deep structured models for semantic segmentation' (https://arxiv.org/abs/1504.01013)) │2015│ 43.3 │ 71.6 │ 78.0 │ ? │ ? │
│ SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' (https://arxiv.org/pdf/1511.00561.pdf)) │2016│ ? │ 56.1 │ ? │ ? │ ? │
│ FCN ('Fully Convolutional Networks for Semantic Segmentation' (https://arxiv.org/pdf/1605.06211.pdf)) │2016│ 37.8 │ 65.3 │ 62.2 │ 22.7 │ 29.39 │
│ ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' (https://arxiv.org/pdf/1606.02147.pdf)) │2016│ ? │ 58.3 │ ? │ ? │ ? │
│ DilatedNet ('MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' (https://arxiv.org/pdf/1511.07122.pdf)) │2016│ ? │ ? │ 67.6 │ ? │ 32.31 │
│ PixelNet ('PixelNet: Towards a General Pixel-Level Architecture' (https://arxiv.org/pdf/1609.06694.pdf)) │2016│ ? │ ? │ 69.8 │ ? │ ? │
│ RefineNet ('RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' (https://arxiv.org/pdf/1611.06612.pdf)) │2016│ 47.3 │ 73.6 │ 83.4 │ 33.6 │ 40.70 │
│ LRR ('Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' (https://arxiv.org/pdf/1605.02264.pdf)) │2016│ ? │ 71.8 │ 79.3 │ ? │ ? │
│ FRRN ('Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' (https://arxiv.org/pdf/1611.08323.pdf)) │2016│ ? │ 71.8 │ ? │ ? │ ? │
│ MultiNet ('MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' (https://arxiv.org/pdf/1612.07695.pdf)) │2016│ ? │ ? │ ? │ ? │ ? │
│ DeepLab ('DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs' │2017│ 45.7 │ 64.8 │ 79.7 │ ? │ ? │
│ (https://arxiv.org/pdf/1606.00915.pdf)) │ │ │ │ │ │ │
│ LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' (https://arxiv.org/pdf/1707.03718.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │
│ Tiramisu ('The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' (https://arxiv.org/pdf/1611.09326.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │
│ ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images' (https://arxiv.org/pdf/1704.08545.pdf)) │2017│ ? │ 70.6 │ ? │ ? │ ? │
│ ERFNet ('Efficient ConvNet for Real-time Semantic Segmentation' (http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) │2017│ ? │ 68.0 │ ? │ ? │ ? │
│ PSPNet ('Pyramid Scene Parsing Network' (https://arxiv.org/pdf/1612.01105.pdf)) │2017│ 47.8 │ 80.2 │ 85.4 │ ? │ 44.94 │
│ GCN ('Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' (https://arxiv.org/pdf/1703.02719.pdf)) │2017│ ? │ 76.9 │ 82.2 │ ? │ ? │
│ Segaware ('Segmentation-Aware Convolutional Networks Using Local Attention Masks' (https://arxiv.org/pdf/1708.04607.pdf)) │2017│ ? │ ? │ 69.0 │ ? │ ? │
│ PixelDCN ('PIXEL DECONVOLUTIONAL NETWORKS' (https://arxiv.org/pdf/1705.06820.pdf)) │2017│ ? │ ? │ 73.0 │ ? │ ? │
│ DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation' (https://arxiv.org/pdf/1706.05587.pdf)) │2017│ ? │ ? │ 85.7 │ ? │ ? │
│ DUC, HDC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ ? │ 77.1 │ ? │ ? │ ? │
│ ShuffleSeg ('SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' (https://arxiv.org/pdf/1803.03816.pdf)) │2018│ ? │ 59.3 │ ? │ ? │ ? │
│ AdaptSegNet ('Learning to Adapt Structured Output Space for Semantic Segmentation' (https://arxiv.org/pdf/1802.10349.pdf)) │2018│ ? │ 46.7 │ ? │ ? │ ? │
│ TuSimple-DUC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ 80.1 │ ? │ 83.1 │ ? │ ? │
│R2U-Net ('Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation' (https://arxiv.org/pdf/1802.06955.pdf))│2018│ ? │ ? │ ? │ ? │ ? │
│ Attention U-Net ('Attention U-Net: Learning Where to Look for the Pancreas' (https://arxiv.org/pdf/1804.03999.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │
│ DANet ('Dual Attention Network for Scene Segmentation' (https://arxiv.org/pdf/1809.02983.pdf)) │2018│ 52.6 │ 81.5 │ ? │ 39.7 │ ? │
│ ENCNet ('Context Encoding for Semantic Segmentation' (https://arxiv.org/abs/1803.08904)) │2018│ 51.7 │ 75.8 │ 85.9 │ ? │ 44.65 │
│ ShelfNet ('ShelfNet for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1811.11254.pdf)) │2018│ 48.4 │ 75.8 │ 84.2 │ ? │ ? │
│ LadderNet ('LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' (https://arxiv.org/pdf/1810.07810.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │
│ CCC-ERFnet ('Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' (https://arxiv.org/pdf/1812.04920v1.pdf)) │2018│ ? │ 69.01 │ ? │ ? │ ? │
│ DifNet-101 ('DifNet: Semantic Segmentation by Diffusion Networks' │2018│ 45.1 │ ? │ 73.2 │ ? │ ? │
│ (http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf)) │ │ │ │ │ │ │
│ BiSeNet(Res18) ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1808.00897.pdf)) │2018│ ? │ ? │ 74.7 │ 28.1 │ ? │
│ ESPNet ('ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' (https://arxiv.org/pdf/1803.06815.pdf)) │2018│ ? │ ? │ 63.01 │ ? │ ? │
│ SPADE ('Semantic Image Synthesis with Spatially-Adaptive Normalization' (https://arxiv.org/pdf/1903.07291.pdf)) │2019│ ? │ 62.3 │ ? │ 37.4 │ 38.5 │
│ SeamlessSeg ('Seamless Scene Segmentation' (https://arxiv.org/pdf/1905.01220v1.pdf)) │2019│ ? │ 77.5 │ ? │ ? │ ? │
│ EMANet ('Expectation-Maximization Attention Networks for Semantic Segmentation' (https://arxiv.org/pdf/1907.13426.pdf)) │2019│ ? │ ? │ 88.2 │ 39.9 │ ? │
Detection models
│ Model │Year│VOC07 (mAP@IoU=0.5)│VOC12 (mAP@IoU=0.5)│COCO (mAP)│
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼───────────────────┼───────────────────┼──────────┤
│ R-CNN ('Rich feature hierarchies for accurate object detection and semantic segmentation' (https://arxiv.org/pdf/1311.2524.pdf)) │2014│ 58.5 │ ? │ ? │
│ OverFeat ('OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' (https://arxiv.org/pdf/1312.6229.pdf)) │2014│ ? │ ? │ ? │
│ MultiBox ('Scalable Object Detection using Deep Neural Networks' (https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf)) │2014│ 29.0 │ ? │ ? │
│ SPP-Net ('Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' (https://arxiv.org/pdf/1406.4729.pdf)) │2014│ 59.2 │ ? │ ? │
│ MR-CNN ('Object detection via a multi-region & semantic segmentation-aware CNN model' │2015│ 78.2 │ 73.9 │ ? │
│ (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf)) │ │ │ │ │
│ AttentionNet ('AttentionNet: Aggregating Weak Directions for Accurate Object Detection' (https://arxiv.org/pdf/1506.07704.pdf)) │2015│ ? │ ? │ ? │
│ Fast R-CNN ('Fast R-CNN' (https://arxiv.org/pdf/1504.08083.pdf)) │2015│ 70.0 │ 68.4 │ ? │
│ Fast R-CNN ('Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks' │2015│ 73.2 │ 70.4 │ 36.8 │
│ (https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf)) │ │ │ │ │
│ YOLO v1 ('You Only Look Once: Unified, Real-Time Object Detection' (https://arxiv.org/pdf/1506.02640.pdf)) │2016│ 66.4 │ 57.9 │ ? │
│ G-CNN ('G-CNN: an Iterative Grid Based Object Detector' (https://arxiv.org/pdf/1512.07729.pdf)) │2016│ 66.8 │ 66.4 │ ? │
│ AZNet ('Adaptive Object Detection Using Adjacency and Zoom Prediction' (https://arxiv.org/pdf/1512.07711.pdf)) │2016│ 70.4 │ ? │ 22.3 │
│ ION ('Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' (https://arxiv.org/pdf/1512.04143.pdf)) │2016│ 80.1 │ 77.9 │ 33.1 │
│ HyperNet ('HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' (https://arxiv.org/pdf/1604.00600.pdf)) │2016│ 76.3 │ 71.4 │ ? │
│ OHEM ('Training Region-based Object Detectors with Online Hard Example Mining' (https://arxiv.org/pdf/1604.03540.pdf)) │2016│ 78.9 │ 76.3 │ 22.4 │
│ MPN ('A MultiPath Network for Object Detection' (https://arxiv.org/pdf/1604.02135.pdf)) │2016│ ? │ ? │ 33.2 │
│ SSD ('SSD: Single Shot MultiBox Detector' (https://arxiv.org/pdf/1512.02325.pdf)) │2016│ 76.8 │ 74.9 │ 31.2 │
│ GBDNet ('Crafting GBD-Net for Object Detection' (https://arxiv.org/pdf/1610.02579.pdf)) │2016│ 77.2 │ ? │ 27.0 │
│ CPF ('Contextual Priming and Feedback for Faster R-CNN' (https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) │2016│ 76.4 │ 72.6 │ ? │
│ MS-CNN ('A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' (https://arxiv.org/pdf/1607.07155.pdf)) │2016│ ? │ ? │ ? │
│ R-FCN ('R-FCN: Object Detection via Region-based Fully Convolutional Networks' (https://arxiv.org/pdf/1605.06409.pdf)) │2016│ 79.5 │ 77.6 │ 29.9 │
│ PVANET ('PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' (https://arxiv.org/pdf/1608.08021.pdf)) │2016│ ? │ ? │ ? │
│ DeepID-Net ('DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' (https://arxiv.org/pdf/1412.5661.pdf)) │2016│ 69.0 │ ? │ ? │
│ NoC ('Object Detection Networks on Convolutional Feature Maps' (https://arxiv.org/pdf/1504.06066.pdf)) │2016│ 71.6 │ 68.8 │ 27.2 │
│ DSSD ('DSSD : Deconvolutional Single Shot Detector' (https://arxiv.org/pdf/1701.06659.pdf)) │2017│ 81.5 │ 80.0 │ ? │
│ TDM ('Beyond Skip Connections: Top-Down Modulation for Object Detection' (https://arxiv.org/pdf/1612.06851.pdf)) │2017│ ? │ ? │ 37.3 │
│ FPN ('Feature Pyramid Networks for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) │2017│ ? │ ? │ 36.2 │
│ YOLO v2 ('YOLO9000: Better, Faster, Stronger' (https://arxiv.org/pdf/1612.08242.pdf)) │2017│ 78.6 │ 73.4 │ 21.6 │
│ RON ('RON: Reverse Connection with Objectness Prior Networks for Object Detection' (https://arxiv.org/pdf/1707.01691.pdf)) │2017│ 77.6 │ 75.4 │ ? │
│ DCN ('Deformable Convolutional Networks' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ ? │
│ DeNet ('DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' (https://arxiv.org/pdf/1703.10295.pdf)) │2017│ 77.1 │ 73.9 │ 33.8 │
│ CoupleNet ('CoupleNet: Coupling Global Structure with Local Parts for Object Detection' (https://arxiv.org/pdf/1708.02863.pdf)) │2017│ 82.7 │ 80.4 │ 34.4 │
│ RetinaNet ('Focal Loss for Dense Object Detection' (https://arxiv.org/pdf/1708.02002.pdf)) │2017│ ? │ ? │ 39.1 │
│ Mask R-CNN ('Mask R-CNN' (http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ 39.8 │
│ DSOD ('DSOD: Learning Deeply Supervised Object Detectors from Scratch' (https://arxiv.org/pdf/1708.01241.pdf)) │2017│ 77.7 │ 76.3 │ ? │
│ SMN ('Spatial Memory for Context Reasoning in Object Detection' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) │2017│ 70.0 │ ? │ ? │
│ YOLO v3 ('YOLOv3: An Incremental Improvement' (https://pjreddie.com/media/files/papers/YOLOv3.pdf)) │2018│ ? │ ? │ 33.0 │
│ SIN ('Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships' │2018│ 76.0 │ 73.1 │ 23.2 │
│ (http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf)) │ │ │ │ │
│ STDN ('Scale-Transferrable Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) │2018│ 80.9 │ ? │ ? │
│ RefineDet ('Single-Shot Refinement Neural Network for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf)) │2018│ 83.8 │ 83.5 │ 41.8 │
│ MegDet ('MegDet: A Large Mini-Batch Object Detector' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) │2018│ ? │ ? │ ? │
│ RFBNet ('Receptive Field Block Net for Accurate and Fast Object Detection' (https://arxiv.org/pdf/1711.07767.pdf)) │2018│ 82.2 │ ? │ ? │
│ CornerNet ('CornerNet: Detecting Objects as Paired Keypoints' (https://arxiv.org/pdf/1808.01244.pdf)) │2018│ ? │ ? │ 42.1 │
│ LibraRetinaNet ('Libra R-CNN: Towards Balanced Learning for Object Detection' (https://arxiv.org/pdf/1904.02701v1.pdf)) │2019│ ? │ ? │ 43.0 │
│ YOLACT-700 ('YOLACT Real-time Instance Segmentation' (https://arxiv.org/pdf/1904.02689v1.pdf)) │2019│ ? │ ? │ 31.2 │
│ DetNASNet(3.8) ('DetNAS: Backbone Search for Object Detection' (https://arxiv.org/pdf/1903.10979v2.pdf)) │2019│ ? │ ? │ 42.0 │
│ YOLOv4 ('YOLOv4: Optimal Speed and Accuracy of Object Detection' (https://arxiv.org/pdf/2004.10934.pdf)) │2020│ ? │ ? │ 46.7 │
│ SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 37.8 │
│ D-SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 40.5 │
│ SNIPER ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 86.6 │ ? │ 47.9 │
│ AutoFocus ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 85.8 │ ? │ 47.9 │
computervisionmodels Github: https://github.com/nerox8664/awesome-computer-vision-models