Awesome Computer Vision Models !Awesome (https://awesome.re/badge-flat.svg) (https://awesome.re) A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers. Contents - Classification models (#classification-models) - Segmentation models (#segmentation-models) - Detection models (#detection-models) Classification models │ Model │Number of parameters│ FLOPS │Top-1 Error│Top-5 Error│Year│ ├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┼──────────┼───────────┼───────────┼────┤ │ AlexNet ('One weird trick for parallelizing convolutional neural networks' (https://arxiv.org/abs/1404.5997)) │ 62.3M │1,132.33M │ 40.96 │ 18.24 │2014│ │ VGG-16 ('Very Deep Convolutional Networks for Large-Scale Image Recognition' (https://arxiv.org/abs/1409.1556)) │ 138.3M │ ? │ 26.78 │ 8.69 │2014│ │ ResNet-10 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 5.5M │ 894.04M │ 34.69 │ 14.36 │2015│ │ ResNet-18 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 11.7M │1,820.41M │ 28.53 │ 9.82 │2015│ │ ResNet-34 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 21.8M │3,672.68M │ 24.84 │ 7.80 │2015│ │ ResNet-50 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) │ 25.5M │3,877.95M │ 22.28 │ 6.33 │2015│ │ InceptionV3 ('Rethinking the Inception Architecture for Computer Vision' (https://arxiv.org/abs/1512.00567)) │ 23.8M │ ? │ 21.2 │ 5.6 │2015│ │ PreResNet-18 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 11.7M │1,820.56M │ 28.43 │ 9.72 │2016│ │ PreResNet-34 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 21.8M │3,672.83M │ 24.89 │ 7.74 │2016│ │ PreResNet-50 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) │ 25.6M │3,875.44M │ 22.40 │ 6.47 │2016│ │ DenseNet-121 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 8.0M │2,872.13M │ 23.48 │ 7.04 │2016│ │ DenseNet-161 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) │ 28.7M │7,793.16M │ 22.86 │ 6.44 │2016│ │ PyramidNet-101 ('Deep Pyramidal Residual Networks' (https://arxiv.org/abs/1610.02915)) │ 42.5M │8,743.54M │ 21.98 │ 6.20 │2016│ │ ResNeXt-14(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 9.5M │1,603.46M │ 30.32 │ 11.46 │2016│ │ ResNeXt-26(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) │ 15.4M │2,488.07M │ 24.14 │ 7.46 │2016│ │ WRN-50-2 ('Wide Residual Networks' (https://arxiv.org/abs/1605.07146)) │ 68.9M │11,405.42M│ 22.53 │ 6.41 │2016│ │ Xception ('Xception: Deep Learning with Depthwise Separable Convolutions' (https://arxiv.org/abs/1610.02357)) │ 22,855,952 │8,403.63M │ 20.97 │ 5.49 │2016│ │ InceptionV4 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 42,679,816 │12,304.93M│ 20.64 │ 5.29 │2016│ │ InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) │ 55,843,464 │13,188.64M│ 19.93 │ 4.90 │2016│ │ PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' (https://arxiv.org/abs/1611.05725)) │ 95,366,600 │34,821.34M│ 19.10 │ 4.52 │2016│ │ DarkNet Ref ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 7,319,416 │ 367.59M │ 38.58 │ 17.18 │2016│ │ DarkNet Tiny ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 1,042,104 │ 500.85M │ 40.74 │ 17.84 │2016│ │ DarkNet 53 ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) │ 41,609,928 │7,133.86M │ 21.75 │ 5.64 │2016│ │ SqueezeResNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 40.09 │ 18.21 │2016│ │ SqueezeNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) │ 1,235,496 │ 352.02M │ 39.31 │ 17.72 │2016│ │ ResAttNet-92 ('Residual Attention Network for Image Classification' (https://arxiv.org/abs/1704.06904)) │ 51.3M │ ? │ 19.5 │ 4.8 │2017│ │ CondenseNet (G=C=8) ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions' (https://arxiv.org/abs/1711.09224)) │ 4.8M │ ? │ 26.2 │ 8.3 │2017│ │ DPN-68 ('Dual Path Networks' (https://arxiv.org/abs/1707.01629)) │ 12,611,602 │2,351.84M │ 23.24 │ 6.79 │2017│ │ ShuffleNet x1.0 (g=1) ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' (https://arxiv.org/abs/1707.01083)) │ 1,531,936 │ 148.13M │ 34.93 │ 13.89 │2017│ │ DiracNetV2-18 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 11,511,784 │1,796.62M │ 31.47 │ 11.70 │2017│ │ DiracNetV2-34 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) │ 21,616,232 │3,646.93M │ 28.75 │ 9.93 │2017│ │ SENet-16 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 31,366,168 │5,081.30M │ 25.65 │ 8.20 │2017│ │ SENet-154 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) │ 115,088,984 │20,745.78M│ 18.62 │ 4.61 │2017│ │ MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' (https://arxiv.org/abs/1704.04861)) │ 4,231,976 │ 579.80M │ 26.61 │ 8.95 │2017│ │ NASNet-A 4@1056 ('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 5,289,978 │ 584.90M │ 25.68 │ 8.16 │2017│ │ NASNet-A 6@4032('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) │ 88,753,150 │23,976.44M│ 18.14 │ 4.21 │2017│ │ DLA-34 ('Deep Layer Aggregation' (https://arxiv.org/abs/1707.06484)) │ 15,742,104 │3,071.37M │ 25.36 │ 7.94 │2017│ │ AirNet50-1x64d (r=2) ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' (https://ieeexplore.ieee.org/document/8510896)) │ 27.43M │ ? │ 22.48 │ 6.21 │2018│ │ BAM-ResNet-50 ('BAM: Bottleneck Attention Module' (https://arxiv.org/abs/1807.06514)) │ 25.92M │ ? │ 23.68 │ 6.96 │2018│ │ CBAM-ResNet-50 ('CBAM: Convolutional Block Attention Module' (https://arxiv.org/abs/1807.06521)) │ 28.1M │ ? │ 23.02 │ 6.38 │2018│ │ 1.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 921,816 │ 285.82M │ 40.77 │ 17.85 │2018│ │ 1.5-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 1,953,616 │ 550.97M │ 33.81 │ 13.01 │2018│ │ 2.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) │ 3,366,344 │ 897.60M │ 29.63 │ 10.66 │2018│ │ ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' (https://arxiv.org/abs/1807.11164)) │ 2,278,604 │ 149.72M │ 31.44 │ 11.63 │2018│ │ 456-MENet-24×1(g=3) ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' (https://arxiv.org/abs/1803.09127)) │ 5.3M │ ? │ 28.4 │ 9.8 │2018│ │ FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' (https://arxiv.org/abs/1802.03750)) │ 2,901,288 │ 147.46M │ 34.23 │ 13.38 │2018│ │ MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks' (https://arxiv.org/abs/1801.04381)) │ 3,504,960 │ 329.36M │ 26.97 │ 8.87 │2018│ │ IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' (https://arxiv.org/abs/1806.00178)) │ 3.5M │ ? │ 28.22 │ 9.54 │2018│ │ DARTS ('DARTS: Differentiable Architecture Search' (https://arxiv.org/abs/1806.09055)) │ 4.9M │ ? │ 26.9 │ 9.0 │2018│ │ PNASNet-5 ('Progressive Neural Architecture Search' (https://arxiv.org/abs/1712.00559)) │ 5.1M │ ? │ 25.8 │ 8.1 │2018│ │ AmoebaNet-C ('Regularized Evolution for Image Classifier Architecture Search' (https://arxiv.org/abs/1802.01548)) │ 5.1M │ ? │ 24.3 │ 7.6 │2018│ │ MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile' (https://arxiv.org/abs/1807.11626)) │ 4,308,816 │ 317.67M │ 31.58 │ 11.74 │2018│ │ IBN-Net50-a ('Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' (https://arxiv.org/abs/1807.09441)) │ ? │ ? │ 22.54 │ 6.32 │2018│ │ MarginNet ('Large Margin Deep Networks for Classification' (http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) │ ? │ ? │ 22.0 │ ? │2018│ │ A^2 Net ('A^2-Nets: Double Attention Networks' (http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) │ ? │ ? │ 23.0 │ 6.5 │2018│ │ FishNeXt-150 ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction'  │ 26.2M │ ? │ 21.5 │ ? │2018│ │ (http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf)) │  │  │  │  │  │ │ Shape-ResNet ('IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' (https://arxiv.org/pdf/1811.12231v2.pdf)) │ 25.5M │ ? │ 23.28 │ 6.72 │2019│ │ SimCNN(k=3 train) ('Greedy Layerwise Learning Can Scale to ImageNet' (https://arxiv.org/pdf/1812.11446.pdf)) │ ? │ ? │ 28.4 │ 10.2 │2019│ │ SKNet-50 ('Selective Kernel Networks' (https://arxiv.org/pdf/1903.06586.pdf)) │ 27.5M │ ? │ 20.79 │ ? │2019│ │ SRM-ResNet-50 ('SRM : A Style-based Recalibration Module for Convolutional Neural Networks' (https://arxiv.org/pdf/1903.10829.pdf)) │ 25.62M │ ? │ 22.87 │ 6.49 │2019│ │ EfficientNet-B0 ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 5,288,548 │ 414.31M │ 24.77 │ 7.52 │2019│ │ EfficientNet-B7b ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) │ 66,347,960 │39,010.98M│ 15.94 │ 3.22 │2019│ │ ProxylessNAS ('PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' (https://arxiv.org/pdf/1812.00332.pdf)) │ ? │ ? │ 24.9 │ 7.5 │2019│ │ MixNet-L ('MixNet: Mixed Depthwise Convolutional Kernels' ( https://arxiv.org/abs/1907.09595)) │ 7.3M │ ? │ 21.1 │ 5.8 │2019│ │ ECA-Net50 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 24.37M │ 3.86G │ 22.52 │ 6.32 │2019│ │ ECA-Net101 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) │ 7.3M │ 7.35G │ 21.35 │ 5.66 │2019│ │ ACNet-Densenet121 ('ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' (https://arxiv.org/abs/1908.03930)) │ ? │ ? │ 24.18 │ 7.23 │2019│ │ LIP-ResNet-50 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 23.9M │ 5.33G │ 21.81 │ 6.04 │2019│ │ LIP-ResNet-101 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 42.9M │ 9.06G │ 20.67 │ 5.40 │2019│ │ LIP-DenseNet-BC-121 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) │ 8.7M │ 4.13G │ 23.36 │ 6.84 │2019│ │ MuffNet_1.0 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'  │ 2.3M │ 146M │ 30.1 │ ? │2019│ │ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │  │  │  │  │  │ │ MuffNet_1.5 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'  │ 3.4M │ 300M │ 26.9 │ ? │2019│ │ (http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) │  │  │  │  │  │ │ ResNet-34-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 21.8M │3,672.68M │ 25.80 │ ? │2019│ │ ResNet-50-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 25.5M │3,877.95M │ 22.96 │ ? │2019│ │ MobileNetV2-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) │ 3,504,960 │ 329.36M │ 27.50 │ ? │2019│ │ FixRes ResNeXt101 WSL ('Fixing the train-test resolution discrepancy' (https://arxiv.org/abs/1906.06423)) │ 829M │ ? │ 13.6 │ 2.0 │2019│ │ Noisy Student(L2) ('Self-training with Noisy Student improves ImageNet classification' (https://arxiv.org/abs/1911.04252)) │ 480M │ ? │ 12.6 │ 1.8 │2019│ │ TResNet-M ('TResNet: High Performance GPU-Dedicated Architecture' (https://arxiv.org/abs/2003.13630)) │ 29.4M │ 5.5G │ 19.3 │ ? │2020│ │ DA-NAS-C ('DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' (https://arxiv.org/abs/2003.12563v1)) │ ? │ 467M │ 23.8 │ ? │2020│ │ ResNeSt-50 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 27.5M │ 5.39G │ 18.87 │ ? │2020│ │ ResNeSt-101 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) │ 48.3M │ 10.2G │ 17.73 │ ? │2020│ │ ResNet-50-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 25.5M │ 3.87G │ 22.40 │ ? │2020│ │ ResNet-101-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) │ 44.5M │ 7.6G │ 22.10 │ ? │2020│ │ ResNet-50-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.33 │ 4.91 │2020│ │ ResNet-50-MEALv2 + CutMix ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 25.6M │ ? │ 19.02 │ 4.65 │2020│ │ MobileNet V3-Large-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.48M │ ? │ 23.08 │ 6.68 │2020│ │ EfficientNet-B0-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) │ 5.29M │ ? │ 21.71 │ 6.05 │2020│ │ T2T-ViT-7 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 4.2M │ 0.6G │ 28.8 │ ? │2021│ │ T2T-ViT-14 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 19.4M │ 4.8G │ 19.4 │ ? │2021│ │ T2T-ViT-19 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) │ 39.0M │ 8.0G │ 18.8 │ ? │2021│ │ NFNet-F0 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 71.5M │ 12.38G │ 16.4 │ 3.2 │2021│ │ NFNet-F1 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 132.6M │ 35.54G │ 15.4 │ 2.9 │2021│ │ NFNet-F6+SAM ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) │ 438.4M │ 377.28G │ 13.5 │ 2.1 │2021│ │ EfficientNetV2-S ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 16.1 │ ? │2021│ │ EfficientNetV2-M ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 14.9 │ ? │2021│ │ EfficientNetV2-L ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 14.3 │ ? │2021│ │ EfficientNetV2-S (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 24M │ 8.8G │ 15.0 │ ? │2021│ │ EfficientNetV2-M (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 55M │ 24G │ 13.9 │ ? │2021│ │ EfficientNetV2-L (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) │ 121M │ 53G │ 13.2 │ ? │2021│ Segmentation models │ Model │Year│PASCAL-Context│Cityscapes (mIOU)│PASCAL VOC 2012 (mIOU)│COCO Stuff│ADE20K VAL (mIOU)│ ├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼──────────────┼─────────────────┼──────────────────────┼──────────┼─────────────────┤ │ U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation' (https://arxiv.org/pdf/1505.04597.pdf)) │2015│ ? │ ? │ ? │ ? │ ? │ │ DeconvNet ('Learning Deconvolution Network for Semantic Segmentation' (https://arxiv.org/pdf/1505.04366.pdf)) │2015│ ? │ ? │ 72.5 │ ? │ ? │ │ ParseNet ('ParseNet: Looking Wider to See Better' (https://arxiv.org/abs/1506.04579)) │2015│ 40.4 │ ? │ 69.8 │ ? │ ? │ │ Piecewise ('Efficient piecewise training of deep structured models for semantic segmentation' (https://arxiv.org/abs/1504.01013)) │2015│ 43.3 │ 71.6 │ 78.0 │ ? │ ? │ │ SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' (https://arxiv.org/pdf/1511.00561.pdf)) │2016│ ? │ 56.1 │ ? │ ? │ ? │ │ FCN ('Fully Convolutional Networks for Semantic Segmentation' (https://arxiv.org/pdf/1605.06211.pdf)) │2016│ 37.8 │ 65.3 │ 62.2 │ 22.7 │ 29.39 │ │ ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' (https://arxiv.org/pdf/1606.02147.pdf)) │2016│ ? │ 58.3 │ ? │ ? │ ? │ │ DilatedNet ('MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' (https://arxiv.org/pdf/1511.07122.pdf)) │2016│ ? │ ? │ 67.6 │ ? │ 32.31 │ │ PixelNet ('PixelNet: Towards a General Pixel-Level Architecture' (https://arxiv.org/pdf/1609.06694.pdf)) │2016│ ? │ ? │ 69.8 │ ? │ ? │ │ RefineNet ('RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' (https://arxiv.org/pdf/1611.06612.pdf)) │2016│ 47.3 │ 73.6 │ 83.4 │ 33.6 │ 40.70 │ │ LRR ('Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' (https://arxiv.org/pdf/1605.02264.pdf)) │2016│ ? │ 71.8 │ 79.3 │ ? │ ? │ │ FRRN ('Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' (https://arxiv.org/pdf/1611.08323.pdf)) │2016│ ? │ 71.8 │ ? │ ? │ ? │ │ MultiNet ('MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' (https://arxiv.org/pdf/1612.07695.pdf)) │2016│ ? │ ? │ ? │ ? │ ? │ │ DeepLab ('DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs'  │2017│ 45.7 │ 64.8 │ 79.7 │ ? │ ? │ │ (https://arxiv.org/pdf/1606.00915.pdf)) │  │  │  │  │  │  │ │ LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' (https://arxiv.org/pdf/1707.03718.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │ │ Tiramisu ('The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' (https://arxiv.org/pdf/1611.09326.pdf)) │2017│ ? │ ? │ ? │ ? │ ? │ │ ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images' (https://arxiv.org/pdf/1704.08545.pdf)) │2017│ ? │ 70.6 │ ? │ ? │ ? │ │ ERFNet ('Efficient ConvNet for Real-time Semantic Segmentation' (http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) │2017│ ? │ 68.0 │ ? │ ? │ ? │ │ PSPNet ('Pyramid Scene Parsing Network' (https://arxiv.org/pdf/1612.01105.pdf)) │2017│ 47.8 │ 80.2 │ 85.4 │ ? │ 44.94 │ │ GCN ('Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' (https://arxiv.org/pdf/1703.02719.pdf)) │2017│ ? │ 76.9 │ 82.2 │ ? │ ? │ │ Segaware ('Segmentation-Aware Convolutional Networks Using Local Attention Masks' (https://arxiv.org/pdf/1708.04607.pdf)) │2017│ ? │ ? │ 69.0 │ ? │ ? │ │ PixelDCN ('PIXEL DECONVOLUTIONAL NETWORKS' (https://arxiv.org/pdf/1705.06820.pdf)) │2017│ ? │ ? │ 73.0 │ ? │ ? │ │ DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation' (https://arxiv.org/pdf/1706.05587.pdf)) │2017│ ? │ ? │ 85.7 │ ? │ ? │ │ DUC, HDC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ ? │ 77.1 │ ? │ ? │ ? │ │ ShuffleSeg ('SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' (https://arxiv.org/pdf/1803.03816.pdf)) │2018│ ? │ 59.3 │ ? │ ? │ ? │ │ AdaptSegNet ('Learning to Adapt Structured Output Space for Semantic Segmentation' (https://arxiv.org/pdf/1802.10349.pdf)) │2018│ ? │ 46.7 │ ? │ ? │ ? │ │ TuSimple-DUC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) │2018│ 80.1 │ ? │ 83.1 │ ? │ ? │ │ R2U-Net ('Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation'  │2018│ ? │ ? │ ? │ ? │ ? │ │ (https://arxiv.org/pdf/1802.06955.pdf)) │  │  │  │  │  │  │ │ Attention U-Net ('Attention U-Net: Learning Where to Look for the Pancreas' (https://arxiv.org/pdf/1804.03999.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │ │ DANet ('Dual Attention Network for Scene Segmentation' (https://arxiv.org/pdf/1809.02983.pdf)) │2018│ 52.6 │ 81.5 │ ? │ 39.7 │ ? │ │ ENCNet ('Context Encoding for Semantic Segmentation' (https://arxiv.org/abs/1803.08904)) │2018│ 51.7 │ 75.8 │ 85.9 │ ? │ 44.65 │ │ ShelfNet ('ShelfNet for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1811.11254.pdf)) │2018│ 48.4 │ 75.8 │ 84.2 │ ? │ ? │ │ LadderNet ('LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' (https://arxiv.org/pdf/1810.07810.pdf)) │2018│ ? │ ? │ ? │ ? │ ? │ │ CCC-ERFnet ('Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' (https://arxiv.org/pdf/1812.04920v1.pdf)) │2018│ ? │ 69.01 │ ? │ ? │ ? │ │ DifNet-101 ('DifNet: Semantic Segmentation by Diffusion Networks'  │2018│ 45.1 │ ? │ 73.2 │ ? │ ? │ │ (http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf)) │  │  │  │  │  │  │ │ BiSeNet(Res18) ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1808.00897.pdf)) │2018│ ? │ ? │ 74.7 │ 28.1 │ ? │ │ ESPNet ('ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' (https://arxiv.org/pdf/1803.06815.pdf)) │2018│ ? │ ? │ 63.01 │ ? │ ? │ │ SPADE ('Semantic Image Synthesis with Spatially-Adaptive Normalization' (https://arxiv.org/pdf/1903.07291.pdf)) │2019│ ? │ 62.3 │ ? │ 37.4 │ 38.5 │ │ SeamlessSeg ('Seamless Scene Segmentation' (https://arxiv.org/pdf/1905.01220v1.pdf)) │2019│ ? │ 77.5 │ ? │ ? │ ? │ │ EMANet ('Expectation-Maximization Attention Networks for Semantic Segmentation' (https://arxiv.org/pdf/1907.13426.pdf)) │2019│ ? │ ? │ 88.2 │ 39.9 │ ? │ Detection models │ Model │Year│VOC07 (mAP@IoU=0.5)│VOC12 (mAP@IoU=0.5)│COCO (mAP)│ ├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼───────────────────┼───────────────────┼──────────┤ │ R-CNN ('Rich feature hierarchies for accurate object detection and semantic segmentation' (https://arxiv.org/pdf/1311.2524.pdf)) │2014│ 58.5 │ ? │ ? │ │ OverFeat ('OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' (https://arxiv.org/pdf/1312.6229.pdf)) │2014│ ? │ ? │ ? │ │ MultiBox ('Scalable Object Detection using Deep Neural Networks'  │2014│ 29.0 │ ? │ ? │ │ (https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf)) │  │  │  │  │ │ SPP-Net ('Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' (https://arxiv.org/pdf/1406.4729.pdf)) │2014│ 59.2 │ ? │ ? │ │ MR-CNN ('Object detection via a multi-region & semantic segmentation-aware CNN model'  │2015│ 78.2 │ 73.9 │ ? │ │ (https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf)) │  │  │  │  │ │ AttentionNet ('AttentionNet: Aggregating Weak Directions for Accurate Object Detection' (https://arxiv.org/pdf/1506.07704.pdf)) │2015│ ? │ ? │ ? │ │ Fast R-CNN ('Fast R-CNN' (https://arxiv.org/pdf/1504.08083.pdf)) │2015│ 70.0 │ 68.4 │ ? │ │ Fast R-CNN ('Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks'  │2015│ 73.2 │ 70.4 │ 36.8 │ │ (https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf)) │  │  │  │  │ │ YOLO v1 ('You Only Look Once: Unified, Real-Time Object Detection' (https://arxiv.org/pdf/1506.02640.pdf)) │2016│ 66.4 │ 57.9 │ ? │ │ G-CNN ('G-CNN: an Iterative Grid Based Object Detector' (https://arxiv.org/pdf/1512.07729.pdf)) │2016│ 66.8 │ 66.4 │ ? │ │ AZNet ('Adaptive Object Detection Using Adjacency and Zoom Prediction' (https://arxiv.org/pdf/1512.07711.pdf)) │2016│ 70.4 │ ? │ 22.3 │ │ ION ('Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' (https://arxiv.org/pdf/1512.04143.pdf)) │2016│ 80.1 │ 77.9 │ 33.1 │ │ HyperNet ('HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' (https://arxiv.org/pdf/1604.00600.pdf)) │2016│ 76.3 │ 71.4 │ ? │ │ OHEM ('Training Region-based Object Detectors with Online Hard Example Mining' (https://arxiv.org/pdf/1604.03540.pdf)) │2016│ 78.9 │ 76.3 │ 22.4 │ │ MPN ('A MultiPath Network for Object Detection' (https://arxiv.org/pdf/1604.02135.pdf)) │2016│ ? │ ? │ 33.2 │ │ SSD ('SSD: Single Shot MultiBox Detector' (https://arxiv.org/pdf/1512.02325.pdf)) │2016│ 76.8 │ 74.9 │ 31.2 │ │ GBDNet ('Crafting GBD-Net for Object Detection' (https://arxiv.org/pdf/1610.02579.pdf)) │2016│ 77.2 │ ? │ 27.0 │ │ CPF ('Contextual Priming and Feedback for Faster R-CNN' (https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) │2016│ 76.4 │ 72.6 │ ? │ │ MS-CNN ('A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' (https://arxiv.org/pdf/1607.07155.pdf)) │2016│ ? │ ? │ ? │ │ R-FCN ('R-FCN: Object Detection via Region-based Fully Convolutional Networks' (https://arxiv.org/pdf/1605.06409.pdf)) │2016│ 79.5 │ 77.6 │ 29.9 │ │ PVANET ('PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' (https://arxiv.org/pdf/1608.08021.pdf)) │2016│ ? │ ? │ ? │ │ DeepID-Net ('DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' (https://arxiv.org/pdf/1412.5661.pdf)) │2016│ 69.0 │ ? │ ? │ │ NoC ('Object Detection Networks on Convolutional Feature Maps' (https://arxiv.org/pdf/1504.06066.pdf)) │2016│ 71.6 │ 68.8 │ 27.2 │ │ DSSD ('DSSD : Deconvolutional Single Shot Detector' (https://arxiv.org/pdf/1701.06659.pdf)) │2017│ 81.5 │ 80.0 │ ? │ │ TDM ('Beyond Skip Connections: Top-Down Modulation for Object Detection' (https://arxiv.org/pdf/1612.06851.pdf)) │2017│ ? │ ? │ 37.3 │ │ FPN ('Feature Pyramid Networks for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) │2017│ ? │ ? │ 36.2 │ │ YOLO v2 ('YOLO9000: Better, Faster, Stronger' (https://arxiv.org/pdf/1612.08242.pdf)) │2017│ 78.6 │ 73.4 │ 21.6 │ │ RON ('RON: Reverse Connection with Objectness Prior Networks for Object Detection' (https://arxiv.org/pdf/1707.01691.pdf)) │2017│ 77.6 │ 75.4 │ ? │ │ DCN ('Deformable Convolutional Networks' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ ? │ │ DeNet ('DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' (https://arxiv.org/pdf/1703.10295.pdf)) │2017│ 77.1 │ 73.9 │ 33.8 │ │ CoupleNet ('CoupleNet: Coupling Global Structure with Local Parts for Object Detection' (https://arxiv.org/pdf/1708.02863.pdf)) │2017│ 82.7 │ 80.4 │ 34.4 │ │ RetinaNet ('Focal Loss for Dense Object Detection' (https://arxiv.org/pdf/1708.02002.pdf)) │2017│ ? │ ? │ 39.1 │ │ Mask R-CNN ('Mask R-CNN' (http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) │2017│ ? │ ? │ 39.8 │ │ DSOD ('DSOD: Learning Deeply Supervised Object Detectors from Scratch' (https://arxiv.org/pdf/1708.01241.pdf)) │2017│ 77.7 │ 76.3 │ ? │ │ SMN ('Spatial Memory for Context Reasoning in Object Detection' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) │2017│ 70.0 │ ? │ ? │ │ YOLO v3 ('YOLOv3: An Incremental Improvement' (https://pjreddie.com/media/files/papers/YOLOv3.pdf)) │2018│ ? │ ? │ 33.0 │ │ SIN ('Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships'  │2018│ 76.0 │ 73.1 │ 23.2 │ │ (http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf)) │  │  │  │  │ │ STDN ('Scale-Transferrable Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) │2018│ 80.9 │ ? │ ? │ │ RefineDet ('Single-Shot Refinement Neural Network for Object Detection'  │2018│ 83.8 │ 83.5 │ 41.8 │ │ (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf)) │  │  │  │  │ │ MegDet ('MegDet: A Large Mini-Batch Object Detector' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) │2018│ ? │ ? │ ? │ │ RFBNet ('Receptive Field Block Net for Accurate and Fast Object Detection' (https://arxiv.org/pdf/1711.07767.pdf)) │2018│ 82.2 │ ? │ ? │ │ CornerNet ('CornerNet: Detecting Objects as Paired Keypoints' (https://arxiv.org/pdf/1808.01244.pdf)) │2018│ ? │ ? │ 42.1 │ │ LibraRetinaNet ('Libra R-CNN: Towards Balanced Learning for Object Detection' (https://arxiv.org/pdf/1904.02701v1.pdf)) │2019│ ? │ ? │ 43.0 │ │ YOLACT-700 ('YOLACT Real-time Instance Segmentation' (https://arxiv.org/pdf/1904.02689v1.pdf)) │2019│ ? │ ? │ 31.2 │ │ DetNASNet(3.8) ('DetNAS: Backbone Search for Object Detection' (https://arxiv.org/pdf/1903.10979v2.pdf)) │2019│ ? │ ? │ 42.0 │ │ YOLOv4 ('YOLOv4: Optimal Speed and Accuracy of Object Detection' (https://arxiv.org/pdf/2004.10934.pdf)) │2020│ ? │ ? │ 46.7 │ │ SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 37.8 │ │ D-SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) │2020│ ? │ ? │ 40.5 │ │ SNIPER ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 86.6 │ ? │ 47.9 │ │ AutoFocus ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) │2021│ 85.8 │ ? │ 47.9 │