Files
awesome-awesomeness/readmes/computervisionmodels.md6
2024-04-20 19:22:54 +02:00

217 lines
63 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Awesome Computer Vision Models [![Awesome](https://awesome.re/badge-flat.svg)](https://awesome.re)
A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.
## Contents
- [Classification models](#classification-models)
- [Segmentation models](#segmentation-models)
- [Detection models](#detection-models)
## Classification models
| Model | Number of parameters | FLOPS | Top-1 Error | Top-5 Error | Year |
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------:|:---------------:|:----------------:|:--------------:|:-----:|
| AlexNet (['One weird trick for parallelizing convolutional neural networks'](https://arxiv.org/abs/1404.5997)) | 62.3M | 1,132.33M | 40.96 | 18.24 | 2014 |
| VGG-16 (['Very Deep Convolutional Networks for Large-Scale Image Recognition'](https://arxiv.org/abs/1409.1556)) | 138.3M | ? | 26.78 | 8.69 | 2014 |
| ResNet-10 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 5.5M | 894.04M | 34.69 | 14.36 | 2015 |
| ResNet-18 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 11.7M | 1,820.41M | 28.53 | 9.82 | 2015 |
| ResNet-34 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 21.8M | 3,672.68M | 24.84 | 7.80 | 2015 |
| ResNet-50 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 25.5M | 3,877.95M | 22.28 | 6.33 | 2015 |
| InceptionV3 (['Rethinking the Inception Architecture for Computer Vision'](https://arxiv.org/abs/1512.00567)) | 23.8M | ? | 21.2 | 5.6 | 2015 |
| PreResNet-18 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 11.7M | 1,820.56M | 28.43 | 9.72 | 2016 |
| PreResNet-34 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 21.8M | 3,672.83M | 24.89 | 7.74 | 2016 |
| PreResNet-50 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 25.6M | 3,875.44M | 22.40 | 6.47 | 2016 |
| DenseNet-121 (['Densely Connected Convolutional Networks'](https://arxiv.org/abs/1608.06993)) | 8.0M | 2,872.13M | 23.48 | 7.04 | 2016 |
| DenseNet-161 (['Densely Connected Convolutional Networks'](https://arxiv.org/abs/1608.06993)) | 28.7M | 7,793.16M | 22.86 | 6.44 | 2016 |
| PyramidNet-101 (['Deep Pyramidal Residual Networks'](https://arxiv.org/abs/1610.02915)) | 42.5M | 8,743.54M | 21.98 | 6.20 | 2016 |
| ResNeXt-14(32x4d) (['Aggregated Residual Transformations for Deep Neural Networks'](http://arxiv.org/abs/1611.05431)) | 9.5M | 1,603.46M | 30.32 | 11.46 | 2016 |
| ResNeXt-26(32x4d) (['Aggregated Residual Transformations for Deep Neural Networks'](http://arxiv.org/abs/1611.05431)) | 15.4M | 2,488.07M | 24.14 | 7.46 | 2016 |
| WRN-50-2 (['Wide Residual Networks'](https://arxiv.org/abs/1605.07146)) | 68.9M | 11,405.42M | 22.53 | 6.41 | 2016 |
| Xception (['Xception: Deep Learning with Depthwise Separable Convolutions'](https://arxiv.org/abs/1610.02357)) | 22,855,952 | 8,403.63M | 20.97 | 5.49 | 2016 |
| InceptionV4 (['Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'](https://arxiv.org/abs/1602.07261)) | 42,679,816 | 12,304.93M | 20.64 | 5.29 | 2016 |
| InceptionResNetV2 (['Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'](https://arxiv.org/abs/1602.07261)) | 55,843,464 | 13,188.64M | 19.93 | 4.90 | 2016 |
| PolyNet (['PolyNet: A Pursuit of Structural Diversity in Very Deep Networks'](https://arxiv.org/abs/1611.05725)) | 95,366,600 | 34,821.34M | 19.10 | 4.52 | 2016 |
| DarkNet Ref (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 7,319,416 | 367.59M | 38.58 | 17.18 | 2016 |
| DarkNet Tiny (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 1,042,104 | 500.85M | 40.74 | 17.84 | 2016 |
| DarkNet 53 (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 41,609,928 | 7,133.86M | 21.75 | 5.64 | 2016 |
| SqueezeResNet1.1 (['SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'](https://arxiv.org/abs/1602.07360)) | 1,235,496 | 352.02M | 40.09 | 18.21 | 2016 |
| SqueezeNet1.1 (['SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'](https://arxiv.org/abs/1602.07360)) | 1,235,496 | 352.02M | 39.31 | 17.72 | 2016 |
| ResAttNet-92 (['Residual Attention Network for Image Classification'](https://arxiv.org/abs/1704.06904)) | 51.3M | ? | 19.5 | 4.8 | 2017 |
| CondenseNet (G=C=8) (['CondenseNet: An Efficient DenseNet using Learned Group Convolutions'](https://arxiv.org/abs/1711.09224)) | 4.8M | ? | 26.2 | 8.3 | 2017 |
| DPN-68 (['Dual Path Networks'](https://arxiv.org/abs/1707.01629)) | 12,611,602 | 2,351.84M | 23.24 | 6.79 | 2017 |
| ShuffleNet x1.0 (g=1) (['ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices'](https://arxiv.org/abs/1707.01083)) | 1,531,936 | 148.13M | 34.93 | 13.89 | 2017 |
| DiracNetV2-18 (['DiracNets: Training Very Deep Neural Networks Without Skip-Connections'](https://arxiv.org/abs/1706.00388)) | 11,511,784 | 1,796.62M | 31.47 | 11.70 | 2017 |
| DiracNetV2-34 (['DiracNets: Training Very Deep Neural Networks Without Skip-Connections'](https://arxiv.org/abs/1706.00388)) | 21,616,232 | 3,646.93M | 28.75 | 9.93 | 2017 |
| SENet-16 (['Squeeze-and-Excitation Networks'](https://arxiv.org/abs/1709.01507)) | 31,366,168 | 5,081.30M | 25.65 | 8.20 | 2017 |
| SENet-154 (['Squeeze-and-Excitation Networks'](https://arxiv.org/abs/1709.01507)) | 115,088,984 | 20,745.78M | 18.62 | 4.61 | 2017 |
| MobileNet (['MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications'](https://arxiv.org/abs/1704.04861)) | 4,231,976 | 579.80M | 26.61 | 8.95 | 2017 |
| NASNet-A 4@1056 (['Learning Transferable Architectures for Scalable Image Recognition'](https://arxiv.org/abs/1707.07012)) | 5,289,978 | 584.90M | 25.68 | 8.16 | 2017 |
| NASNet-A 6@4032(['Learning Transferable Architectures for Scalable Image Recognition'](https://arxiv.org/abs/1707.07012)) | 88,753,150 | 23,976.44M | 18.14 | 4.21 | 2017 |
| DLA-34 (['Deep Layer Aggregation'](https://arxiv.org/abs/1707.06484)) | 15,742,104 | 3,071.37M | 25.36 | 7.94 | 2017 |
| AirNet50-1x64d (r=2) (['Attention Inspiring Receptive-Fields Network for Learning Invariant Representations'](https://ieeexplore.ieee.org/document/8510896)) | 27.43M | ? | 22.48 | 6.21 | 2018 |
| BAM-ResNet-50 (['BAM: Bottleneck Attention Module'](https://arxiv.org/abs/1807.06514)) | 25.92M | ? | 23.68 | 6.96 | 2018 |
| CBAM-ResNet-50 (['CBAM: Convolutional Block Attention Module'](https://arxiv.org/abs/1807.06521)) | 28.1M | ? | 23.02 | 6.38 | 2018 |
| 1.0-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 921,816 | 285.82M | 40.77 | 17.85 | 2018 |
| 1.5-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 1,953,616 | 550.97M | 33.81 | 13.01 | 2018 |
| 2.0-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 3,366,344 | 897.60M | 29.63 | 10.66 | 2018 |
| ShuffleNetV2 (['ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design'](https://arxiv.org/abs/1807.11164)) | 2,278,604 | 149.72M | 31.44 | 11.63 | 2018 |
| 456-MENet-24×1(g=3) (['Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications'](https://arxiv.org/abs/1803.09127)) | 5.3M | ? | 28.4 | 9.8 | 2018 |
| FD-MobileNet (['FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy'](https://arxiv.org/abs/1802.03750)) | 2,901,288 | 147.46M | 34.23 | 13.38 | 2018 |
| MobileNetV2 (['MobileNetV2: Inverted Residuals and Linear Bottlenecks'](https://arxiv.org/abs/1801.04381)) | 3,504,960 | 329.36M | 26.97 | 8.87 | 2018 |
| IGCV3 (['IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks'](https://arxiv.org/abs/1806.00178)) | 3.5M | ? | 28.22 | 9.54 | 2018 |
| DARTS (['DARTS: Differentiable Architecture Search'](https://arxiv.org/abs/1806.09055)) | 4.9M | ? | 26.9 | 9.0 | 2018 |
| PNASNet-5 (['Progressive Neural Architecture Search'](https://arxiv.org/abs/1712.00559)) | 5.1M | ? | 25.8 | 8.1 | 2018 |
| AmoebaNet-C (['Regularized Evolution for Image Classifier Architecture Search'](https://arxiv.org/abs/1802.01548)) | 5.1M | ? | 24.3 | 7.6 | 2018 |
| MnasNet (['MnasNet: Platform-Aware Neural Architecture Search for Mobile'](https://arxiv.org/abs/1807.11626)) | 4,308,816 | 317.67M | 31.58 | 11.74 | 2018 |
| IBN-Net50-a (['Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net'](https://arxiv.org/abs/1807.09441)) | ? | ? | 22.54 | 6.32 | 2018 |
| MarginNet (['Large Margin Deep Networks for Classification'](http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) | ? | ? | 22.0 | ? | 2018 |
| A^2 Net (['A^2-Nets: Double Attention Networks'](http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) | ? | ? | 23.0 | 6.5 | 2018 |
| FishNeXt-150 (['FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction'](http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf)) | 26.2M | ? | 21.5 | ? | 2018 |
| Shape-ResNet (['IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS'](https://arxiv.org/pdf/1811.12231v2.pdf)) | 25.5M | ? | 23.28 | 6.72 | 2019 |
| SimCNN(k=3 train) (['Greedy Layerwise Learning Can Scale to ImageNet'](https://arxiv.org/pdf/1812.11446.pdf)) | ? | ? | 28.4 | 10.2 | 2019 |
| SKNet-50 (['Selective Kernel Networks'](https://arxiv.org/pdf/1903.06586.pdf)) | 27.5M | ? | 20.79 | ? | 2019 |
| SRM-ResNet-50 (['SRM : A Style-based Recalibration Module for Convolutional Neural Networks'](https://arxiv.org/pdf/1903.10829.pdf)) | 25.62M | ? | 22.87 | 6.49 | 2019 |
| EfficientNet-B0 (['EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'](http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) | 5,288,548 | 414.31M | 24.77 | 7.52 | 2019 |
| EfficientNet-B7b (['EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'](http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) | 66,347,960 | 39,010.98M | 15.94 | 3.22 | 2019 |
| ProxylessNAS (['PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE'](https://arxiv.org/pdf/1812.00332.pdf)) | ? | ? | 24.9 | 7.5 | 2019 |
| MixNet-L (['MixNet: Mixed Depthwise Convolutional Kernels']( https://arxiv.org/abs/1907.09595)) | 7.3M | ? | 21.1 | 5.8 | 2019 |
| ECA-Net50 (['ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'](https://arxiv.org/pdf/1910.03151v1.pdf)) | 24.37M | 3.86G | 22.52 | 6.32 | 2019 |
| ECA-Net101 (['ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'](https://arxiv.org/pdf/1910.03151v1.pdf)) | 7.3M | 7.35G | 21.35 | 5.66 | 2019 |
| ACNet-Densenet121 (['ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks'](https://arxiv.org/abs/1908.03930)) | ? | ? | 24.18 | 7.23 | 2019 |
| LIP-ResNet-50 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 23.9M | 5.33G | 21.81 | 6.04 | 2019 |
| LIP-ResNet-101 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 42.9M | 9.06G | 20.67 | 5.40 | 2019 |
| LIP-DenseNet-BC-121 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 8.7M | 4.13G | 23.36 | 6.84 | 2019 |
| MuffNet_1.0 (['MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'](http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) | 2.3M | 146M | 30.1 | ? | 2019 |
| MuffNet_1.5 (['MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'](http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) | 3.4M | 300M | 26.9 | ? | 2019 |
| ResNet-34-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 21.8M | 3,672.68M | 25.80 | ? | 2019 |
| ResNet-50-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 25.5M | 3,877.95M | 22.96 | ? | 2019 |
| MobileNetV2-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 3,504,960 | 329.36M | 27.50 | ? | 2019 |
| FixRes ResNeXt101 WSL (['Fixing the train-test resolution discrepancy'](https://arxiv.org/abs/1906.06423)) | 829M | ? | 13.6 | 2.0 | 2019 |
| Noisy Student*(L2) (['Self-training with Noisy Student improves ImageNet classification'](https://arxiv.org/abs/1911.04252)) | 480M | ? | 12.6 | 1.8 | 2019 |
| TResNet-M (['TResNet: High Performance GPU-Dedicated Architecture'](https://arxiv.org/abs/2003.13630)) | 29.4M | 5.5G | 19.3 | ? | 2020 |
| DA-NAS-C (['DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search'](https://arxiv.org/abs/2003.12563v1)) | ? | 467M | 23.8 | ? | 2020 |
| ResNeSt-50 (['ResNeSt: Split-Attention Networks'](https://arxiv.org/abs/2004.08955)) | 27.5M | 5.39G | 18.87 | ? | 2020 |
| ResNeSt-101 (['ResNeSt: Split-Attention Networks'](https://arxiv.org/abs/2004.08955)) | 48.3M | 10.2G | 17.73 | ? | 2020 |
| ResNet-50-FReLU (['Funnel Activation for Visual Recognition'](https://arxiv.org/abs/2007.11824v2)) | 25.5M | 3.87G | 22.40 | ? | 2020 |
| ResNet-101-FReLU (['Funnel Activation for Visual Recognition'](https://arxiv.org/abs/2007.11824v2)) | 44.5M | 7.6G | 22.10 | ? | 2020 |
| ResNet-50-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 25.6M | ? | 19.33 | 4.91 | 2020 |
| ResNet-50-MEALv2 + CutMix (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 25.6M | ? | 19.02 | 4.65 | 2020 |
| MobileNet V3-Large-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 5.48M | ? | 23.08 | 6.68 | 2020 |
| EfficientNet-B0-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 5.29M | ? | 21.71 | 6.05 | 2020 |
| T2T-ViT-7 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 4.2M | 0.6G | 28.8 | ? | 2021 |
| T2T-ViT-14 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 19.4M | 4.8G | 19.4 | ? | 2021 |
| T2T-ViT-19 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 39.0M | 8.0G | 18.8 | ? | 2021 |
| NFNet-F0 (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 71.5M | 12.38G | 16.4 | 3.2 | 2021 |
| NFNet-F1 (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 132.6M | 35.54G | 15.4 | 2.9 | 2021 |
| NFNet-F6+SAM (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 438.4M | 377.28G | 13.5 | 2.1 | 2021 |
| EfficientNetV2-S (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 24M | 8.8G | 16.1 | ? | 2021 |
| EfficientNetV2-M (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 55M | 24G | 14.9 | ? | 2021 |
| EfficientNetV2-L (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 121M | 53G | 14.3 | ? | 2021 |
| EfficientNetV2-S (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 24M | 8.8G | 15.0 | ? | 2021 |
| EfficientNetV2-M (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 55M | 24G | 13.9 | ? | 2021 |
| EfficientNetV2-L (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 121M | 53G | 13.2 | ? | 2021 |
## Segmentation models
| Model | Year | PASCAL-Context | Cityscapes (mIOU) | PASCAL VOC 2012 (mIOU) | COCO Stuff | ADE20K VAL (mIOU) |
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----:|:-------------------:|:-------------------:|:----------------------:|:------------:|:----------------------:|
| U-Net (['U-Net: Convolutional Networks for Biomedical Image Segmentation'](https://arxiv.org/pdf/1505.04597.pdf)) | 2015 | ? | ? | ? | ? | ? |
| DeconvNet (['Learning Deconvolution Network for Semantic Segmentation'](https://arxiv.org/pdf/1505.04366.pdf)) | 2015 | ? | ? | 72.5 | ? | ? |
| ParseNet (['ParseNet: Looking Wider to See Better'](https://arxiv.org/abs/1506.04579)) | 2015 | 40.4 | ? | 69.8 | ? | ? |
| Piecewise (['Efficient piecewise training of deep structured models for semantic segmentation'](https://arxiv.org/abs/1504.01013)) | 2015 | 43.3 | 71.6 | 78.0 | ? | ? |
| SegNet (['SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation'](https://arxiv.org/pdf/1511.00561.pdf)) | 2016 | ? | 56.1 | ? | ? | ? |
| FCN (['Fully Convolutional Networks for Semantic Segmentation'](https://arxiv.org/pdf/1605.06211.pdf)) | 2016 | 37.8 | 65.3 | 62.2 | 22.7 | 29.39 |
| ENet (['ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation'](https://arxiv.org/pdf/1606.02147.pdf)) | 2016 | ? | 58.3 | ? | ? | ? |
| DilatedNet (['MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS'](https://arxiv.org/pdf/1511.07122.pdf)) | 2016 | ? | ? | 67.6 | ? | 32.31 |
| PixelNet (['PixelNet: Towards a General Pixel-Level Architecture'](https://arxiv.org/pdf/1609.06694.pdf)) | 2016 | ? | ? | 69.8 | ? | ? |
| RefineNet (['RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation'](https://arxiv.org/pdf/1611.06612.pdf)) | 2016 | 47.3 | 73.6 | 83.4 | 33.6 | 40.70 |
| LRR (['Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation'](https://arxiv.org/pdf/1605.02264.pdf)) | 2016 | ? | 71.8 | 79.3 | ? | ? |
| FRRN (['Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes'](https://arxiv.org/pdf/1611.08323.pdf)) | 2016 | ? | 71.8 | ? | ? | ? |
| MultiNet (['MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving'](https://arxiv.org/pdf/1612.07695.pdf)) | 2016 | ? | ? | ? | ? | ? |
| DeepLab (['DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs'](https://arxiv.org/pdf/1606.00915.pdf)) | 2017 | 45.7 | 64.8 | 79.7 | ? | ? |
| LinkNet (['LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation'](https://arxiv.org/pdf/1707.03718.pdf)) | 2017 | ? | ? | ? | ? | ? |
| Tiramisu (['The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation'](https://arxiv.org/pdf/1611.09326.pdf)) | 2017 | ? | ? | ? | ? | ? |
| ICNet (['ICNet for Real-Time Semantic Segmentation on High-Resolution Images'](https://arxiv.org/pdf/1704.08545.pdf)) | 2017 | ? | 70.6 | ? | ? | ? |
| ERFNet (['Efficient ConvNet for Real-time Semantic Segmentation'](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) | 2017 | ? | 68.0 | ? | ? | ? |
| PSPNet (['Pyramid Scene Parsing Network'](https://arxiv.org/pdf/1612.01105.pdf)) | 2017 | 47.8 | 80.2 | 85.4 | ? | 44.94 |
| GCN (['Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network'](https://arxiv.org/pdf/1703.02719.pdf)) | 2017 | ? | 76.9 | 82.2 | ? | ? |
| Segaware (['Segmentation-Aware Convolutional Networks Using Local Attention Masks'](https://arxiv.org/pdf/1708.04607.pdf)) | 2017 | ? | ? | 69.0 | ? | ? |
| PixelDCN (['PIXEL DECONVOLUTIONAL NETWORKS'](https://arxiv.org/pdf/1705.06820.pdf)) | 2017 | ? | ? | 73.0 | ? | ? |
| DeepLabv3 (['Rethinking Atrous Convolution for Semantic Image Segmentation'](https://arxiv.org/pdf/1706.05587.pdf)) | 2017 | ? | ? | 85.7 | ? | ? |
| DUC, HDC (['Understanding Convolution for Semantic Segmentation'](https://arxiv.org/pdf/1702.08502.pdf)) | 2018 | ? | 77.1 | ? | ? | ? |
| ShuffleSeg (['SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK'](https://arxiv.org/pdf/1803.03816.pdf)) | 2018 | ? | 59.3 | ? | ? | ? |
| AdaptSegNet (['Learning to Adapt Structured Output Space for Semantic Segmentation'](https://arxiv.org/pdf/1802.10349.pdf)) | 2018 | ? | 46.7 | ? | ? | ? |
| TuSimple-DUC (['Understanding Convolution for Semantic Segmentation'](https://arxiv.org/pdf/1702.08502.pdf)) | 2018 | 80.1 | ? | 83.1 | ? | ? |
| R2U-Net (['Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation'](https://arxiv.org/pdf/1802.06955.pdf)) | 2018 | ? | ? | ? | ? | ? |
| Attention U-Net (['Attention U-Net: Learning Where to Look for the Pancreas'](https://arxiv.org/pdf/1804.03999.pdf)) | 2018 | ? | ? | ? | ? | ? |
| DANet (['Dual Attention Network for Scene Segmentation'](https://arxiv.org/pdf/1809.02983.pdf)) | 2018 | 52.6 | 81.5 | ? | 39.7 | ? |
| ENCNet (['Context Encoding for Semantic Segmentation'](https://arxiv.org/abs/1803.08904)) | 2018 | 51.7 | 75.8 | 85.9 | ? | 44.65 |
| ShelfNet (['ShelfNet for Real-time Semantic Segmentation'](https://arxiv.org/pdf/1811.11254.pdf)) | 2018 | 48.4 | 75.8 | 84.2 | ? | ? |
| LadderNet (['LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION'](https://arxiv.org/pdf/1810.07810.pdf)) | 2018 | ? | ? | ? | ? | ? |
| CCC-ERFnet (['Concentrated-Comprehensive Convolutions for lightweight semantic segmentation'](https://arxiv.org/pdf/1812.04920v1.pdf)) | 2018 | ? | 69.01 | ? | ? | ? |
| DifNet-101 (['DifNet: Semantic Segmentation by Diffusion Networks'](http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf)) | 2018 | 45.1 | ? | 73.2 | ? | ? |
| BiSeNet(Res18) (['BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation'](https://arxiv.org/pdf/1808.00897.pdf)) | 2018 | ? | ? | 74.7 | 28.1 | ? |
| ESPNet (['ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation'](https://arxiv.org/pdf/1803.06815.pdf)) | 2018 | ? | ? | 63.01 | ? | ? |
| SPADE (['Semantic Image Synthesis with Spatially-Adaptive Normalization'](https://arxiv.org/pdf/1903.07291.pdf)) | 2019 | ? | 62.3 | ? | 37.4 | 38.5 |
| SeamlessSeg (['Seamless Scene Segmentation'](https://arxiv.org/pdf/1905.01220v1.pdf)) | 2019 | ? | 77.5 | ? | ? | ? |
| EMANet (['Expectation-Maximization Attention Networks for Semantic Segmentation'](https://arxiv.org/pdf/1907.13426.pdf)) | 2019 | ? | ? | 88.2 | 39.9 | ? |
## Detection models
| Model | Year | VOC07 (mAP@IoU=0.5) | VOC12 (mAP@IoU=0.5) | COCO (mAP) |
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----:|:-------------------:|:-------------------:|:----------:|
| R-CNN (['Rich feature hierarchies for accurate object detection and semantic segmentation'](https://arxiv.org/pdf/1311.2524.pdf)) | 2014 | 58.5 | ? | ? |
| OverFeat (['OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks'](https://arxiv.org/pdf/1312.6229.pdf)) | 2014 | ? | ? | ? |
| MultiBox (['Scalable Object Detection using Deep Neural Networks'](https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf)) | 2014 | 29.0 | ? | ? |
| SPP-Net (['Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition'](https://arxiv.org/pdf/1406.4729.pdf)) | 2014 | 59.2 | ? | ? |
| MR-CNN (['Object detection via a multi-region & semantic segmentation-aware CNN model'](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf)) | 2015 | 78.2 | 73.9 | ? |
| AttentionNet (['AttentionNet: Aggregating Weak Directions for Accurate Object Detection'](https://arxiv.org/pdf/1506.07704.pdf)) | 2015 | ? | ? | ? |
| Fast R-CNN (['Fast R-CNN'](https://arxiv.org/pdf/1504.08083.pdf)) | 2015 | 70.0 | 68.4 | ? |
| Fast R-CNN (['Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks'](https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf)) | 2015 | 73.2 | 70.4 | 36.8 |
| YOLO v1 (['You Only Look Once: Unified, Real-Time Object Detection'](https://arxiv.org/pdf/1506.02640.pdf)) | 2016 | 66.4 | 57.9 | ? |
| G-CNN (['G-CNN: an Iterative Grid Based Object Detector'](https://arxiv.org/pdf/1512.07729.pdf)) | 2016 | 66.8 | 66.4 | ? |
| AZNet (['Adaptive Object Detection Using Adjacency and Zoom Prediction'](https://arxiv.org/pdf/1512.07711.pdf)) | 2016 | 70.4 | ? | 22.3 |
| ION (['Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks'](https://arxiv.org/pdf/1512.04143.pdf)) | 2016 | 80.1 | 77.9 | 33.1 |
| HyperNet (['HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection'](https://arxiv.org/pdf/1604.00600.pdf)) | 2016 | 76.3 | 71.4 | ? |
| OHEM (['Training Region-based Object Detectors with Online Hard Example Mining'](https://arxiv.org/pdf/1604.03540.pdf)) | 2016 | 78.9 | 76.3 | 22.4 |
| MPN (['A MultiPath Network for Object Detection'](https://arxiv.org/pdf/1604.02135.pdf)) | 2016 | ? | ? | 33.2 |
| SSD (['SSD: Single Shot MultiBox Detector'](https://arxiv.org/pdf/1512.02325.pdf)) | 2016 | 76.8 | 74.9 | 31.2 |
| GBDNet (['Crafting GBD-Net for Object Detection'](https://arxiv.org/pdf/1610.02579.pdf)) | 2016 | 77.2 | ? | 27.0 |
| CPF (['Contextual Priming and Feedback for Faster R-CNN'](https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) | 2016 | 76.4 | 72.6 | ? |
| MS-CNN (['A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection'](https://arxiv.org/pdf/1607.07155.pdf)) | 2016 | ? | ? | ? |
| R-FCN (['R-FCN: Object Detection via Region-based Fully Convolutional Networks'](https://arxiv.org/pdf/1605.06409.pdf)) | 2016 | 79.5 | 77.6 | 29.9 |
| PVANET (['PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection'](https://arxiv.org/pdf/1608.08021.pdf)) | 2016 | ? | ? | ? |
| DeepID-Net (['DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection'](https://arxiv.org/pdf/1412.5661.pdf)) | 2016 | 69.0 | ? | ? |
| NoC (['Object Detection Networks on Convolutional Feature Maps'](https://arxiv.org/pdf/1504.06066.pdf)) | 2016 | 71.6 | 68.8 | 27.2 |
| DSSD (['DSSD : Deconvolutional Single Shot Detector'](https://arxiv.org/pdf/1701.06659.pdf)) | 2017 | 81.5 | 80.0 | ? |
| TDM (['Beyond Skip Connections: Top-Down Modulation for Object Detection'](https://arxiv.org/pdf/1612.06851.pdf)) | 2017 | ? | ? | 37.3 |
| FPN (['Feature Pyramid Networks for Object Detection'](http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) | 2017 | ? | ? | 36.2 |
| YOLO v2 (['YOLO9000: Better, Faster, Stronger'](https://arxiv.org/pdf/1612.08242.pdf)) | 2017 | 78.6 | 73.4 | 21.6 |
| RON (['RON: Reverse Connection with Objectness Prior Networks for Object Detection'](https://arxiv.org/pdf/1707.01691.pdf)) | 2017 | 77.6 | 75.4 | ? |
| DCN (['Deformable Convolutional Networks'](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) | 2017 | ? | ? | ? |
| DeNet (['DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling'](https://arxiv.org/pdf/1703.10295.pdf)) | 2017 | 77.1 | 73.9 | 33.8 |
| CoupleNet (['CoupleNet: Coupling Global Structure with Local Parts for Object Detection'](https://arxiv.org/pdf/1708.02863.pdf)) | 2017 | 82.7 | 80.4 | 34.4 |
| RetinaNet (['Focal Loss for Dense Object Detection'](https://arxiv.org/pdf/1708.02002.pdf)) | 2017 | ? | ? | 39.1 |
| Mask R-CNN (['Mask R-CNN'](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) | 2017 | ? | ? | 39.8 |
| DSOD (['DSOD: Learning Deeply Supervised Object Detectors from Scratch'](https://arxiv.org/pdf/1708.01241.pdf)) | 2017 | 77.7 | 76.3 | ? |
| SMN (['Spatial Memory for Context Reasoning in Object Detection'](http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) | 2017 | 70.0 | ? | ? |
| YOLO v3 (['YOLOv3: An Incremental Improvement'](https://pjreddie.com/media/files/papers/YOLOv3.pdf)) | 2018 | ? | ? | 33.0 |
| SIN (['Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf)) | 2018 | 76.0 | 73.1 | 23.2 |
| STDN (['Scale-Transferrable Object Detection'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) | 2018 | 80.9 | ? | ? |
| RefineDet (['Single-Shot Refinement Neural Network for Object Detection'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf)) | 2018 | 83.8 | 83.5 | 41.8 |
| MegDet (['MegDet: A Large Mini-Batch Object Detector'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) | 2018 | ? | ? | ? |
| RFBNet (['Receptive Field Block Net for Accurate and Fast Object Detection'](https://arxiv.org/pdf/1711.07767.pdf)) | 2018 | 82.2 | ? | ? |
| CornerNet (['CornerNet: Detecting Objects as Paired Keypoints'](https://arxiv.org/pdf/1808.01244.pdf)) | 2018 | ? | ? | 42.1 |
| LibraRetinaNet (['Libra R-CNN: Towards Balanced Learning for Object Detection'](https://arxiv.org/pdf/1904.02701v1.pdf)) | 2019 | ? | ? | 43.0 |
| YOLACT-700 (['YOLACT Real-time Instance Segmentation'](https://arxiv.org/pdf/1904.02689v1.pdf)) | 2019 | ? | ? | 31.2 |
| DetNASNet(3.8) (['DetNAS: Backbone Search for Object Detection'](https://arxiv.org/pdf/1903.10979v2.pdf)) | 2019 | ? | ? | 42.0 |
| YOLOv4 (['YOLOv4: Optimal Speed and Accuracy of Object Detection'](https://arxiv.org/pdf/2004.10934.pdf)) | 2020 | ? | ? | 46.7 |
| SOLO (['SOLO: Segmenting Objects by Locations'](https://arxiv.org/pdf/1912.04488v3.pdf)) | 2020 | ? | ? | 37.8 |
| D-SOLO (['SOLO: Segmenting Objects by Locations'](https://arxiv.org/pdf/1912.04488v3.pdf)) | 2020 | ? | ? | 40.5 |
| SNIPER (['Scale Normalized Image Pyramids with AutoFocus for Object Detection'](https://arxiv.org/pdf/2102.05646v1.pdf)) | 2021 | 86.6 | ? | 47.9 |
| AutoFocus (['Scale Normalized Image Pyramids with AutoFocus for Object Detection'](https://arxiv.org/pdf/2102.05646v1.pdf)) | 2021 | 85.8 | ? | 47.9 |