217 lines
63 KiB
Markdown
217 lines
63 KiB
Markdown
# Awesome Computer Vision Models [](https://awesome.re)
|
||
|
||
A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.
|
||
|
||
|
||
## Contents
|
||
|
||
- [Classification models](#classification-models)
|
||
- [Segmentation models](#segmentation-models)
|
||
- [Detection models](#detection-models)
|
||
|
||
|
||
## Classification models
|
||
|
||
| Model | Number of parameters | FLOPS | Top-1 Error | Top-5 Error | Year |
|
||
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------------------:|:---------------:|:----------------:|:--------------:|:-----:|
|
||
| AlexNet (['One weird trick for parallelizing convolutional neural networks'](https://arxiv.org/abs/1404.5997)) | 62.3M | 1,132.33M | 40.96 | 18.24 | 2014 |
|
||
| VGG-16 (['Very Deep Convolutional Networks for Large-Scale Image Recognition'](https://arxiv.org/abs/1409.1556)) | 138.3M | ? | 26.78 | 8.69 | 2014 |
|
||
| ResNet-10 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 5.5M | 894.04M | 34.69 | 14.36 | 2015 |
|
||
| ResNet-18 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 11.7M | 1,820.41M | 28.53 | 9.82 | 2015 |
|
||
| ResNet-34 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 21.8M | 3,672.68M | 24.84 | 7.80 | 2015 |
|
||
| ResNet-50 (['Deep Residual Learning for Image Recognition'](https://arxiv.org/abs/1512.03385)) | 25.5M | 3,877.95M | 22.28 | 6.33 | 2015 |
|
||
| InceptionV3 (['Rethinking the Inception Architecture for Computer Vision'](https://arxiv.org/abs/1512.00567)) | 23.8M | ? | 21.2 | 5.6 | 2015 |
|
||
| PreResNet-18 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 11.7M | 1,820.56M | 28.43 | 9.72 | 2016 |
|
||
| PreResNet-34 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 21.8M | 3,672.83M | 24.89 | 7.74 | 2016 |
|
||
| PreResNet-50 (['Identity Mappings in Deep Residual Networks'](https://arxiv.org/abs/1603.05027)) | 25.6M | 3,875.44M | 22.40 | 6.47 | 2016 |
|
||
| DenseNet-121 (['Densely Connected Convolutional Networks'](https://arxiv.org/abs/1608.06993)) | 8.0M | 2,872.13M | 23.48 | 7.04 | 2016 |
|
||
| DenseNet-161 (['Densely Connected Convolutional Networks'](https://arxiv.org/abs/1608.06993)) | 28.7M | 7,793.16M | 22.86 | 6.44 | 2016 |
|
||
| PyramidNet-101 (['Deep Pyramidal Residual Networks'](https://arxiv.org/abs/1610.02915)) | 42.5M | 8,743.54M | 21.98 | 6.20 | 2016 |
|
||
| ResNeXt-14(32x4d) (['Aggregated Residual Transformations for Deep Neural Networks'](http://arxiv.org/abs/1611.05431)) | 9.5M | 1,603.46M | 30.32 | 11.46 | 2016 |
|
||
| ResNeXt-26(32x4d) (['Aggregated Residual Transformations for Deep Neural Networks'](http://arxiv.org/abs/1611.05431)) | 15.4M | 2,488.07M | 24.14 | 7.46 | 2016 |
|
||
| WRN-50-2 (['Wide Residual Networks'](https://arxiv.org/abs/1605.07146)) | 68.9M | 11,405.42M | 22.53 | 6.41 | 2016 |
|
||
| Xception (['Xception: Deep Learning with Depthwise Separable Convolutions'](https://arxiv.org/abs/1610.02357)) | 22,855,952 | 8,403.63M | 20.97 | 5.49 | 2016 |
|
||
| InceptionV4 (['Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'](https://arxiv.org/abs/1602.07261)) | 42,679,816 | 12,304.93M | 20.64 | 5.29 | 2016 |
|
||
| InceptionResNetV2 (['Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning'](https://arxiv.org/abs/1602.07261)) | 55,843,464 | 13,188.64M | 19.93 | 4.90 | 2016 |
|
||
| PolyNet (['PolyNet: A Pursuit of Structural Diversity in Very Deep Networks'](https://arxiv.org/abs/1611.05725)) | 95,366,600 | 34,821.34M | 19.10 | 4.52 | 2016 |
|
||
| DarkNet Ref (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 7,319,416 | 367.59M | 38.58 | 17.18 | 2016 |
|
||
| DarkNet Tiny (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 1,042,104 | 500.85M | 40.74 | 17.84 | 2016 |
|
||
| DarkNet 53 (['Darknet: Open source neural networks in C'](https://github.com/pjreddie/darknet)) | 41,609,928 | 7,133.86M | 21.75 | 5.64 | 2016 |
|
||
| SqueezeResNet1.1 (['SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'](https://arxiv.org/abs/1602.07360)) | 1,235,496 | 352.02M | 40.09 | 18.21 | 2016 |
|
||
| SqueezeNet1.1 (['SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size'](https://arxiv.org/abs/1602.07360)) | 1,235,496 | 352.02M | 39.31 | 17.72 | 2016 |
|
||
| ResAttNet-92 (['Residual Attention Network for Image Classification'](https://arxiv.org/abs/1704.06904)) | 51.3M | ? | 19.5 | 4.8 | 2017 |
|
||
| CondenseNet (G=C=8) (['CondenseNet: An Efficient DenseNet using Learned Group Convolutions'](https://arxiv.org/abs/1711.09224)) | 4.8M | ? | 26.2 | 8.3 | 2017 |
|
||
| DPN-68 (['Dual Path Networks'](https://arxiv.org/abs/1707.01629)) | 12,611,602 | 2,351.84M | 23.24 | 6.79 | 2017 |
|
||
| ShuffleNet x1.0 (g=1) (['ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices'](https://arxiv.org/abs/1707.01083)) | 1,531,936 | 148.13M | 34.93 | 13.89 | 2017 |
|
||
| DiracNetV2-18 (['DiracNets: Training Very Deep Neural Networks Without Skip-Connections'](https://arxiv.org/abs/1706.00388)) | 11,511,784 | 1,796.62M | 31.47 | 11.70 | 2017 |
|
||
| DiracNetV2-34 (['DiracNets: Training Very Deep Neural Networks Without Skip-Connections'](https://arxiv.org/abs/1706.00388)) | 21,616,232 | 3,646.93M | 28.75 | 9.93 | 2017 |
|
||
| SENet-16 (['Squeeze-and-Excitation Networks'](https://arxiv.org/abs/1709.01507)) | 31,366,168 | 5,081.30M | 25.65 | 8.20 | 2017 |
|
||
| SENet-154 (['Squeeze-and-Excitation Networks'](https://arxiv.org/abs/1709.01507)) | 115,088,984 | 20,745.78M | 18.62 | 4.61 | 2017 |
|
||
| MobileNet (['MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications'](https://arxiv.org/abs/1704.04861)) | 4,231,976 | 579.80M | 26.61 | 8.95 | 2017 |
|
||
| NASNet-A 4@1056 (['Learning Transferable Architectures for Scalable Image Recognition'](https://arxiv.org/abs/1707.07012)) | 5,289,978 | 584.90M | 25.68 | 8.16 | 2017 |
|
||
| NASNet-A 6@4032(['Learning Transferable Architectures for Scalable Image Recognition'](https://arxiv.org/abs/1707.07012)) | 88,753,150 | 23,976.44M | 18.14 | 4.21 | 2017 |
|
||
| DLA-34 (['Deep Layer Aggregation'](https://arxiv.org/abs/1707.06484)) | 15,742,104 | 3,071.37M | 25.36 | 7.94 | 2017 |
|
||
| AirNet50-1x64d (r=2) (['Attention Inspiring Receptive-Fields Network for Learning Invariant Representations'](https://ieeexplore.ieee.org/document/8510896)) | 27.43M | ? | 22.48 | 6.21 | 2018 |
|
||
| BAM-ResNet-50 (['BAM: Bottleneck Attention Module'](https://arxiv.org/abs/1807.06514)) | 25.92M | ? | 23.68 | 6.96 | 2018 |
|
||
| CBAM-ResNet-50 (['CBAM: Convolutional Block Attention Module'](https://arxiv.org/abs/1807.06521)) | 28.1M | ? | 23.02 | 6.38 | 2018 |
|
||
| 1.0-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 921,816 | 285.82M | 40.77 | 17.85 | 2018 |
|
||
| 1.5-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 1,953,616 | 550.97M | 33.81 | 13.01 | 2018 |
|
||
| 2.0-SqNxt-23v5 (['SqueezeNext: Hardware-Aware Neural Network Design'](https://arxiv.org/abs/1803.10615)) | 3,366,344 | 897.60M | 29.63 | 10.66 | 2018 |
|
||
| ShuffleNetV2 (['ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design'](https://arxiv.org/abs/1807.11164)) | 2,278,604 | 149.72M | 31.44 | 11.63 | 2018 |
|
||
| 456-MENet-24×1(g=3) (['Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications'](https://arxiv.org/abs/1803.09127)) | 5.3M | ? | 28.4 | 9.8 | 2018 |
|
||
| FD-MobileNet (['FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy'](https://arxiv.org/abs/1802.03750)) | 2,901,288 | 147.46M | 34.23 | 13.38 | 2018 |
|
||
| MobileNetV2 (['MobileNetV2: Inverted Residuals and Linear Bottlenecks'](https://arxiv.org/abs/1801.04381)) | 3,504,960 | 329.36M | 26.97 | 8.87 | 2018 |
|
||
| IGCV3 (['IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks'](https://arxiv.org/abs/1806.00178)) | 3.5M | ? | 28.22 | 9.54 | 2018 |
|
||
| DARTS (['DARTS: Differentiable Architecture Search'](https://arxiv.org/abs/1806.09055)) | 4.9M | ? | 26.9 | 9.0 | 2018 |
|
||
| PNASNet-5 (['Progressive Neural Architecture Search'](https://arxiv.org/abs/1712.00559)) | 5.1M | ? | 25.8 | 8.1 | 2018 |
|
||
| AmoebaNet-C (['Regularized Evolution for Image Classifier Architecture Search'](https://arxiv.org/abs/1802.01548)) | 5.1M | ? | 24.3 | 7.6 | 2018 |
|
||
| MnasNet (['MnasNet: Platform-Aware Neural Architecture Search for Mobile'](https://arxiv.org/abs/1807.11626)) | 4,308,816 | 317.67M | 31.58 | 11.74 | 2018 |
|
||
| IBN-Net50-a (['Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net'](https://arxiv.org/abs/1807.09441)) | ? | ? | 22.54 | 6.32 | 2018 |
|
||
| MarginNet (['Large Margin Deep Networks for Classification'](http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) | ? | ? | 22.0 | ? | 2018 |
|
||
| A^2 Net (['A^2-Nets: Double Attention Networks'](http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) | ? | ? | 23.0 | 6.5 | 2018 |
|
||
| FishNeXt-150 (['FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction'](http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf)) | 26.2M | ? | 21.5 | ? | 2018 |
|
||
| Shape-ResNet (['IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS'](https://arxiv.org/pdf/1811.12231v2.pdf)) | 25.5M | ? | 23.28 | 6.72 | 2019 |
|
||
| SimCNN(k=3 train) (['Greedy Layerwise Learning Can Scale to ImageNet'](https://arxiv.org/pdf/1812.11446.pdf)) | ? | ? | 28.4 | 10.2 | 2019 |
|
||
| SKNet-50 (['Selective Kernel Networks'](https://arxiv.org/pdf/1903.06586.pdf)) | 27.5M | ? | 20.79 | ? | 2019 |
|
||
| SRM-ResNet-50 (['SRM : A Style-based Recalibration Module for Convolutional Neural Networks'](https://arxiv.org/pdf/1903.10829.pdf)) | 25.62M | ? | 22.87 | 6.49 | 2019 |
|
||
| EfficientNet-B0 (['EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'](http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) | 5,288,548 | 414.31M | 24.77 | 7.52 | 2019 |
|
||
| EfficientNet-B7b (['EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks'](http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) | 66,347,960 | 39,010.98M | 15.94 | 3.22 | 2019 |
|
||
| ProxylessNAS (['PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE'](https://arxiv.org/pdf/1812.00332.pdf)) | ? | ? | 24.9 | 7.5 | 2019 |
|
||
| MixNet-L (['MixNet: Mixed Depthwise Convolutional Kernels']( https://arxiv.org/abs/1907.09595)) | 7.3M | ? | 21.1 | 5.8 | 2019 |
|
||
| ECA-Net50 (['ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'](https://arxiv.org/pdf/1910.03151v1.pdf)) | 24.37M | 3.86G | 22.52 | 6.32 | 2019 |
|
||
| ECA-Net101 (['ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks'](https://arxiv.org/pdf/1910.03151v1.pdf)) | 7.3M | 7.35G | 21.35 | 5.66 | 2019 |
|
||
| ACNet-Densenet121 (['ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks'](https://arxiv.org/abs/1908.03930)) | ? | ? | 24.18 | 7.23 | 2019 |
|
||
| LIP-ResNet-50 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 23.9M | 5.33G | 21.81 | 6.04 | 2019 |
|
||
| LIP-ResNet-101 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 42.9M | 9.06G | 20.67 | 5.40 | 2019 |
|
||
| LIP-DenseNet-BC-121 (['LIP: Local Importance-based Pooling'](https://arxiv.org/abs/1908.04156)) | 8.7M | 4.13G | 23.36 | 6.84 | 2019 |
|
||
| MuffNet_1.0 (['MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'](http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) | 2.3M | 146M | 30.1 | ? | 2019 |
|
||
| MuffNet_1.5 (['MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning'](http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf)) | 3.4M | 300M | 26.9 | ? | 2019 |
|
||
| ResNet-34-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 21.8M | 3,672.68M | 25.80 | ? | 2019 |
|
||
| ResNet-50-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 25.5M | 3,877.95M | 22.96 | ? | 2019 |
|
||
| MobileNetV2-Bin-5 (['Making Convolutional Networks Shift-Invariant Again'](https://arxiv.org/abs/1904.11486)) | 3,504,960 | 329.36M | 27.50 | ? | 2019 |
|
||
| FixRes ResNeXt101 WSL (['Fixing the train-test resolution discrepancy'](https://arxiv.org/abs/1906.06423)) | 829M | ? | 13.6 | 2.0 | 2019 |
|
||
| Noisy Student*(L2) (['Self-training with Noisy Student improves ImageNet classification'](https://arxiv.org/abs/1911.04252)) | 480M | ? | 12.6 | 1.8 | 2019 |
|
||
| TResNet-M (['TResNet: High Performance GPU-Dedicated Architecture'](https://arxiv.org/abs/2003.13630)) | 29.4M | 5.5G | 19.3 | ? | 2020 |
|
||
| DA-NAS-C (['DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search'](https://arxiv.org/abs/2003.12563v1)) | ? | 467M | 23.8 | ? | 2020 |
|
||
| ResNeSt-50 (['ResNeSt: Split-Attention Networks'](https://arxiv.org/abs/2004.08955)) | 27.5M | 5.39G | 18.87 | ? | 2020 |
|
||
| ResNeSt-101 (['ResNeSt: Split-Attention Networks'](https://arxiv.org/abs/2004.08955)) | 48.3M | 10.2G | 17.73 | ? | 2020 |
|
||
| ResNet-50-FReLU (['Funnel Activation for Visual Recognition'](https://arxiv.org/abs/2007.11824v2)) | 25.5M | 3.87G | 22.40 | ? | 2020 |
|
||
| ResNet-101-FReLU (['Funnel Activation for Visual Recognition'](https://arxiv.org/abs/2007.11824v2)) | 44.5M | 7.6G | 22.10 | ? | 2020 |
|
||
| ResNet-50-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 25.6M | ? | 19.33 | 4.91 | 2020 |
|
||
| ResNet-50-MEALv2 + CutMix (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 25.6M | ? | 19.02 | 4.65 | 2020 |
|
||
| MobileNet V3-Large-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 5.48M | ? | 23.08 | 6.68 | 2020 |
|
||
| EfficientNet-B0-MEALv2 (['MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks'](https://arxiv.org/abs/2009.08453v1)) | 5.29M | ? | 21.71 | 6.05 | 2020 |
|
||
| T2T-ViT-7 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 4.2M | 0.6G | 28.8 | ? | 2021 |
|
||
| T2T-ViT-14 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 19.4M | 4.8G | 19.4 | ? | 2021 |
|
||
| T2T-ViT-19 (['Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet'](https://arxiv.org/abs/2101.11986v1)) | 39.0M | 8.0G | 18.8 | ? | 2021 |
|
||
| NFNet-F0 (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 71.5M | 12.38G | 16.4 | 3.2 | 2021 |
|
||
| NFNet-F1 (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 132.6M | 35.54G | 15.4 | 2.9 | 2021 |
|
||
| NFNet-F6+SAM (['High-Performance Large-Scale Image Recognition Without Normalization'](https://arxiv.org/abs/2102.06171)) | 438.4M | 377.28G | 13.5 | 2.1 | 2021 |
|
||
| EfficientNetV2-S (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 24M | 8.8G | 16.1 | ? | 2021 |
|
||
| EfficientNetV2-M (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 55M | 24G | 14.9 | ? | 2021 |
|
||
| EfficientNetV2-L (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 121M | 53G | 14.3 | ? | 2021 |
|
||
| EfficientNetV2-S (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 24M | 8.8G | 15.0 | ? | 2021 |
|
||
| EfficientNetV2-M (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 55M | 24G | 13.9 | ? | 2021 |
|
||
| EfficientNetV2-L (21k) (['EfficientNetV2: Smaller Models and Faster Training'](https://arxiv.org/abs/2104.00298)) | 121M | 53G | 13.2 | ? | 2021 |
|
||
|
||
|
||
## Segmentation models
|
||
|
||
| Model | Year | PASCAL-Context | Cityscapes (mIOU) | PASCAL VOC 2012 (mIOU) | COCO Stuff | ADE20K VAL (mIOU) |
|
||
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----:|:-------------------:|:-------------------:|:----------------------:|:------------:|:----------------------:|
|
||
| U-Net (['U-Net: Convolutional Networks for Biomedical Image Segmentation'](https://arxiv.org/pdf/1505.04597.pdf)) | 2015 | ? | ? | ? | ? | ? |
|
||
| DeconvNet (['Learning Deconvolution Network for Semantic Segmentation'](https://arxiv.org/pdf/1505.04366.pdf)) | 2015 | ? | ? | 72.5 | ? | ? |
|
||
| ParseNet (['ParseNet: Looking Wider to See Better'](https://arxiv.org/abs/1506.04579)) | 2015 | 40.4 | ? | 69.8 | ? | ? |
|
||
| Piecewise (['Efficient piecewise training of deep structured models for semantic segmentation'](https://arxiv.org/abs/1504.01013)) | 2015 | 43.3 | 71.6 | 78.0 | ? | ? |
|
||
| SegNet (['SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation'](https://arxiv.org/pdf/1511.00561.pdf)) | 2016 | ? | 56.1 | ? | ? | ? |
|
||
| FCN (['Fully Convolutional Networks for Semantic Segmentation'](https://arxiv.org/pdf/1605.06211.pdf)) | 2016 | 37.8 | 65.3 | 62.2 | 22.7 | 29.39 |
|
||
| ENet (['ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation'](https://arxiv.org/pdf/1606.02147.pdf)) | 2016 | ? | 58.3 | ? | ? | ? |
|
||
| DilatedNet (['MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS'](https://arxiv.org/pdf/1511.07122.pdf)) | 2016 | ? | ? | 67.6 | ? | 32.31 |
|
||
| PixelNet (['PixelNet: Towards a General Pixel-Level Architecture'](https://arxiv.org/pdf/1609.06694.pdf)) | 2016 | ? | ? | 69.8 | ? | ? |
|
||
| RefineNet (['RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation'](https://arxiv.org/pdf/1611.06612.pdf)) | 2016 | 47.3 | 73.6 | 83.4 | 33.6 | 40.70 |
|
||
| LRR (['Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation'](https://arxiv.org/pdf/1605.02264.pdf)) | 2016 | ? | 71.8 | 79.3 | ? | ? |
|
||
| FRRN (['Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes'](https://arxiv.org/pdf/1611.08323.pdf)) | 2016 | ? | 71.8 | ? | ? | ? |
|
||
| MultiNet (['MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving'](https://arxiv.org/pdf/1612.07695.pdf)) | 2016 | ? | ? | ? | ? | ? |
|
||
| DeepLab (['DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs'](https://arxiv.org/pdf/1606.00915.pdf)) | 2017 | 45.7 | 64.8 | 79.7 | ? | ? |
|
||
| LinkNet (['LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation'](https://arxiv.org/pdf/1707.03718.pdf)) | 2017 | ? | ? | ? | ? | ? |
|
||
| Tiramisu (['The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation'](https://arxiv.org/pdf/1611.09326.pdf)) | 2017 | ? | ? | ? | ? | ? |
|
||
| ICNet (['ICNet for Real-Time Semantic Segmentation on High-Resolution Images'](https://arxiv.org/pdf/1704.08545.pdf)) | 2017 | ? | 70.6 | ? | ? | ? |
|
||
| ERFNet (['Efficient ConvNet for Real-time Semantic Segmentation'](http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) | 2017 | ? | 68.0 | ? | ? | ? |
|
||
| PSPNet (['Pyramid Scene Parsing Network'](https://arxiv.org/pdf/1612.01105.pdf)) | 2017 | 47.8 | 80.2 | 85.4 | ? | 44.94 |
|
||
| GCN (['Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network'](https://arxiv.org/pdf/1703.02719.pdf)) | 2017 | ? | 76.9 | 82.2 | ? | ? |
|
||
| Segaware (['Segmentation-Aware Convolutional Networks Using Local Attention Masks'](https://arxiv.org/pdf/1708.04607.pdf)) | 2017 | ? | ? | 69.0 | ? | ? |
|
||
| PixelDCN (['PIXEL DECONVOLUTIONAL NETWORKS'](https://arxiv.org/pdf/1705.06820.pdf)) | 2017 | ? | ? | 73.0 | ? | ? |
|
||
| DeepLabv3 (['Rethinking Atrous Convolution for Semantic Image Segmentation'](https://arxiv.org/pdf/1706.05587.pdf)) | 2017 | ? | ? | 85.7 | ? | ? |
|
||
| DUC, HDC (['Understanding Convolution for Semantic Segmentation'](https://arxiv.org/pdf/1702.08502.pdf)) | 2018 | ? | 77.1 | ? | ? | ? |
|
||
| ShuffleSeg (['SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK'](https://arxiv.org/pdf/1803.03816.pdf)) | 2018 | ? | 59.3 | ? | ? | ? |
|
||
| AdaptSegNet (['Learning to Adapt Structured Output Space for Semantic Segmentation'](https://arxiv.org/pdf/1802.10349.pdf)) | 2018 | ? | 46.7 | ? | ? | ? |
|
||
| TuSimple-DUC (['Understanding Convolution for Semantic Segmentation'](https://arxiv.org/pdf/1702.08502.pdf)) | 2018 | 80.1 | ? | 83.1 | ? | ? |
|
||
| R2U-Net (['Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation'](https://arxiv.org/pdf/1802.06955.pdf)) | 2018 | ? | ? | ? | ? | ? |
|
||
| Attention U-Net (['Attention U-Net: Learning Where to Look for the Pancreas'](https://arxiv.org/pdf/1804.03999.pdf)) | 2018 | ? | ? | ? | ? | ? |
|
||
| DANet (['Dual Attention Network for Scene Segmentation'](https://arxiv.org/pdf/1809.02983.pdf)) | 2018 | 52.6 | 81.5 | ? | 39.7 | ? |
|
||
| ENCNet (['Context Encoding for Semantic Segmentation'](https://arxiv.org/abs/1803.08904)) | 2018 | 51.7 | 75.8 | 85.9 | ? | 44.65 |
|
||
| ShelfNet (['ShelfNet for Real-time Semantic Segmentation'](https://arxiv.org/pdf/1811.11254.pdf)) | 2018 | 48.4 | 75.8 | 84.2 | ? | ? |
|
||
| LadderNet (['LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION'](https://arxiv.org/pdf/1810.07810.pdf)) | 2018 | ? | ? | ? | ? | ? |
|
||
| CCC-ERFnet (['Concentrated-Comprehensive Convolutions for lightweight semantic segmentation'](https://arxiv.org/pdf/1812.04920v1.pdf)) | 2018 | ? | 69.01 | ? | ? | ? |
|
||
| DifNet-101 (['DifNet: Semantic Segmentation by Diffusion Networks'](http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf)) | 2018 | 45.1 | ? | 73.2 | ? | ? |
|
||
| BiSeNet(Res18) (['BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation'](https://arxiv.org/pdf/1808.00897.pdf)) | 2018 | ? | ? | 74.7 | 28.1 | ? |
|
||
| ESPNet (['ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation'](https://arxiv.org/pdf/1803.06815.pdf)) | 2018 | ? | ? | 63.01 | ? | ? |
|
||
| SPADE (['Semantic Image Synthesis with Spatially-Adaptive Normalization'](https://arxiv.org/pdf/1903.07291.pdf)) | 2019 | ? | 62.3 | ? | 37.4 | 38.5 |
|
||
| SeamlessSeg (['Seamless Scene Segmentation'](https://arxiv.org/pdf/1905.01220v1.pdf)) | 2019 | ? | 77.5 | ? | ? | ? |
|
||
| EMANet (['Expectation-Maximization Attention Networks for Semantic Segmentation'](https://arxiv.org/pdf/1907.13426.pdf)) | 2019 | ? | ? | 88.2 | 39.9 | ? |
|
||
|
||
## Detection models
|
||
|
||
| Model | Year | VOC07 (mAP@IoU=0.5) | VOC12 (mAP@IoU=0.5) | COCO (mAP) |
|
||
|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----:|:-------------------:|:-------------------:|:----------:|
|
||
| R-CNN (['Rich feature hierarchies for accurate object detection and semantic segmentation'](https://arxiv.org/pdf/1311.2524.pdf)) | 2014 | 58.5 | ? | ? |
|
||
| OverFeat (['OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks'](https://arxiv.org/pdf/1312.6229.pdf)) | 2014 | ? | ? | ? |
|
||
| MultiBox (['Scalable Object Detection using Deep Neural Networks'](https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf)) | 2014 | 29.0 | ? | ? |
|
||
| SPP-Net (['Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition'](https://arxiv.org/pdf/1406.4729.pdf)) | 2014 | 59.2 | ? | ? |
|
||
| MR-CNN (['Object detection via a multi-region & semantic segmentation-aware CNN model'](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf)) | 2015 | 78.2 | 73.9 | ? |
|
||
| AttentionNet (['AttentionNet: Aggregating Weak Directions for Accurate Object Detection'](https://arxiv.org/pdf/1506.07704.pdf)) | 2015 | ? | ? | ? |
|
||
| Fast R-CNN (['Fast R-CNN'](https://arxiv.org/pdf/1504.08083.pdf)) | 2015 | 70.0 | 68.4 | ? |
|
||
| Fast R-CNN (['Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks'](https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf)) | 2015 | 73.2 | 70.4 | 36.8 |
|
||
| YOLO v1 (['You Only Look Once: Unified, Real-Time Object Detection'](https://arxiv.org/pdf/1506.02640.pdf)) | 2016 | 66.4 | 57.9 | ? |
|
||
| G-CNN (['G-CNN: an Iterative Grid Based Object Detector'](https://arxiv.org/pdf/1512.07729.pdf)) | 2016 | 66.8 | 66.4 | ? |
|
||
| AZNet (['Adaptive Object Detection Using Adjacency and Zoom Prediction'](https://arxiv.org/pdf/1512.07711.pdf)) | 2016 | 70.4 | ? | 22.3 |
|
||
| ION (['Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks'](https://arxiv.org/pdf/1512.04143.pdf)) | 2016 | 80.1 | 77.9 | 33.1 |
|
||
| HyperNet (['HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection'](https://arxiv.org/pdf/1604.00600.pdf)) | 2016 | 76.3 | 71.4 | ? |
|
||
| OHEM (['Training Region-based Object Detectors with Online Hard Example Mining'](https://arxiv.org/pdf/1604.03540.pdf)) | 2016 | 78.9 | 76.3 | 22.4 |
|
||
| MPN (['A MultiPath Network for Object Detection'](https://arxiv.org/pdf/1604.02135.pdf)) | 2016 | ? | ? | 33.2 |
|
||
| SSD (['SSD: Single Shot MultiBox Detector'](https://arxiv.org/pdf/1512.02325.pdf)) | 2016 | 76.8 | 74.9 | 31.2 |
|
||
| GBDNet (['Crafting GBD-Net for Object Detection'](https://arxiv.org/pdf/1610.02579.pdf)) | 2016 | 77.2 | ? | 27.0 |
|
||
| CPF (['Contextual Priming and Feedback for Faster R-CNN'](https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) | 2016 | 76.4 | 72.6 | ? |
|
||
| MS-CNN (['A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection'](https://arxiv.org/pdf/1607.07155.pdf)) | 2016 | ? | ? | ? |
|
||
| R-FCN (['R-FCN: Object Detection via Region-based Fully Convolutional Networks'](https://arxiv.org/pdf/1605.06409.pdf)) | 2016 | 79.5 | 77.6 | 29.9 |
|
||
| PVANET (['PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection'](https://arxiv.org/pdf/1608.08021.pdf)) | 2016 | ? | ? | ? |
|
||
| DeepID-Net (['DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection'](https://arxiv.org/pdf/1412.5661.pdf)) | 2016 | 69.0 | ? | ? |
|
||
| NoC (['Object Detection Networks on Convolutional Feature Maps'](https://arxiv.org/pdf/1504.06066.pdf)) | 2016 | 71.6 | 68.8 | 27.2 |
|
||
| DSSD (['DSSD : Deconvolutional Single Shot Detector'](https://arxiv.org/pdf/1701.06659.pdf)) | 2017 | 81.5 | 80.0 | ? |
|
||
| TDM (['Beyond Skip Connections: Top-Down Modulation for Object Detection'](https://arxiv.org/pdf/1612.06851.pdf)) | 2017 | ? | ? | 37.3 |
|
||
| FPN (['Feature Pyramid Networks for Object Detection'](http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) | 2017 | ? | ? | 36.2 |
|
||
| YOLO v2 (['YOLO9000: Better, Faster, Stronger'](https://arxiv.org/pdf/1612.08242.pdf)) | 2017 | 78.6 | 73.4 | 21.6 |
|
||
| RON (['RON: Reverse Connection with Objectness Prior Networks for Object Detection'](https://arxiv.org/pdf/1707.01691.pdf)) | 2017 | 77.6 | 75.4 | ? |
|
||
| DCN (['Deformable Convolutional Networks'](http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) | 2017 | ? | ? | ? |
|
||
| DeNet (['DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling'](https://arxiv.org/pdf/1703.10295.pdf)) | 2017 | 77.1 | 73.9 | 33.8 |
|
||
| CoupleNet (['CoupleNet: Coupling Global Structure with Local Parts for Object Detection'](https://arxiv.org/pdf/1708.02863.pdf)) | 2017 | 82.7 | 80.4 | 34.4 |
|
||
| RetinaNet (['Focal Loss for Dense Object Detection'](https://arxiv.org/pdf/1708.02002.pdf)) | 2017 | ? | ? | 39.1 |
|
||
| Mask R-CNN (['Mask R-CNN'](http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) | 2017 | ? | ? | 39.8 |
|
||
| DSOD (['DSOD: Learning Deeply Supervised Object Detectors from Scratch'](https://arxiv.org/pdf/1708.01241.pdf)) | 2017 | 77.7 | 76.3 | ? |
|
||
| SMN (['Spatial Memory for Context Reasoning in Object Detection'](http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) | 2017 | 70.0 | ? | ? |
|
||
| YOLO v3 (['YOLOv3: An Incremental Improvement'](https://pjreddie.com/media/files/papers/YOLOv3.pdf)) | 2018 | ? | ? | 33.0 |
|
||
| SIN (['Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf)) | 2018 | 76.0 | 73.1 | 23.2 |
|
||
| STDN (['Scale-Transferrable Object Detection'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) | 2018 | 80.9 | ? | ? |
|
||
| RefineDet (['Single-Shot Refinement Neural Network for Object Detection'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf)) | 2018 | 83.8 | 83.5 | 41.8 |
|
||
| MegDet (['MegDet: A Large Mini-Batch Object Detector'](http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) | 2018 | ? | ? | ? |
|
||
| RFBNet (['Receptive Field Block Net for Accurate and Fast Object Detection'](https://arxiv.org/pdf/1711.07767.pdf)) | 2018 | 82.2 | ? | ? |
|
||
| CornerNet (['CornerNet: Detecting Objects as Paired Keypoints'](https://arxiv.org/pdf/1808.01244.pdf)) | 2018 | ? | ? | 42.1 |
|
||
| LibraRetinaNet (['Libra R-CNN: Towards Balanced Learning for Object Detection'](https://arxiv.org/pdf/1904.02701v1.pdf)) | 2019 | ? | ? | 43.0 |
|
||
| YOLACT-700 (['YOLACT Real-time Instance Segmentation'](https://arxiv.org/pdf/1904.02689v1.pdf)) | 2019 | ? | ? | 31.2 |
|
||
| DetNASNet(3.8) (['DetNAS: Backbone Search for Object Detection'](https://arxiv.org/pdf/1903.10979v2.pdf)) | 2019 | ? | ? | 42.0 |
|
||
| YOLOv4 (['YOLOv4: Optimal Speed and Accuracy of Object Detection'](https://arxiv.org/pdf/2004.10934.pdf)) | 2020 | ? | ? | 46.7 |
|
||
| SOLO (['SOLO: Segmenting Objects by Locations'](https://arxiv.org/pdf/1912.04488v3.pdf)) | 2020 | ? | ? | 37.8 |
|
||
| D-SOLO (['SOLO: Segmenting Objects by Locations'](https://arxiv.org/pdf/1912.04488v3.pdf)) | 2020 | ? | ? | 40.5 |
|
||
| SNIPER (['Scale Normalized Image Pyramids with AutoFocus for Object Detection'](https://arxiv.org/pdf/2102.05646v1.pdf)) | 2021 | 86.6 | ? | 47.9 |
|
||
| AutoFocus (['Scale Normalized Image Pyramids with AutoFocus for Object Detection'](https://arxiv.org/pdf/2102.05646v1.pdf)) | 2021 | 85.8 | ? | 47.9 |
|