Files
awesome-awesomeness/terminal/computervisionmodels
2024-04-23 15:17:38 +02:00

131 KiB

Awesome Computer Vision Models !Awesome (https://awesome.re/badge-flat.svg) (https://awesome.re)
 
A curated list of popular classification, segmentation and detection models with corresponding evaluation metrics from papers.
 
 
Contents
 
- Classification models (#classification-models)
- Segmentation models (#segmentation-models)
- Detection models (#detection-models)
 
 
Classification models
 
Model Number of parameters FLOPS Top-1 ErrorTop-5 ErrorYear
├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────────────────┼──────────┼───────────┼───────────┼────┤
AlexNet ('One weird trick for parallelizing convolutional neural networks' (https://arxiv.org/abs/1404.5997)) 62.3M 1,132.33M 40.96 18.24 2014
VGG-16 ('Very Deep Convolutional Networks for Large-Scale Image Recognition' (https://arxiv.org/abs/1409.1556)) 138.3M ? 26.78 8.69 2014
ResNet-10 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) 5.5M 894.04M 34.69 14.36 2015
ResNet-18 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) 11.7M 1,820.41M 28.53 9.82 2015
ResNet-34 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) 21.8M 3,672.68M 24.84 7.80 2015
ResNet-50 ('Deep Residual Learning for Image Recognition' (https://arxiv.org/abs/1512.03385)) 25.5M 3,877.95M 22.28 6.33 2015
InceptionV3 ('Rethinking the Inception Architecture for Computer Vision' (https://arxiv.org/abs/1512.00567)) 23.8M ? 21.2 5.6 2015
PreResNet-18 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) 11.7M 1,820.56M 28.43 9.72 2016
PreResNet-34 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) 21.8M 3,672.83M 24.89 7.74 2016
PreResNet-50 ('Identity Mappings in Deep Residual Networks' (https://arxiv.org/abs/1603.05027)) 25.6M 3,875.44M 22.40 6.47 2016
DenseNet-121 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) 8.0M 2,872.13M 23.48 7.04 2016
DenseNet-161 ('Densely Connected Convolutional Networks' (https://arxiv.org/abs/1608.06993)) 28.7M 7,793.16M 22.86 6.44 2016
PyramidNet-101 ('Deep Pyramidal Residual Networks' (https://arxiv.org/abs/1610.02915)) 42.5M 8,743.54M 21.98 6.20 2016
ResNeXt-14(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) 9.5M 1,603.46M 30.32 11.46 2016
ResNeXt-26(32x4d) ('Aggregated Residual Transformations for Deep Neural Networks' (http://arxiv.org/abs/1611.05431)) 15.4M 2,488.07M 24.14 7.46 2016
WRN-50-2 ('Wide Residual Networks' (https://arxiv.org/abs/1605.07146)) 68.9M 11,405.42M 22.53 6.41 2016
Xception ('Xception: Deep Learning with Depthwise Separable Convolutions' (https://arxiv.org/abs/1610.02357)) 22,855,952 8,403.63M 20.97 5.49 2016
InceptionV4 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) 42,679,816 12,304.93M 20.64 5.29 2016
InceptionResNetV2 ('Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning' (https://arxiv.org/abs/1602.07261)) 55,843,464 13,188.64M 19.93 4.90 2016
PolyNet ('PolyNet: A Pursuit of Structural Diversity in Very Deep Networks' (https://arxiv.org/abs/1611.05725)) 95,366,600 34,821.34M 19.10 4.52 2016
DarkNet Ref ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) 7,319,416 367.59M 38.58 17.18 2016
DarkNet Tiny ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) 1,042,104 500.85M 40.74 17.84 2016
DarkNet 53 ('Darknet: Open source neural networks in C' (https://github.com/pjreddie/darknet)) 41,609,928 7,133.86M 21.75 5.64 2016
SqueezeResNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) 1,235,496 352.02M 40.09 18.21 2016
SqueezeNet1.1 ('SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size' (https://arxiv.org/abs/1602.07360)) 1,235,496 352.02M 39.31 17.72 2016
ResAttNet-92 ('Residual Attention Network for Image Classification' (https://arxiv.org/abs/1704.06904)) 51.3M ? 19.5 4.8 2017
CondenseNet (G=C=8) ('CondenseNet: An Efficient DenseNet using Learned Group Convolutions' (https://arxiv.org/abs/1711.09224)) 4.8M ? 26.2 8.3 2017
DPN-68 ('Dual Path Networks' (https://arxiv.org/abs/1707.01629)) 12,611,602 2,351.84M 23.24 6.79 2017
ShuffleNet x1.0 (g=1) ('ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices' (https://arxiv.org/abs/1707.01083)) 1,531,936 148.13M 34.93 13.89 2017
DiracNetV2-18 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) 11,511,784 1,796.62M 31.47 11.70 2017
DiracNetV2-34 ('DiracNets: Training Very Deep Neural Networks Without Skip-Connections' (https://arxiv.org/abs/1706.00388)) 21,616,232 3,646.93M 28.75 9.93 2017
SENet-16 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) 31,366,168 5,081.30M 25.65 8.20 2017
SENet-154 ('Squeeze-and-Excitation Networks' (https://arxiv.org/abs/1709.01507)) 115,088,984 20,745.78M 18.62 4.61 2017
MobileNet ('MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications' (https://arxiv.org/abs/1704.04861)) 4,231,976 579.80M 26.61 8.95 2017
NASNet-A 4@1056 ('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) 5,289,978 584.90M 25.68 8.16 2017
NASNet-A 6@4032('Learning Transferable Architectures for Scalable Image Recognition' (https://arxiv.org/abs/1707.07012)) 88,753,150 23,976.44M 18.14 4.21 2017
DLA-34 ('Deep Layer Aggregation' (https://arxiv.org/abs/1707.06484)) 15,742,104 3,071.37M 25.36 7.94 2017
AirNet50-1x64d (r=2) ('Attention Inspiring Receptive-Fields Network for Learning Invariant Representations' (https://ieeexplore.ieee.org/document/8510896)) 27.43M ? 22.48 6.21 2018
BAM-ResNet-50 ('BAM: Bottleneck Attention Module' (https://arxiv.org/abs/1807.06514)) 25.92M ? 23.68 6.96 2018
CBAM-ResNet-50 ('CBAM: Convolutional Block Attention Module' (https://arxiv.org/abs/1807.06521)) 28.1M ? 23.02 6.38 2018
1.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) 921,816 285.82M 40.77 17.85 2018
1.5-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) 1,953,616 550.97M 33.81 13.01 2018
2.0-SqNxt-23v5 ('SqueezeNext: Hardware-Aware Neural Network Design' (https://arxiv.org/abs/1803.10615)) 3,366,344 897.60M 29.63 10.66 2018
ShuffleNetV2 ('ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design' (https://arxiv.org/abs/1807.11164)) 2,278,604 149.72M 31.44 11.63 2018
456-MENet-24×1(g=3) ('Merging and Evolution: Improving Convolutional Neural Networks for Mobile Applications' (https://arxiv.org/abs/1803.09127)) 5.3M ? 28.4 9.8 2018
FD-MobileNet ('FD-MobileNet: Improved MobileNet with A Fast Downsampling Strategy' (https://arxiv.org/abs/1802.03750)) 2,901,288 147.46M 34.23 13.38 2018
MobileNetV2 ('MobileNetV2: Inverted Residuals and Linear Bottlenecks' (https://arxiv.org/abs/1801.04381)) 3,504,960 329.36M 26.97 8.87 2018
IGCV3 ('IGCV3: Interleaved Low-Rank Group Convolutions for Efficient Deep Neural Networks' (https://arxiv.org/abs/1806.00178)) 3.5M ? 28.22 9.54 2018
DARTS ('DARTS: Differentiable Architecture Search' (https://arxiv.org/abs/1806.09055)) 4.9M ? 26.9 9.0 2018
PNASNet-5 ('Progressive Neural Architecture Search' (https://arxiv.org/abs/1712.00559)) 5.1M ? 25.8 8.1 2018
AmoebaNet-C ('Regularized Evolution for Image Classifier Architecture Search' (https://arxiv.org/abs/1802.01548)) 5.1M ? 24.3 7.6 2018
MnasNet ('MnasNet: Platform-Aware Neural Architecture Search for Mobile' (https://arxiv.org/abs/1807.11626)) 4,308,816 317.67M 31.58 11.74 2018
IBN-Net50-a ('Two at Once: Enhancing Learning andGeneralization Capacities via IBN-Net' (https://arxiv.org/abs/1807.09441)) ? ? 22.54 6.32 2018
MarginNet ('Large Margin Deep Networks for Classification' (http://papers.nips.cc/paper/7364-large-margin-deep-networks-for-classification.pdf)) ? ? 22.0 ? 2018
A^2 Net ('A^2-Nets: Double Attention Networks' (http://papers.nips.cc/paper/7318-a2-nets-double-attention-networks.pdf)) ? ? 23.0 6.5 2018
FishNeXt-150 ('FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction' 26.2M ? 21.5 ? 2018
(http://papers.nips.cc/paper/7356-fishnet-a-versatile-backbone-for-image-region-and-pixel-level-prediction.pdf))
Shape-ResNet ('IMAGENET-TRAINED CNNS ARE BIASED TOWARDS TEXTURE; INCREASING SHAPE BIAS IMPROVES ACCURACY AND ROBUSTNESS' (https://arxiv.org/pdf/1811.12231v2.pdf)) 25.5M ? 23.28 6.72 2019
SimCNN(k=3 train) ('Greedy Layerwise Learning Can Scale to ImageNet' (https://arxiv.org/pdf/1812.11446.pdf)) ? ? 28.4 10.2 2019
SKNet-50 ('Selective Kernel Networks' (https://arxiv.org/pdf/1903.06586.pdf)) 27.5M ? 20.79 ? 2019
SRM-ResNet-50 ('SRM : A Style-based Recalibration Module for Convolutional Neural Networks' (https://arxiv.org/pdf/1903.10829.pdf)) 25.62M ? 22.87 6.49 2019
EfficientNet-B0 ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) 5,288,548 414.31M 24.77 7.52 2019
EfficientNet-B7b ('EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks' (http://proceedings.mlr.press/v97/tan19a/tan19a.pdf)) 66,347,960 39,010.98M 15.94 3.22 2019
ProxylessNAS ('PROXYLESSNAS: DIRECT NEURAL ARCHITECTURE SEARCH ON TARGET TASK AND HARDWARE' (https://arxiv.org/pdf/1812.00332.pdf)) ? ? 24.9 7.5 2019
MixNet-L ('MixNet: Mixed Depthwise Convolutional Kernels' ( https://arxiv.org/abs/1907.09595)) 7.3M ? 21.1 5.8 2019
ECA-Net50 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) 24.37M 3.86G 22.52 6.32 2019
ECA-Net101 ('ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks' (https://arxiv.org/pdf/1910.03151v1.pdf)) 7.3M 7.35G 21.35 5.66 2019
ACNet-Densenet121 ('ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks' (https://arxiv.org/abs/1908.03930)) ? ? 24.18 7.23 2019
LIP-ResNet-50 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) 23.9M 5.33G 21.81 6.04 2019
LIP-ResNet-101 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) 42.9M 9.06G 20.67 5.40 2019
LIP-DenseNet-BC-121 ('LIP: Local Importance-based Pooling' (https://arxiv.org/abs/1908.04156)) 8.7M 4.13G 23.36 6.84 2019
MuffNet_1.0 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' 2.3M 146M 30.1 ? 2019
(http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf))
MuffNet_1.5 ('MuffNet: Multi-Layer Feature Federation for Mobile Deep Learning' 3.4M 300M 26.9 ? 2019
(http://openaccess.thecvf.com/content_ICCVW_2019/papers/CEFRL/Chen_MuffNet_Multi-Layer_Feature_Federation_for_Mobile_Deep_Learning_ICCVW_2019_paper.pdf))
ResNet-34-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) 21.8M 3,672.68M 25.80 ? 2019
ResNet-50-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) 25.5M 3,877.95M 22.96 ? 2019
MobileNetV2-Bin-5 ('Making Convolutional Networks Shift-Invariant Again' (https://arxiv.org/abs/1904.11486)) 3,504,960 329.36M 27.50 ? 2019
FixRes ResNeXt101 WSL ('Fixing the train-test resolution discrepancy' (https://arxiv.org/abs/1906.06423)) 829M ? 13.6 2.0 2019
Noisy Student(L2) ('Self-training with Noisy Student improves ImageNet classification' (https://arxiv.org/abs/1911.04252)) 480M ? 12.6 1.8 2019
TResNet-M ('TResNet: High Performance GPU-Dedicated Architecture' (https://arxiv.org/abs/2003.13630)) 29.4M 5.5G 19.3 ? 2020
DA-NAS-C ('DA-NAS: Data Adapted Pruning for Efficient Neural Architecture Search' (https://arxiv.org/abs/2003.12563v1)) ? 467M 23.8 ? 2020
ResNeSt-50 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) 27.5M 5.39G 18.87 ? 2020
ResNeSt-101 ('ResNeSt: Split-Attention Networks' (https://arxiv.org/abs/2004.08955)) 48.3M 10.2G 17.73 ? 2020
ResNet-50-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) 25.5M 3.87G 22.40 ? 2020
ResNet-101-FReLU ('Funnel Activation for Visual Recognition' (https://arxiv.org/abs/2007.11824v2)) 44.5M 7.6G 22.10 ? 2020
ResNet-50-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) 25.6M ? 19.33 4.91 2020
ResNet-50-MEALv2 + CutMix ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) 25.6M ? 19.02 4.65 2020
MobileNet V3-Large-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) 5.48M ? 23.08 6.68 2020
EfficientNet-B0-MEALv2 ('MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks' (https://arxiv.org/abs/2009.08453v1)) 5.29M ? 21.71 6.05 2020
T2T-ViT-7 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) 4.2M 0.6G 28.8 ? 2021
T2T-ViT-14 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) 19.4M 4.8G 19.4 ? 2021
T2T-ViT-19 ('Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet' (https://arxiv.org/abs/2101.11986v1)) 39.0M 8.0G 18.8 ? 2021
NFNet-F0 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) 71.5M 12.38G 16.4 3.2 2021
NFNet-F1 ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) 132.6M 35.54G 15.4 2.9 2021
NFNet-F6+SAM ('High-Performance Large-Scale Image Recognition Without Normalization' (https://arxiv.org/abs/2102.06171)) 438.4M 377.28G 13.5 2.1 2021
EfficientNetV2-S ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 24M 8.8G 16.1 ? 2021
EfficientNetV2-M ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 55M 24G 14.9 ? 2021
EfficientNetV2-L ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 121M 53G 14.3 ? 2021
EfficientNetV2-S (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 24M 8.8G 15.0 ? 2021
EfficientNetV2-M (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 55M 24G 13.9 ? 2021
EfficientNetV2-L (21k) ('EfficientNetV2: Smaller Models and Faster Training' (https://arxiv.org/abs/2104.00298)) 121M 53G 13.2 ? 2021
 
 
Segmentation models
 
Model YearPASCAL-ContextCityscapes (mIOU)PASCAL VOC 2012 (mIOU)COCO StuffADE20K VAL (mIOU)
├────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼──────────────┼─────────────────┼──────────────────────┼──────────┼─────────────────┤
U-Net ('U-Net: Convolutional Networks for Biomedical Image Segmentation' (https://arxiv.org/pdf/1505.04597.pdf)) 2015 ? ? ? ? ?
DeconvNet ('Learning Deconvolution Network for Semantic Segmentation' (https://arxiv.org/pdf/1505.04366.pdf)) 2015 ? ? 72.5 ? ?
ParseNet ('ParseNet: Looking Wider to See Better' (https://arxiv.org/abs/1506.04579)) 2015 40.4 ? 69.8 ? ?
Piecewise ('Efficient piecewise training of deep structured models for semantic segmentation' (https://arxiv.org/abs/1504.01013)) 2015 43.3 71.6 78.0 ? ?
SegNet ('SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation' (https://arxiv.org/pdf/1511.00561.pdf)) 2016 ? 56.1 ? ? ?
FCN ('Fully Convolutional Networks for Semantic Segmentation' (https://arxiv.org/pdf/1605.06211.pdf)) 2016 37.8 65.3 62.2 22.7 29.39
ENet ('ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation' (https://arxiv.org/pdf/1606.02147.pdf)) 2016 ? 58.3 ? ? ?
DilatedNet ('MULTI-SCALE CONTEXT AGGREGATION BY DILATED CONVOLUTIONS' (https://arxiv.org/pdf/1511.07122.pdf)) 2016 ? ? 67.6 ? 32.31
PixelNet ('PixelNet: Towards a General Pixel-Level Architecture' (https://arxiv.org/pdf/1609.06694.pdf)) 2016 ? ? 69.8 ? ?
RefineNet ('RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation' (https://arxiv.org/pdf/1611.06612.pdf)) 2016 47.3 73.6 83.4 33.6 40.70
LRR ('Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation' (https://arxiv.org/pdf/1605.02264.pdf)) 2016 ? 71.8 79.3 ? ?
FRRN ('Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes' (https://arxiv.org/pdf/1611.08323.pdf)) 2016 ? 71.8 ? ? ?
MultiNet ('MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving' (https://arxiv.org/pdf/1612.07695.pdf)) 2016 ? ? ? ? ?
DeepLab ('DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs' 2017 45.7 64.8 79.7 ? ?
(https://arxiv.org/pdf/1606.00915.pdf))
LinkNet ('LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation' (https://arxiv.org/pdf/1707.03718.pdf)) 2017 ? ? ? ? ?
Tiramisu ('The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation' (https://arxiv.org/pdf/1611.09326.pdf)) 2017 ? ? ? ? ?
ICNet ('ICNet for Real-Time Semantic Segmentation on High-Resolution Images' (https://arxiv.org/pdf/1704.08545.pdf)) 2017 ? 70.6 ? ? ?
ERFNet ('Efficient ConvNet for Real-time Semantic Segmentation' (http://www.robesafe.uah.es/personal/eduardo.romera/pdfs/Romera17iv.pdf)) 2017 ? 68.0 ? ? ?
PSPNet ('Pyramid Scene Parsing Network' (https://arxiv.org/pdf/1612.01105.pdf)) 2017 47.8 80.2 85.4 ? 44.94
GCN ('Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network' (https://arxiv.org/pdf/1703.02719.pdf)) 2017 ? 76.9 82.2 ? ?
Segaware ('Segmentation-Aware Convolutional Networks Using Local Attention Masks' (https://arxiv.org/pdf/1708.04607.pdf)) 2017 ? ? 69.0 ? ?
PixelDCN ('PIXEL DECONVOLUTIONAL NETWORKS' (https://arxiv.org/pdf/1705.06820.pdf)) 2017 ? ? 73.0 ? ?
DeepLabv3 ('Rethinking Atrous Convolution for Semantic Image Segmentation' (https://arxiv.org/pdf/1706.05587.pdf)) 2017 ? ? 85.7 ? ?
DUC, HDC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) 2018 ? 77.1 ? ? ?
ShuffleSeg ('SHUFFLESEG: REAL-TIME SEMANTIC SEGMENTATION NETWORK' (https://arxiv.org/pdf/1803.03816.pdf)) 2018 ? 59.3 ? ? ?
AdaptSegNet ('Learning to Adapt Structured Output Space for Semantic Segmentation' (https://arxiv.org/pdf/1802.10349.pdf)) 2018 ? 46.7 ? ? ?
TuSimple-DUC ('Understanding Convolution for Semantic Segmentation' (https://arxiv.org/pdf/1702.08502.pdf)) 2018 80.1 ? 83.1 ? ?
R2U-Net ('Recurrent Residual Convolutional Neural Network based on U-Net (R2U-Net) for Medical Image Segmentation' 2018 ? ? ? ? ?
(https://arxiv.org/pdf/1802.06955.pdf))
Attention U-Net ('Attention U-Net: Learning Where to Look for the Pancreas' (https://arxiv.org/pdf/1804.03999.pdf)) 2018 ? ? ? ? ?
DANet ('Dual Attention Network for Scene Segmentation' (https://arxiv.org/pdf/1809.02983.pdf)) 2018 52.6 81.5 ? 39.7 ?
ENCNet ('Context Encoding for Semantic Segmentation' (https://arxiv.org/abs/1803.08904)) 2018 51.7 75.8 85.9 ? 44.65
ShelfNet ('ShelfNet for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1811.11254.pdf)) 2018 48.4 75.8 84.2 ? ?
LadderNet ('LADDERNET: MULTI-PATH NETWORKS BASED ON U-NET FOR MEDICAL IMAGE SEGMENTATION' (https://arxiv.org/pdf/1810.07810.pdf)) 2018 ? ? ? ? ?
CCC-ERFnet ('Concentrated-Comprehensive Convolutions for lightweight semantic segmentation' (https://arxiv.org/pdf/1812.04920v1.pdf)) 2018 ? 69.01 ? ? ?
DifNet-101 ('DifNet: Semantic Segmentation by Diffusion Networks' 2018 45.1 ? 73.2 ? ?
(http://papers.nips.cc/paper/7435-difnet-semantic-segmentation-by-diffusion-networks.pdf))
BiSeNet(Res18) ('BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation' (https://arxiv.org/pdf/1808.00897.pdf)) 2018 ? ? 74.7 28.1 ?
ESPNet ('ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation' (https://arxiv.org/pdf/1803.06815.pdf)) 2018 ? ? 63.01 ? ?
SPADE ('Semantic Image Synthesis with Spatially-Adaptive Normalization' (https://arxiv.org/pdf/1903.07291.pdf)) 2019 ? 62.3 ? 37.4 38.5
SeamlessSeg ('Seamless Scene Segmentation' (https://arxiv.org/pdf/1905.01220v1.pdf)) 2019 ? 77.5 ? ? ?
EMANet ('Expectation-Maximization Attention Networks for Semantic Segmentation' (https://arxiv.org/pdf/1907.13426.pdf)) 2019 ? ? 88.2 39.9 ?
 
Detection models
 
Model YearVOC07 (mAP@IoU=0.5)VOC12 (mAP@IoU=0.5)COCO (mAP)
├──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼────┼───────────────────┼───────────────────┼──────────┤
R-CNN ('Rich feature hierarchies for accurate object detection and semantic segmentation' (https://arxiv.org/pdf/1311.2524.pdf)) 2014 58.5 ? ?
OverFeat ('OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks' (https://arxiv.org/pdf/1312.6229.pdf)) 2014 ? ? ?
MultiBox ('Scalable Object Detection using Deep Neural Networks' 2014 29.0 ? ?
(https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Erhan_Scalable_Object_Detection_2014_CVPR_paper.pdf))
SPP-Net ('Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition' (https://arxiv.org/pdf/1406.4729.pdf)) 2014 59.2 ? ?
MR-CNN ('Object detection via a multi-region & semantic segmentation-aware CNN model' 2015 78.2 73.9 ?
(https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Gidaris_Object_Detection_via_ICCV_2015_paper.pdf))
AttentionNet ('AttentionNet: Aggregating Weak Directions for Accurate Object Detection' (https://arxiv.org/pdf/1506.07704.pdf)) 2015 ? ? ?
Fast R-CNN ('Fast R-CNN' (https://arxiv.org/pdf/1504.08083.pdf)) 2015 70.0 68.4 ?
Fast R-CNN ('Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks' 2015 73.2 70.4 36.8
(https://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf))
YOLO v1 ('You Only Look Once: Unified, Real-Time Object Detection' (https://arxiv.org/pdf/1506.02640.pdf)) 2016 66.4 57.9 ?
G-CNN ('G-CNN: an Iterative Grid Based Object Detector' (https://arxiv.org/pdf/1512.07729.pdf)) 2016 66.8 66.4 ?
AZNet ('Adaptive Object Detection Using Adjacency and Zoom Prediction' (https://arxiv.org/pdf/1512.07711.pdf)) 2016 70.4 ? 22.3
ION ('Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks' (https://arxiv.org/pdf/1512.04143.pdf)) 2016 80.1 77.9 33.1
HyperNet ('HyperNet: Towards Accurate Region Proposal Generation and Joint Object Detection' (https://arxiv.org/pdf/1604.00600.pdf)) 2016 76.3 71.4 ?
OHEM ('Training Region-based Object Detectors with Online Hard Example Mining' (https://arxiv.org/pdf/1604.03540.pdf)) 2016 78.9 76.3 22.4
MPN ('A MultiPath Network for Object Detection' (https://arxiv.org/pdf/1604.02135.pdf)) 2016 ? ? 33.2
SSD ('SSD: Single Shot MultiBox Detector' (https://arxiv.org/pdf/1512.02325.pdf)) 2016 76.8 74.9 31.2
GBDNet ('Crafting GBD-Net for Object Detection' (https://arxiv.org/pdf/1610.02579.pdf)) 2016 77.2 ? 27.0
CPF ('Contextual Priming and Feedback for Faster R-CNN' (https://pdfs.semanticscholar.org/40e7/4473cb82231559cbaeaa44989e9bbfe7ec3f.pdf)) 2016 76.4 72.6 ?
MS-CNN ('A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection' (https://arxiv.org/pdf/1607.07155.pdf)) 2016 ? ? ?
R-FCN ('R-FCN: Object Detection via Region-based Fully Convolutional Networks' (https://arxiv.org/pdf/1605.06409.pdf)) 2016 79.5 77.6 29.9
PVANET ('PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection' (https://arxiv.org/pdf/1608.08021.pdf)) 2016 ? ? ?
DeepID-Net ('DeepID-Net: Deformable Deep Convolutional Neural Networks for Object Detection' (https://arxiv.org/pdf/1412.5661.pdf)) 2016 69.0 ? ?
NoC ('Object Detection Networks on Convolutional Feature Maps' (https://arxiv.org/pdf/1504.06066.pdf)) 2016 71.6 68.8 27.2
DSSD ('DSSD : Deconvolutional Single Shot Detector' (https://arxiv.org/pdf/1701.06659.pdf)) 2017 81.5 80.0 ?
TDM ('Beyond Skip Connections: Top-Down Modulation for Object Detection' (https://arxiv.org/pdf/1612.06851.pdf)) 2017 ? ? 37.3
FPN ('Feature Pyramid Networks for Object Detection' (http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf)) 2017 ? ? 36.2
YOLO v2 ('YOLO9000: Better, Faster, Stronger' (https://arxiv.org/pdf/1612.08242.pdf)) 2017 78.6 73.4 21.6
RON ('RON: Reverse Connection with Objectness Prior Networks for Object Detection' (https://arxiv.org/pdf/1707.01691.pdf)) 2017 77.6 75.4 ?
DCN ('Deformable Convolutional Networks' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf)) 2017 ? ? ?
DeNet ('DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling' (https://arxiv.org/pdf/1703.10295.pdf)) 2017 77.1 73.9 33.8
CoupleNet ('CoupleNet: Coupling Global Structure with Local Parts for Object Detection' (https://arxiv.org/pdf/1708.02863.pdf)) 2017 82.7 80.4 34.4
RetinaNet ('Focal Loss for Dense Object Detection' (https://arxiv.org/pdf/1708.02002.pdf)) 2017 ? ? 39.1
Mask R-CNN ('Mask R-CNN' (http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf)) 2017 ? ? 39.8
DSOD ('DSOD: Learning Deeply Supervised Object Detectors from Scratch' (https://arxiv.org/pdf/1708.01241.pdf)) 2017 77.7 76.3 ?
SMN ('Spatial Memory for Context Reasoning in Object Detection' (http://openaccess.thecvf.com/content_ICCV_2017/papers/Chen_Spatial_Memory_for_ICCV_2017_paper.pdf)) 2017 70.0 ? ?
YOLO v3 ('YOLOv3: An Incremental Improvement' (https://pjreddie.com/media/files/papers/YOLOv3.pdf)) 2018 ? ? 33.0
SIN ('Structure Inference Net: Object Detection Using Scene-Level Context and Instance-Level Relationships' 2018 76.0 73.1 23.2
(http://openaccess.thecvf.com/content_cvpr_2018/papers/Liu_Structure_Inference_Net_CVPR_2018_paper.pdf))
STDN ('Scale-Transferrable Object Detection' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf)) 2018 80.9 ? ?
RefineDet ('Single-Shot Refinement Neural Network for Object Detection' 2018 83.8 83.5 41.8
(http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf))
MegDet ('MegDet: A Large Mini-Batch Object Detector' (http://openaccess.thecvf.com/content_cvpr_2018/papers/Peng_MegDet_A_Large_CVPR_2018_paper.pdf)) 2018 ? ? ?
RFBNet ('Receptive Field Block Net for Accurate and Fast Object Detection' (https://arxiv.org/pdf/1711.07767.pdf)) 2018 82.2 ? ?
CornerNet ('CornerNet: Detecting Objects as Paired Keypoints' (https://arxiv.org/pdf/1808.01244.pdf)) 2018 ? ? 42.1
LibraRetinaNet ('Libra R-CNN: Towards Balanced Learning for Object Detection' (https://arxiv.org/pdf/1904.02701v1.pdf)) 2019 ? ? 43.0
YOLACT-700 ('YOLACT Real-time Instance Segmentation' (https://arxiv.org/pdf/1904.02689v1.pdf)) 2019 ? ? 31.2
DetNASNet(3.8) ('DetNAS: Backbone Search for Object Detection' (https://arxiv.org/pdf/1903.10979v2.pdf)) 2019 ? ? 42.0
YOLOv4 ('YOLOv4: Optimal Speed and Accuracy of Object Detection' (https://arxiv.org/pdf/2004.10934.pdf)) 2020 ? ? 46.7
SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) 2020 ? ? 37.8
D-SOLO ('SOLO: Segmenting Objects by Locations' (https://arxiv.org/pdf/1912.04488v3.pdf)) 2020 ? ? 40.5
SNIPER ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) 2021 86.6 ? 47.9
AutoFocus ('Scale Normalized Image Pyramids with AutoFocus for Object Detection' (https://arxiv.org/pdf/2102.05646v1.pdf)) 2021 85.8 ? 47.9