1155 lines
53 KiB
HTML
1155 lines
53 KiB
HTML
<h1 id="awesome-deep-vision-awesome">Awesome Deep Vision <a
|
||
href="https://github.com/sindresorhus/awesome"><img
|
||
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
|
||
alt="Awesome" /></a></h1>
|
||
<p>A curated list of deep learning resources for computer vision,
|
||
inspired by <a
|
||
href="https://github.com/ziadoz/awesome-php">awesome-php</a> and <a
|
||
href="https://github.com/jbhuang0604/awesome-computer-vision">awesome-computer-vision</a>.</p>
|
||
<p>Maintainers - <a href="https://github.com/kjw0612">Jiwon Kim</a>, <a
|
||
href="https://github.com/hmyeong">Heesoo Myeong</a>, <a
|
||
href="https://github.com/myungsub">Myungsub Choi</a>, <a
|
||
href="https://github.com/deruci">Jung Kwon Lee</a>, <a
|
||
href="https://github.com/jazzsaxmafia">Taeksoo Kim</a></p>
|
||
<p>The project is not actively maintained.</p>
|
||
<h2 id="contributing">Contributing</h2>
|
||
<p>Please feel free to <a
|
||
href="https://github.com/kjw0612/awesome-deep-vision/pulls">pull
|
||
requests</a> to add papers.</p>
|
||
<p><a
|
||
href="https://gitter.im/kjw0612/awesome-deep-vision?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge"><img
|
||
src="https://badges.gitter.im/Join%20Chat.svg"
|
||
alt="Join the chat at https://gitter.im/kjw0612/awesome-deep-vision" /></a></p>
|
||
<h2 id="sharing">Sharing</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://twitter.com/home?status=http://jiwonkim.org/awesome-deep-vision%0ADeep%20Learning%20Resources%20for%20Computer%20Vision">Share
|
||
on Twitter</a></li>
|
||
<li><a
|
||
href="http://www.facebook.com/sharer/sharer.php?u=https://jiwonkim.org/awesome-deep-vision">Share
|
||
on Facebook</a></li>
|
||
<li><a
|
||
href="http://plus.google.com/share?url=https://jiwonkim.org/awesome-deep-vision">Share
|
||
on Google Plus</a></li>
|
||
<li><a
|
||
href="http://www.linkedin.com/shareArticle?mini=true&url=https://jiwonkim.org/awesome-deep-vision&title=Awesome%20Deep%20Vision&summary=&source=">Share
|
||
on LinkedIn</a></li>
|
||
</ul>
|
||
<h2 id="table-of-contents">Table of Contents</h2>
|
||
<ul>
|
||
<li><a href="#papers">Papers</a>
|
||
<ul>
|
||
<li><a href="#imagenet-classification">ImageNet Classification</a></li>
|
||
<li><a href="#object-detection">Object Detection</a></li>
|
||
<li><a href="#object-tracking">Object Tracking</a></li>
|
||
<li><a href="#low-level-vision">Low-Level Vision</a>
|
||
<ul>
|
||
<li><a href="#super-resolution">Super-Resolution</a></li>
|
||
<li><a href="#other-applications">Other Applications</a></li>
|
||
</ul></li>
|
||
<li><a href="#edge-detection">Edge Detection</a></li>
|
||
<li><a href="#semantic-segmentation">Semantic Segmentation</a></li>
|
||
<li><a href="#visual-attention-and-saliency">Visual Attention and
|
||
Saliency</a></li>
|
||
<li><a href="#object-recognition">Object Recognition</a></li>
|
||
<li><a href="#human-pose-estimation">Human Pose Estimation</a></li>
|
||
<li><a href="#understanding-cnn">Understanding CNN</a></li>
|
||
<li><a href="#image-and-language">Image and Language</a>
|
||
<ul>
|
||
<li><a href="#image-captioning">Image Captioning</a></li>
|
||
<li><a href="#video-captioning">Video Captioning</a></li>
|
||
<li><a href="#question-answering">Question Answering</a></li>
|
||
</ul></li>
|
||
<li><a href="#image-generation">Image Generation</a></li>
|
||
<li><a href="#other-topics">Other Topics</a></li>
|
||
</ul></li>
|
||
<li><a href="#courses">Courses</a></li>
|
||
<li><a href="#books">Books</a></li>
|
||
<li><a href="#videos">Videos</a></li>
|
||
<li><a href="#software">Software</a>
|
||
<ul>
|
||
<li><a href="#framework">Framework</a></li>
|
||
<li><a href="#applications">Applications</a></li>
|
||
</ul></li>
|
||
<li><a href="#tutorials">Tutorials</a></li>
|
||
<li><a href="#blogs">Blogs</a></li>
|
||
</ul>
|
||
<h2 id="papers">Papers</h2>
|
||
<h3 id="imagenet-classification">ImageNet Classification</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8451949/327b9566-2022-11e5-8b34-53b4a64c13ad.PNG"
|
||
alt="classification" /> (from Alex Krizhevsky, Ilya Sutskever, Geoffrey
|
||
E. Hinton, ImageNet Classification with Deep Convolutional Neural
|
||
Networks, NIPS, 2012.) * Microsoft (Deep Residual Learning) [<a
|
||
href="http://arxiv.org/pdf/1512.03385v1.pdf">Paper</a>][<a
|
||
href="http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf">Slide</a>]
|
||
* Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual
|
||
Learning for Image Recognition, arXiv:1512.03385. * Microsoft
|
||
(PReLu/Weight Initialization) <a
|
||
href="http://arxiv.org/pdf/1502.01852">[Paper]</a> * Kaiming He, Xiangyu
|
||
Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing
|
||
Human-Level Performance on ImageNet Classification, arXiv:1502.01852. *
|
||
Batch Normalization <a
|
||
href="http://arxiv.org/pdf/1502.03167">[Paper]</a> * Sergey Ioffe,
|
||
Christian Szegedy, Batch Normalization: Accelerating Deep Network
|
||
Training by Reducing Internal Covariate Shift, arXiv:1502.03167. *
|
||
GoogLeNet <a href="http://arxiv.org/pdf/1409.4842">[Paper]</a> *
|
||
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed,
|
||
Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich,
|
||
CVPR, 2015. * VGG-Net <a
|
||
href="http://www.robots.ox.ac.uk/~vgg/research/very_deep/">[Web]</a> <a
|
||
href="http://arxiv.org/pdf/1409.1556">[Paper]</a> * Karen Simonyan and
|
||
Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale
|
||
Visual Recognition, ICLR, 2015. * AlexNet <a
|
||
href="http://papers.nips.cc/book/advances-in-neural-information-processing-systems-25-2012">[Paper]</a>
|
||
* Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet
|
||
Classification with Deep Convolutional Neural Networks, NIPS, 2012.</p>
|
||
<h3 id="object-detection">Object Detection</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452063/f76ba500-2022-11e5-8db1-2cd5d490e3b3.PNG"
|
||
alt="object_detection" /> (from Shaoqing Ren, Kaiming He, Ross Girshick,
|
||
Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region
|
||
Proposal Networks, arXiv:1506.01497.)</p>
|
||
<ul>
|
||
<li>PVANET <a href="https://arxiv.org/pdf/1608.08021">[Paper]</a> <a
|
||
href="https://github.com/sanghoon/pva-faster-rcnn">[Code]</a>
|
||
<ul>
|
||
<li>Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje
|
||
Park, PVANET: Deep but Lightweight Neural Networks for Real-time Object
|
||
Detection, arXiv:1608.08021</li>
|
||
</ul></li>
|
||
<li>OverFeat, NYU <a
|
||
href="http://arxiv.org/pdf/1312.6229.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>OverFeat: Integrated Recognition, Localization and Detection using
|
||
Convolutional Networks, ICLR, 2014.</li>
|
||
</ul></li>
|
||
<li>R-CNN, UC Berkeley <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf">[Paper-CVPR14]</a>
|
||
<a href="http://arxiv.org/pdf/1311.2524">[Paper-arXiv14]</a>
|
||
<ul>
|
||
<li>Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich
|
||
feature hierarchies for accurate object detection and semantic
|
||
segmentation, CVPR, 2014.</li>
|
||
</ul></li>
|
||
<li>SPP, Microsoft Research <a
|
||
href="http://arxiv.org/pdf/1406.4729">[Paper]</a>
|
||
<ul>
|
||
<li>Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spatial Pyramid
|
||
Pooling in Deep Convolutional Networks for Visual Recognition, ECCV,
|
||
2014.</li>
|
||
</ul></li>
|
||
<li>Fast R-CNN, Microsoft Research <a
|
||
href="http://arxiv.org/pdf/1504.08083">[Paper]</a>
|
||
<ul>
|
||
<li>Ross Girshick, Fast R-CNN, arXiv:1504.08083.</li>
|
||
</ul></li>
|
||
<li>Faster R-CNN, Microsoft Research <a
|
||
href="http://arxiv.org/pdf/1506.01497">[Paper]</a>
|
||
<ul>
|
||
<li>Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN:
|
||
Towards Real-Time Object Detection with Region Proposal Networks,
|
||
arXiv:1506.01497.</li>
|
||
</ul></li>
|
||
<li>R-CNN minus R, Oxford <a
|
||
href="http://arxiv.org/pdf/1506.06981">[Paper]</a>
|
||
<ul>
|
||
<li>Karel Lenc, Andrea Vedaldi, R-CNN minus R, arXiv:1506.06981.</li>
|
||
</ul></li>
|
||
<li>End-to-end people detection in crowded scenes <a
|
||
href="http://arxiv.org/abs/1506.04878">[Paper]</a>
|
||
<ul>
|
||
<li>Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in
|
||
crowded scenes, arXiv:1506.04878.</li>
|
||
</ul></li>
|
||
<li>You Only Look Once: Unified, Real-Time Object Detection <a
|
||
href="http://arxiv.org/abs/1506.02640">[Paper]</a>, <a
|
||
href="https://arxiv.org/abs/1612.08242">[Paper Version 2]</a>, <a
|
||
href="https://github.com/pjreddie/darknet">[C Code]</a>, <a
|
||
href="https://github.com/thtrieu/darkflow">[Tensorflow Code]</a>
|
||
<ul>
|
||
<li>Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only
|
||
Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640</li>
|
||
<li>Joseph Redmon, Ali Farhadi (Version 2)</li>
|
||
</ul></li>
|
||
<li>Inside-Outside Net <a
|
||
href="http://arxiv.org/abs/1512.04143">[Paper]</a>
|
||
<ul>
|
||
<li>Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick,
|
||
Inside-Outside Net: Detecting Objects in Context with Skip Pooling and
|
||
Recurrent Neural Networks</li>
|
||
</ul></li>
|
||
<li>Deep Residual Network (Current State-of-the-Art) <a
|
||
href="http://arxiv.org/abs/1512.03385">[Paper]</a>
|
||
<ul>
|
||
<li>Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual
|
||
Learning for Image Recognition</li>
|
||
</ul></li>
|
||
<li>Weakly Supervised Object Localization with Multi-fold Multiple
|
||
Instance Learning [<a
|
||
href="http://arxiv.org/pdf/1503.00949.pdf">Paper</a>]</li>
|
||
<li>R-FCN <a href="https://arxiv.org/abs/1605.06409">[Paper]</a> <a
|
||
href="https://github.com/daijifeng001/R-FCN">[Code]</a>
|
||
<ul>
|
||
<li>Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: Object Detection via
|
||
Region-based Fully Convolutional Networks</li>
|
||
</ul></li>
|
||
<li>SSD <a href="https://arxiv.org/pdf/1512.02325v2.pdf">[Paper]</a> <a
|
||
href="https://github.com/weiliu89/caffe/tree/ssd">[Code]</a>
|
||
<ul>
|
||
<li>Wei Liu1, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott
|
||
Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox
|
||
Detector, arXiv:1512.02325</li>
|
||
</ul></li>
|
||
<li>Speed/accuracy trade-offs for modern convolutional object detectors
|
||
<a href="https://arxiv.org/pdf/1611.10012v1.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop
|
||
Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song,
|
||
Sergio Guadarrama, Kevin Murphy, Google Research, arXiv:1611.10012</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="video-classification">Video Classification</h3>
|
||
<ul>
|
||
<li>Nicolas Ballas, Li Yao, Pal Chris, Aaron Courville, “Delving Deeper
|
||
into Convolutional Networks for Learning Video Representations”, ICLR
|
||
2016. [<a href="http://arxiv.org/pdf/1511.06432v4.pdf">Paper</a>]</li>
|
||
<li>Michael Mathieu, camille couprie, Yann Lecun, “Deep Multi Scale
|
||
Video Prediction Beyond Mean Square Error”, ICLR 2016. [<a
|
||
href="http://arxiv.org/pdf/1511.05440v6.pdf">Paper</a>]</li>
|
||
</ul>
|
||
<h3 id="object-tracking">Object Tracking</h3>
|
||
<ul>
|
||
<li>Seunghoon Hong, Tackgeun You, Suha Kwak, Bohyung Han, Online
|
||
Tracking by Learning Discriminative Saliency Map with Convolutional
|
||
Neural Network, arXiv:1502.06796. <a
|
||
href="http://arxiv.org/pdf/1502.06796">[Paper]</a></li>
|
||
<li>Hanxi Li, Yi Li and Fatih Porikli, DeepTrack: Learning
|
||
Discriminative Feature Representations by Convolutional Neural Networks
|
||
for Visual Tracking, BMVC, 2014. <a
|
||
href="http://www.bmva.org/bmvc/2014/files/paper028.pdf">[Paper]</a></li>
|
||
<li>N Wang, DY Yeung, Learning a Deep Compact Image Representation for
|
||
Visual Tracking, NIPS, 2013. <a
|
||
href="http://winsty.net/papers/dlt.pdf">[Paper]</a></li>
|
||
<li>Chao Ma, Jia-Bin Huang, Xiaokang Yang and Ming-Hsuan Yang,
|
||
Hierarchical Convolutional Features for Visual Tracking, ICCV 2015 [<a
|
||
href="http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Ma_Hierarchical_Convolutional_Features_ICCV_2015_paper.pdf">Paper</a>]
|
||
[<a href="https://github.com/jbhuang0604/CF2">Code</a>]</li>
|
||
<li>Lijun Wang, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu, Visual
|
||
Tracking with fully Convolutional Networks, ICCV 2015 [<a
|
||
href="http://202.118.75.4/lu/Paper/ICCV2015/iccv15_lijun.pdf">Paper</a>]
|
||
[<a href="https://github.com/scott89/FCNT">Code</a>]</li>
|
||
<li>Hyeonseob Namand Bohyung Han, Learning Multi-Domain Convolutional
|
||
Neural Networks for Visual Tracking, [<a
|
||
href="http://arxiv.org/pdf/1510.07945.pdf">Paper</a>] [<a
|
||
href="https://github.com/HyeonseobNam/MDNet">Code</a>] [<a
|
||
href="http://cvlab.postech.ac.kr/research/mdnet/">Project Page</a>]</li>
|
||
</ul>
|
||
<h3 id="low-level-vision">Low-Level Vision</h3>
|
||
<h4 id="super-resolution">Super-Resolution</h4>
|
||
<ul>
|
||
<li>Iterative Image Reconstruction
|
||
<ul>
|
||
<li>Sven Behnke: Learning Iterative Image Reconstruction. IJCAI, 2001.
|
||
<a
|
||
href="http://www.ais.uni-bonn.de/behnke/papers/ijcai01.pdf">[Paper]</a></li>
|
||
<li>Sven Behnke: Learning Iterative Image Reconstruction in the Neural
|
||
Abstraction Pyramid. International Journal of Computational Intelligence
|
||
and Applications, vol. 1, no. 4, pp. 427-438, 2001. <a
|
||
href="http://www.ais.uni-bonn.de/behnke/papers/ijcia01.pdf">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Super-Resolution (SRCNN) <a
|
||
href="http://mmlab.ie.cuhk.edu.hk/projects/SRCNN.html">[Web]</a> <a
|
||
href="http://personal.ie.cuhk.edu.hk/~ccloy/files/eccv_2014_deepresolution.pdf">[Paper-ECCV14]</a>
|
||
<a href="http://arxiv.org/pdf/1501.00092.pdf">[Paper-arXiv15]</a>
|
||
<ul>
|
||
<li>Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Learning a Deep
|
||
Convolutional Network for Image Super-Resolution, ECCV, 2014.</li>
|
||
<li>Chao Dong, Chen Change Loy, Kaiming He, Xiaoou Tang, Image
|
||
Super-Resolution Using Deep Convolutional Networks,
|
||
arXiv:1501.00092.</li>
|
||
</ul></li>
|
||
<li>Very Deep Super-Resolution
|
||
<ul>
|
||
<li>Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Accurate Image
|
||
Super-Resolution Using Very Deep Convolutional Networks,
|
||
arXiv:1511.04587, 2015. <a
|
||
href="http://arxiv.org/abs/1511.04587">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Deeply-Recursive Convolutional Network
|
||
<ul>
|
||
<li>Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, Deeply-Recursive
|
||
Convolutional Network for Image Super-Resolution, arXiv:1511.04491,
|
||
2015. <a href="http://arxiv.org/abs/1511.04491">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Casade-Sparse-Coding-Network
|
||
<ul>
|
||
<li>Zhaowen Wang, Ding Liu, Wei Han, Jianchao Yang and Thomas S. Huang,
|
||
Deep Networks for Image Super-Resolution with Sparse Prior. ICCV, 2015.
|
||
<a
|
||
href="http://www.ifp.illinois.edu/~dingliu2/iccv15/iccv15.pdf">[Paper]</a>
|
||
<a href="http://www.ifp.illinois.edu/~dingliu2/iccv15/">[Code]</a></li>
|
||
</ul></li>
|
||
<li>Perceptual Losses for Super-Resolution
|
||
<ul>
|
||
<li>Justin Johnson, Alexandre Alahi, Li Fei-Fei, Perceptual Losses for
|
||
Real-Time Style Transfer and Super-Resolution, arXiv:1603.08155, 2016.
|
||
<a href="http://arxiv.org/abs/1603.08155">[Paper]</a> <a
|
||
href="http://cs.stanford.edu/people/jcjohns/papers/fast-style/fast-style-supp.pdf">[Supplementary]</a></li>
|
||
</ul></li>
|
||
<li>SRGAN
|
||
<ul>
|
||
<li>Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew
|
||
Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes
|
||
Totz, Zehan Wang, Wenzhe Shi, Photo-Realistic Single Image
|
||
Super-Resolution Using a Generative Adversarial Network,
|
||
arXiv:1609.04802v3, 2016. <a
|
||
href="https://arxiv.org/pdf/1609.04802v3.pdf">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Others
|
||
<ul>
|
||
<li>Osendorfer, Christian, Hubert Soyer, and Patrick van der Smagt,
|
||
Image Super-Resolution with Fast Approximate Convolutional Sparse
|
||
Coding, ICONIP, 2014. <a
|
||
href="http://brml.org/uploads/tx_sibibtex/281.pdf">[Paper
|
||
ICONIP-2014]</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="other-applications">Other Applications</h4>
|
||
<ul>
|
||
<li>Optical Flow (FlowNet) <a
|
||
href="http://arxiv.org/pdf/1504.06852">[Paper]</a>
|
||
<ul>
|
||
<li>Philipp Fischer, Alexey Dosovitskiy, Eddy Ilg, Philip Häusser, Caner
|
||
Hazırbaş, Vladimir Golkov, Patrick van der Smagt, Daniel Cremers, Thomas
|
||
Brox, FlowNet: Learning Optical Flow with Convolutional Networks,
|
||
arXiv:1504.06852.</li>
|
||
</ul></li>
|
||
<li>Compression Artifacts Reduction <a
|
||
href="http://arxiv.org/pdf/1504.06993">[Paper-arXiv15]</a>
|
||
<ul>
|
||
<li>Chao Dong, Yubin Deng, Chen Change Loy, Xiaoou Tang, Compression
|
||
Artifacts Reduction by a Deep Convolutional Network,
|
||
arXiv:1504.06993.</li>
|
||
</ul></li>
|
||
<li>Blur Removal
|
||
<ul>
|
||
<li>Christian J. Schuler, Michael Hirsch, Stefan Harmeling, Bernhard
|
||
Schölkopf, Learning to Deblur, arXiv:1406.7444 <a
|
||
href="http://arxiv.org/pdf/1406.7444.pdf">[Paper]</a></li>
|
||
<li>Jian Sun, Wenfei Cao, Zongben Xu, Jean Ponce, Learning a
|
||
Convolutional Neural Network for Non-uniform Motion Blur Removal, CVPR,
|
||
2015 <a href="http://arxiv.org/pdf/1503.00593">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Image Deconvolution <a href="http://lxu.me/projects/dcnn/">[Web]</a>
|
||
<a href="http://lxu.me/mypapers/dcnn_nips14.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Li Xu, Jimmy SJ. Ren, Ce Liu, Jiaya Jia, Deep Convolutional Neural
|
||
Network for Image Deconvolution, NIPS, 2014.</li>
|
||
</ul></li>
|
||
<li>Deep Edge-Aware Filter <a
|
||
href="http://jmlr.org/proceedings/papers/v37/xub15.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Li Xu, Jimmy SJ. Ren, Qiong Yan, Renjie Liao, Jiaya Jia, Deep
|
||
Edge-Aware Filters, ICML, 2015.</li>
|
||
</ul></li>
|
||
<li>Computing the Stereo Matching Cost with a Convolutional Neural
|
||
Network <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zbontar_Computing_the_Stereo_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Jure Žbontar, Yann LeCun, Computing the Stereo Matching Cost with a
|
||
Convolutional Neural Network, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>Colorful Image Colorization Richard Zhang, Phillip Isola, Alexei A.
|
||
Efros, ECCV, 2016 <a
|
||
href="http://arxiv.org/pdf/1603.08511.pdf">[Paper]</a>, <a
|
||
href="https://github.com/richzhang/colorization">[Code]</a></li>
|
||
<li>Ryan Dahl, <a href="http://tinyclouds.org/colorize/">[Blog]</a></li>
|
||
<li>Feature Learning by Inpainting<a
|
||
href="https://arxiv.org/pdf/1604.07379v1.pdf">[Paper]</a><a
|
||
href="https://github.com/pathak22/context-encoder">[Code]</a>
|
||
<ul>
|
||
<li>Deepak Pathak, Philipp Krahenbuhl, Jeff Donahue, Trevor Darrell,
|
||
Alexei A. Efros, Context Encoders: Feature Learning by Inpainting, CVPR,
|
||
2016</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="edge-detection">Edge Detection</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452371/93ca6f7e-2025-11e5-90f2-d428fd5ff7ac.PNG"
|
||
alt="edge_detection" /> (from Gedas Bertasius, Jianbo Shi, Lorenzo
|
||
Torresani, DeepEdge: A Multi-Scale Bifurcated Deep Network for Top-Down
|
||
Contour Detection, CVPR, 2015.)</p>
|
||
<ul>
|
||
<li>Holistically-Nested Edge Detection <a
|
||
href="http://arxiv.org/pdf/1504.06375">[Paper]</a> <a
|
||
href="https://github.com/s9xie/hed">[Code]</a>
|
||
<ul>
|
||
<li>Saining Xie, Zhuowen Tu, Holistically-Nested Edge Detection,
|
||
arXiv:1504.06375.</li>
|
||
</ul></li>
|
||
<li>DeepEdge <a href="http://arxiv.org/pdf/1412.1123">[Paper]</a>
|
||
<ul>
|
||
<li>Gedas Bertasius, Jianbo Shi, Lorenzo Torresani, DeepEdge: A
|
||
Multi-Scale Bifurcated Deep Network for Top-Down Contour Detection,
|
||
CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>DeepContour <a
|
||
href="http://mc.eistar.net/UpLoadFiles/Papers/DeepContour_cvpr15.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Wei Shen, Xinggang Wang, Yan Wang, Xiang Bai, Zhijiang Zhang,
|
||
DeepContour: A Deep Convolutional Feature Learned by Positive-Sharing
|
||
Loss for Contour Detection, CVPR, 2015.</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="semantic-segmentation">Semantic Segmentation</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452076/0ba8340c-2023-11e5-88bc-bebf4509b6bb.PNG"
|
||
alt="semantic_segmantation" /> (from Jifeng Dai, Kaiming He, Jian Sun,
|
||
BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks
|
||
for Semantic Segmentation, arXiv:1503.01640.) * PASCAL VOC2012 Challenge
|
||
Leaderboard (01 Sep. 2016) <img
|
||
src="https://cloud.githubusercontent.com/assets/3803777/18164608/c3678488-7038-11e6-9ec1-74a1542dce13.png"
|
||
alt="VOC2012_top_rankings" /> (from PASCAL VOC2012 <a
|
||
href="http://host.robots.ox.ac.uk:8080/leaderboard/displaylb.php?challengeid=11&compid=6">leaderboards</a>)
|
||
* SEC: Seed, Expand and Constrain * Alexander Kolesnikov, Christoph
|
||
Lampert, Seed, Expand and Constrain: Three Principles for
|
||
Weakly-Supervised Image Segmentation, ECCV, 2016. <a
|
||
href="http://pub.ist.ac.at/~akolesnikov/files/ECCV2016/main.pdf">[Paper]</a>
|
||
<a href="https://github.com/kolesman/SEC">[Code]</a> * Adelaide *
|
||
Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient
|
||
piecewise training of deep structured models for semantic segmentation,
|
||
arXiv:1504.01013. <a href="http://arxiv.org/pdf/1504.01013">[Paper]</a>
|
||
(1st ranked in VOC2012) * Guosheng Lin, Chunhua Shen, Ian Reid, Anton
|
||
van den Hengel, Deeply Learning the Messages in Message Passing
|
||
Inference, arXiv:1508.02108. <a
|
||
href="http://arxiv.org/pdf/1506.02108">[Paper]</a> (4th ranked in
|
||
VOC2012) * Deep Parsing Network (DPN) * Ziwei Liu, Xiaoxiao Li, Ping
|
||
Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep
|
||
Parsing Network, arXiv:1509.02634 / ICCV 2015 <a
|
||
href="http://arxiv.org/pdf/1509.02634.pdf">[Paper]</a> (2nd ranked in
|
||
VOC 2012) * CentraleSuperBoundaries, INRIA <a
|
||
href="http://arxiv.org/pdf/1511.07386">[Paper]</a> * Iasonas Kokkinos,
|
||
Surpassing Humans in Boundary Detection using Deep Learning,
|
||
arXiv:1411.07386 (4th ranked in VOC 2012) * BoxSup <a
|
||
href="http://arxiv.org/pdf/1503.01640">[Paper]</a> * Jifeng Dai, Kaiming
|
||
He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise
|
||
Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th
|
||
ranked in VOC2012) * POSTECH * Hyeonwoo Noh, Seunghoon Hong, Bohyung
|
||
Han, Learning Deconvolution Network for Semantic Segmentation,
|
||
arXiv:1505.04366. <a href="http://arxiv.org/pdf/1505.04366">[Paper]</a>
|
||
(7th ranked in VOC2012) * Seunghoon Hong, Hyeonwoo Noh, Bohyung Han,
|
||
Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation,
|
||
arXiv:1506.04924. <a href="http://arxiv.org/pdf/1506.04924">[Paper]</a>
|
||
* Seunghoon Hong,Junhyuk Oh, Bohyung Han, and Honglak Lee, Learning
|
||
Transferrable Knowledge for Semantic Segmentation with Deep
|
||
Convolutional Neural Network, arXiv:1512.07928 [<a
|
||
href="http://arxiv.org/pdf/1512.07928.pdf">Paper</a>] [<a
|
||
href="http://cvlab.postech.ac.kr/research/transfernet/">Project
|
||
Page</a>] * Conditional Random Fields as Recurrent Neural Networks <a
|
||
href="http://arxiv.org/pdf/1502.03240">[Paper]</a> * Shuai Zheng, Sadeep
|
||
Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su,
|
||
Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as
|
||
Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012) *
|
||
DeepLab * Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L.
|
||
Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image
|
||
segmentation, arXiv:1502.02734. <a
|
||
href="http://arxiv.org/pdf/1502.02734">[Paper]</a> (9th ranked in
|
||
VOC2012) * Zoom-out <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mostajabi_Feedforward_Semantic_Segmentation_2015_CVPR_paper.pdf">[Paper]</a>
|
||
* Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich,
|
||
Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015 *
|
||
Joint Calibration <a href="http://arxiv.org/pdf/1507.01581">[Paper]</a>
|
||
* Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration
|
||
for Semantic Segmentation, arXiv:1507.01581. * Fully Convolutional
|
||
Networks for Semantic Segmentation <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf">[Paper-CVPR15]</a>
|
||
<a href="http://arxiv.org/pdf/1411.4038">[Paper-arXiv15]</a> * Jonathan
|
||
Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for
|
||
Semantic Segmentation, CVPR, 2015. * Hypercolumn <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Hariharan_Hypercolumns_for_Object_2015_CVPR_paper.pdf">[Paper]</a>
|
||
* Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik,
|
||
Hypercolumns for Object Segmentation and Fine-Grained Localization,
|
||
CVPR, 2015. * Deep Hierarchical Parsing * Abhishek Sharma, Oncel Tuzel,
|
||
David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation,
|
||
CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Sharma_Deep_Hierarchical_Parsing_2015_CVPR_paper.pdf">[Paper]</a>
|
||
* Learning Hierarchical Features for Scene Labeling <a
|
||
href="http://yann.lecun.com/exdb/publis/pdf/farabet-icml-12.pdf">[Paper-ICML12]</a>
|
||
<a
|
||
href="http://yann.lecun.com/exdb/publis/pdf/farabet-pami-13.pdf">[Paper-PAMI13]</a>
|
||
* Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene
|
||
Parsing with Multiscale Feature Learning, Purity Trees, and Optimal
|
||
Covers, ICML, 2012. * Clement Farabet, Camille Couprie, Laurent Najman,
|
||
Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI,
|
||
2013. * University of Cambridge <a
|
||
href="http://mi.eng.cam.ac.uk/projects/segnet/">[Web]</a> * Vijay
|
||
Badrinarayanan, Alex Kendall and Roberto Cipolla “SegNet: A Deep
|
||
Convolutional Encoder-Decoder Architecture for Image Segmentation.”
|
||
arXiv preprint arXiv:1511.00561, 2015. <a
|
||
href="http://arxiv.org/abs/1511.00561">[Paper]</a> * Alex Kendall, Vijay
|
||
Badrinarayanan and Roberto Cipolla “Bayesian SegNet: Model Uncertainty
|
||
in Deep Convolutional Encoder-Decoder Architectures for Scene
|
||
Understanding.” arXiv preprint arXiv:1511.02680, 2015. <a
|
||
href="http://arxiv.org/abs/1511.00561">[Paper]</a> * Princeton * Fisher
|
||
Yu, Vladlen Koltun, “Multi-Scale Context Aggregation by Dilated
|
||
Convolutions”, ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1511.07122v2.pdf">Paper</a>] * Univ. of
|
||
Washington, Allen AI * Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar
|
||
Divvala, Yejin Choi, Ali Farhadi, “Segment-Phrase Table for Semantic
|
||
Segmentation, Visual Entailment and Paraphrasing”, ICCV, 2015, [<a
|
||
href="http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Izadinia_Segment-Phrase_Table_for_ICCV_2015_paper.pdf">Paper</a>]
|
||
* INRIA * Iasonas Kokkinos, “Pusing the Boundaries of Boundary Detection
|
||
Using deep Learning”, ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1511.07386v2.pdf">Paper</a>] * UCSB *
|
||
Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, “Weakly supervised
|
||
graph based semantic segmentation by learning communities of
|
||
image-parts”, ICCV, 2015, [<a
|
||
href="http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Pourian_Weakly_Supervised_Graph_ICCV_2015_paper.pdf">Paper</a>]</p>
|
||
<h3 id="visual-attention-and-saliency">Visual Attention and
|
||
Saliency</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8492362/7ec65b88-2183-11e5-978f-017e45ddba32.png"
|
||
alt="saliency" /> (from Nian Liu, Junwei Han, Dingwen Zhang, Shifeng
|
||
Wen, Tianming Liu, Predicting Eye Fixations using Convolutional Neural
|
||
Networks, CVPR, 2015.)</p>
|
||
<ul>
|
||
<li>Mr-CNN <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liu_Predicting_Eye_Fixations_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Nian Liu, Junwei Han, Dingwen Zhang, Shifeng Wen, Tianming Liu,
|
||
Predicting Eye Fixations using Convolutional Neural Networks, CVPR,
|
||
2015.</li>
|
||
</ul></li>
|
||
<li>Learning a Sequential Search for Landmarks <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Singh_Learning_a_Sequential_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Saurabh Singh, Derek Hoiem, David Forsyth, Learning a Sequential
|
||
Search for Landmarks, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>Multiple Object Recognition with Visual Attention <a
|
||
href="http://arxiv.org/pdf/1412.7755.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Jimmy Lei Ba, Volodymyr Mnih, Koray Kavukcuoglu, Multiple Object
|
||
Recognition with Visual Attention, ICLR, 2015.</li>
|
||
</ul></li>
|
||
<li>Recurrent Models of Visual Attention <a
|
||
href="http://papers.nips.cc/paper/5542-recurrent-models-of-visual-attention.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Volodymyr Mnih, Nicolas Heess, Alex Graves, Koray Kavukcuoglu,
|
||
Recurrent Models of Visual Attention, NIPS, 2014.</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="object-recognition">Object Recognition</h3>
|
||
<ul>
|
||
<li>Weakly-supervised learning with convolutional neural networks <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Oquab_Is_Object_Localization_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Maxime Oquab, Leon Bottou, Ivan Laptev, Josef Sivic, Is object
|
||
localization for free? – Weakly-supervised learning with convolutional
|
||
neural networks, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>FV-CNN <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Cimpoi_Deep_Filter_Banks_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Mircea Cimpoi, Subhransu Maji, Andrea Vedaldi, Deep Filter Banks for
|
||
Texture Recognition and Segmentation, CVPR, 2015.</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="human-pose-estimation">Human Pose Estimation</h3>
|
||
<ul>
|
||
<li>Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, Realtime
|
||
Multi-Person 2D Pose Estimation using Part Affinity Fields, CVPR,
|
||
2017.</li>
|
||
<li>Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres,
|
||
Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele, Deepcut: Joint
|
||
subset partition and labeling for multi person pose estimation, CVPR,
|
||
2016.</li>
|
||
<li>Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh,
|
||
Convolutional pose machines, CVPR, 2016.</li>
|
||
<li>Alejandro Newell, Kaiyu Yang, and Jia Deng, Stacked hourglass
|
||
networks for human pose estimation, ECCV, 2016.</li>
|
||
<li>Tomas Pfister, James Charles, and Andrew Zisserman, Flowing convnets
|
||
for human pose estimation in videos, ICCV, 2015.</li>
|
||
<li>Jonathan J. Tompson, Arjun Jain, Yann LeCun, Christoph Bregler,
|
||
Joint training of a convolutional network and a graphical model for
|
||
human pose estimation, NIPS, 2014.</li>
|
||
</ul>
|
||
<h3 id="understanding-cnn">Understanding CNN</h3>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452083/1aaa0066-2023-11e5-800b-2248ead51584.PNG"
|
||
alt="understanding" /> (from Aravindh Mahendran, Andrea Vedaldi,
|
||
Understanding Deep Image Representations by Inverting Them, CVPR,
|
||
2015.)</p>
|
||
<ul>
|
||
<li>Karel Lenc, Andrea Vedaldi, Understanding image representations by
|
||
measuring their equivariance and equivalence, CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Lenc_Understanding_Image_Representations_2015_CVPR_paper.pdf">[Paper]</a></li>
|
||
<li>Anh Nguyen, Jason Yosinski, Jeff Clune, Deep Neural Networks are
|
||
Easily Fooled:High Confidence Predictions for Unrecognizable Images,
|
||
CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.pdf">[Paper]</a></li>
|
||
<li>Aravindh Mahendran, Andrea Vedaldi, Understanding Deep Image
|
||
Representations by Inverting Them, CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Mahendran_Understanding_Deep_Image_2015_CVPR_paper.pdf">[Paper]</a></li>
|
||
<li>Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, Antonio
|
||
Torralba, Object Detectors Emerge in Deep Scene CNNs, ICLR, 2015. <a
|
||
href="http://arxiv.org/abs/1412.6856">[arXiv Paper]</a></li>
|
||
<li>Alexey Dosovitskiy, Thomas Brox, Inverting Visual Representations
|
||
with Convolutional Networks, arXiv, 2015. <a
|
||
href="http://arxiv.org/abs/1506.02753">[Paper]</a></li>
|
||
<li>Matthrew Zeiler, Rob Fergus, Visualizing and Understanding
|
||
Convolutional Networks, ECCV, 2014. <a
|
||
href="https://www.cs.nyu.edu/~fergus/papers/zeilerECCV2014.pdf">[Paper]</a></li>
|
||
</ul>
|
||
<h3 id="image-and-language">Image and Language</h3>
|
||
<h4 id="image-captioning">Image Captioning</h4>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452051/e8f81030-2022-11e5-85db-c68e7d8251ce.PNG"
|
||
alt="image_captioning" /> (from Andrej Karpathy, Li Fei-Fei, Deep
|
||
Visual-Semantic Alignments for Generating Image Description, CVPR,
|
||
2015.)</p>
|
||
<ul>
|
||
<li>UCLA / Baidu <a href="http://arxiv.org/pdf/1410.1090">[Paper]</a>
|
||
<ul>
|
||
<li>Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Alan L. Yuille, Explain
|
||
Images with Multimodal Recurrent Neural Networks, arXiv:1410.1090.</li>
|
||
</ul></li>
|
||
<li>Toronto <a href="http://arxiv.org/pdf/1411.2539">[Paper]</a>
|
||
<ul>
|
||
<li>Ryan Kiros, Ruslan Salakhutdinov, Richard S. Zemel, Unifying
|
||
Visual-Semantic Embeddings with Multimodal Neural Language Models,
|
||
arXiv:1411.2539.</li>
|
||
</ul></li>
|
||
<li>Berkeley <a href="http://arxiv.org/pdf/1411.4389">[Paper]</a>
|
||
<ul>
|
||
<li>Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus
|
||
Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term
|
||
Recurrent Convolutional Networks for Visual Recognition and Description,
|
||
arXiv:1411.4389.</li>
|
||
</ul></li>
|
||
<li>Google <a href="http://arxiv.org/pdf/1411.4555">[Paper]</a>
|
||
<ul>
|
||
<li>Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan, Show
|
||
and Tell: A Neural Image Caption Generator, arXiv:1411.4555.</li>
|
||
</ul></li>
|
||
<li>Stanford <a
|
||
href="http://cs.stanford.edu/people/karpathy/deepimagesent/">[Web]</a>
|
||
<a
|
||
href="http://cs.stanford.edu/people/karpathy/cvpr2015.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Andrej Karpathy, Li Fei-Fei, Deep Visual-Semantic Alignments for
|
||
Generating Image Description, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>UML / UT <a href="http://arxiv.org/pdf/1412.4729">[Paper]</a>
|
||
<ul>
|
||
<li>Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach,
|
||
Raymond Mooney, Kate Saenko, Translating Videos to Natural Language
|
||
Using Deep Recurrent Neural Networks, NAACL-HLT, 2015.</li>
|
||
</ul></li>
|
||
<li>CMU / Microsoft <a
|
||
href="http://arxiv.org/pdf/1411.5654">[Paper-arXiv]</a> <a
|
||
href="http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf">[Paper-CVPR]</a>
|
||
<ul>
|
||
<li>Xinlei Chen, C. Lawrence Zitnick, Learning a Recurrent Visual
|
||
Representation for Image Caption Generation, arXiv:1411.5654.</li>
|
||
<li>Xinlei Chen, C. Lawrence Zitnick, Mind’s Eye: A Recurrent Visual
|
||
Representation for Image Caption Generation, CVPR 2015</li>
|
||
</ul></li>
|
||
<li>Microsoft <a href="http://arxiv.org/pdf/1411.4952">[Paper]</a>
|
||
<ul>
|
||
<li>Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li
|
||
Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John
|
||
C. Platt, C. Lawrence Zitnick, Geoffrey Zweig, From Captions to Visual
|
||
Concepts and Back, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal / Univ. Toronto [<a
|
||
href="http://kelvinxu.github.io/projects/capgen.html">Web</a>] [<a
|
||
href="http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville,
|
||
Ruslan Salakhutdinov, Richard S. Zemel, Yoshua Bengio, Show, Attend, and
|
||
Tell: Neural Image Caption Generation with Visual Attention,
|
||
arXiv:1502.03044 / ICML 2015</li>
|
||
</ul></li>
|
||
<li>Idiap / EPFL / Facebook [<a
|
||
href="http://arxiv.org/pdf/1502.03671">Paper</a>]
|
||
<ul>
|
||
<li>Remi Lebret, Pedro O. Pinheiro, Ronan Collobert, Phrase-based Image
|
||
Captioning, arXiv:1502.03671 / ICML 2015</li>
|
||
</ul></li>
|
||
<li>UCLA / Baidu [<a href="http://arxiv.org/pdf/1504.06692">Paper</a>]
|
||
<ul>
|
||
<li>Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, Alan L.
|
||
Yuille, Learning like a Child: Fast Novel Visual Concept Learning from
|
||
Sentence Descriptions of Images, arXiv:1504.06692</li>
|
||
</ul></li>
|
||
<li>MS + Berkeley
|
||
<ul>
|
||
<li>Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, C.
|
||
Lawrence Zitnick, Exploring Nearest Neighbor Approaches for Image
|
||
Captioning, arXiv:1505.04467 [<a
|
||
href="http://arxiv.org/pdf/1505.04467.pdf">Paper</a>]</li>
|
||
<li>Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong
|
||
He, Geoffrey Zweig, Margaret Mitchell, Language Models for Image
|
||
Captioning: The Quirks and What Works, arXiv:1505.01809 [<a
|
||
href="http://arxiv.org/pdf/1505.01809.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Adelaide [<a href="http://arxiv.org/pdf/1506.01144.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, Anthony
|
||
Dick, Image Captioning with an Intermediate Attributes Layer,
|
||
arXiv:1506.01144</li>
|
||
</ul></li>
|
||
<li>Tilburg [<a href="http://arxiv.org/pdf/1506.03694.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Grzegorz Chrupala, Akos Kadar, Afra Alishahi, Learning language
|
||
through pictures, arXiv:1506.03694</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal [<a
|
||
href="http://arxiv.org/pdf/1507.01053.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia
|
||
Content using Attention-based Encoder-Decoder Networks,
|
||
arXiv:1507.01053</li>
|
||
</ul></li>
|
||
<li>Cornell [<a href="http://arxiv.org/pdf/1508.02091.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Jack Hessel, Nicolas Savva, Michael J. Wilber, Image Representations
|
||
and New Domains in Neural Image Captioning, arXiv:1508.02091</li>
|
||
</ul></li>
|
||
<li>MS + City Univ. of HongKong [<a
|
||
href="http://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Yao_Learning_Query_and_ICCV_2015_paper.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Ting Yao, Tao Mei, and Chong-Wah Ngo, “Learning Query and Image
|
||
Similarities with Ranking Canonical Correlation Analysis”, ICCV,
|
||
2015</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="video-captioning">Video Captioning</h4>
|
||
<ul>
|
||
<li>Berkeley <a href="http://jeffdonahue.com/lrcn/">[Web]</a> <a
|
||
href="http://arxiv.org/pdf/1411.4389.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus
|
||
Rohrbach, Subhashini Venugopalan, Kate Saenko, Trevor Darrell, Long-term
|
||
Recurrent Convolutional Networks for Visual Recognition and Description,
|
||
CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>UT / UML / Berkeley <a
|
||
href="http://arxiv.org/pdf/1412.4729">[Paper]</a>
|
||
<ul>
|
||
<li>Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach,
|
||
Raymond Mooney, Kate Saenko, Translating Videos to Natural Language
|
||
Using Deep Recurrent Neural Networks, arXiv:1412.4729.</li>
|
||
</ul></li>
|
||
<li>Microsoft <a href="http://arxiv.org/pdf/1505.01861">[Paper]</a>
|
||
<ul>
|
||
<li>Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, Yong Rui, Joint
|
||
Modeling Embedding and Translation to Bridge Video and Language,
|
||
arXiv:1505.01861.</li>
|
||
</ul></li>
|
||
<li>UT / Berkeley / UML <a
|
||
href="http://arxiv.org/pdf/1505.00487">[Paper]</a>
|
||
<ul>
|
||
<li>Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond
|
||
Mooney, Trevor Darrell, Kate Saenko, Sequence to Sequence–Video to Text,
|
||
arXiv:1505.00487.</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal / Univ. Sherbrooke [<a
|
||
href="http://arxiv.org/pdf/1502.08029.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher
|
||
Pal, Hugo Larochelle, Aaron Courville, Describing Videos by Exploiting
|
||
Temporal Structure, arXiv:1502.08029</li>
|
||
</ul></li>
|
||
<li>MPI / Berkeley [<a
|
||
href="http://arxiv.org/pdf/1506.01698.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Anna Rohrbach, Marcus Rohrbach, Bernt Schiele, The Long-Short Story
|
||
of Movie Description, arXiv:1506.01698</li>
|
||
</ul></li>
|
||
<li>Univ. Toronto / MIT [<a
|
||
href="http://arxiv.org/pdf/1506.06724.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel
|
||
Urtasun, Antonio Torralba, Sanja Fidler, Aligning Books and Movies:
|
||
Towards Story-like Visual Explanations by Watching Movies and Reading
|
||
Books, arXiv:1506.06724</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal [<a
|
||
href="http://arxiv.org/pdf/1507.01053.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kyunghyun Cho, Aaron Courville, Yoshua Bengio, Describing Multimedia
|
||
Content using Attention-based Encoder-Decoder Networks,
|
||
arXiv:1507.01053</li>
|
||
</ul></li>
|
||
<li>TAU / USC [<a href="https://arxiv.org/pdf/1612.06950.pdf">paper</a>]
|
||
<ul>
|
||
<li>Dotan Kaufman, Gil Levi, Tal Hassner, Lior Wolf, Temporal
|
||
Tessellation for Video Annotation and Summarization,
|
||
arXiv:1612.06950.</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="question-answering">Question Answering</h4>
|
||
<p><img
|
||
src="https://cloud.githubusercontent.com/assets/5226447/8452068/ffe7b1f6-2022-11e5-87ab-4f6d4696c220.PNG"
|
||
alt="question_answering" /> (from Stanislaw Antol, Aishwarya Agrawal,
|
||
Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, Devi
|
||
Parikh, VQA: Visual Question Answering, CVPR, 2015 SUNw:Scene
|
||
Understanding workshop)</p>
|
||
<ul>
|
||
<li>Virginia Tech / MSR <a href="http://www.visualqa.org/">[Web]</a> <a
|
||
href="http://arxiv.org/pdf/1505.00468">[Paper]</a>
|
||
<ul>
|
||
<li>Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell,
|
||
Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, VQA: Visual Question
|
||
Answering, CVPR, 2015 SUNw:Scene Understanding workshop.</li>
|
||
</ul></li>
|
||
<li>MPI / Berkeley <a
|
||
href="https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/visual-turing-challenge/">[Web]</a>
|
||
<a href="http://arxiv.org/pdf/1505.01121">[Paper]</a>
|
||
<ul>
|
||
<li>Mateusz Malinowski, Marcus Rohrbach, Mario Fritz, Ask Your Neurons:
|
||
A Neural-based Approach to Answering Questions about Images,
|
||
arXiv:1505.01121.</li>
|
||
</ul></li>
|
||
<li>Toronto <a href="http://arxiv.org/pdf/1505.02074">[Paper]</a> <a
|
||
href="http://www.cs.toronto.edu/~mren/imageqa/data/cocoqa/">[Dataset]</a>
|
||
<ul>
|
||
<li>Mengye Ren, Ryan Kiros, Richard Zemel, Image Question Answering: A
|
||
Visual Semantic Embedding Model and a New Dataset, arXiv:1505.02074 /
|
||
ICML 2015 deep learning workshop.</li>
|
||
</ul></li>
|
||
<li>Baidu / UCLA <a href="http://arxiv.org/pdf/1505.05612">[Paper]</a>
|
||
<a href="">[Dataset]</a>
|
||
<ul>
|
||
<li>Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, Wei Xu,
|
||
Are You Talking to a Machine? Dataset and Methods for Multilingual Image
|
||
Question Answering, arXiv:1505.05612.</li>
|
||
</ul></li>
|
||
<li>POSTECH [<a href="http://arxiv.org/pdf/1511.05756.pdf">Paper</a>]
|
||
[<a href="http://cvlab.postech.ac.kr/research/dppnet/">Project Page</a>]
|
||
<ul>
|
||
<li>Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han, Image Question
|
||
Answering using Convolutional Neural Network with Dynamic Parameter
|
||
Prediction, arXiv:1511.05765</li>
|
||
</ul></li>
|
||
<li>CMU / Microsoft Research [<a
|
||
href="http://arxiv.org/pdf/1511.02274v2.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Yang, Z., He, X., Gao, J., Deng, L., & Smola, A. (2015). Stacked
|
||
Attention Networks for Image Question Answering. arXiv:1511.02274.</li>
|
||
</ul></li>
|
||
<li>MetaMind [<a href="http://arxiv.org/pdf/1603.01417v1.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Xiong, Caiming, Stephen Merity, and Richard Socher. “Dynamic Memory
|
||
Networks for Visual and Textual Question Answering.” arXiv:1603.01417
|
||
(2016).</li>
|
||
</ul></li>
|
||
<li>SNU + NAVER [<a href="http://arxiv.org/abs/1606.01455">Paper</a>]
|
||
<ul>
|
||
<li>Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim,
|
||
Jung-Woo Ha, Byoung-Tak Zhang, <em>Multimodal Residual Learning for
|
||
Visual QA</em>, arXiv:1606:01455</li>
|
||
</ul></li>
|
||
<li>UC Berkeley + Sony [<a
|
||
href="https://arxiv.org/pdf/1606.01847">Paper</a>]
|
||
<ul>
|
||
<li>Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor
|
||
Darrell, and Marcus Rohrbach, <em>Multimodal Compact Bilinear Pooling
|
||
for Visual Question Answering and Visual Grounding</em>,
|
||
arXiv:1606.01847</li>
|
||
</ul></li>
|
||
<li>Postech [<a href="http://arxiv.org/pdf/1606.03647.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Hyeonwoo Noh and Bohyung Han, <em>Training Recurrent Answering Units
|
||
with Joint Loss Minimization for VQA</em>, arXiv:1606.03647</li>
|
||
</ul></li>
|
||
<li>SNU + NAVER [<a href="http://arxiv.org/abs/1610.04325">Paper</a>]
|
||
<ul>
|
||
<li>Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak
|
||
Zhang, <em>Hadamard Product for Low-rank Bilinear Pooling</em>,
|
||
arXiv:1610.04325.</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="image-generation">Image Generation</h3>
|
||
<ul>
|
||
<li>Convolutional / Recurrent Networks
|
||
<ul>
|
||
<li>Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt,
|
||
Alex Graves, Koray Kavukcuoglu. “Conditional Image Generation with
|
||
PixelCNN Decoders”<a
|
||
href="https://arxiv.org/pdf/1606.05328v2.pdf">[Paper]</a><a
|
||
href="https://github.com/kundan2510/pixelCNN">[Code]</a></li>
|
||
<li>Alexey Dosovitskiy, Jost Tobias Springenberg, Thomas Brox, “Learning
|
||
to Generate Chairs with Convolutional Neural Networks”, CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Dosovitskiy_Learning_to_Generate_2015_CVPR_paper.pdf">[Paper]</a></li>
|
||
<li>Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende,
|
||
Daan Wierstra, “DRAW: A Recurrent Neural Network For Image Generation”,
|
||
ICML, 2015. [<a
|
||
href="https://arxiv.org/pdf/1502.04623v2.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Adversarial Networks
|
||
<ul>
|
||
<li>Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David
|
||
Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, Generative
|
||
Adversarial Networks, NIPS, 2014. <a
|
||
href="http://arxiv.org/abs/1406.2661">[Paper]</a></li>
|
||
<li>Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus, Deep
|
||
Generative Image Models using a Laplacian Pyramid of Adversarial
|
||
Networks, NIPS, 2015. <a
|
||
href="http://arxiv.org/abs/1506.05751">[Paper]</a></li>
|
||
<li>Lucas Theis, Aäron van den Oord, Matthias Bethge, “A note on the
|
||
evaluation of generative models”, ICLR 2016. [<a
|
||
href="http://arxiv.org/abs/1511.01844">Paper</a>]</li>
|
||
<li>Zhenwen Dai, Andreas Damianou, Javier Gonzalez, Neil Lawrence,
|
||
“Variationally Auto-Encoded Deep Gaussian Processes”, ICLR 2016. [<a
|
||
href="http://arxiv.org/pdf/1511.06455v2.pdf">Paper</a>]</li>
|
||
<li>Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov,
|
||
“Generating Images from Captions with Attention”, ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1511.02793v2.pdf">Paper</a>]</li>
|
||
<li>Jost Tobias Springenberg, “Unsupervised and Semi-supervised Learning
|
||
with Categorical Generative Adversarial Networks”, ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1511.06390v1.pdf">Paper</a>]</li>
|
||
<li>Harrison Edwards, Amos Storkey, “Censoring Representations with an
|
||
Adversary”, ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1511.05897v3.pdf">Paper</a>]</li>
|
||
<li>Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, Ken Nakae, Shin
|
||
Ishii, “Distributional Smoothing with Virtual Adversarial Training”,
|
||
ICLR 2016, [<a
|
||
href="http://arxiv.org/pdf/1507.00677v8.pdf">Paper</a>]</li>
|
||
<li>Jun-Yan Zhu, Philipp Krahenbuhl, Eli Shechtman, and Alexei A. Efros,
|
||
“Generative Visual Manipulation on the Natural Image Manifold”, ECCV
|
||
2016. [<a href="https://arxiv.org/pdf/1609.03552v2.pdf">Paper</a>] [<a
|
||
href="https://github.com/junyanz/iGAN">Code</a>] [<a
|
||
href="https://youtu.be/9c4z6YsBGQ0">Video</a>]</li>
|
||
</ul></li>
|
||
<li>Mixing Convolutional and Adversarial Networks
|
||
<ul>
|
||
<li>Alec Radford, Luke Metz, Soumith Chintala, “Unsupervised
|
||
Representation Learning with Deep Convolutional Generative Adversarial
|
||
Networks”, ICLR 2016. [<a
|
||
href="http://arxiv.org/pdf/1511.06434.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="other-topics">Other Topics</h3>
|
||
<ul>
|
||
<li>Visual Analogy [<a
|
||
href="https://web.eecs.umich.edu/~honglak/nips2015-analogy.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Scott Reed, Yi Zhang, Yuting Zhang, Honglak Lee, Deep Visual Analogy
|
||
Making, NIPS, 2015</li>
|
||
</ul></li>
|
||
<li>Surface Normal Estimation <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Designing_Deep_Networks_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Xiaolong Wang, David F. Fouhey, Abhinav Gupta, Designing Deep
|
||
Networks for Surface Normal Estimation, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>Action Detection <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Gkioxari_Finding_Action_Tubes_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Georgia Gkioxari, Jitendra Malik, Finding Action Tubes, CVPR,
|
||
2015.</li>
|
||
</ul></li>
|
||
<li>Crowd Counting <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhang_Cross-Scene_Crowd_Counting_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Cong Zhang, Hongsheng Li, Xiaogang Wang, Xiaokang Yang, Cross-scene
|
||
Crowd Counting via Deep Convolutional Neural Networks, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>3D Shape Retrieval <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Wang_Sketch-Based_3D_Shape_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Fang Wang, Le Kang, Yi Li, Sketch-based 3D Shape Retrieval using
|
||
Convolutional Neural Networks, CVPR, 2015.</li>
|
||
</ul></li>
|
||
<li>Weakly-supervised Classification
|
||
<ul>
|
||
<li>Samaneh Azadi, Jiashi Feng, Stefanie Jegelka, Trevor Darrell,
|
||
“Auxiliary Image Regularization for Deep CNNs with Noisy Labels”, ICLR
|
||
2016, [<a href="http://arxiv.org/pdf/1511.07069v2.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Artistic Style <a href="http://arxiv.org/abs/1508.06576">[Paper]</a>
|
||
<a href="https://github.com/jcjohnson/neural-style">[Code]</a>
|
||
<ul>
|
||
<li>Leon A. Gatys, Alexander S. Ecker, Matthias Bethge, A Neural
|
||
Algorithm of Artistic Style.</li>
|
||
</ul></li>
|
||
<li>Human Gaze Estimation
|
||
<ul>
|
||
<li>Xucong Zhang, Yusuke Sugano, Mario Fritz, Andreas Bulling,
|
||
Appearance-Based Gaze Estimation in the Wild, CVPR, 2015. <a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zhang_Appearance-Based_Gaze_Estimation_2015_CVPR_paper.pdf">[Paper]</a>
|
||
<a
|
||
href="https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/gaze-based-human-computer-interaction/appearance-based-gaze-estimation-in-the-wild-mpiigaze/">[Website]</a></li>
|
||
</ul></li>
|
||
<li>Face Recognition
|
||
<ul>
|
||
<li>Yaniv Taigman, Ming Yang, Marc’Aurelio Ranzato, Lior Wolf, DeepFace:
|
||
Closing the Gap to Human-Level Performance in Face Verification, CVPR,
|
||
2014. <a
|
||
href="https://www.cs.toronto.edu/~ranzato/publications/taigman_cvpr14.pdf">[Paper]</a></li>
|
||
<li>Yi Sun, Ding Liang, Xiaogang Wang, Xiaoou Tang, DeepID3: Face
|
||
Recognition with Very Deep Neural Networks, 2015. <a
|
||
href="http://arxiv.org/abs/1502.00873">[Paper]</a></li>
|
||
<li>Florian Schroff, Dmitry Kalenichenko, James Philbin, FaceNet: A
|
||
Unified Embedding for Face Recognition and Clustering, CVPR, 2015. <a
|
||
href="http://arxiv.org/abs/1503.03832">[Paper]</a></li>
|
||
</ul></li>
|
||
<li>Facial Landmark Detection
|
||
<ul>
|
||
<li>Yue Wu, Tal Hassner, KangGeon Kim, Gerard Medioni, Prem Natarajan,
|
||
Facial Landmark Detection with Tweaked Convolutional Neural Networks,
|
||
2015. <a href="http://arxiv.org/abs/1511.04031">[Paper]</a> <a
|
||
href="http://www.openu.ac.il/home/hassner/projects/tcnn_landmarks/">[Project]</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="courses">Courses</h2>
|
||
<ul>
|
||
<li>Deep Vision
|
||
<ul>
|
||
<li>[Stanford] <a href="http://cs231n.stanford.edu/">CS231n:
|
||
Convolutional Neural Networks for Visual Recognition</a></li>
|
||
<li>[CUHK] <a
|
||
href="https://piazza.com/cuhk.edu.hk/spring2015/eleg5040/home">ELEG
|
||
5040: Advanced Topics in Signal Processing(Introduction to Deep
|
||
Learning)</a></li>
|
||
</ul></li>
|
||
<li>More Deep Learning
|
||
<ul>
|
||
<li>[Stanford] <a href="http://cs224d.stanford.edu/">CS224d: Deep
|
||
Learning for Natural Language Processing</a></li>
|
||
<li>[Oxford] <a
|
||
href="https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/">Deep
|
||
Learning by Prof. Nando de Freitas</a></li>
|
||
<li>[NYU] <a
|
||
href="http://cilvr.cs.nyu.edu/doku.php?id=courses:deeplearning2014:start">Deep
|
||
Learning by Prof. Yann LeCun</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="books">Books</h2>
|
||
<ul>
|
||
<li>Free Online Books
|
||
<ul>
|
||
<li><a href="http://www.iro.umontreal.ca/~bengioy/dlbook/">Deep Learning
|
||
by Ian Goodfellow, Yoshua Bengio, and Aaron Courville</a></li>
|
||
<li><a href="http://neuralnetworksanddeeplearning.com/">Neural Networks
|
||
and Deep Learning by Michael Nielsen</a></li>
|
||
<li><a href="http://deeplearning.net/tutorial/deeplearning.pdf">Deep
|
||
Learning Tutorial by LISA lab, University of Montreal</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="videos">Videos</h2>
|
||
<ul>
|
||
<li>Talks
|
||
<ul>
|
||
<li><a href="https://www.youtube.com/watch?v=n1ViNeWhC24">Deep Learning,
|
||
Self-Taught Learning and Unsupervised Feature Learning By Andrew
|
||
Ng</a></li>
|
||
<li><a href="https://www.youtube.com/watch?v=vShMxxqtDDs">Recent
|
||
Developments in Deep Learning By Geoff Hinton</a></li>
|
||
<li><a href="https://www.youtube.com/watch?v=sc-KbuZqGkI">The
|
||
Unreasonable Effectiveness of Deep Learning by Yann LeCun</a></li>
|
||
<li><a href="https://www.youtube.com/watch?v=4xsVFLnHC_0">Deep Learning
|
||
of Representations by Yoshua bengio</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="software">Software</h2>
|
||
<h3 id="framework">Framework</h3>
|
||
<ul>
|
||
<li>Tensorflow: An open source software library for numerical
|
||
computation using data flow graph by Google [<a
|
||
href="https://www.tensorflow.org/">Web</a>]</li>
|
||
<li>Torch7: Deep learning library in Lua, used by Facebook and Google
|
||
Deepmind [<a href="http://torch.ch/">Web</a>]
|
||
<ul>
|
||
<li>Torch-based deep learning libraries: [<a
|
||
href="https://github.com/torchnet/torchnet">torchnet</a>],</li>
|
||
</ul></li>
|
||
<li>Caffe: Deep learning framework by the BVLC [<a
|
||
href="http://caffe.berkeleyvision.org/">Web</a>]</li>
|
||
<li>Theano: Mathematical library in Python, maintained by LISA lab [<a
|
||
href="http://deeplearning.net/software/theano/">Web</a>]
|
||
<ul>
|
||
<li>Theano-based deep learning libraries: [<a
|
||
href="http://deeplearning.net/software/pylearn2/">Pylearn2</a>], [<a
|
||
href="https://github.com/mila-udem/blocks">Blocks</a>], [<a
|
||
href="http://keras.io/">Keras</a>], [<a
|
||
href="https://github.com/Lasagne/Lasagne">Lasagne</a>]</li>
|
||
</ul></li>
|
||
<li>MatConvNet: CNNs for MATLAB [<a
|
||
href="http://www.vlfeat.org/matconvnet/">Web</a>]</li>
|
||
<li>MXNet: A flexible and efficient deep learning library for
|
||
heterogeneous distributed systems with multi-language support [<a
|
||
href="http://mxnet.io/">Web</a>]</li>
|
||
<li>Deepgaze: A computer vision library for human-computer interaction
|
||
based on CNNs [<a
|
||
href="https://github.com/mpatacchiola/deepgaze">Web</a>]</li>
|
||
</ul>
|
||
<h3 id="applications">Applications</h3>
|
||
<ul>
|
||
<li>Adversarial Training
|
||
<ul>
|
||
<li>Code and hyperparameters for the paper “Generative Adversarial
|
||
Networks” <a
|
||
href="https://github.com/goodfeli/adversarial">[Web]</a></li>
|
||
</ul></li>
|
||
<li>Understanding and Visualizing
|
||
<ul>
|
||
<li>Source code for “Understanding Deep Image Representations by
|
||
Inverting Them,” CVPR, 2015. <a
|
||
href="https://github.com/aravindhm/deep-goggle">[Web]</a></li>
|
||
</ul></li>
|
||
<li>Semantic Segmentation
|
||
<ul>
|
||
<li>Source code for the paper “Rich feature hierarchies for accurate
|
||
object detection and semantic segmentation,” CVPR, 2014. <a
|
||
href="https://github.com/rbgirshick/rcnn">[Web]</a></li>
|
||
<li>Source code for the paper “Fully Convolutional Networks for Semantic
|
||
Segmentation,” CVPR, 2015. <a
|
||
href="https://github.com/longjon/caffe/tree/future">[Web]</a></li>
|
||
</ul></li>
|
||
<li>Super-Resolution
|
||
<ul>
|
||
<li>Image Super-Resolution for Anime-Style-Art <a
|
||
href="https://github.com/nagadomi/waifu2x">[Web]</a></li>
|
||
</ul></li>
|
||
<li>Edge Detection
|
||
<ul>
|
||
<li>Source code for the paper “DeepContour: A Deep Convolutional Feature
|
||
Learned by Positive-Sharing Loss for Contour Detection,” CVPR, 2015. <a
|
||
href="https://github.com/shenwei1231/DeepContour">[Web]</a></li>
|
||
<li>Source code for the paper “Holistically-Nested Edge Detection”, ICCV
|
||
2015. <a href="https://github.com/s9xie/hed">[Web]</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="tutorials">Tutorials</h2>
|
||
<ul>
|
||
<li>[CVPR 2014] <a
|
||
href="https://sites.google.com/site/deeplearningcvpr2014/">Tutorial on
|
||
Deep Learning in Computer Vision</a></li>
|
||
<li>[CVPR 2015] <a href="https://github.com/soumith/cvpr2015">Applied
|
||
Deep Learning for Computer Vision with Torch</a></li>
|
||
</ul>
|
||
<h2 id="blogs">Blogs</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://www.computervisionblog.com/2015/06/deep-down-rabbit-hole-cvpr-2015-and.html">Deep
|
||
down the rabbit hole: CVPR 2015 and beyond@Tombone’s Computer Vision
|
||
Blog</a></li>
|
||
<li><a
|
||
href="http://zoyathinks.blogspot.kr/2015/06/cvpr-recap-and-where-were-going.html">CVPR
|
||
recap and where we’re going@Zoya Bylinskii (MIT PhD Student)’s
|
||
Blog</a></li>
|
||
<li><a
|
||
href="http://www.wired.com/2015/06/facebook-googles-fake-brains-spawn-new-visual-reality/">Facebook’s
|
||
AI Painting@Wired</a></li>
|
||
<li><a
|
||
href="http://googleresearch.blogspot.kr/2015/06/inceptionism-going-deeper-into-neural.html">Inceptionism:
|
||
Going Deeper into Neural Networks@Google Research</a></li>
|
||
<li><a href="http://peterroelants.github.io/">Implementing Neural
|
||
networks</a></li>
|
||
</ul>
|