1177 lines
51 KiB
HTML
1177 lines
51 KiB
HTML
<h1 id="awesome-recurrent-neural-networks">Awesome Recurrent Neural
|
||
Networks</h1>
|
||
<p>A curated list of resources dedicated to recurrent neural networks
|
||
(closely related to <em>deep learning</em>).</p>
|
||
<p>Maintainers - <a href="https://github.com/myungsub">Myungsub
|
||
Choi</a>, <a href="https://github.com/jazzsaxmafia">Taeksoo Kim</a>, <a
|
||
href="https://github.com/kjw0612">Jiwon Kim</a></p>
|
||
<p>We have pages for other topics: <a
|
||
href="https://github.com/kjw0612/awesome-deep-vision">awesome-deep-vision</a>,
|
||
<a
|
||
href="https://github.com/kjw0612/awesome-random-forest">awesome-random-forest</a></p>
|
||
<h2 id="contributing">Contributing</h2>
|
||
<p>Please feel free to <a
|
||
href="https://github.com/kjw0612/awesome-rnn/pulls">pull requests</a>,
|
||
email Myungsub Choi (cms6539@gmail.com) or join our chats to add
|
||
links.</p>
|
||
<p>The project is not actively maintained.</p>
|
||
<p><a
|
||
href="https://gitter.im/kjw0612/awesome-rnn?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge"><img
|
||
src="https://badges.gitter.im/Join%20Chat.svg"
|
||
alt="Join the chat at https://gitter.im/kjw0612/awesome-rnn" /></a></p>
|
||
<h2 id="sharing">Sharing</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://twitter.com/home?status=http://jiwonkim.org/awesome-rnn%0AResources%20for%20Recurrent%20Neural%20Networks">Share
|
||
on Twitter</a></li>
|
||
<li><a
|
||
href="http://www.facebook.com/sharer/sharer.php?u=https://jiwonkim.org/awesome-rnn">Share
|
||
on Facebook</a></li>
|
||
<li><a
|
||
href="http://plus.google.com/share?url=https://jiwonkim.org/awesome-rnn">Share
|
||
on Google Plus</a></li>
|
||
<li><a
|
||
href="http://www.linkedin.com/shareArticle?mini=true&url=https://jiwonkim.org/awesome-rnn&title=Awesome%20Recurrent%20Neural&Networks&summary=&source=">Share
|
||
on LinkedIn</a></li>
|
||
</ul>
|
||
<h2 id="table-of-contents">Table of Contents</h2>
|
||
<ul>
|
||
<li><a href="#codes">Codes</a></li>
|
||
<li><a href="#theory">Theory</a>
|
||
<ul>
|
||
<li><a href="#lectures">Lectures</a></li>
|
||
<li><a href="#books--thesis">Books / Thesis</a></li>
|
||
<li><a href="#architecture-variants">Architecture Variants</a>
|
||
<ul>
|
||
<li><a href="#structure">Structure</a></li>
|
||
<li><a href="#memory">Memory</a></li>
|
||
</ul></li>
|
||
<li><a href="#surveys">Surveys</a></li>
|
||
</ul></li>
|
||
<li><a href="#applications">Applications</a>
|
||
<ul>
|
||
<li><a href="#natural-language-processing">Natural Language
|
||
Processing</a>
|
||
<ul>
|
||
<li><a href="#language-modeling">Language Modeling</a></li>
|
||
<li><a href="#speech-recognition">Speech Recognition</a></li>
|
||
<li><a href="#machine-translation">Machine Translation</a></li>
|
||
<li><a href="#conversation-modeling">Conversation Modeling</a></li>
|
||
<li><a href="#question-answering">Question Answering</a></li>
|
||
</ul></li>
|
||
<li><a href="#computer-vision">Computer Vision</a>
|
||
<ul>
|
||
<li><a href="#object-recognition">Object Recognition</a></li>
|
||
<li><a href="#image-generation">Image Generation</a></li>
|
||
<li><a href="#video-analysis">Video Analysis</a></li>
|
||
</ul></li>
|
||
<li><a href="#multimodal-cv--nlp">Multimodal (CV+NLP)</a>
|
||
<ul>
|
||
<li><a href="#image-captioning">Image Captioning</a></li>
|
||
<li><a href="#video-captioning">Video Captioning</a></li>
|
||
<li><a href="#visual-question-answering">Visual Question
|
||
Answering</a></li>
|
||
</ul></li>
|
||
<li><a href="#turing-machines">Turing Machines</a></li>
|
||
<li><a href="#robotics">Robotics</a></li>
|
||
<li><a href="#other">Other</a></li>
|
||
</ul></li>
|
||
<li><a href="#datasets">Datasets</a></li>
|
||
<li><a href="#blogs">Blogs</a></li>
|
||
<li><a href="#online-demos">Online Demos</a></li>
|
||
</ul>
|
||
<h2 id="codes">Codes</h2>
|
||
<ul>
|
||
<li><a href="https://www.tensorflow.org/">Tensorflow</a> - Python, C++
|
||
<ul>
|
||
<li><a
|
||
href="https://www.tensorflow.org/versions/master/get_started/index.html">Get
|
||
started</a>, <a
|
||
href="https://www.tensorflow.org/versions/master/tutorials/index.html">Tutorials</a>
|
||
<ul>
|
||
<li><a
|
||
href="https://www.tensorflow.org/versions/master/tutorials/recurrent/index.html">Recurrent
|
||
Neural Network Tutorial</a></li>
|
||
<li><a
|
||
href="https://www.tensorflow.org/versions/master/tutorials/seq2seq/index.html">Sequence-to-Sequence
|
||
Model Tutorial</a></li>
|
||
</ul></li>
|
||
<li><a
|
||
href="https://github.com/nlintz/TensorFlow-Tutorials">Tutorials</a> by
|
||
nlintz</li>
|
||
<li><a
|
||
href="https://github.com/aymericdamien/TensorFlow-Examples">Notebook
|
||
examples</a> by aymericdamien</li>
|
||
<li><a href="https://github.com/tensorflow/skflow">Scikit Flow
|
||
(skflow)</a> - Simplified Scikit-learn like Interface for
|
||
TensorFlow</li>
|
||
<li><a href="http://keras.io/">Keras</a> : (Tensorflow / Theano)-based
|
||
modular deep learning library similar to Torch</li>
|
||
<li><a
|
||
href="https://github.com/sherjilozair/char-rnn-tensorflow">char-rnn-tensorflow</a>
|
||
by sherjilozair: char-rnn in tensorflow</li>
|
||
</ul></li>
|
||
<li><a href="http://deeplearning.net/software/theano/">Theano</a> -
|
||
Python
|
||
<ul>
|
||
<li>Simple IPython <a
|
||
href="http://nbviewer.jupyter.org/github/craffel/theano-tutorial/blob/master/Theano%20Tutorial.ipynb">tutorial
|
||
on Theano</a></li>
|
||
<li><a href="http://www.deeplearning.net/tutorial/">Deep Learning
|
||
Tutorials</a>
|
||
<ul>
|
||
<li><a
|
||
href="http://www.deeplearning.net/tutorial/rnnslu.html#rnnslu">RNN for
|
||
semantic parsing of speech</a></li>
|
||
<li><a href="http://www.deeplearning.net/tutorial/lstm.html#lstm">LSTM
|
||
network for sentiment analysis</a></li>
|
||
</ul></li>
|
||
<li><a href="http://deeplearning.net/software/pylearn2/">Pylearn2</a> :
|
||
Library that wraps a lot of models and training algorithms in deep
|
||
learning</li>
|
||
<li><a href="https://github.com/mila-udem/blocks">Blocks</a> : modular
|
||
framework that enables building neural network models</li>
|
||
<li><a href="http://keras.io/">Keras</a> : (Tensorflow / Theano)-based
|
||
modular deep learning library similar to Torch</li>
|
||
<li><a href="https://github.com/Lasagne/Lasagne">Lasagne</a> :
|
||
Lightweight library to build and train neural networks in Theano</li>
|
||
<li><a href="https://github.com/gwtaylor/theano-rnn">theano-rnn</a> by
|
||
Graham Taylor</li>
|
||
<li><a href="https://github.com/IndicoDataSolutions/Passage">Passage</a>
|
||
: Library for text analysis with RNNs</li>
|
||
<li><a
|
||
href="https://github.com/Ivaylo-Popov/Theano-Lights">Theano-Lights</a> :
|
||
Contains many generative models</li>
|
||
</ul></li>
|
||
<li><a href="https://github.com/BVLC/caffe">Caffe</a> - C++ with
|
||
MATLAB/Python wrappers
|
||
<ul>
|
||
<li><a href="http://jeffdonahue.com/lrcn/">LRCN</a> by Jeff Donahue</li>
|
||
</ul></li>
|
||
<li><a href="http://torch.ch/">Torch</a> - Lua
|
||
<ul>
|
||
<li><a href="https://github.com/torchnet/torchnet">torchnet</a> :
|
||
modular framework that enables building neural network models</li>
|
||
<li><a href="https://github.com/karpathy/char-rnn">char-rnn</a> by
|
||
Andrej Karpathy : multi-layer RNN/LSTM/GRU for training/sampling from
|
||
character-level language models</li>
|
||
<li><a href="https://github.com/jcjohnson/torch-rnn">torch-rnn</a> by
|
||
Justin Johnson : reusable RNN/LSTM modules for torch7 - much faster and
|
||
memory efficient reimplementation of char-rnn</li>
|
||
<li><a href="https://github.com/karpathy/neuraltalk2">neuraltalk2</a> by
|
||
Andrej Karpathy : Recurrent Neural Network captions image, much faster
|
||
and better version of the original <a
|
||
href="https://github.com/karpathy/neuraltalk">neuraltalk</a></li>
|
||
<li><a href="https://github.com/wojzaremba/lstm">LSTM</a> by Wojciech
|
||
Zaremba : Long Short Term Memory Units to train a language model on word
|
||
level Penn Tree Bank dataset</li>
|
||
<li><a href="https://github.com/oxford-cs-ml-2015">Oxford</a> by Nando
|
||
de Freitas : Oxford Computer Science - Machine Learning 2015
|
||
Practicals</li>
|
||
<li><a href="https://github.com/Element-Research/rnn">rnn</a> by
|
||
Nicholas Leonard : general library for implementing RNN, LSTM, BRNN and
|
||
BLSTM (highly unit tested).</li>
|
||
</ul></li>
|
||
<li><a href="http://pytorch.org/">PyTorch</a> - Python
|
||
<ul>
|
||
<li><a
|
||
href="https://github.com/pytorch/examples/tree/master/word_language_model">Word-level
|
||
RNN example</a> : demonstrates PyTorch’s built in RNN modules for
|
||
language modeling</li>
|
||
<li><a href="https://github.com/spro/practical-pytorch">Practical
|
||
PyTorch tutorials</a> by Sean Robertson : focuses on using RNNs for
|
||
Natural Language Processing</li>
|
||
<li><a
|
||
href="https://github.com/rguthrie3/DeepLearningForNLPInPytorch">Deep
|
||
Learning For NLP In PyTorch</a> by Robert Guthrie : written for a
|
||
Natural Language Processing class at Georgia Tech</li>
|
||
</ul></li>
|
||
<li><a href="http://deeplearning4j.org/">DL4J</a> by <a
|
||
href="http://www.skymind.io/">Skymind</a> : Deep Learning library for
|
||
Java, Scala & Clojure on Hadoop, Spark & GPUs
|
||
<ul>
|
||
<li><a href="http://deeplearning4j.org/">Documentation</a> (Also in <a
|
||
href="http://deeplearning4j.org/zh-index.html">Chinese</a>, <a
|
||
href="http://deeplearning4j.org/ja-index.html">Japanese</a>, <a
|
||
href="http://deeplearning4j.org/kr-index.html">Korean</a>) : <a
|
||
href="http://deeplearning4j.org/usingrnns.html">RNN</a>, <a
|
||
href="http://deeplearning4j.org/lstm.html">LSTM</a></li>
|
||
<li><a
|
||
href="https://github.com/deeplearning4j/dl4j-examples/tree/master/dl4j-examples/src/main/java/org/deeplearning4j/examples/recurrent">rnn
|
||
examples</a></li>
|
||
</ul></li>
|
||
<li>Etc.
|
||
<ul>
|
||
<li><a
|
||
href="http://neon.nervanasys.com/docs/latest/index.html">Neon</a>: new
|
||
deep learning library in Python, with support for RNN/LSTM, and a fast
|
||
image captioning model</li>
|
||
<li><a href="https://github.com/IDSIA/brainstorm">Brainstorm</a>: deep
|
||
learning library in Python, developed by IDSIA, thereby including
|
||
various recurrent structures</li>
|
||
<li><a href="http://chainer.org/">Chainer</a> : new, flexible deep
|
||
learning library in Python</li>
|
||
<li><a href="http://joschu.github.io/">CGT</a>(Computational Graph
|
||
Toolkit) : replicates Theano’s API, but with very short compilation time
|
||
and multithreading</li>
|
||
<li><a href="https://sourceforge.net/p/rnnl/wiki/Home/">RNNLIB</a> by
|
||
Alex Graves : C++ based LSTM library</li>
|
||
<li><a href="http://rnnlm.org/">RNNLM</a> by Tomas Mikolov : C++ based
|
||
simple code</li>
|
||
<li><a href="https://github.com/yandex/faster-rnnlm">faster-RNNLM</a> of
|
||
Yandex : C++ based rnnlm implementation aimed to handle huge
|
||
datasets</li>
|
||
<li><a href="https://github.com/karpathy/neuraltalk">neuraltalk</a> by
|
||
Andrej Karpathy : numpy-based RNN/LSTM implementation</li>
|
||
<li><a
|
||
href="https://gist.github.com/karpathy/587454dc0146a6ae21fc">gist</a> by
|
||
Andrej Karpathy : raw numpy code that implements an efficient batched
|
||
LSTM</li>
|
||
<li><a href="https://github.com/karpathy/recurrentjs">Recurrentjs</a> by
|
||
Andrej Karpathy : a beta javascript library for RNN</li>
|
||
<li><a href="https://github.com/5vision/DARQN">DARQN</a> by 5vision :
|
||
Deep Attention Recurrent Q-Network</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="theory">Theory</h2>
|
||
<h3 id="lectures">Lectures</h3>
|
||
<ul>
|
||
<li>Stanford NLP (<a
|
||
href="http://cs224d.stanford.edu/index.html">CS224d</a>) by Richard
|
||
Socher
|
||
<ul>
|
||
<li><a
|
||
href="http://cs224d.stanford.edu/lecture_notes/LectureNotes3.pdf">Lecture
|
||
Note 3</a> : neural network basics</li>
|
||
<li><a
|
||
href="http://cs224d.stanford.edu/lecture_notes/LectureNotes4.pdf">Lecture
|
||
Note 4</a> : RNN language models, bi-directional RNN, GRU, LSTM</li>
|
||
</ul></li>
|
||
<li>Stanford vision (<a href="http://cs231n.github.io/">CS231n</a>) by
|
||
Andrej Karpathy
|
||
<ul>
|
||
<li>About NN basic, and CNN</li>
|
||
</ul></li>
|
||
<li>Oxford <a
|
||
href="https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/">Machine
|
||
Learning</a> by Nando de Freitas
|
||
<ul>
|
||
<li><a href="https://www.youtube.com/watch?v=56TYLaQN4N8">Lecture 12</a>
|
||
: Recurrent neural networks and LSTMs</li>
|
||
<li><a href="https://www.youtube.com/watch?v=-yX1SYeDHbg">Lecture 13</a>
|
||
: (guest lecture) Alex Graves on Hallucination with RNNs</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="books-thesis">Books / Thesis</h3>
|
||
<ul>
|
||
<li>Alex Graves (2008)
|
||
<ul>
|
||
<li><a href="http://www.cs.toronto.edu/~graves/preprint.pdf">Supervised
|
||
Sequence Labelling with Recurrent Neural Networks</a></li>
|
||
</ul></li>
|
||
<li>Tomas Mikolov (2012)
|
||
<ul>
|
||
<li><a
|
||
href="http://www.fit.vutbr.cz/~imikolov/rnnlm/thesis.pdf">Statistical
|
||
Language Models based on Neural Networks</a></li>
|
||
</ul></li>
|
||
<li>Ilya Sutskever (2013)
|
||
<ul>
|
||
<li><a
|
||
href="http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf">Training
|
||
Recurrent Neural Networks</a></li>
|
||
</ul></li>
|
||
<li>Richard Socher (2014)
|
||
<ul>
|
||
<li><a href="http://nlp.stanford.edu/~socherr/thesis.pdf">Recursive Deep
|
||
Learning for Natural Language Processing and Computer Vision</a></li>
|
||
</ul></li>
|
||
<li>Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016)
|
||
<ul>
|
||
<li><a href="http://www.deeplearningbook.org/contents/rnn.html">The Deep
|
||
Learning Book chapter 10</a></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="architecture-variants">Architecture Variants</h3>
|
||
<h4 id="structure">Structure</h4>
|
||
<ul>
|
||
<li>Bi-directional RNN [<a
|
||
href="http://www.di.ufpe.br/~fnj/RNA/bibliografia/BRNN.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Mike Schuster and Kuldip K. Paliwal, <em>Bidirectional Recurrent
|
||
Neural Networks</em>, Trans. on Signal Processing 1997</li>
|
||
</ul></li>
|
||
<li>Multi-dimensional RNN [<a
|
||
href="http://arxiv.org/pdf/0705.2011.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Alex Graves, Santiago Fernandez, and Jurgen Schmidhuber,
|
||
<em>Multi-Dimensional Recurrent Neural Networks</em>, ICANN 2007</li>
|
||
</ul></li>
|
||
<li>GFRNN [<a href="http://arxiv.org/pdf/1502.02367">Paper-arXiv</a>]
|
||
[<a
|
||
href="http://jmlr.org/proceedings/papers/v37/chung15.pdf">Paper-ICML</a>]
|
||
[<a
|
||
href="http://jmlr.org/proceedings/papers/v37/chung15-supp.pdf">Supplementary</a>]
|
||
<ul>
|
||
<li>Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho, Yoshua Bengio,
|
||
<em>Gated Feedback Recurrent Neural Networks</em>, arXiv:1502.02367 /
|
||
ICML 2015</li>
|
||
</ul></li>
|
||
<li>Tree-Structured RNNs
|
||
<ul>
|
||
<li>Kai Sheng Tai, Richard Socher, and Christopher D. Manning,
|
||
<em>Improved Semantic Representations From Tree-Structured Long
|
||
Short-Term Memory Networks</em>, arXiv:1503.00075 / ACL 2015 [<a
|
||
href="http://arxiv.org/pdf/1503.00075">Paper</a>]</li>
|
||
<li>Samuel R. Bowman, Christopher D. Manning, and Christopher Potts,
|
||
<em>Tree-structured composition in neural networks without
|
||
tree-structured architectures</em>, arXiv:1506.04834 [<a
|
||
href="http://arxiv.org/pdf/1506.04834">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Grid LSTM [<a href="http://arxiv.org/pdf/1507.01526">Paper</a>] [<a
|
||
href="https://github.com/coreylynch/grid-lstm">Code</a>]
|
||
<ul>
|
||
<li>Nal Kalchbrenner, Ivo Danihelka, and Alex Graves, <em>Grid Long
|
||
Short-Term Memory</em>, arXiv:1507.01526</li>
|
||
</ul></li>
|
||
<li>Segmental RNN [<a
|
||
href="http://arxiv.org/pdf/1511.06018v2.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Lingpeng Kong, Chris Dyer, Noah Smith, “Segmental Recurrent Neural
|
||
Networks”, ICLR 2016.</li>
|
||
</ul></li>
|
||
<li>Seq2seq for Sets [<a
|
||
href="http://arxiv.org/pdf/1511.06391v4.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Oriol Vinyals, Samy Bengio, Manjunath Kudlur, “Order Matters:
|
||
Sequence to sequence for sets”, ICLR 2016.</li>
|
||
</ul></li>
|
||
<li>Hierarchical Recurrent Neural Networks [<a
|
||
href="http://arxiv.org/abs/1609.01704">Paper</a>]
|
||
<ul>
|
||
<li>Junyoung Chung, Sungjin Ahn, Yoshua Bengio, “Hierarchical Multiscale
|
||
Recurrent Neural Networks”, arXiv:1609.01704</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="memory">Memory</h4>
|
||
<ul>
|
||
<li>LSTM [<a
|
||
href="http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Sepp Hochreiter and Jurgen Schmidhuber, <em>Long Short-Term
|
||
Memory</em>, Neural Computation 1997</li>
|
||
</ul></li>
|
||
<li>GRU (Gated Recurrent Unit) [<a
|
||
href="http://arxiv.org/pdf/1406.1078.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kyunghyun Cho, Bart van Berrienboer, Caglar Gulcehre, Dzmitry
|
||
Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio,
|
||
<em>Learning Phrase Representations using RNN Encoder-Decoder for
|
||
Statistical Machine Translation</em>, arXiv:1406.1078 / EMNLP 2014</li>
|
||
</ul></li>
|
||
<li>NTM [<a href="http://arxiv.org/pdf/1410.5401">Paper</a>]
|
||
<ul>
|
||
<li>A.Graves, G. Wayne, and I. Danihelka., <em>Neural Turing
|
||
Machines,</em> arXiv preprint arXiv:1410.5401</li>
|
||
</ul></li>
|
||
<li>Neural GPU [<a href="http://arxiv.org/pdf/1511.08228.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Łukasz Kaiser, Ilya Sutskever, arXiv:1511.08228 / ICML 2016 (under
|
||
review)</li>
|
||
</ul></li>
|
||
<li>Memory Network [<a href="http://arxiv.org/pdf/1410.3916">Paper</a>]
|
||
<ul>
|
||
<li>Jason Weston, Sumit Chopra, Antoine Bordes, <em>Memory
|
||
Networks,</em> arXiv:1410.3916</li>
|
||
</ul></li>
|
||
<li>Pointer Network [<a
|
||
href="http://arxiv.org/pdf/1506.03134">Paper</a>]
|
||
<ul>
|
||
<li>Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly, <em>Pointer
|
||
Networks</em>, arXiv:1506.03134 / NIPS 2015</li>
|
||
</ul></li>
|
||
<li>Deep Attention Recurrent Q-Network [<a
|
||
href="http://arxiv.org/abs/1512.01693">Paper</a>]
|
||
<ul>
|
||
<li>Ivan Sorokin, Alexey Seleznev, Mikhail Pavlov, Aleksandr Fedorov,
|
||
Anastasiia Ignateva, <em>Deep Attention Recurrent Q-Network</em> ,
|
||
arXiv:1512.01693</li>
|
||
</ul></li>
|
||
<li>Dynamic Memory Networks [<a
|
||
href="http://arxiv.org/abs/1506.07285">Paper</a>]
|
||
<ul>
|
||
<li>Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James
|
||
Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, Richard Socher,
|
||
“Ask Me Anything: Dynamic Memory Networks for Natural Language
|
||
Processing”, arXiv:1506.07285</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="surveys">Surveys</h3>
|
||
<ul>
|
||
<li>Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, <a
|
||
href="http://www.nature.com/nature/journal/v521/n7553/pdf/nature14539.pdf">Deep
|
||
Learning</a>, Nature 2015</li>
|
||
<li>Klaus Greff, Rupesh Kumar Srivastava, Jan Koutnik, Bas R.
|
||
Steunebrink, Jurgen Schmidhuber, <a
|
||
href="http://arxiv.org/pdf/1503.04069">LSTM: A Search Space Odyssey</a>,
|
||
arXiv:1503.04069</li>
|
||
<li>Zachary C. Lipton, <a href="http://arxiv.org/pdf/1506.00019">A
|
||
Critical Review of Recurrent Neural Networks for Sequence Learning</a>,
|
||
arXiv:1506.00019</li>
|
||
<li>Andrej Karpathy, Justin Johnson, Li Fei-Fei, <a
|
||
href="http://arxiv.org/pdf/1506.02078">Visualizing and Understanding
|
||
Recurrent Networks</a>, arXiv:1506.02078</li>
|
||
<li>Rafal Jozefowicz, Wojciech Zaremba, Ilya Sutskever, <a
|
||
href="http://jmlr.org/proceedings/papers/v37/jozefowicz15.pdf">An
|
||
Empirical Exploration of Recurrent Network Architectures</a>, ICML,
|
||
2015.</li>
|
||
</ul>
|
||
<h2 id="applications">Applications</h2>
|
||
<h3 id="natural-language-processing">Natural Language Processing</h3>
|
||
<h4 id="language-modeling">Language Modeling</h4>
|
||
<ul>
|
||
<li>Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan “Honza” Cernocky,
|
||
Sanjeev Khudanpur, <em>Recurrent Neural Network based Language
|
||
Model</em>, Interspeech 2010 [<a
|
||
href="http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf">Paper</a>]</li>
|
||
<li>Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan “Honza” Cernocky,
|
||
Sanjeev Khudanpur, <em>Extensions of Recurrent Neural Network Language
|
||
Model</em>, ICASSP 2011 [<a
|
||
href="http://www.fit.vutbr.cz/research/groups/speech/publi/2011/mikolov_icassp2011_5528.pdf">Paper</a>]</li>
|
||
<li>Stefan Kombrink, Tomas Mikolov, Martin Karafiat, Lukas Burget,
|
||
<em>Recurrent Neural Network based Language Modeling in Meeting
|
||
Recognition</em>, Interspeech 2011 [<a
|
||
href="http://www.fit.vutbr.cz/~imikolov/rnnlm/ApplicationOfRNNinMeetingRecognition_IS2011.pdf">Paper</a>]</li>
|
||
<li>Jiwei Li, Minh-Thang Luong, and Dan Jurafsky, <em>A Hierarchical
|
||
Neural Autoencoder for Paragraphs and Documents</em>, ACL 2015 [<a
|
||
href="http://arxiv.org/pdf/1506.01057">Paper</a>], [<a
|
||
href="https://github.com/jiweil/Hierarchical-Neural-Autoencoder">Code</a>]</li>
|
||
<li>Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, and Richard S. Zemel,
|
||
<em>Skip-Thought Vectors</em>, arXiv:1506.06726 / NIPS 2015 [<a
|
||
href="http://arxiv.org/pdf/1506.06726.pdf">Paper</a>]</li>
|
||
<li>Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush,
|
||
<em>Character-Aware Neural Language Models</em>, arXiv:1508.06615 [<a
|
||
href="http://arxiv.org/pdf/1508.06615">Paper</a>]</li>
|
||
<li>Xingxing Zhang, Liang Lu, and Mirella Lapata, <em>Tree Recurrent
|
||
Neural Networks with Application to Language Modeling</em>,
|
||
arXiv:1511.00060 [<a
|
||
href="http://arxiv.org/pdf/1511.00060.pdf">Paper</a>]</li>
|
||
<li>Felix Hill, Antoine Bordes, Sumit Chopra, and Jason Weston, <em>The
|
||
Goldilocks Principle: Reading children’s books with explicit memory
|
||
representations</em>, arXiv:1511.0230 [<a
|
||
href="http://arxiv.org/pdf/1511.02301.pdf">Paper</a>]</li>
|
||
</ul>
|
||
<h4 id="speech-recognition">Speech Recognition</h4>
|
||
<ul>
|
||
<li>Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman
|
||
Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick
|
||
Nguyen, Tara N. Sainath, and Brian Kingsbury, <em>Deep Neural Networks
|
||
for Acoustic Modeling in Speech Recognition</em>, IEEE Signam Processing
|
||
Magazine 2012 [<a
|
||
href="http://cs224d.stanford.edu/papers/maas_paper.pdf">Paper</a>]</li>
|
||
<li>Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton, <em>Speech
|
||
Recognition with Deep Recurrent Neural Networks</em>, arXiv:1303.5778 /
|
||
ICASSP 2013 [<a
|
||
href="http://www.cs.toronto.edu/~fritz/absps/RNN13.pdf">Paper</a>]</li>
|
||
<li>Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and
|
||
Yoshua Bengio, <em>Attention-Based Models for Speech Recognition</em>,
|
||
arXiv:1506.07503 / NIPS 2015 [<a
|
||
href="http://arxiv.org/pdf/1506.07503">Paper</a>]</li>
|
||
<li>Haşim Sak, Andrew Senior, Kanishka Rao, and Françoise Beaufays.
|
||
<em>Fast and Accurate Recurrent Neural Network Acoustic Models for
|
||
Speech Recognition</em>, arXiv:1507.06947 2015 [<a
|
||
href="http://arxiv.org/pdf/1507.06947v1.pdf">Paper</a>].</li>
|
||
</ul>
|
||
<h4 id="machine-translation">Machine Translation</h4>
|
||
<ul>
|
||
<li>Oxford [<a
|
||
href="http://www.nal.ai/papers/kalchbrennerblunsom_emnlp13">Paper</a>]
|
||
<ul>
|
||
<li>Nal Kalchbrenner and Phil Blunsom, <em>Recurrent Continuous
|
||
Translation Models</em>, EMNLP 2013</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal
|
||
<ul>
|
||
<li>Kyunghyun Cho, Bart van Berrienboer, Caglar Gulcehre, Dzmitry
|
||
Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio,
|
||
<em>Learning Phrase Representations using RNN Encoder-Decoder for
|
||
Statistical Machine Translation</em>, arXiv:1406.1078 / EMNLP 2014 [<a
|
||
href="http://arxiv.org/pdf/1406.1078">Paper</a>]</li>
|
||
<li>Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, and Yoshua
|
||
Bengio, <em>On the Properties of Neural Machine Translation:
|
||
Encoder-Decoder Approaches</em>, SSST-8 2014 [<a
|
||
href="http://www.aclweb.org/anthology/W14-4012">Paper</a>]</li>
|
||
<li>Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer,
|
||
Kyunghyun Cho, and Yoshua Bengio, <em>Overcoming the Curse of Sentence
|
||
Length for Neural Machine Translation using Automatic Segmentation</em>,
|
||
SSST-8 2014</li>
|
||
<li>Dzmitry Bahdanau, KyungHyun Cho, and Yoshua Bengio, <em>Neural
|
||
Machine Translation by Jointly Learning to Align and Translate</em>,
|
||
arXiv:1409.0473 / ICLR 2015 [<a
|
||
href="http://arxiv.org/pdf/1409.0473">Paper</a>]</li>
|
||
<li>Sebastian Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua Bengio,
|
||
<em>On using very large target vocabulary for neural machine
|
||
translation</em>, arXiv:1412.2007 / ACL 2015 [<a
|
||
href="http://arxiv.org/pdf/1412.2007.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal + Middle East Tech. Univ. + Univ. Maine [<a
|
||
href="http://arxiv.org/pdf/1503.03535.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Caglar Gulcehre, Orhan Firat, Kelvin Xu, Kyunghyun Cho, Loic
|
||
Barrault, Huei-Chi Lin, Fethi Bougares, Holger Schwenk, and Yoshua
|
||
Bengio, <em>On Using Monolingual Corpora in Neural Machine
|
||
Translation</em>, arXiv:1503.03535</li>
|
||
</ul></li>
|
||
<li>Google [<a
|
||
href="http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, <em>Sequence to
|
||
Sequence Learning with Neural Networks</em>, arXiv:1409.3215 / NIPS
|
||
2014</li>
|
||
</ul></li>
|
||
<li>Google + NYU [<a href="http://arxiv.org/pdf/1410.8206">Paper</a>]
|
||
<ul>
|
||
<li>Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and
|
||
Wojciech Zaremba, <em>Addressing the Rare Word Problem in Neural Machine
|
||
Transltaion</em>, arXiv:1410.8206 / ACL 2015</li>
|
||
</ul></li>
|
||
<li>ICT + Huawei [<a
|
||
href="http://arxiv.org/pdf/1506.06442.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, and Qun Liu, <em>A
|
||
Deep Memory-based Architecture for Sequence-to-Sequence Learning</em>,
|
||
arXiv:1506.06442</li>
|
||
</ul></li>
|
||
<li>Stanford [<a href="http://arxiv.org/pdf/1508.04025.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Minh-Thang Luong, Hieu Pham, and Christopher D. Manning,
|
||
<em>Effective Approaches to Attention-based Neural Machine
|
||
Translation</em>, arXiv:1508.04025</li>
|
||
</ul></li>
|
||
<li>Middle East Tech. Univ. + NYU + Univ. Montreal [<a
|
||
href="http://arxiv.org/pdf/1601.01073.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Orhan Firat, Kyunghyun Cho, and Yoshua Bengio, <em>Multi-Way,
|
||
Multilingual Neural Machine Translation with a Shared Attention
|
||
Mechanism</em>, arXiv:1601.01073</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="conversation-modeling">Conversation Modeling</h4>
|
||
<ul>
|
||
<li>Lifeng Shang, Zhengdong Lu, and Hang Li, <em>Neural Responding
|
||
Machine for Short-Text Conversation</em>, arXiv:1503.02364 / ACL 2015
|
||
[<a href="http://arxiv.org/pdf/1503.02364">Paper</a>]</li>
|
||
<li>Oriol Vinyals and Quoc V. Le, <em>A Neural Conversational
|
||
Model</em>, arXiv:1506.05869 [<a
|
||
href="http://arxiv.org/pdf/1506.05869">Paper</a>]</li>
|
||
<li>Ryan Lowe, Nissan Pow, Iulian V. Serban, and Joelle Pineau, <em>The
|
||
Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured
|
||
Multi-Turn Dialogue Systems</em>, arXiv:1506.08909 [<a
|
||
href="http://arxiv.org/pdf/1506.08909">Paper</a>]</li>
|
||
<li>Jesse Dodge, Andreea Gane, Xiang Zhang, Antoine Bordes, Sumit
|
||
Chopra, Alexander Miller, Arthur Szlam, and Jason Weston, <em>Evaluating
|
||
Prerequisite Qualities for Learning End-to-End Dialog Systems</em>,
|
||
arXiv:1511.06931 [<a
|
||
href="http://arxiv.org/pdf/1511.06931">Paper</a>]</li>
|
||
<li>Jason Weston, <em>Dialog-based Language Learning</em>,
|
||
arXiv:1604.06045, [<a
|
||
href="http://arxiv.org/pdf/1604.06045">Paper</a>]</li>
|
||
<li>Antoine Bordes and Jason Weston, <em>Learning End-to-End
|
||
Goal-Oriented Dialog</em>, arXiv:1605.07683 [<a
|
||
href="http://arxiv.org/pdf/1605.07683">Paper</a>]</li>
|
||
</ul>
|
||
<h4 id="question-answering">Question Answering</h4>
|
||
<ul>
|
||
<li>FAIR
|
||
<ul>
|
||
<li>Jason Weston, Antoine Bordes, Sumit Chopra, Tomas Mikolov, and
|
||
Alexander M. Rush, <em>Towards AI-Complete Question Answering: A Set of
|
||
Prerequisite Toy Tasks</em>, arXiv:1502.05698 [<a
|
||
href="https://research.facebook.com/researchers/1543934539189348">Web</a>]
|
||
[<a href="http://arxiv.org/pdf/1502.05698.pdf">Paper</a>]</li>
|
||
<li>Antoine Bordes, Nicolas Usunier, Sumit Chopra, and Jason Weston,
|
||
<em>Simple Question answering with Memory Networks</em>,
|
||
arXiv:1506.02075 [<a
|
||
href="http://arxiv.org/abs/1506.02075">Paper</a>]</li>
|
||
<li>Felix Hill, Antoine Bordes, Sumit Chopra, Jason Weston, “The
|
||
Goldilocks Principle: Reading Children’s Books with Explicit Memory
|
||
Representations”, ICLR 2016 [<a
|
||
href="http://arxiv.org/abs/1511.02301">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>DeepMind + Oxford [<a
|
||
href="http://arxiv.org/pdf/1506.03340.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Karl M. Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt,
|
||
Will Kay, Mustafa Suleyman, and Phil Blunsom, <em>Teaching Machines to
|
||
Read and Comprehend</em>, arXiv:1506.03340 / NIPS 2015</li>
|
||
</ul></li>
|
||
<li>MetaMind [<a href="http://arxiv.org/pdf/1506.07285.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert
|
||
English, Brian Pierce, Peter Ondruska, Mohit Iyyer, Ishaan Gulrajani,
|
||
and Richard Socher, <em>Ask Me Anything: Dynamic Memory Networks for
|
||
Natural Language Processing</em>, arXiv:1506.07285</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="computer-vision">Computer Vision</h3>
|
||
<h4 id="object-recognition">Object Recognition</h4>
|
||
<ul>
|
||
<li>Pedro Pinheiro and Ronan Collobert, <em>Recurrent Convolutional
|
||
Neural Networks for Scene Labeling</em>, ICML 2014 [<a
|
||
href="http://jmlr.org/proceedings/papers/v32/pinheiro14.pdf">Paper</a>]</li>
|
||
<li>Ming Liang and Xiaolin Hu, <em>Recurrent Convolutional Neural
|
||
Network for Object Recognition</em>, CVPR 2015 [<a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Liang_Recurrent_Convolutional_Neural_2015_CVPR_paper.pdf">Paper</a>]</li>
|
||
<li>Wonmin Byeon, Thomas Breuel, Federico Raue1, and Marcus Liwicki1,
|
||
<em>Scene Labeling with LSTM Recurrent Neural Networks</em>, CVPR 2015
|
||
[<a
|
||
href="http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Byeon_Scene_Labeling_With_2015_CVPR_paper.pdf">Paper</a>]</li>
|
||
<li>Mircea Serban Pavel, Hannes Schulz, and Sven Behnke, <em>Recurrent
|
||
Convolutional Neural Networks for Object-Class Segmentation of RGB-D
|
||
Video</em>, IJCNN 2015 [<a
|
||
href="http://www.ais.uni-bonn.de/papers/IJCNN_2015_Pavel.pdf">Paper</a>]</li>
|
||
<li>Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav
|
||
Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip H. S. Torr,
|
||
<em>Conditional Random Fields as Recurrent Neural Networks</em>,
|
||
arXiv:1502.03240 [<a
|
||
href="http://arxiv.org/pdf/1502.03240">Paper</a>]</li>
|
||
<li>Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin,
|
||
and Shuicheng Yan, <em>Semantic Object Parsing with Local-Global Long
|
||
Short-Term Memory</em>, arXiv:1511.04510 [<a
|
||
href="http://arxiv.org/pdf/1511.04510.pdf">Paper</a>]</li>
|
||
<li>Sean Bell, C. Lawrence Zitnick, Kavita Bala, and Ross Girshick,
|
||
<em>Inside-Outside Net: Detecting Objects in Context with Skip Pooling
|
||
and Recurrent Neural Networks</em>, arXiv:1512.04143 / ICCV 2015
|
||
workshop [<a href="http://arxiv.org/pdf/1512.04143">Paper</a>]</li>
|
||
</ul>
|
||
<h4 id="visual-tracking">Visual Tracking</h4>
|
||
<ul>
|
||
<li>Quan Gan, Qipeng Guo, Zheng Zhang, and Kyunghyun Cho, <em>First Step
|
||
toward Model-Free, Anonymous Object Tracking with Recurrent Neural
|
||
Networks</em>, arXiv:1511.06425 [<a
|
||
href="http://arxiv.org/pdf/1511.06425">Paper</a>]</li>
|
||
</ul>
|
||
<h4 id="image-generation">Image Generation</h4>
|
||
<ul>
|
||
<li>Karol Gregor, Ivo Danihelka, Alex Graves, Danilo J. Rezende, and
|
||
Daan Wierstra, <em>DRAW: A Recurrent Neural Network for Image
|
||
Generation,</em> ICML 2015 [<a
|
||
href="http://arxiv.org/pdf/1502.04623">Paper</a>]</li>
|
||
<li>Angeliki Lazaridou, Dat T. Nguyen, R. Bernardi, and M. Baroni,
|
||
<em>Unveiling the Dreams of Word Embeddings: Towards Language-Driven
|
||
Image Generation,</em> arXiv:1506.03500 [<a
|
||
href="http://arxiv.org/pdf/1506.03500">Paper</a>]</li>
|
||
<li>Lucas Theis and Matthias Bethge, <em>Generative Image Modeling Using
|
||
Spatial LSTMs,</em> arXiv:1506.03478 / NIPS 2015 [<a
|
||
href="http://arxiv.org/pdf/1506.03478">Paper</a>]</li>
|
||
<li>Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu,
|
||
<em>Pixel Recurrent Neural Networks,</em> arXiv:1601.06759 [<a
|
||
href="http://arxiv.org/abs/1601.06759">Paper</a>]</li>
|
||
</ul>
|
||
<h4 id="video-analysis">Video Analysis</h4>
|
||
<ul>
|
||
<li>Univ. Toronto [<a href="http://arxiv.org/abs/1502.04681">paper</a>]
|
||
<ul>
|
||
<li>Nitish Srivastava, Elman Mansimov, Ruslan Salakhutdinov,
|
||
<em>Unsupervised Learning of Video Representations using LSTMs</em>,
|
||
arXiv:1502.04681 / ICML 2015</li>
|
||
</ul></li>
|
||
<li>Univ. Cambridge [<a
|
||
href="http://arxiv.org/abs/1511.06309">paper</a>]
|
||
<ul>
|
||
<li>Viorica Patraucean, Ankur Handa, Roberto Cipolla,
|
||
<em>Spatio-temporal video autoencoder with differentiable memory</em>,
|
||
arXiv:1511.06309</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h3 id="multimodal-cv-nlp">Multimodal (CV + NLP)</h3>
|
||
<h4 id="image-captioning">Image Captioning</h4>
|
||
<ul>
|
||
<li>UCLA + Baidu [<a
|
||
href="http://www.stat.ucla.edu/~junhua.mao/m-RNN.html">Web</a>] [<a
|
||
href="http://arxiv.org/pdf/1410.1090">Paper-arXiv1</a>], [<a
|
||
href="http://arxiv.org/pdf/1412.6632">Paper-arXiv2</a>]
|
||
<ul>
|
||
<li>Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, and Alan L. Yuille,
|
||
<em>Explain Images with Multimodal Recurrent Neural Networks</em>,
|
||
arXiv:1410.1090</li>
|
||
<li>Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L.
|
||
Yuille, <em>Deep Captioning with Multimodal Recurrent Neural Networks
|
||
(m-RNN)</em>, arXiv:1412.6632 / ICLR 2015</li>
|
||
</ul></li>
|
||
<li>Univ. Toronto [<a href="http://arxiv.org/pdf/1411.2539">Paper</a>]
|
||
[<a href="http://deeplearning.cs.toronto.edu/i2t">Web demo</a>]
|
||
<ul>
|
||
<li>Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel, <em>Unifying
|
||
Visual-Semantic Embeddings with Multimodal Neural Language Models</em>,
|
||
arXiv:1411.2539 / TACL 2015</li>
|
||
</ul></li>
|
||
<li>Berkeley [<a href="http://jeffdonahue.com/lrcn/">Web</a>] [<a
|
||
href="http://arxiv.org/pdf/1411.4389">Paper</a>]
|
||
<ul>
|
||
<li>Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus
|
||
Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell,
|
||
<em>Long-term Recurrent Convolutional Networks for Visual Recognition
|
||
and Description</em>, arXiv:1411.4389 / CVPR 2015</li>
|
||
</ul></li>
|
||
<li>Google [<a href="http://arxiv.org/pdf/1411.4555">Paper</a>]
|
||
<ul>
|
||
<li>Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan,
|
||
<em>Show and Tell: A Neural Image Caption Generator</em>,
|
||
arXiv:1411.4555 / CVPR 2015</li>
|
||
</ul></li>
|
||
<li>Stanford <a
|
||
href="http://cs.stanford.edu/people/karpathy/deepimagesent/">[Web]</a>
|
||
<a
|
||
href="http://cs.stanford.edu/people/karpathy/cvpr2015.pdf">[Paper]</a>
|
||
<ul>
|
||
<li>Andrej Karpathy and Li Fei-Fei, <em>Deep Visual-Semantic Alignments
|
||
for Generating Image Description</em>, CVPR 2015</li>
|
||
</ul></li>
|
||
<li>Microsoft [<a href="http://arxiv.org/pdf/1411.4952">Paper</a>]
|
||
<ul>
|
||
<li>Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li
|
||
Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John
|
||
C. Platt, Lawrence Zitnick, and Geoffrey Zweig, <em>From Captions to
|
||
Visual Concepts and Back</em>, arXiv:1411.4952 / CVPR 2015</li>
|
||
</ul></li>
|
||
<li>CMU + Microsoft [<a
|
||
href="http://arxiv.org/pdf/1411.5654">Paper-arXiv</a>], [<a
|
||
href="http://www.cs.cmu.edu/~xinleic/papers/cvpr15_rnn.pdf">Paper-CVPR</a>]
|
||
<ul>
|
||
<li>Xinlei Chen, and C. Lawrence Zitnick, <em>Learning a Recurrent
|
||
Visual Representation for Image Caption Generation</em></li>
|
||
<li>Xinlei Chen, and C. Lawrence Zitnick, <em>Mind’s Eye: A Recurrent
|
||
Visual Representation for Image Caption Generation</em>, CVPR 2015</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal + Univ. Toronto [<a
|
||
href="http://kelvinxu.github.io/projects/capgen.html">Web</a>] [<a
|
||
href="http://www.cs.toronto.edu/~zemel/documents/captionAttn.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kelvin Xu, Jimmy Lei Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville,
|
||
Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio, <em>Show,
|
||
Attend, and Tell: Neural Image Caption Generation with Visual
|
||
Attention</em>, arXiv:1502.03044 / ICML 2015</li>
|
||
</ul></li>
|
||
<li>Idiap + EPFL + Facebook [<a
|
||
href="http://arxiv.org/pdf/1502.03671">Paper</a>]
|
||
<ul>
|
||
<li>Remi Lebret, Pedro O. Pinheiro, and Ronan Collobert,
|
||
<em>Phrase-based Image Captioning</em>, arXiv:1502.03671 / ICML
|
||
2015</li>
|
||
</ul></li>
|
||
<li>UCLA + Baidu [<a href="http://arxiv.org/pdf/1504.06692">Paper</a>]
|
||
<ul>
|
||
<li>Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L.
|
||
Yuille, <em>Learning like a Child: Fast Novel Visual Concept Learning
|
||
from Sentence Descriptions of Images</em>, arXiv:1504.06692</li>
|
||
</ul></li>
|
||
<li>MS + Berkeley
|
||
<ul>
|
||
<li>Jacob Devlin, Saurabh Gupta, Ross Girshick, Margaret Mitchell, and
|
||
C. Lawrence Zitnick, <em>Exploring Nearest Neighbor Approaches for Image
|
||
Captioning</em>, arXiv:1505.04467 (Note: technically not RNN) [<a
|
||
href="http://arxiv.org/pdf/1505.04467.pdf">Paper</a>]</li>
|
||
<li>Jacob Devlin, Hao Cheng, Hao Fang, Saurabh Gupta, Li Deng, Xiaodong
|
||
He, Geoffrey Zweig, and Margaret Mitchell, <em>Language Models for Image
|
||
Captioning: The Quirks and What Works</em>, arXiv:1505.01809 [<a
|
||
href="http://arxiv.org/pdf/1505.01809.pdf">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Adelaide [<a href="http://arxiv.org/pdf/1506.01144.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Qi Wu, Chunhua Shen, Anton van den Hengel, Lingqiao Liu, and Anthony
|
||
Dick, <em>Image Captioning with an Intermediate Attributes Layer</em>,
|
||
arXiv:1506.01144</li>
|
||
</ul></li>
|
||
<li>Tilburg [<a href="http://arxiv.org/pdf/1506.03694.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Grzegorz Chrupala, Akos Kadar, and Afra Alishahi, <em>Learning
|
||
language through pictures</em>, arXiv:1506.03694</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal [<a
|
||
href="http://arxiv.org/pdf/1507.01053.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kyunghyun Cho, Aaron Courville, and Yoshua Bengio, <em>Describing
|
||
Multimedia Content using Attention-based Encoder-Decoder Networks</em>,
|
||
arXiv:1507.01053</li>
|
||
</ul></li>
|
||
<li>Cornell [<a href="http://arxiv.org/pdf/1508.02091.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Jack Hessel, Nicolas Savva, and Michael J. Wilber, <em>Image
|
||
Representations and New Domains in Neural Image Captioning</em>,
|
||
arXiv:1508.02091</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="video-captioning">Video Captioning</h4>
|
||
<ul>
|
||
<li>Berkeley [<a href="http://jeffdonahue.com/lrcn/">Web</a>] [<a
|
||
href="http://arxiv.org/pdf/1411.4389">Paper</a>]
|
||
<ul>
|
||
<li>Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus
|
||
Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell,
|
||
<em>Long-term Recurrent Convolutional Networks for Visual Recognition
|
||
and Description</em>, arXiv:1411.4389 / CVPR 2015</li>
|
||
</ul></li>
|
||
<li>UT Austin + UML + Berkeley [<a
|
||
href="http://arxiv.org/pdf/1412.4729">Paper</a>]
|
||
<ul>
|
||
<li>Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach,
|
||
Raymond Mooney, and Kate Saenko, <em>Translating Videos to Natural
|
||
Language Using Deep Recurrent Neural Networks</em>, arXiv:1412.4729</li>
|
||
</ul></li>
|
||
<li>Microsoft [<a href="http://arxiv.org/pdf/1505.01861">Paper</a>]
|
||
<ul>
|
||
<li>Yingwei Pan, Tao Mei, Ting Yao, Houqiang Li, and Yong Rui, <em>Joint
|
||
Modeling Embedding and Translation to Bridge Video and Language</em>,
|
||
arXiv:1505.01861</li>
|
||
</ul></li>
|
||
<li>UT Austin + Berkeley + UML [<a
|
||
href="http://arxiv.org/pdf/1505.00487">Paper</a>]
|
||
<ul>
|
||
<li>Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond
|
||
Mooney, Trevor Darrell, and Kate Saenko, <em>Sequence to Sequence–Video
|
||
to Text</em>, arXiv:1505.00487</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal + Univ. Sherbrooke [<a
|
||
href="http://arxiv.org/pdf/1502.08029.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher
|
||
Pal, Hugo Larochelle, and Aaron Courville, <em>Describing Videos by
|
||
Exploiting Temporal Structure</em>, arXiv:1502.08029</li>
|
||
</ul></li>
|
||
<li>MPI + Berkeley [<a
|
||
href="http://arxiv.org/pdf/1506.01698.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Anna Rohrbach, Marcus Rohrbach, and Bernt Schiele, <em>The
|
||
Long-Short Story of Movie Description</em>, arXiv:1506.01698</li>
|
||
</ul></li>
|
||
<li>Univ. Toronto + MIT [<a
|
||
href="http://arxiv.org/pdf/1506.06724.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel
|
||
Urtasun, Antonio Torralba, and Sanja Fidler, <em>Aligning Books and
|
||
Movies: Towards Story-like Visual Explanations by Watching Movies and
|
||
Reading Books</em>, arXiv:1506.06724</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal [<a
|
||
href="http://arxiv.org/pdf/1507.01053.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Kyunghyun Cho, Aaron Courville, and Yoshua Bengio, <em>Describing
|
||
Multimedia Content using Attention-based Encoder-Decoder Networks</em>,
|
||
arXiv:1507.01053</li>
|
||
</ul></li>
|
||
<li>Zhejiang Univ. + UTS [<a
|
||
href="http://arxiv.org/abs/1511.03476">Paper</a>]
|
||
<ul>
|
||
<li>Pingbo Pan, Zhongwen Xu, Yi Yang, Fei Wu, Yueting Zhuang,
|
||
<em>Hierarchical Recurrent Neural Encoder for Video Representation with
|
||
Application to Captioning</em>, arXiv:1511.03476</li>
|
||
</ul></li>
|
||
<li>Univ. Montreal + NYU + IBM [<a
|
||
href="http://arxiv.org/pdf/1511.04590.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, and Yoshua
|
||
Bengio, <em>Empirical performance upper bounds for image and video
|
||
captioning</em>, arXiv:1511.04590</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="visual-question-answering">Visual Question Answering</h4>
|
||
<ul>
|
||
<li>Virginia Tech. + MSR [<a href="http://www.visualqa.org/">Web</a>]
|
||
[<a href="http://arxiv.org/pdf/1505.00468">Paper</a>]
|
||
<ul>
|
||
<li>Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell,
|
||
Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh, <em>VQA: Visual
|
||
Question Answering</em>, arXiv:1505.00468 / CVPR 2015 SUNw:Scene
|
||
Understanding workshop</li>
|
||
</ul></li>
|
||
<li>MPI + Berkeley [<a
|
||
href="https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/visual-turing-challenge/">Web</a>]
|
||
[<a href="http://arxiv.org/pdf/1505.01121">Paper</a>]
|
||
<ul>
|
||
<li>Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz, <em>Ask Your
|
||
Neurons: A Neural-based Approach to Answering Questions about
|
||
Images</em>, arXiv:1505.01121</li>
|
||
</ul></li>
|
||
<li>Univ. Toronto [<a href="http://arxiv.org/pdf/1505.02074">Paper</a>]
|
||
[<a
|
||
href="http://www.cs.toronto.edu/~mren/imageqa/data/cocoqa/">Dataset</a>]
|
||
<ul>
|
||
<li>Mengye Ren, Ryan Kiros, and Richard Zemel, <em>Exploring Models and
|
||
Data for Image Question Answering</em>, arXiv:1505.02074 / ICML 2015
|
||
deep learning workshop</li>
|
||
</ul></li>
|
||
<li>Baidu + UCLA [<a href="http://arxiv.org/pdf/1505.05612">Paper</a>]
|
||
[<a href="">Dataset</a>]
|
||
<ul>
|
||
<li>Hauyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei
|
||
Xu, <em>Are You Talking to a Machine? Dataset and Methods for
|
||
Multilingual Image Question Answering</em>, arXiv:1505.05612 / NIPS
|
||
2015</li>
|
||
</ul></li>
|
||
<li>SNU + NAVER [<a href="http://arxiv.org/abs/1606.01455">Paper</a>]
|
||
<ul>
|
||
<li>Jin-Hwa Kim, Sang-Woo Lee, Dong-Hyun Kwak, Min-Oh Heo, Jeonghee Kim,
|
||
Jung-Woo Ha, Byoung-Tak Zhang, <em>Multimodal Residual Learning for
|
||
Visual QA</em>, arXiv:1606:01455</li>
|
||
</ul></li>
|
||
<li>UC Berkeley + Sony [<a
|
||
href="https://arxiv.org/pdf/1606.01847">Paper</a>]
|
||
<ul>
|
||
<li>Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor
|
||
Darrell, and Marcus Rohrbach, <em>Multimodal Compact Bilinear Pooling
|
||
for Visual Question Answering and Visual Grounding</em>,
|
||
arXiv:1606.01847</li>
|
||
</ul></li>
|
||
<li>Postech [<a href="http://arxiv.org/pdf/1606.03647.pdf">Paper</a>]
|
||
<ul>
|
||
<li>Hyeonwoo Noh and Bohyung Han, <em>Training Recurrent Answering Units
|
||
with Joint Loss Minimization for VQA</em>, arXiv:1606.03647</li>
|
||
</ul></li>
|
||
<li>SNU + NAVER [<a href="http://arxiv.org/abs/1610.04325">Paper</a>]
|
||
<ul>
|
||
<li>Jin-Hwa Kim, Kyoung Woon On, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak
|
||
Zhang, <em>Hadamard Product for Low-rank Bilinear Pooling</em>,
|
||
arXiv:1610.04325</li>
|
||
</ul></li>
|
||
<li>Video QA
|
||
<ul>
|
||
<li>CMU + UTS [<a href="http://arxiv.org/abs/1511.04670">paper</a>]
|
||
<ul>
|
||
<li>Linchao Zhu, Zhongwen Xu, Yi Yang, Alexander G. Hauptmann,
|
||
Uncovering Temporal Context for Video Question and Answering,
|
||
arXiv:1511.04670</li>
|
||
</ul></li>
|
||
<li>KIT + MIT + Univ. Toronto [<a
|
||
href="http://arxiv.org/abs/1512.02902">Paper</a>] [<a
|
||
href="http://movieqa.cs.toronto.edu/home/">Dataset</a>]
|
||
<ul>
|
||
<li>Makarand Tapaswi, Yukun Zhu, Rainer Stiefelhagen, Antonio Torralba,
|
||
Raquel Urtasun, Sanja Fidler, MovieQA: Understanding Stories in Movies
|
||
through Question-Answering, arXiv:1512.02902</li>
|
||
</ul></li>
|
||
</ul></li>
|
||
</ul>
|
||
<h4 id="turing-machines">Turing Machines</h4>
|
||
<ul>
|
||
<li>A.Graves, G. Wayne, and I. Danihelka., <em>Neural Turing
|
||
Machines,</em> arXiv preprint arXiv:1410.5401 [<a
|
||
href="http://arxiv.org/pdf/1410.5401">Paper</a>]</li>
|
||
<li>Jason Weston, Sumit Chopra, Antoine Bordes, <em>Memory
|
||
Networks,</em> arXiv:1410.3916 [<a
|
||
href="http://arxiv.org/pdf/1410.3916">Paper</a>]</li>
|
||
<li>Armand Joulin and Tomas Mikolov, <em>Inferring Algorithmic Patterns
|
||
with Stack-Augmented Recurrent Nets</em>, arXiv:1503.01007 / NIPS 2015
|
||
[<a href="http://arxiv.org/pdf/1503.01007">Paper</a>]</li>
|
||
<li>Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, and Rob Fergus,
|
||
<em>End-To-End Memory Networks</em>, arXiv:1503.08895 / NIPS 2015 [<a
|
||
href="http://arxiv.org/pdf/1503.08895">Paper</a>]</li>
|
||
<li>Wojciech Zaremba and Ilya Sutskever, <em>Reinforcement Learning
|
||
Neural Turing Machines,</em> arXiv:1505.00521 [<a
|
||
href="http://arxiv.org/pdf/1505.00521">Paper</a>]</li>
|
||
<li>Baolin Peng and Kaisheng Yao, <em>Recurrent Neural Networks with
|
||
External Memory for Language Understanding</em>, arXiv:1506.00195 [<a
|
||
href="http://arxiv.org/pdf/1506.00195.pdf">Paper</a>]</li>
|
||
<li>Fandong Meng, Zhengdong Lu, Zhaopeng Tu, Hang Li, and Qun Liu, <em>A
|
||
Deep Memory-based Architecture for Sequence-to-Sequence Learning</em>,
|
||
arXiv:1506.06442 [<a
|
||
href="http://arxiv.org/pdf/1506.06442.pdf">Paper</a>]</li>
|
||
<li>Arvind Neelakantan, Quoc V. Le, and Ilya Sutskever, <em>Neural
|
||
Programmer: Inducing Latent Programs with Gradient Descent</em>,
|
||
arXiv:1511.04834 [<a
|
||
href="http://arxiv.org/pdf/1511.04834.pdf">Paper</a>]</li>
|
||
<li>Scott Reed and Nando de Freitas, <em>Neural
|
||
Programmer-Interpreters</em>, arXiv:1511.06279 [<a
|
||
href="http://arxiv.org/pdf/1511.06279.pdf">Paper</a>]</li>
|
||
<li>Karol Kurach, Marcin Andrychowicz, and Ilya Sutskever, <em>Neural
|
||
Random-Access Machines</em>, arXiv:1511.06392 [<a
|
||
href="http://arxiv.org/pdf/1511.06392.pdf">Paper</a>]</li>
|
||
<li>Łukasz Kaiser and Ilya Sutskever, <em>Neural GPUs Learn
|
||
Algorithms</em>, arXiv:1511.08228 [<a
|
||
href="http://arxiv.org/pdf/1511.08228.pdf">Paper</a>]</li>
|
||
<li>Ethan Caballero, <em>Skip-Thought Memory Networks</em>,
|
||
arXiv:1511.6420 [<a
|
||
href="https://pdfs.semanticscholar.org/6b9f/0d695df0ce01d005eb5aa69386cb5fbac62a.pdf">Paper</a>]</li>
|
||
<li>Wojciech Zaremba, Tomas Mikolov, Armand Joulin, and Rob Fergus,
|
||
<em>Learning Simple Algorithms from Examples</em>, arXiv:1511.07275 [<a
|
||
href="http://arxiv.org/pdf/1511.07275.pdf">Paper</a>]</li>
|
||
</ul>
|
||
<h3 id="robotics">Robotics</h3>
|
||
<ul>
|
||
<li>Hongyuan Mei, Mohit Bansal, and Matthew R. Walter, <em>Listen,
|
||
Attend, and Walk: Neural Mapping of Navigational Instructions to Action
|
||
Sequences</em>, arXiv:1506.04089 [<a
|
||
href="http://arxiv.org/pdf/1506.04089.pdf">Paper</a>]</li>
|
||
<li>Marvin Zhang, Sergey Levine, Zoe McCarthy, Chelsea Finn, and Pieter
|
||
Abbeel, <em>Policy Learning with Continuous Memory States for Partially
|
||
Observed Robotic Control,</em> arXiv:1507.01273. <a
|
||
href="http://arxiv.org/pdf/1507.01273">[Paper]</a></li>
|
||
</ul>
|
||
<h3 id="other">Other</h3>
|
||
<ul>
|
||
<li>Alex Graves, <em>Generating Sequences With Recurrent Neural
|
||
Networks,</em> arXiv:1308.0850 <a
|
||
href="http://arxiv.org/abs/1308.0850">[Paper]</a></li>
|
||
<li>Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu,
|
||
<em>Recurrent Models of Visual Attention</em>, NIPS 2014 /
|
||
arXiv:1406.6247 [<a
|
||
href="http://arxiv.org/pdf/1406.6247.pdf">Paper</a>]</li>
|
||
<li>Wojciech Zaremba and Ilya Sutskever, <em>Learning to Execute</em>,
|
||
arXiv:1410.4615 [<a href="http://arxiv.org/pdf/1410.4615.pdf">Paper</a>]
|
||
[<a
|
||
href="https://github.com/wojciechz/learning_to_execute">Code</a>]</li>
|
||
<li>Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer,
|
||
<em>Scheduled Sampling for Sequence Prediction with Recurrent Neural
|
||
Networks</em>, arXiv:1506.03099 / NIPS 2015 [<a
|
||
href="http://arxiv.org/pdf/1506.03099">Paper</a>]</li>
|
||
<li>Bing Shuai, Zhen Zuo, Gang Wang, and Bing Wang, <em>DAG-Recurrent
|
||
Neural Networks For Scene Labeling</em>, arXiv:1509.00552 [<a
|
||
href="http://arxiv.org/pdf/1509.00552">Paper</a>]</li>
|
||
<li>Soren Kaae Sonderby, Casper Kaae Sonderby, Lars Maaloe, and Ole
|
||
Winther, <em>Recurrent Spatial Transformer Networks</em>,
|
||
arXiv:1509.05329 [<a
|
||
href="http://arxiv.org/pdf/1509.05329">Paper</a>]</li>
|
||
<li>Cesar Laurent, Gabriel Pereyra, Philemon Brakel, Ying Zhang, and
|
||
Yoshua Bengio, <em>Batch Normalized Recurrent Neural Networks</em>,
|
||
arXiv:1510.01378 [<a
|
||
href="http://arxiv.org/pdf/1510.01378">Paper</a>]</li>
|
||
<li>Jiwon Kim, Jung Kwon Lee, Kyoung Mu Lee, <em>Deeply-Recursive
|
||
Convolutional Network for Image Super-Resolution</em>, arXiv:1511.04491
|
||
<a href="http://arxiv.org/abs/1511.04491">[Paper]</a></li>
|
||
<li>Quan Gan, Qipeng Guo, Zheng Zhang, and Kyunghyun Cho, <em>First Step
|
||
toward Model-Free, Anonymous Object Tracking with Recurrent Neural
|
||
Networks</em>, arXiv:1511.06425 [<a
|
||
href="http://arxiv.org/pdf/1511.06425.pdf">Paper</a>]</li>
|
||
<li>Francesco Visin, Kyle Kastner, Aaron Courville, Yoshua Bengio,
|
||
Matteo Matteucci, and Kyunghyun Cho, <em>ReSeg: A Recurrent Neural
|
||
Network for Object Segmentation</em>, arXiv:1511.07053 [<a
|
||
href="http://arxiv.org/pdf/1511.07053.pdf">Paper</a>]</li>
|
||
<li>Juergen Schmidhuber, <em>On Learning to Think: Algorithmic
|
||
Information Theory for Novel Combinations of Reinforcement Learning
|
||
Controllers and Recurrent Neural World Models</em>, arXiv:1511.09249 <a
|
||
href="http://arxiv.org/pdf/1511.09249">[Paper]</a></li>
|
||
</ul>
|
||
<h2 id="datasets">Datasets</h2>
|
||
<ul>
|
||
<li>Speech Recognition
|
||
<ul>
|
||
<li><a href="http://www.openslr.org/resources.php">OpenSLR</a> (Open
|
||
Speech and Language Resources)
|
||
<ul>
|
||
<li><a href="http://www.openslr.org/12/">LibriSpeech ASR corpus</a></li>
|
||
</ul></li>
|
||
<li><a href="http://voxforge.org/home">VoxForge</a></li>
|
||
</ul></li>
|
||
<li>Image Captioning
|
||
<ul>
|
||
<li><a
|
||
href="http://nlp.cs.illinois.edu/HockenmaierGroup/Framing_Image_Description/KCCA.html">Flickr
|
||
8k</a></li>
|
||
<li><a href="http://shannon.cs.illinois.edu/DenotationGraph/">Flickr
|
||
30k</a></li>
|
||
<li><a href="http://mscoco.org/home/">Microsoft COCO</a></li>
|
||
</ul></li>
|
||
<li>Question Answering
|
||
<ul>
|
||
<li><a href="http://fb.ai/babi">The bAbI Project</a> - Dataset for text
|
||
understanding and reasoning, by Facebook AI Research. Contains:
|
||
<ul>
|
||
<li>The (20) QA bAbI tasks - [<a
|
||
href="http://arxiv.org/abs/1502.05698">Paper</a>]</li>
|
||
<li>The (6) dialog bAbI tasks - [<a
|
||
href="http://arxiv.org/abs/1605.07683">Paper</a>]</li>
|
||
<li>The Children’s Book Test - [<a
|
||
href="http://arxiv.org/abs/1511.02301">Paper</a>]</li>
|
||
<li>The Movie Dialog dataset - [<a
|
||
href="http://arxiv.org/abs/1511.06931">Paper</a>]</li>
|
||
<li>The MovieQA dataset - [<a
|
||
href="http://www.thespermwhale.com/jaseweston/babi/movie_dialog_dataset.tgz">Data</a>]</li>
|
||
<li>The Dialog-based Language Learning dataset - [<a
|
||
href="http://arxiv.org/abs/1604.06045">Paper</a>]</li>
|
||
<li>The SimpleQuestions dataset - [<a
|
||
href="http://arxiv.org/abs/1506.02075">Paper</a>]</li>
|
||
</ul></li>
|
||
<li><a href="https://stanford-qa.com/">SQuAD</a> - Stanford Question
|
||
Answering Dataset : [<a
|
||
href="http://arxiv.org/pdf/1606.05250">Paper</a>]</li>
|
||
</ul></li>
|
||
<li>Image Question Answering
|
||
<ul>
|
||
<li><a
|
||
href="https://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-computing/research/vision-and-language/visual-turing-challenge/">DAQUAR</a>
|
||
- built upon <a
|
||
href="http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html">NYU Depth
|
||
v2</a> by N. Silberman et al.</li>
|
||
<li><a href="http://www.visualqa.org/">VQA</a> - based on <a
|
||
href="http://mscoco.org/">MSCOCO</a> images</li>
|
||
<li><a href="http://www.cs.toronto.edu/~mren/imageqa/data/cocoqa/">Image
|
||
QA</a> - based on MSCOCO images</li>
|
||
<li><a href="http://idl.baidu.com/FM-IQA.html">Multilingual Image QA</a>
|
||
- built from scratch by Baidu - in Chinese, with English
|
||
translation</li>
|
||
</ul></li>
|
||
<li>Action Recognition
|
||
<ul>
|
||
<li><a href="http://www.thumos.info/home.html">THUMOS</a> : Large-scale
|
||
action recognition dataset</li>
|
||
<li><a
|
||
href="http://ai.stanford.edu/~syyeung/resources/multithumos.zip">MultiTHUMOS</a>
|
||
: Extension of THUMOS ’14 action detection dataset with dense
|
||
multilabele annotation</li>
|
||
</ul></li>
|
||
</ul>
|
||
<h2 id="blogs">Blogs</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://karpathy.github.io/2015/05/21/rnn-effectiveness/">The
|
||
Unreasonable Effectiveness of RNNs</a> by <a
|
||
href="http://cs.stanford.edu/people/karpathy/">Andrej Karpathy</a></li>
|
||
<li><a
|
||
href="http://colah.github.io/posts/2015-08-Understanding-LSTMs/">Understanding
|
||
LSTM Networks</a> in <a href="http://colah.github.io/">Colah’s
|
||
blog</a></li>
|
||
<li><a href="http://www.wildml.com/">WildML</a> blog’s RNN tutorial [<a
|
||
href="http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/">Part1</a>],
|
||
[<a
|
||
href="http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/">Part2</a>],
|
||
[<a
|
||
href="http://www.wildml.com/2015/10/recurrent-neural-networks-tutorial-part-3-backpropagation-through-time-and-vanishing-gradients/">Part3</a>],
|
||
[<a
|
||
href="http://www.wildml.com/2015/10/recurrent-neural-network-tutorial-part-4-implementing-a-grulstm-rnn-with-python-and-theano/">Part4</a>]</li>
|
||
<li><a
|
||
href="http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/">RNNs
|
||
in Tensorflow, a Practical Guide and Undocumented Features</a></li>
|
||
<li><a href="https://svail.github.io/">Optimizing RNN Performance</a>
|
||
from Baidu’s Silicon Valley AI Lab.</li>
|
||
<li><a
|
||
href="http://nbviewer.jupyter.org/gist/yoavg/d76121dfde2618422139">Character
|
||
Level Language modelling using RNN</a> by Yoav Goldberg</li>
|
||
<li><a
|
||
href="http://peterroelants.github.io/posts/rnn_implementation_part01/">Implement
|
||
an RNN in Python</a>.</li>
|
||
<li><a
|
||
href="http://arunmallya.github.io/writeups/nn/lstm/index.html#/">LSTM
|
||
Backpropogation</a></li>
|
||
<li><a
|
||
href="https://danijar.com/introduction-to-recurrent-networks-in-tensorflow/">Introduction
|
||
to Recurrent Networks in TensorFlow</a> by Danijar Hafner</li>
|
||
<li><a
|
||
href="https://danijar.com/variable-sequence-lengths-in-tensorflow/">Variable
|
||
Sequence Lengths in TensorFlow</a> by Danijar Hafner</li>
|
||
<li><a
|
||
href="http://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html">Written
|
||
Memories: Understanding, Deriving and Extending the LSTM</a> by Silviu
|
||
Pitis</li>
|
||
</ul>
|
||
<h2 id="online-demos">Online Demos</h2>
|
||
<ul>
|
||
<li>Alex graves, hand-writing generation [<a
|
||
href="http://www.cs.toronto.edu/~graves/handwriting.html">link</a>]</li>
|
||
<li>Ink Poster: Handwritten post-it notes [<a
|
||
href="http://www.inkposter.com/?">link</a>]</li>
|
||
<li>LSTMVis: Visual Analysis for Recurrent Neural Networks [<a
|
||
href="http://lstm.seas.harvard.edu/">link</a>]</li>
|
||
</ul>
|
||
<p><a href="https://github.com/kjw0612/awesome-rnn">rnn.md
|
||
Github</a></p>
|