Files
awesome-awesomeness/html/nlg.html
2024-04-20 19:22:54 +02:00

300 lines
15 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="awesome-natural-language-generation-awesome">Awesome Natural
Language Generation <a href="https://awesome.re"><img
src="https://awesome.re/badge.svg" alt="Awesome" /></a></h1>
<figure>
<img src="logo.png" alt="Piscis Magnus from BL Harley 647" />
<figcaption aria-hidden="true">Piscis Magnus from BL Harley
647</figcaption>
</figure>
<p>Natural Language Generation is a broad domain with applications in
chat-bots, story generation, and data descriptions. There is a wide
spectrum of different technologies addressing parts or the whole of the
NLG process. This list aims to represent this deversity of NLG
applications and techniques by providing links to various projects,
tools, research papers, and learning materials.</p>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#datasets">Datasets</a></li>
<li><a href="#dialog">Dialog</a></li>
<li><a href="#evaluation">Evaluation</a></li>
<li><a href="#grammar">Grammar</a></li>
<li><a href="#libraries">Libraries</a></li>
<li><a href="#narrative-generation">Narrative Generation</a></li>
<li><a href="#neural-natural-language-generation">Neural Natural
Language Generation</a></li>
<li><a href="#papers-and-articles">Papers and Articles</a></li>
<li><a href="#products">Products</a></li>
<li><a href="#realizers">Realizers</a></li>
<li><a href="#templating-languages">Templating Languages</a></li>
<li><a href="#videos">Videos</a></li>
</ul>
<h2 id="datasets">Datasets</h2>
<ul>
<li><a href="https://github.com/UFAL-DSG/alex_context_nlg_dataset">Alex
Context NLG Dataset</a> - A dataset for NLG in dialogue systems in the
public transport information domain.</li>
<li><a href="https://github.com/harvardnlp/boxscore-data/">Box-score
data</a> - This dataset consists of (human-written) NBA basketball game
summaries aligned with their corresponding box- and line-scores.</li>
<li><a href="http://www.macs.hw.ac.uk/InteractionLab/E2E">E2E</a> - This
shared task focuses on recent end-to-end (E2E), data-driven NLG methods,
which jointly learn sentence planning and surface realisation from
non-aligned data.</li>
<li><a
href="https://github.com/pvougiou/Neural-Wikipedian">Neural-Wikipedian</a>
- The repository contains the code along with the required corpora that
were used in order to build a system that “learns” how to generate
English biographies for Semantic Web triples.</li>
<li><a
href="https://cs.stanford.edu/~pliang/data/weather-data.zip">WeatherGov</a>
- Computer-generated weather forecasts from weather.gov (US public
forecast), along with corresponding weather data.</li>
<li><a href="https://github.com/ThiagoCF05/webnlg">WebNLG</a> - The
enriched version of the WebNLG - a resource for evaluating common NLG
tasks, including Discourse Ordering, Lexicalization and Referring
Expression Generation.</li>
<li><a
href="https://rlebret.github.io/wikipedia-biography-dataset/">WikiBio -
wikipedia biography dataset</a> - This dataset gathers 728,321
biographies from wikipedia. It aims at evaluating text generation
algorithms.</li>
<li><a
href="https://github.com/google-research-datasets/dstc8-schema-guided-dialogue">The
Schema-Guided Dialogue Dataset</a> - The Schema-Guided Dialogue (SGD)
dataset consists of over 20k annotated multi-domain, task-oriented
conversations between a human and a virtual assistant.</li>
<li><a
href="https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus">The
Wikipedia company corpus</a> - Company descriptions collected from
Wikipedia. The dataset contains semantic representations, short, and
long descriptions for 51K companies in English.</li>
<li><a href="https://nlds.soe.ucsc.edu/yelpnlg">YelpNLG</a> - YelpNLG
provides resources for natural language generation of restaurant
reviews.</li>
</ul>
<h2 id="dialog">Dialog</h2>
<ul>
<li><a href="https://github.com/rodrigopivi/Chatito">Chatito</a> -
Generate datasets for AI chatbots, NLP tasks, named entity recognition
or text classification models using a simple DSL!</li>
<li><a href="https://github.com/shawnwun/NNDIAL">NNDIAL</a> - NNDial is
an open source toolkit for building end-to-end trainable task-oriented
dialogue models.</li>
<li><a
href="https://github.com/uber-research/plato-research-dialogue-system">Plato</a>
- This is the Plato Research Dialogue System, a flexible platform for
developing conversational AI agents.</li>
<li><a href="https://github.com/shawnwun/RNNLG">RNNLG</a> - RNNLG is an
open source benchmark toolkit for Natural Language Generation (NLG) in
spoken dialogue system application domains.</li>
<li><a href="https://github.com/UFAL-DSG/tgen">TGen</a> - Statistical
NLG for spoken dialogue systems.</li>
</ul>
<h2 id="evaluation">Evaluation</h2>
<ul>
<li><a href="https://github.com/google-research/bleurt">BLEURT: a
Transfer Learning-Based Metric for Natural Language Generation</a></li>
<li><a href="https://github.com/neulab/compare-mt">compare-mt</a> - A
tool for holistic analysis of language generations systems.</li>
<li><a href="https://gem-benchmark.com/">GEM</a> - a benchmark
environment for NLG with a focus on its Evaluation, both through human
annotations and automated Metrics.</li>
<li><a href="https://github.com/Maluuba/nlg-eval">NLG-eval</a> -
Evaluation code for various unsupervised automated metrics for Natural
Language Generation.</li>
<li><a href="https://github.com/facebookresearch/vizseq">VizSeq</a> - A
Visual Analysis Toolkit for Text Generation Tasks.</li>
</ul>
<h2 id="grammar">Grammar</h2>
<ul>
<li><a href="https://github.com/OpenCCG/openccg">OpenCCG</a> - OpenCCG
library for parsing and realization with CCG.</li>
<li><a
href="http://www.grammaticalframework.org/">GrammaticalFramework</a> - A
programming language for multilingual grammar applications.</li>
<li><a href="https://github.com/mikelewis0/easyccg">EasyCCG</a> - CCG:
All combinators, common grammar format, parsing to logical form,
parameter estimation for probabilistic CCG.</li>
<li><a href="https://github.com/bozsahin/ccglab">CCG Lab</a> - All
combinators, common grammar format, parsing to logical form, parameter
estimation for probabilistic CCG.</li>
<li><a href="https://github.com/texttheater/ccgweb">CCGweb</a> - A Web
platform for parsing and annotation.</li>
</ul>
<h2 id="libraries">Libraries</h2>
<ul>
<li><a
href="https://github.com/bradymholt/cron-expression-descriptor">Cron
Expression Descriptor</a> - A .NET library that converts cron
expressions into human readable descriptions.</li>
<li><a href="https://github.com/tokenmill/numberwords">Number Words</a>
- Convert a number to an approximated text expression: from 0.23 to
less than a quarter.</li>
<li><a href="https://docs.writebot.app">Writebot</a> - A NodeJS library
that makes it easier to use GPT-3 by using presets.</li>
</ul>
<h2 id="narrative-generation">Narrative Generation</h2>
<ul>
<li><a href="https://github.com/aherriot/story-generator">Random Story
Generator</a> - Using Natural Language Generation (NLG) to create a
random short story.</li>
<li><a href="https://github.com/galaxykate/tracery">Tracery</a> - A
story-grammar generation library for JavaScript.</li>
</ul>
<h2 id="neural-natural-language-generation">Neural Natural Language
Generation</h2>
<ul>
<li><a href="https://github.com/minimaxir/aitextgen">aitextgen</a> - A
robust Python tool for text-based AI training and generation using
GPT-2.</li>
<li><a href="https://github.com/diegma/graph-2-text">graph-2-text</a> -
Graph to sequence implemented in Pytorch combining Graph convolutional
networks and opennmt-py.</li>
<li><a
href="https://github.com/neural-nuts/image-caption-generator">Image
Caption Generator</a> - A Neural Network based generative model for
captioning images using Tensorflow.</li>
<li><a href="https://github.com/kasnerz/lightnlg">lightnlg</a> - A
minimalistic codebase for finetuning and interacting with NLG models
using PyTorch Lightning.</li>
<li><a href="https://github.com/EagleW/PaperRobot">PaperRobot:
Incremental Draft Generation of Scientific Ideas</a> - We present a
PaperRobot who performs as an automatic research assistant.</li>
<li><a href="https://github.com/uber-research/PPLM">PPLM</a> - Plug and
Play Language Model implementation. Allows to steer topic and attributes
of GPT-2 models.</li>
<li><a
href="https://github.com/patil-suraj/question_generation">Question
Generation using hugstransformers</a> - Question generation is the task
of automatically generating questions from a text paragraph.</li>
<li><a href="https://github.com/asyml/texar">Texar</a> - Texar is a
toolkit aiming to support a broad set of machine learning, especially
natural language processing and text generation tasks.</li>
<li><a href="https://github.com/minimaxir/textgenrnn">textgenrnn</a> -
Easily train your own text-generating neural network of any size and
complexity on any text dataset with a few lines of code.</li>
<li><a
href="https://github.com/turtlesoupy/this-word-does-not-exist">This Word
Does Not Exist</a> - This is a project allows people to train a variant
of GPT-2 that makes up words, definitions and examples from
scratch.</li>
<li><a
href="https://github.com/huggingface/transformers">Transformers</a> -
State-of-the-art Natural Language Processing for TensorFlow 2.0 and
PyTorch.</li>
<li><a
href="https://github.com/akanimax/natural-language-summary-generation-from-structured-data">Summary
Generation From Structured Data</a> - For converting information present
in the form of structured data into natural language text.</li>
</ul>
<h2 id="papers-and-articles">Papers and Articles</h2>
<ul>
<li><a href="https://arxiv.org/abs/2202.06935">2022: Repairing the
Cracked Foundation: A Survey of Obstacles in Evaluation Practices for
Generated Text</a></li>
<li><a
href="https://ehudreiter.com/2021/03/17/vision-nlg-can-help-humanise-data-and-ai/">2021:
Vision: NLG Can Help Humanise Data and AI</a></li>
<li><a href="https://openreview.net/forum?id=rygGQyrFvH">2020: The
Curious Case of Neural Text Degeneration</a></li>
<li><a href="https://arxiv.org/abs/2011.03992">2020: A Gold Standard
Methodology for Evaluating Accuracy in Data-To-Text Systems</a></li>
<li><a
href="https://www.sciencedirect.com/science/article/pii/S0885230819300919">2020:
Evaluating the state-of-the-art of End-to-End Natural Language
Generation: The E2E NLG challenge</a></li>
<li><a href="https://huggingface.co/blog/how-to-generate">2020: How to
generate text: using different decoding methods for language generation
with Transformers</a></li>
<li><a
href="https://www.cambridge.org/core/services/aop-cambridge-core/content/view/BA2417D73AF29F8073FF5B611CDEB97F/S135132492000025Xa.pdf/natural_language_generation_the_commercial_state_of_the_art_in_2020.pdf">2020:
Natural language generation: The commercial state ofthe art in
2020</a></li>
<li><a
href="https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/">2020:
Turing-NLG: A 17-billion-parameter language model by Microsoft</a></li>
<li><a href="https://www.inlg2019.com/assets/papers/178_Paper.pdf">2019:
A Closer Look at Recent Results of Verb Selection for Data-to-Text
NLG</a></li>
<li><a href="https://www.inlg2019.com/assets/papers/28_Paper.pdf">2019:
A Personalized Data-to-Text Support Tool for Cancer Patients</a></li>
<li><a href="https://www.inlg2019.com/assets/papers/79_Paper.pdf">2019:
Controlling Contents in Data-to-Document Generation with Human-Designed
Topic Labels</a></li>
<li><a
href="https://ehudreiter.com/2019/09/26/generated-texts-must-be-accurate/">2019:
Generated Texts Must Be Accurate!</a></li>
<li><a href="https://www.inlg2019.com/assets/papers/44_Paper.pdf">2019:
Hotel Scribe: Generating High Variation Hotel Descriptions</a></li>
<li><a href="https://www.inlg2019.com/assets/papers/32_Paper.pdf">2019:
Revisiting Challenges in Data-to-Text Generation with Fact
Grounding</a></li>
<li><a href="https://arxiv.org/pdf/1703.09902.pdf">2017: Survey of the
State of the Art in NaturalLanguage Generation: Core tasks,
applicationsand evaluation</a></li>
<li><a href="https://arxiv.org/pdf/1606.03254.pdf">2016: Natural
Language Generation enhances human decision-making with uncertain
information</a></li>
</ul>
<h2 id="products">Products</h2>
<ul>
<li><a href="https://github.com/tokenmill/accelerated-text">Accelerated
Text</a> - Automatically generate multiple natural language descriptions
of your data varying in wording and structure.</li>
<li><a href="https://rosaenlg.org">RosaeNLG</a> - An open-source library
for node.js or client side (browser) execution, based on the Pug
template engine, to generate texts in English, French, German and
Italian.</li>
<li><a href="http://twinery.org/">Twine</a> - An open-source tool for
telling interactive, nonlinear stories.</li>
</ul>
<h2 id="realizers">Realizers</h2>
<ul>
<li><a href="https://github.com/kowey/GenI">Genl</a> - Surface realiser
(part of a Natural Language Generation system) using Tree Adjoining
Grammar.</li>
<li><a href="https://github.com/rali-udem/JSrealB">JSrealB</a> - A
JavaScript bilingual text realizer for web development.</li>
<li><a href="https://github.com/simplenlg/simplenlg">SimpleNLG</a> -
Java API for Natural Language Generation.</li>
<li><a href="https://github.com/sebischair/SimpleNLG-DE">SimpleNLG
DE</a> - German version of SimpleNLG 4.</li>
<li><a
href="https://github.com/rali-udem/SimpleNLG-EnFr">SimpleNLG-EnFr</a> -
SimpleNLG-EnFr 1.1 is a bilingual English/French adaption of SimpleNLG
v4.2.</li>
</ul>
<h2 id="templating-languages">Templating Languages</h2>
<ul>
<li><a href="https://github.com/maetl/calyx">calyx</a> - A Ruby library
for generating text with recursive template grammars.</li>
<li><a href="https://github.com/spro/nalgene">nalgene</a> - Natural
language generation language.</li>
<li><a href="https://www.stringtemplate.org/">StringTemplate</a> - Java
template engine (with ports for C##, Objective-C, JavaScript, Scala) for
generating source code, web pages, emails, or any other formatted text
output.</li>
</ul>
<h2 id="videos">Videos</h2>
<ul>
<li><a href="https://www.youtube.com/watch?v=kFRw-wk5YOA">Data-To-Text:
Generating Textual Summaries of Complex Data - Ehud Reiter</a></li>
<li><a
href="https://slideslive.com/38922816/imitation-learning-and-its-application-to-natural-language-generation">Imitation
Learning and its Application to Natural Language Generation</a></li>
<li><a href="https://www.youtube.com/watch?v=4fjM72lbJaw">Natural
Language Generation (Introduction)</a></li>
<li><a href="https://www.youtube.com/watch?v=Ls7elVbN8bI">Strata Data
Conference | The future of natural language generation:
2017-2027</a></li>
<li><a href="https://www.youtube.com/watch?v=wgcDUX_BPpk">The Quest for
Automated Story Generation - Mark Riedl</a></li>
</ul>
<h2 id="license">License</h2>
<p><a href="http://creativecommons.org/publicdomain/zero/1.0"><img
src="http://mirrors.creativecommons.org/presskit/buttons/88x31/svg/cc-zero.svg"
alt="CC0" /></a></p>
<p>To the extent possible under law, <a
href="https://www.tokenmill.ai">TokenMill</a> has waived all copyright
and related or neighboring rights to this work.</p>