Natural Language Generation is a broad domain with applications in
chat-bots, story generation, and data descriptions. There is a wide
spectrum of different technologies addressing parts or the whole of the
NLG process. This list aims to represent this deversity of NLG
applications and techniques by providing links to various projects,
tools, research papers, and learning materials.
Alex
Context NLG Dataset - A dataset for NLG in dialogue systems in the
public transport information domain.
Box-score
data - This dataset consists of (human-written) NBA basketball game
summaries aligned with their corresponding box- and line-scores.
E2E - This
shared task focuses on recent end-to-end (E2E), data-driven NLG methods,
which jointly learn sentence planning and surface realisation from
non-aligned data.
Neural-Wikipedian
- The repository contains the code along with the required corpora that
were used in order to build a system that “learns” how to generate
English biographies for Semantic Web triples.
WeatherGov
- Computer-generated weather forecasts from weather.gov (US public
forecast), along with corresponding weather data.
WebNLG - The
enriched version of the WebNLG - a resource for evaluating common NLG
tasks, including Discourse Ordering, Lexicalization and Referring
Expression Generation.
The
Schema-Guided Dialogue Dataset - The Schema-Guided Dialogue (SGD)
dataset consists of over 20k annotated multi-domain, task-oriented
conversations between a human and a virtual assistant.
The
Wikipedia company corpus - Company descriptions collected from
Wikipedia. The dataset contains semantic representations, short, and
long descriptions for 51K companies in English.
YelpNLG - YelpNLG
provides resources for natural language generation of restaurant
reviews.
Dialog
Chatito -
Generate datasets for AI chatbots, NLP tasks, named entity recognition
or text classification models using a simple DSL!
NNDIAL - NNDial is
an open source toolkit for building end-to-end trainable task-oriented
dialogue models.
Plato
- This is the Plato Research Dialogue System, a flexible platform for
developing conversational AI agents.
RNNLG - RNNLG is an
open source benchmark toolkit for Natural Language Generation (NLG) in
spoken dialogue system application domains.
TGen - Statistical
NLG for spoken dialogue systems.
Accelerated
Text - Automatically generate multiple natural language descriptions
of your data varying in wording and structure.
RosaeNLG - An open-source library
for node.js or client side (browser) execution, based on the Pug
template engine, to generate texts in English, French, German and
Italian.
Twine - An open-source tool for
telling interactive, nonlinear stories.
Realizers
Genl - Surface realiser
(part of a Natural Language Generation system) using Tree Adjoining
Grammar.
JSrealB - A
JavaScript bilingual text realizer for web development.
SimpleNLG -
Java API for Natural Language Generation.
StringTemplate - Java
template engine (with ports for C##, Objective-C, JavaScript, Scala) for
generating source code, web pages, emails, or any other formatted text
output.