Update and add index

This commit is contained in:
Jonas Zeunert
2024-04-23 15:17:38 +02:00
parent 4d0cd768f7
commit 8d4db5d359
726 changed files with 41721 additions and 53949 deletions

View File

@@ -1,10 +1,9 @@
 Awesome Natural Language Generation !Awesome (https://awesome.re/badge.svg) (https://awesome.re)
 Awesome Natural Language Generation !Awesome (https://awesome.re/badge.svg) (https://awesome.re)
!Piscis Magnus from BL Harley 647 (logo.png)
Natural Language Generation is a broad domain with applications in chat-bots, story generation, and data descriptions. There is a wide spectrum of different technologies addressing parts or 
the whole of the NLG process. This list aims to represent this deversity of NLG applications and techniques by providing links to various projects, tools, research papers, and learning 
materials.
Natural Language Generation is a broad domain with applications in chat-bots, story generation, and data descriptions. There is a wide spectrum of different technologies addressing parts or the whole of the NLG process. This list aims 
to represent this deversity of NLG applications and techniques by providing links to various projects, tools, research papers, and learning materials.
Contents
@@ -25,19 +24,16 @@
- Alex Context NLG Dataset (https://github.com/UFAL-DSG/alex_context_nlg_dataset) - A dataset for NLG in dialogue systems in the public transport information domain.
- Box-score data (https://github.com/harvardnlp/boxscore-data/) - This dataset consists of (human-written) NBA basketball game summaries aligned with their corresponding box- and line-scores.
- E2E (http://www.macs.hw.ac.uk/InteractionLab/E2E) - This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface 
realisation from non-aligned data.
- Neural-Wikipedian (https://github.com/pvougiou/Neural-Wikipedian) - The repository contains the code along with the required corpora that were used in order to build a system that "learns" 
how to generate English biographies for Semantic Web triples.
- E2E (http://www.macs.hw.ac.uk/InteractionLab/E2E) - This shared task focuses on recent end-to-end (E2E), data-driven NLG methods, which jointly learn sentence planning and surface realisation from non-aligned data.
- Neural-Wikipedian (https://github.com/pvougiou/Neural-Wikipedian) - The repository contains the code along with the required corpora that were used in order to build a system that "learns" how to generate English biographies for 
Semantic Web triples.
- WeatherGov (https://cs.stanford.edu/~pliang/data/weather-data.zip) - Computer-generated weather forecasts from weather.gov (US public forecast), along with corresponding weather data.
- WebNLG (https://github.com/ThiagoCF05/webnlg) - The enriched version of the WebNLG - a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring 
Expression Generation.
- WikiBio - wikipedia biography dataset (https://rlebret.github.io/wikipedia-biography-dataset/) - This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text 
generation algorithms.
- The Schema-Guided Dialogue Dataset (https://github.com/google-research-datasets/dstc8-schema-guided-dialogue) - The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated 
multi-domain, task-oriented conversations between a human and a virtual assistant.
- The Wikipedia company corpus (https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus) - Company descriptions collected from Wikipedia. The dataset contains semantic 
representations, short, and long descriptions for 51K companies in English.
- WebNLG (https://github.com/ThiagoCF05/webnlg) - The enriched version of the WebNLG - a resource for evaluating common NLG tasks, including Discourse Ordering, Lexicalization and Referring Expression Generation.
- WikiBio - wikipedia biography dataset (https://rlebret.github.io/wikipedia-biography-dataset/) - This dataset gathers 728,321 biographies from wikipedia. It aims at evaluating text generation algorithms.
- The Schema-Guided Dialogue Dataset (https://github.com/google-research-datasets/dstc8-schema-guided-dialogue) - The Schema-Guided Dialogue (SGD) dataset consists of over 20k annotated multi-domain, task-oriented conversations between 
a human and a virtual assistant.
- The Wikipedia company corpus (https://gricad-gitlab.univ-grenoble-alpes.fr/getalp/wikipediacompanycorpus) - Company descriptions collected from Wikipedia. The dataset contains semantic representations, short, and long descriptions for
51K companies in English.
- YelpNLG (https://nlds.soe.ucsc.edu/yelpnlg) - YelpNLG provides resources for natural language generation of restaurant reviews.
Dialog
@@ -83,15 +79,12 @@
- lightnlg (https://github.com/kasnerz/lightnlg) - A minimalistic codebase for finetuning and interacting with NLG models using PyTorch Lightning.
- PaperRobot: Incremental Draft Generation of Scientific Ideas (https://github.com/EagleW/PaperRobot) - We present a PaperRobot who performs as an automatic research assistant.
- PPLM (https://github.com/uber-research/PPLM) - Plug and Play Language Model implementation. Allows to steer topic and attributes of GPT-2 models.
- Question Generation using hugstransformers (https://github.com/patil-suraj/question_generation) - Question generation is the task of automatically generating questions from a text 
paragraph.
- Question Generation using hugstransformers (https://github.com/patil-suraj/question_generation) - Question generation is the task of automatically generating questions from a text paragraph.
- Texar (https://github.com/asyml/texar) - Texar is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks.
- textgenrnn (https://github.com/minimaxir/textgenrnn) - Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.
- This Word Does Not Exist (https://github.com/turtlesoupy/this-word-does-not-exist) - This is a project allows people to train a variant of GPT-2 that makes up words, definitions and 
examples from scratch.
- This Word Does Not Exist (https://github.com/turtlesoupy/this-word-does-not-exist) - This is a project allows people to train a variant of GPT-2 that makes up words, definitions and examples from scratch.
- Transformers (https://github.com/huggingface/transformers) - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch.
- Summary Generation From Structured Data (https://github.com/akanimax/natural-language-summary-generation-from-structured-data) - For converting information present in the form of structured
data into natural language text.
- Summary Generation From Structured Data (https://github.com/akanimax/natural-language-summary-generation-from-structured-data) - For converting information present in the form of structured data into natural language text.
Papers and Articles
- 2022: Repairing the Cracked Foundation: A Survey of Obstacles in Evaluation Practices for Generated Text (https://arxiv.org/abs/2202.06935)
@@ -101,8 +94,7 @@
- 2020: Evaluating the state-of-the-art of End-to-End Natural Language Generation: The E2E NLG challenge (https://www.sciencedirect.com/science/article/pii/S0885230819300919)
- 2020: How to generate text: using different decoding methods for language generation with Transformers (https://huggingface.co/blog/how-to-generate)
- 2020: Natural language generation: The commercial state ofthe art in 2020 
(https://www.cambridge.org/core/services/aop-cambridge-core/content/view/BA2417D73AF29F8073FF5B611CDEB97F/S135132492000025Xa.pdf/natural_language_generation_the_commercial_state_of_the_art_in
_2020.pdf)
(https://www.cambridge.org/core/services/aop-cambridge-core/content/view/BA2417D73AF29F8073FF5B611CDEB97F/S135132492000025Xa.pdf/natural_language_generation_the_commercial_state_of_the_art_in_2020.pdf)
- 2020: Turing-NLG: A 17-billion-parameter language model by Microsoft (https://www.microsoft.com/en-us/research/blog/turing-nlg-a-17-billion-parameter-language-model-by-microsoft/)
- 2019: A Closer Look at Recent Results of Verb Selection for Data-to-Text NLG (https://www.inlg2019.com/assets/papers/178_Paper.pdf)
- 2019: A Personalized Data-to-Text Support Tool for Cancer Patients (https://www.inlg2019.com/assets/papers/28_Paper.pdf)
@@ -117,8 +109,7 @@
Products 
- Accelerated Text (https://github.com/tokenmill/accelerated-text) - Automatically generate multiple natural language descriptions of your data varying in wording and structure.
- RosaeNLG (https://rosaenlg.org) - An open-source library for node.js or client side (browser) execution, based on the Pug template engine, to generate texts in English, French, German and 
Italian.
- RosaeNLG (https://rosaenlg.org) - An open-source library for node.js or client side (browser) execution, based on the Pug template engine, to generate texts in English, French, German and Italian.
- Twine (http://twinery.org/) - An open-source tool for telling interactive, nonlinear stories.
Realizers
@@ -133,8 +124,7 @@
- calyx (https://github.com/maetl/calyx) - A Ruby library for generating text with recursive template grammars.
- nalgene (https://github.com/spro/nalgene) - Natural language generation language.
- StringTemplate (https://www.stringtemplate.org/) - Java template engine (with ports for C##, Objective-C, JavaScript, Scala) for generating source code, web pages, emails, or any other 
formatted text output. 
- StringTemplate (https://www.stringtemplate.org/) - Java template engine (with ports for C##, Objective-C, JavaScript, Scala) for generating source code, web pages, emails, or any other formatted text output. 
Videos