Files
awesome-awesomeness/html/linguistics.html
2025-07-18 22:22:32 +02:00

276 lines
12 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h3 id="awesome-linguistics">Awesome Linguistics</h3>
<p><a href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></p>
<p>A curated list of anything remotely related to linguistics, sorted in
alphabetical order.</p>
<ul>
<li><a href="#programming">Programming</a>
<ul>
<li><a href="#platforms-and-toolkits">Platforms and toolkits</a></li>
<li><a href="#algorithms">Algorithms</a></li>
<li><a href="#data-sets">Data sets</a></li>
</ul></li>
<li><a href="#resources">Resources</a>
<ul>
<li><a href="#deep-learning-models-and-transformers">Deep learning
models and transformers</a></li>
<li><a href="#on-wikipedia">On Wikipedia</a></li>
<li><a href="#on-youtube">On Youtube</a></li>
<li><a href="#books">Books</a>
<ul>
<li><a href="#free">Free</a></li>
<li><a href="#non-free">Non free</a></li>
<li><a href="#lists">Lists</a></li>
</ul></li>
</ul></li>
<li><a href="#standards">Standards</a></li>
<li><a href="#lists">Lists</a></li>
<li><a href="#communities">Communities</a></li>
</ul>
<h3 id="programming">Programming</h3>
<p><em>Libraries, frameworks and applications useful for developing
applications.</em></p>
<h3 id="platforms-and-toolkits">Platforms and toolkits</h3>
<ul>
<li><a href="https://www.clarin-d.net/en/analysing">CLARIN-D web
tools</a> - Tools for Analysing Research Data</li>
<li><a
href="https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/">CorpusExplorer</a>
- Software for corpus linguists and text/data mining enthusiasts. The
CorpusExplorer combines over 50 interactive visualizations under a
user-friendly interface.</li>
<li><a
href="https://github.com/sexybiggetje/haxe-linguistics">Haxe-linguistics</a>
- Early linguistical analysis and natural language processing library
for Haxe.</li>
<li><a href="https://github.com/NaturalNode/natural">Natural</a> -
General natural language tools for Node.js.</li>
<li><a href="http://www.nltk.org/">Natural Language ToolKit (NLTK)</a> -
The most complete platform for building Python programs to work with
human language data.</li>
<li><a href="https://snowballstem.org/">Snowball</a> - Snowball is a
language in which stemming algorithms can be easily represented.</li>
<li><a href="https://spacy.io/">Spacy</a> - Industrial-strength National
Language Processing in Python.</li>
<li><a href="http://hdl.handle.net/11022/1007-0000-0000-8E4E-A">Mate
Tools</a>, webservice via WebLicht</li>
<li><a href="https://ubiai.tools/">UBIAI</a> - Easy-to-use text
annotation tool for teams with most comprehensive auto-annotation
features. Supports NER, relations and document classification as well as
OCR annotation for invoice labeling.</li>
<li><a
href="https://github.com/markuskiller/textblob-de">textblob-de</a> -
Nice alternative for spacy (see above).</li>
<li><a href="https://github.com/mongsvo/tyo">tyo</a> - A utility for
finding Typo-Bridges.</li>
<li><a href="https://github.com/mikahama/uralicNLP">UralicNLP</a> - An
open source Python library for processing morphologically rich and, for
the most part, endangered Uralic languages. It can do morphological
analysis, generation, lemmatization, disambiguation and lexical lookup
for a great many Uralic languages.</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><a
href="http://snowball.tartarus.org/texts/stemmersoverview.html">Stemming
algorithms for various European languages</a> - Various stemming
algorithms from snowball.</li>
<li><a href="http://tartarus.org/martin/PorterStemmer/">The Porter
Stemmer Algorithm</a> - The official home page for distribution of the
Porter Stemming Algorithm, written and maintained by its author, Martin
Porter.</li>
</ul>
<h3 id="data-sets">Data sets</h3>
<ul>
<li><a href="https://github.com/kirkins/euroromcom">EuroRomCom Data</a>
- JSON formatted Pan-Romance word lists.</li>
<li><a
href="http://aranea.juls.savba.sk/aranea_about/_germanicum.html">Araneum
Germanicum</a></li>
<li><a
href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638">CEHugeWebCorpus</a>
- German corpus based on CommonCrawl</li>
<li><a href="https://dwds.de">Digitales Wörterbuch der deutschen Sprache
(DWDS)</a></li>
<li><a
href="https://german-nlp-group.github.io/projects/gc4-corpus.html">GC4
Corpus</a> (CommonCrawl)</li>
<li><a href="https://www1.ids-mannheim.de/kl/projekte/korpora">IDS
Corpora</a> - German Reference Corpus</li>
<li><a href="https://wortschatz.uni-leipzig.de/en/download/">Leipzig
Corpora Collection</a> - sampled sentences in different languages.</li>
<li><a
href="https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html">SdeWaC</a>
- big german internet corpus</li>
<li><a
href="http://lingured.info/linguistic-resources/cwep/">C-WEP</a></li>
<li><a href="https://github.com/Rauschii/DysListGerman">DysList (list of
dyslexic errors)</a></li>
<li><a
href="https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko">Falko</a></li>
<li><a
href="https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/">Litkey</a></li>
<li><a
href="https://github.com/hdaSprachtechnologie/OpinionSpam">OpinionSpam</a></li>
</ul>
<h3 id="resources">Resources</h3>
<ul>
<li><a href="https://github.com/RIchardLitt/low-resource-languages">Low
Resource Languages</a> - A list of resources for conservation,
development, and documentation of low resource (human) languages.</li>
<li><a href="https://langsci-press.org/">Language Science Press</a> -
Language Science Press is a born-digital scholar-led open access
publisher in linguistics.</li>
</ul>
<h3 id="deep-learning-models-and-transformers">Deep learning models and
transformers</h3>
<ul>
<li><a href="https://github.com/dbmdz/berts">dbmdz BERT models</a></li>
<li><a href="https://deepset.ai/german-bert">Deepset German BERT
model</a></li>
<li><a href="https://github.com/DFKI-NLP/gevalm">Evaluating German
Transformer Language Models with Syntactic Agreement Tests</a></li>
<li><a
href="https://github.com/t-systems-on-site-services-gmbh/german-elmo-model">German
ELMo Model</a></li>
<li><a
href="https://github.com/PhilipMay/german-transformer-training">german-transformer-training</a></li>
<li><a
href="https://github.com/tonianelope/Multilingual-BERT">GermLM</a> (NER
exploration)</li>
<li><a href="https://github.com/bminixhofer/gerpt2">GerPT2</a></li>
<li><a href="https://github.com/UKPLab/sentence-transformers">Sentence
Transformers</a></li>
</ul>
<h3 id="on-wikipedia">On Wikipedia</h3>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Bag-of-words_model">Bag of
words model</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Document_classification">Document
classification</a></li>
<li><a href="https://en.wikipedia.org/wiki/Language_model">Language
models</a></li>
<li><a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive
Bayes classification</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Natural_language_processing">Natural
language processing</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Outline_of_natural_language_processing">Outline
of natural language processing</a></li>
<li><a href="https://en.wikipedia.org/wiki/Part-of-speech_tagging">Parts
of speech tagging</a></li>
<li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment
analysis</a></li>
<li><a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">Term
frequency - inverse document frequency</a></li>
<li><a href="https://en.wikipedia.org/wiki/Vector_space_model">Vector
space model</a></li>
</ul>
<h3 id="on-youtube">On Youtube</h3>
<ul>
<li><a
href="https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i">Computational
Linguistics Lecture Playlist (Youtube)</a> - Lectures for University of
Maryland class on computational linguistics.</li>
<li><a
href="https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g">The
Virtual Linguistics Campus</a> - CC-licensed educational videos
interconnected with Marburg Universitys e-learning platform of the same
name.</li>
</ul>
<h3 id="books">Books</h3>
<p><em>Some of the more interesting and complete books.</em></p>
<h4 id="free">Free</h4>
<ul>
<li><a
href="https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/">Essentials
of Linguistics, 2nd edition</a> - An introductory book (2nd
edition).</li>
<li><a
href="https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf">Introduction
to Linguistics</a></li>
<li><a href="https://www.nltk.org/book/">Natural Language Processing
with Python</a> - The book from the NLTK package.</li>
<li><a href="https://www.tidytextmining.com">Text Mining with R</a></li>
</ul>
<h4 id="non-free">Non free</h4>
<ul>
<li><a
href="https://books.google.com/books?id=o9iGAgAAQBAJ&amp;dq=Foundations+of+Computational+Linguistics&amp;hl=nl&amp;source=gbs_navlinks_s">Foundations
of Computational Linguistics</a></li>
<li><a href="https://books.google.nl/books?id=YiFDxbEX3SUC">Foundations
of Statistical Natural Language Processing</a></li>
<li><a
href="https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&amp;redir_esc=y">Semisupervised
Learning for Computational Linguistics</a></li>
<li><a href="https://books.google.nl/books?id=fZmj5UNK8AQC">Speech and
Language Processing: An Introduction to Natural Language Processing,
Computational Linguistics and Speech Recognition</a></li>
<li><a
href="https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349">The
Oxford Handbook of Computational Linguistics</a></li>
</ul>
<h3 id="standards">Standards</h3>
<ul>
<li><a href="https://www.deutschestextarchiv.de/doku/basisformat/">DTA
Basisformat</a></li>
<li><a href="https://www.iso.org/committee/297592.html">ISO TC 37 SC
4</a></li>
<li><a
href="https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html">UIMA</a></li>
</ul>
<h3 id="lists">Lists</h3>
<ul>
<li><a
href="https://www.goodreads.com/shelf/show/natural-language-processing">15
most popular books on good reads</a></li>
<li>GitHub topics <a
href="https://github.com/topics/corpus-linguistics">corpus-linguistics</a>
&amp; <a href="https://github.com/topics/nlp">nlp</a></li>
<li><a
href="https://github.com/niderhoff/nlp-datasets">nlp-datasets</a></li>
<li><a
href="https://github.com/sebastianruder/NLP-progress">NLP-progress</a></li>
<li><a
href="https://www.reddit.com/r/LanguageTechnology/">/r/LanguageTechnology/</a></li>
<li><a href="https://github.com/keon/awesome-nlp">awesome-nlp</a></li>
<li><a
href="https://github.com/alvations/awesome-community-curated-nlp">Awesome
Community-Curated NLP List</a></li>
<li><a
href="https://github.com/crownpku/Awesome-Chinese-NLP">awesome-chinese-nlp</a></li>
<li><a
href="https://github.com/fnielsen/awesome-danish">awesome-danish</a></li>
<li><a
href="https://github.com/oroszgy/awesome-hungarian-nlp">awesome-hungarian-nlp</a></li>
<li><a
href="https://github.com/harpribot/awesome-information-retrieval">awesome
Information Retrieval</a></li>
<li><a href="https://github.com/kmkurn/id-nlp-resource">Indonesian
NLP</a></li>
<li><a href="https://github.com/web64/norwegian-nlp-resources">Norwegian
NLP resources</a></li>
<li><a href="https://github.com/adbar/German-NLP/">German NLP
resources</a></li>
<li><a
href="https://github.com/ksopyla/awesome-nlp-polish">awesome-nlp-polish</a></li>
<li><a
href="https://github.com/dav009/awesome-spanish-nlp">awesome-spanish-nlp</a></li>
<li><a
href="https://martinweisser.org/corpora_site/comp_ling_resources.html">M.
Weissers list of NLP/Computational Linguistics Resources</a></li>
</ul>
<h3 id="communities">Communities</h3>
<ul>
<li><a href="https://linguistics.stackexchange.com/">Linguistics Stack
Exchange</a></li>
<li><a href="https://untranslatable.co/">Untranslatable.co, Multilingual
urban dictionary</a></li>
</ul>
<p><a
href="https://github.com/theimpossibleastronaut/awesome-linguistics">linguistics.md
Github</a></p>