awesome-awesomeness/html/linguistics.html

<h3 id="awesome-linguistics">Awesome Linguistics</h3>
<p><a href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></p>
<p>A curated list of anything remotely related to linguistics, sorted in
alphabetical order.</p>
<ul>
<li><a href="#programming">Programming</a>
<ul>
<li><a href="#platforms-and-toolkits">Platforms and toolkits</a></li>
<li><a href="#algorithms">Algorithms</a></li>
<li><a href="#data-sets">Data sets</a></li>
</ul></li>
<li><a href="#resources">Resources</a>
<ul>
<li><a href="#deep-learning-models-and-transformers">Deep learning
models and transformers</a></li>
<li><a href="#on-wikipedia">On Wikipedia</a></li>
<li><a href="#on-youtube">On Youtube</a></li>
<li><a href="#books">Books</a>
<ul>
<li><a href="#free">Free</a></li>
<li><a href="#non-free">Non free</a></li>
<li><a href="#lists">Lists</a></li>
</ul></li>
</ul></li>
<li><a href="#standards">Standards</a></li>
<li><a href="#lists">Lists</a></li>
<li><a href="#communities">Communities</a></li>
</ul>
<h3 id="programming">Programming</h3>
<p><em>Libraries, frameworks and applications useful for developing
applications.</em></p>
<h3 id="platforms-and-toolkits">Platforms and toolkits</h3>
<ul>
<li><a href="https://www.clarin-d.net/en/analysing">CLARIN-D web
tools</a> - Tools for Analysing Research Data</li>
<li><a
href="https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/">CorpusExplorer</a>
- Software for corpus linguists and text/data mining enthusiasts. The
CorpusExplorer combines over 50 interactive visualizations under a
user-friendly interface.</li>
<li><a
href="https://github.com/sexybiggetje/haxe-linguistics">Haxe-linguistics</a>
- Early linguistical analysis and natural language processing library
for Haxe.</li>
<li><a href="https://github.com/NaturalNode/natural">Natural</a> -
General natural language tools for Node.js.</li>
<li><a href="http://www.nltk.org/">Natural Language ToolKit (NLTK)</a> -
The most complete platform for building Python programs to work with
human language data.</li>
<li><a href="https://snowballstem.org/">Snowball</a> - Snowball is a
language in which stemming algorithms can be easily represented.</li>
<li><a href="https://spacy.io/">Spacy</a> - Industrial-strength National
Language Processing in Python.</li>
<li><a href="http://hdl.handle.net/11022/1007-0000-0000-8E4E-A">Mate
Tools</a>, webservice via <a
href="https://weblicht.sfs.uni-tuebingen.de/">WebLicht</a></li>
<li><a href="https://ubiai.tools/">UBIAI</a> - Easy-to-use text
annotation tool for teams with most comprehensive auto-annotation
features. Supports NER, relations and document classification as well as
OCR annotation for invoice labeling.</li>
<li><a
href="https://github.com/markuskiller/textblob-de">textblob-de</a> -
Nice alternative for spacy (see above).</li>
<li><a href="https://github.com/mikahama/uralicNLP">UralicNLP</a> - An
open source Python library for processing morphologically rich and, for
the most part, endangered Uralic languages. It can do morphological
analysis, generation, lemmatization, disambiguation and lexical lookup
for a great many Uralic languages.</li>
</ul>
<h3 id="algorithms">Algorithms</h3>
<ul>
<li><a
href="http://snowball.tartarus.org/texts/stemmersoverview.html">Stemming
algorithms for various European languages</a> - Various stemming
algorithms from snowball.</li>
<li><a href="http://tartarus.org/martin/PorterStemmer/">The Porter
Stemmer Algorithm</a> - The ‘official’ home page for distribution of the
Porter Stemming Algorithm, written and maintained by its author, Martin
Porter.</li>
</ul>
<h3 id="data-sets">Data sets</h3>
<ul>
<li><a href="https://github.com/kirkins/euroromcom">EuroRomCom Data</a>
- JSON formatted Pan-Romance word lists.</li>
<li><a
href="http://aranea.juls.savba.sk/aranea_about/_germanicum.html">Araneum
Germanicum</a></li>
<li><a
href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638">CEHugeWebCorpus</a>
- German corpus based on CommonCrawl</li>
<li><a href="https://dwds.de">Digitales Wörterbuch der deutschen Sprache
(DWDS)</a></li>
<li><a
href="https://german-nlp-group.github.io/projects/gc4-corpus.html">GC4
Corpus</a> (CommonCrawl)</li>
<li><a href="https://www1.ids-mannheim.de/kl/projekte/korpora">IDS
Corpora</a> - German Reference Corpus</li>
<li><a href="https://wortschatz.uni-leipzig.de/en/download/">Leipzig
Corpora Collection</a> - sampled sentences in different languages.</li>
<li><a
href="https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html">SdeWaC</a>
- big german internet corpus</li>
<li><a
href="http://lingured.info/linguistic-resources/cwep/">C-WEP</a></li>
<li><a href="https://github.com/Rauschii/DysListGerman">DysList (list of
dyslexic errors)</a></li>
<li><a
href="https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko">Falko</a></li>
<li><a
href="https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/">Litkey</a></li>
<li><a
href="https://github.com/hdaSprachtechnologie/OpinionSpam">OpinionSpam</a></li>
</ul>
<h3 id="resources">Resources</h3>
<ul>
<li><a href="https://www.lighttag.io/how-to-label-data/">How To Label
Data</a> - Guide on managing large scale linguistic annotation
projects.</li>
<li><a href="https://github.com/RIchardLitt/low-resource-languages">Low
Resource Languages</a> - A list of resources for conservation,
development, and documentation of low resource (human) languages.</li>
<li><a href="https://langsci-press.org/">Language Science Press</a> -
Language Science Press is a born-digital scholar-led open access
publisher in linguistics.</li>
</ul>
<h3 id="deep-learning-models-and-transformers">Deep learning models and
transformers</h3>
<ul>
<li><a href="https://github.com/dbmdz/berts">dbmdz BERT models</a></li>
<li><a href="https://deepset.ai/german-bert">Deepset German BERT
model</a></li>
<li><a href="https://github.com/DFKI-NLP/gevalm">Evaluating German
Transformer Language Models with Syntactic Agreement Tests</a></li>
<li><a
href="https://github.com/t-systems-on-site-services-gmbh/german-elmo-model">German
ELMo Model</a></li>
<li><a
href="https://github.com/PhilipMay/german-transformer-training">german-transformer-training</a></li>
<li><a
href="https://github.com/tonianelope/Multilingual-BERT">GermLM</a> (NER
exploration)</li>
<li><a href="https://github.com/bminixhofer/gerpt2">GerPT2</a></li>
<li><a href="https://github.com/UKPLab/sentence-transformers">Sentence
Transformers</a></li>
</ul>
<h3 id="on-wikipedia">On Wikipedia</h3>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Bag-of-words_model">Bag of
words model</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Document_classification">Document
classification</a></li>
<li><a href="https://en.wikipedia.org/wiki/Language_model">Language
models</a></li>
<li><a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive
Bayes classification</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Natural_language_processing">Natural
language processing</a></li>
<li><a
href="https://en.wikipedia.org/wiki/Outline_of_natural_language_processing">Outline
of natural language processing</a></li>
<li><a href="https://en.wikipedia.org/wiki/Part-of-speech_tagging">Parts
of speech tagging</a></li>
<li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment
analysis</a></li>
<li><a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">Term
frequency - inverse document frequency</a></li>
<li><a href="https://en.wikipedia.org/wiki/Vector_space_model">Vector
space model</a></li>
</ul>
<h3 id="on-youtube">On Youtube</h3>
<ul>
<li><a
href="https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i">Computational
Linguistics Lecture Playlist (Youtube)</a> - Lectures for University of
Maryland class on computational linguistics.</li>
<li><a
href="https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g">The
Virtual Linguistics Campus</a> - CC-licensed educational videos
interconnected with Marburg University’s e-learning platform of the same
name.</li>
</ul>
<h3 id="books">Books</h3>
<p><em>Some of the more interesting and complete books.</em></p>
<h4 id="free">Free</h4>
<ul>
<li><a
href="https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/">Essentials
of Linguistics, 2nd edition</a> - An introductory book (2nd
edition).</li>
<li><a
href="https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf">Introduction
to Linguistics</a></li>
<li><a href="https://www.nltk.org/book/">Natural Language Processing
with Python</a> - The book from the NLTK package.</li>
<li><a href="https://www.tidytextmining.com">Text Mining with R</a></li>
</ul>
<h4 id="non-free">Non free</h4>
<ul>
<li><a
href="https://books.google.com/books?id=o9iGAgAAQBAJ&amp;dq=Foundations+of+Computational+Linguistics&amp;hl=nl&amp;source=gbs_navlinks_s">Foundations
of Computational Linguistics</a></li>
<li><a href="https://books.google.nl/books?id=YiFDxbEX3SUC">Foundations
of Statistical Natural Language Processing</a></li>
<li><a
href="https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&amp;redir_esc=y">Semisupervised
Learning for Computational Linguistics</a></li>
<li><a href="https://books.google.nl/books?id=fZmj5UNK8AQC">Speech and
Language Processing: An Introduction to Natural Language Processing,
Computational Linguistics and Speech Recognition</a></li>
<li><a
href="https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349">The
Oxford Handbook of Computational Linguistics</a></li>
</ul>
<h3 id="standards">Standards</h3>
<ul>
<li><a href="https://www.deutschestextarchiv.de/doku/basisformat/">DTA
Basisformat</a></li>
<li><a href="https://www.iso.org/committee/297592.html">ISO TC 37 SC
4</a></li>
<li><a
href="https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html">UIMA</a></li>
</ul>
<h3 id="lists">Lists</h3>
<ul>
<li><a
href="https://www.goodreads.com/shelf/show/natural-language-processing">15
most popular books on good reads</a></li>
<li>GitHub topics <a
href="https://github.com/topics/corpus-linguistics">corpus-linguistics</a>
&amp; <a href="https://github.com/topics/nlp">nlp</a></li>
<li><a
href="https://github.com/niderhoff/nlp-datasets">nlp-datasets</a></li>
<li><a
href="https://github.com/sebastianruder/NLP-progress">NLP-progress</a></li>
<li><a
href="https://www.reddit.com/r/LanguageTechnology/">/r/LanguageTechnology/</a></li>
<li><a href="https://github.com/keon/awesome-nlp">awesome-nlp</a></li>
<li><a
href="https://github.com/alvations/awesome-community-curated-nlp">Awesome
Community-Curated NLP List</a></li>
<li><a
href="https://github.com/crownpku/Awesome-Chinese-NLP">awesome-chinese-nlp</a></li>
<li><a
href="https://github.com/fnielsen/awesome-danish">awesome-danish</a></li>
<li><a
href="https://github.com/oroszgy/awesome-hungarian-nlp">awesome-hungarian-nlp</a></li>
<li><a
href="https://github.com/harpribot/awesome-information-retrieval">awesome
Information Retrieval</a></li>
<li><a href="https://github.com/kmkurn/id-nlp-resource">Indonesian
NLP</a></li>
<li><a href="https://github.com/web64/norwegian-nlp-resources">Norwegian
NLP resources</a></li>
<li><a href="https://github.com/adbar/German-NLP/">German NLP
resources</a></li>
<li><a
href="https://github.com/ksopyla/awesome-nlp-polish">awesome-nlp-polish</a></li>
<li><a
href="https://github.com/dav009/awesome-spanish-nlp">awesome-spanish-nlp</a></li>
<li><a
href="https://martinweisser.org/corpora_site/comp_ling_resources.html">M.
Weisser’s list of NLP/Computational Linguistics Resources</a></li>
<li><a
href="https://www.coli.uni-saarland.de/~csporled/page.php?id=tools">NLP
tools (Saarland University)</a></li>
</ul>
<h3 id="communities">Communities</h3>
<ul>
<li><a href="https://linguistics.stackexchange.com/">Linguistics Stack
Exchange</a></li>
<li><a href="https://untranslatable.co/">Untranslatable.co, Multilingual
urban dictionary</a></li>
</ul>