278 lines
12 KiB
HTML
278 lines
12 KiB
HTML
<h3 id="awesome-linguistics">Awesome Linguistics</h3>
|
||
<p><a href="https://github.com/sindresorhus/awesome"><img
|
||
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
|
||
alt="Awesome" /></a></p>
|
||
<p>A curated list of anything remotely related to linguistics, sorted in
|
||
alphabetical order.</p>
|
||
<ul>
|
||
<li><a href="#programming">Programming</a>
|
||
<ul>
|
||
<li><a href="#platforms-and-toolkits">Platforms and toolkits</a></li>
|
||
<li><a href="#algorithms">Algorithms</a></li>
|
||
<li><a href="#data-sets">Data sets</a></li>
|
||
</ul></li>
|
||
<li><a href="#resources">Resources</a>
|
||
<ul>
|
||
<li><a href="#deep-learning-models-and-transformers">Deep learning
|
||
models and transformers</a></li>
|
||
<li><a href="#on-wikipedia">On Wikipedia</a></li>
|
||
<li><a href="#on-youtube">On Youtube</a></li>
|
||
<li><a href="#books">Books</a>
|
||
<ul>
|
||
<li><a href="#free">Free</a></li>
|
||
<li><a href="#non-free">Non free</a></li>
|
||
<li><a href="#lists">Lists</a></li>
|
||
</ul></li>
|
||
</ul></li>
|
||
<li><a href="#standards">Standards</a></li>
|
||
<li><a href="#lists">Lists</a></li>
|
||
<li><a href="#communities">Communities</a></li>
|
||
</ul>
|
||
<h3 id="programming">Programming</h3>
|
||
<p><em>Libraries, frameworks and applications useful for developing
|
||
applications.</em></p>
|
||
<h3 id="platforms-and-toolkits">Platforms and toolkits</h3>
|
||
<ul>
|
||
<li><a href="https://www.clarin-d.net/en/analysing">CLARIN-D web
|
||
tools</a> - Tools for Analysing Research Data</li>
|
||
<li><a
|
||
href="https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/">CorpusExplorer</a>
|
||
- Software for corpus linguists and text/data mining enthusiasts. The
|
||
CorpusExplorer combines over 50 interactive visualizations under a
|
||
user-friendly interface.</li>
|
||
<li><a
|
||
href="https://github.com/sexybiggetje/haxe-linguistics">Haxe-linguistics</a>
|
||
- Early linguistical analysis and natural language processing library
|
||
for Haxe.</li>
|
||
<li><a href="https://github.com/NaturalNode/natural">Natural</a> -
|
||
General natural language tools for Node.js.</li>
|
||
<li><a href="http://www.nltk.org/">Natural Language ToolKit (NLTK)</a> -
|
||
The most complete platform for building Python programs to work with
|
||
human language data.</li>
|
||
<li><a href="https://snowballstem.org/">Snowball</a> - Snowball is a
|
||
language in which stemming algorithms can be easily represented.</li>
|
||
<li><a href="https://spacy.io/">Spacy</a> - Industrial-strength National
|
||
Language Processing in Python.</li>
|
||
<li><a href="http://hdl.handle.net/11022/1007-0000-0000-8E4E-A">Mate
|
||
Tools</a>, webservice via <a
|
||
href="https://weblicht.sfs.uni-tuebingen.de/">WebLicht</a></li>
|
||
<li><a href="https://ubiai.tools/">UBIAI</a> - Easy-to-use text
|
||
annotation tool for teams with most comprehensive auto-annotation
|
||
features. Supports NER, relations and document classification as well as
|
||
OCR annotation for invoice labeling.</li>
|
||
<li><a
|
||
href="https://github.com/markuskiller/textblob-de">textblob-de</a> -
|
||
Nice alternative for spacy (see above).</li>
|
||
<li><a href="https://github.com/mikahama/uralicNLP">UralicNLP</a> - An
|
||
open source Python library for processing morphologically rich and, for
|
||
the most part, endangered Uralic languages. It can do morphological
|
||
analysis, generation, lemmatization, disambiguation and lexical lookup
|
||
for a great many Uralic languages.</li>
|
||
</ul>
|
||
<h3 id="algorithms">Algorithms</h3>
|
||
<ul>
|
||
<li><a
|
||
href="http://snowball.tartarus.org/texts/stemmersoverview.html">Stemming
|
||
algorithms for various European languages</a> - Various stemming
|
||
algorithms from snowball.</li>
|
||
<li><a href="http://tartarus.org/martin/PorterStemmer/">The Porter
|
||
Stemmer Algorithm</a> - The ‘official’ home page for distribution of the
|
||
Porter Stemming Algorithm, written and maintained by its author, Martin
|
||
Porter.</li>
|
||
</ul>
|
||
<h3 id="data-sets">Data sets</h3>
|
||
<ul>
|
||
<li><a href="https://github.com/kirkins/euroromcom">EuroRomCom Data</a>
|
||
- JSON formatted Pan-Romance word lists.</li>
|
||
<li><a
|
||
href="http://aranea.juls.savba.sk/aranea_about/_germanicum.html">Araneum
|
||
Germanicum</a></li>
|
||
<li><a
|
||
href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638">CEHugeWebCorpus</a>
|
||
- German corpus based on CommonCrawl</li>
|
||
<li><a href="https://dwds.de">Digitales Wörterbuch der deutschen Sprache
|
||
(DWDS)</a></li>
|
||
<li><a
|
||
href="https://german-nlp-group.github.io/projects/gc4-corpus.html">GC4
|
||
Corpus</a> (CommonCrawl)</li>
|
||
<li><a href="https://www1.ids-mannheim.de/kl/projekte/korpora">IDS
|
||
Corpora</a> - German Reference Corpus</li>
|
||
<li><a href="https://wortschatz.uni-leipzig.de/en/download/">Leipzig
|
||
Corpora Collection</a> - sampled sentences in different languages.</li>
|
||
<li><a
|
||
href="https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html">SdeWaC</a>
|
||
- big german internet corpus</li>
|
||
<li><a
|
||
href="http://lingured.info/linguistic-resources/cwep/">C-WEP</a></li>
|
||
<li><a href="https://github.com/Rauschii/DysListGerman">DysList (list of
|
||
dyslexic errors)</a></li>
|
||
<li><a
|
||
href="https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko">Falko</a></li>
|
||
<li><a
|
||
href="https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/">Litkey</a></li>
|
||
<li><a
|
||
href="https://github.com/hdaSprachtechnologie/OpinionSpam">OpinionSpam</a></li>
|
||
</ul>
|
||
<h3 id="resources">Resources</h3>
|
||
<ul>
|
||
<li><a href="https://www.lighttag.io/how-to-label-data/">How To Label
|
||
Data</a> - Guide on managing large scale linguistic annotation
|
||
projects.</li>
|
||
<li><a href="https://github.com/RIchardLitt/low-resource-languages">Low
|
||
Resource Languages</a> - A list of resources for conservation,
|
||
development, and documentation of low resource (human) languages.</li>
|
||
<li><a href="https://langsci-press.org/">Language Science Press</a> -
|
||
Language Science Press is a born-digital scholar-led open access
|
||
publisher in linguistics.</li>
|
||
</ul>
|
||
<h3 id="deep-learning-models-and-transformers">Deep learning models and
|
||
transformers</h3>
|
||
<ul>
|
||
<li><a href="https://github.com/dbmdz/berts">dbmdz BERT models</a></li>
|
||
<li><a href="https://deepset.ai/german-bert">Deepset German BERT
|
||
model</a></li>
|
||
<li><a href="https://github.com/DFKI-NLP/gevalm">Evaluating German
|
||
Transformer Language Models with Syntactic Agreement Tests</a></li>
|
||
<li><a
|
||
href="https://github.com/t-systems-on-site-services-gmbh/german-elmo-model">German
|
||
ELMo Model</a></li>
|
||
<li><a
|
||
href="https://github.com/PhilipMay/german-transformer-training">german-transformer-training</a></li>
|
||
<li><a
|
||
href="https://github.com/tonianelope/Multilingual-BERT">GermLM</a> (NER
|
||
exploration)</li>
|
||
<li><a href="https://github.com/bminixhofer/gerpt2">GerPT2</a></li>
|
||
<li><a href="https://github.com/UKPLab/sentence-transformers">Sentence
|
||
Transformers</a></li>
|
||
</ul>
|
||
<h3 id="on-wikipedia">On Wikipedia</h3>
|
||
<ul>
|
||
<li><a href="https://en.wikipedia.org/wiki/Bag-of-words_model">Bag of
|
||
words model</a></li>
|
||
<li><a
|
||
href="https://en.wikipedia.org/wiki/Document_classification">Document
|
||
classification</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Language_model">Language
|
||
models</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive
|
||
Bayes classification</a></li>
|
||
<li><a
|
||
href="https://en.wikipedia.org/wiki/Natural_language_processing">Natural
|
||
language processing</a></li>
|
||
<li><a
|
||
href="https://en.wikipedia.org/wiki/Outline_of_natural_language_processing">Outline
|
||
of natural language processing</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Part-of-speech_tagging">Parts
|
||
of speech tagging</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment
|
||
analysis</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">Term
|
||
frequency - inverse document frequency</a></li>
|
||
<li><a href="https://en.wikipedia.org/wiki/Vector_space_model">Vector
|
||
space model</a></li>
|
||
</ul>
|
||
<h3 id="on-youtube">On Youtube</h3>
|
||
<ul>
|
||
<li><a
|
||
href="https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i">Computational
|
||
Linguistics Lecture Playlist (Youtube)</a> - Lectures for University of
|
||
Maryland class on computational linguistics.</li>
|
||
<li><a
|
||
href="https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g">The
|
||
Virtual Linguistics Campus</a> - CC-licensed educational videos
|
||
interconnected with Marburg University’s e-learning platform of the same
|
||
name.</li>
|
||
</ul>
|
||
<h3 id="books">Books</h3>
|
||
<p><em>Some of the more interesting and complete books.</em></p>
|
||
<h4 id="free">Free</h4>
|
||
<ul>
|
||
<li><a
|
||
href="https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/">Essentials
|
||
of Linguistics, 2nd edition</a> - An introductory book (2nd
|
||
edition).</li>
|
||
<li><a
|
||
href="https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf">Introduction
|
||
to Linguistics</a></li>
|
||
<li><a href="https://www.nltk.org/book/">Natural Language Processing
|
||
with Python</a> - The book from the NLTK package.</li>
|
||
<li><a href="https://www.tidytextmining.com">Text Mining with R</a></li>
|
||
</ul>
|
||
<h4 id="non-free">Non free</h4>
|
||
<ul>
|
||
<li><a
|
||
href="https://books.google.com/books?id=o9iGAgAAQBAJ&dq=Foundations+of+Computational+Linguistics&hl=nl&source=gbs_navlinks_s">Foundations
|
||
of Computational Linguistics</a></li>
|
||
<li><a href="https://books.google.nl/books?id=YiFDxbEX3SUC">Foundations
|
||
of Statistical Natural Language Processing</a></li>
|
||
<li><a
|
||
href="https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&redir_esc=y">Semisupervised
|
||
Learning for Computational Linguistics</a></li>
|
||
<li><a href="https://books.google.nl/books?id=fZmj5UNK8AQC">Speech and
|
||
Language Processing: An Introduction to Natural Language Processing,
|
||
Computational Linguistics and Speech Recognition</a></li>
|
||
<li><a
|
||
href="https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349">The
|
||
Oxford Handbook of Computational Linguistics</a></li>
|
||
</ul>
|
||
<h3 id="standards">Standards</h3>
|
||
<ul>
|
||
<li><a href="https://www.deutschestextarchiv.de/doku/basisformat/">DTA
|
||
Basisformat</a></li>
|
||
<li><a href="https://www.iso.org/committee/297592.html">ISO TC 37 SC
|
||
4</a></li>
|
||
<li><a
|
||
href="https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html">UIMA</a></li>
|
||
</ul>
|
||
<h3 id="lists">Lists</h3>
|
||
<ul>
|
||
<li><a
|
||
href="https://www.goodreads.com/shelf/show/natural-language-processing">15
|
||
most popular books on good reads</a></li>
|
||
<li>GitHub topics <a
|
||
href="https://github.com/topics/corpus-linguistics">corpus-linguistics</a>
|
||
& <a href="https://github.com/topics/nlp">nlp</a></li>
|
||
<li><a
|
||
href="https://github.com/niderhoff/nlp-datasets">nlp-datasets</a></li>
|
||
<li><a
|
||
href="https://github.com/sebastianruder/NLP-progress">NLP-progress</a></li>
|
||
<li><a
|
||
href="https://www.reddit.com/r/LanguageTechnology/">/r/LanguageTechnology/</a></li>
|
||
<li><a href="https://github.com/keon/awesome-nlp">awesome-nlp</a></li>
|
||
<li><a
|
||
href="https://github.com/alvations/awesome-community-curated-nlp">Awesome
|
||
Community-Curated NLP List</a></li>
|
||
<li><a
|
||
href="https://github.com/crownpku/Awesome-Chinese-NLP">awesome-chinese-nlp</a></li>
|
||
<li><a
|
||
href="https://github.com/fnielsen/awesome-danish">awesome-danish</a></li>
|
||
<li><a
|
||
href="https://github.com/oroszgy/awesome-hungarian-nlp">awesome-hungarian-nlp</a></li>
|
||
<li><a
|
||
href="https://github.com/harpribot/awesome-information-retrieval">awesome
|
||
Information Retrieval</a></li>
|
||
<li><a href="https://github.com/kmkurn/id-nlp-resource">Indonesian
|
||
NLP</a></li>
|
||
<li><a href="https://github.com/web64/norwegian-nlp-resources">Norwegian
|
||
NLP resources</a></li>
|
||
<li><a href="https://github.com/adbar/German-NLP/">German NLP
|
||
resources</a></li>
|
||
<li><a
|
||
href="https://github.com/ksopyla/awesome-nlp-polish">awesome-nlp-polish</a></li>
|
||
<li><a
|
||
href="https://github.com/dav009/awesome-spanish-nlp">awesome-spanish-nlp</a></li>
|
||
<li><a
|
||
href="https://martinweisser.org/corpora_site/comp_ling_resources.html">M.
|
||
Weisser’s list of NLP/Computational Linguistics Resources</a></li>
|
||
<li><a
|
||
href="https://www.coli.uni-saarland.de/~csporled/page.php?id=tools">NLP
|
||
tools (Saarland University)</a></li>
|
||
</ul>
|
||
<h3 id="communities">Communities</h3>
|
||
<ul>
|
||
<li><a href="https://linguistics.stackexchange.com/">Linguistics Stack
|
||
Exchange</a></li>
|
||
<li><a href="https://untranslatable.co/">Untranslatable.co, Multilingual
|
||
urban dictionary</a></li>
|
||
</ul>
|