update
This commit is contained in:
275
html/linguistics.md2.html
Normal file
275
html/linguistics.md2.html
Normal file
@@ -0,0 +1,275 @@
|
||||
<h3 id="awesome-linguistics">Awesome Linguistics</h3>
|
||||
<p><a href="https://github.com/sindresorhus/awesome"><img
|
||||
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
|
||||
alt="Awesome" /></a></p>
|
||||
<p>A curated list of anything remotely related to linguistics, sorted in
|
||||
alphabetical order.</p>
|
||||
<ul>
|
||||
<li><a href="#programming">Programming</a>
|
||||
<ul>
|
||||
<li><a href="#platforms-and-toolkits">Platforms and toolkits</a></li>
|
||||
<li><a href="#algorithms">Algorithms</a></li>
|
||||
<li><a href="#data-sets">Data sets</a></li>
|
||||
</ul></li>
|
||||
<li><a href="#resources">Resources</a>
|
||||
<ul>
|
||||
<li><a href="#deep-learning-models-and-transformers">Deep learning
|
||||
models and transformers</a></li>
|
||||
<li><a href="#on-wikipedia">On Wikipedia</a></li>
|
||||
<li><a href="#on-youtube">On Youtube</a></li>
|
||||
<li><a href="#books">Books</a>
|
||||
<ul>
|
||||
<li><a href="#free">Free</a></li>
|
||||
<li><a href="#non-free">Non free</a></li>
|
||||
<li><a href="#lists">Lists</a></li>
|
||||
</ul></li>
|
||||
</ul></li>
|
||||
<li><a href="#standards">Standards</a></li>
|
||||
<li><a href="#lists">Lists</a></li>
|
||||
<li><a href="#communities">Communities</a></li>
|
||||
</ul>
|
||||
<h3 id="programming">Programming</h3>
|
||||
<p><em>Libraries, frameworks and applications useful for developing
|
||||
applications.</em></p>
|
||||
<h3 id="platforms-and-toolkits">Platforms and toolkits</h3>
|
||||
<ul>
|
||||
<li><a href="https://www.clarin-d.net/en/analysing">CLARIN-D web
|
||||
tools</a> - Tools for Analysing Research Data</li>
|
||||
<li><a
|
||||
href="https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/">CorpusExplorer</a>
|
||||
- Software for corpus linguists and text/data mining enthusiasts. The
|
||||
CorpusExplorer combines over 50 interactive visualizations under a
|
||||
user-friendly interface.</li>
|
||||
<li><a
|
||||
href="https://github.com/sexybiggetje/haxe-linguistics">Haxe-linguistics</a>
|
||||
- Early linguistical analysis and natural language processing library
|
||||
for Haxe.</li>
|
||||
<li><a href="https://github.com/NaturalNode/natural">Natural</a> -
|
||||
General natural language tools for Node.js.</li>
|
||||
<li><a href="http://www.nltk.org/">Natural Language ToolKit (NLTK)</a> -
|
||||
The most complete platform for building Python programs to work with
|
||||
human language data.</li>
|
||||
<li><a href="https://snowballstem.org/">Snowball</a> - Snowball is a
|
||||
language in which stemming algorithms can be easily represented.</li>
|
||||
<li><a href="https://spacy.io/">Spacy</a> - Industrial-strength National
|
||||
Language Processing in Python.</li>
|
||||
<li><a href="http://hdl.handle.net/11022/1007-0000-0000-8E4E-A">Mate
|
||||
Tools</a>, webservice via WebLicht</li>
|
||||
<li><a href="https://ubiai.tools/">UBIAI</a> - Easy-to-use text
|
||||
annotation tool for teams with most comprehensive auto-annotation
|
||||
features. Supports NER, relations and document classification as well as
|
||||
OCR annotation for invoice labeling.</li>
|
||||
<li><a
|
||||
href="https://github.com/markuskiller/textblob-de">textblob-de</a> -
|
||||
Nice alternative for spacy (see above).</li>
|
||||
<li><a href="https://github.com/mongsvo/tyo">tyo</a> - A utility for
|
||||
finding Typo-Bridges.</li>
|
||||
<li><a href="https://github.com/mikahama/uralicNLP">UralicNLP</a> - An
|
||||
open source Python library for processing morphologically rich and, for
|
||||
the most part, endangered Uralic languages. It can do morphological
|
||||
analysis, generation, lemmatization, disambiguation and lexical lookup
|
||||
for a great many Uralic languages.</li>
|
||||
</ul>
|
||||
<h3 id="algorithms">Algorithms</h3>
|
||||
<ul>
|
||||
<li><a
|
||||
href="http://snowball.tartarus.org/texts/stemmersoverview.html">Stemming
|
||||
algorithms for various European languages</a> - Various stemming
|
||||
algorithms from snowball.</li>
|
||||
<li><a href="http://tartarus.org/martin/PorterStemmer/">The Porter
|
||||
Stemmer Algorithm</a> - The ‘official’ home page for distribution of the
|
||||
Porter Stemming Algorithm, written and maintained by its author, Martin
|
||||
Porter.</li>
|
||||
</ul>
|
||||
<h3 id="data-sets">Data sets</h3>
|
||||
<ul>
|
||||
<li><a href="https://github.com/kirkins/euroromcom">EuroRomCom Data</a>
|
||||
- JSON formatted Pan-Romance word lists.</li>
|
||||
<li><a
|
||||
href="http://aranea.juls.savba.sk/aranea_about/_germanicum.html">Araneum
|
||||
Germanicum</a></li>
|
||||
<li><a
|
||||
href="https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638">CEHugeWebCorpus</a>
|
||||
- German corpus based on CommonCrawl</li>
|
||||
<li><a href="https://dwds.de">Digitales Wörterbuch der deutschen Sprache
|
||||
(DWDS)</a></li>
|
||||
<li><a
|
||||
href="https://german-nlp-group.github.io/projects/gc4-corpus.html">GC4
|
||||
Corpus</a> (CommonCrawl)</li>
|
||||
<li><a href="https://www1.ids-mannheim.de/kl/projekte/korpora">IDS
|
||||
Corpora</a> - German Reference Corpus</li>
|
||||
<li><a href="https://wortschatz.uni-leipzig.de/en/download/">Leipzig
|
||||
Corpora Collection</a> - sampled sentences in different languages.</li>
|
||||
<li><a
|
||||
href="https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html">SdeWaC</a>
|
||||
- big german internet corpus</li>
|
||||
<li><a
|
||||
href="http://lingured.info/linguistic-resources/cwep/">C-WEP</a></li>
|
||||
<li><a href="https://github.com/Rauschii/DysListGerman">DysList (list of
|
||||
dyslexic errors)</a></li>
|
||||
<li><a
|
||||
href="https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko">Falko</a></li>
|
||||
<li><a
|
||||
href="https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/">Litkey</a></li>
|
||||
<li><a
|
||||
href="https://github.com/hdaSprachtechnologie/OpinionSpam">OpinionSpam</a></li>
|
||||
</ul>
|
||||
<h3 id="resources">Resources</h3>
|
||||
<ul>
|
||||
<li><a href="https://github.com/RIchardLitt/low-resource-languages">Low
|
||||
Resource Languages</a> - A list of resources for conservation,
|
||||
development, and documentation of low resource (human) languages.</li>
|
||||
<li><a href="https://langsci-press.org/">Language Science Press</a> -
|
||||
Language Science Press is a born-digital scholar-led open access
|
||||
publisher in linguistics.</li>
|
||||
</ul>
|
||||
<h3 id="deep-learning-models-and-transformers">Deep learning models and
|
||||
transformers</h3>
|
||||
<ul>
|
||||
<li><a href="https://github.com/dbmdz/berts">dbmdz BERT models</a></li>
|
||||
<li><a href="https://deepset.ai/german-bert">Deepset German BERT
|
||||
model</a></li>
|
||||
<li><a href="https://github.com/DFKI-NLP/gevalm">Evaluating German
|
||||
Transformer Language Models with Syntactic Agreement Tests</a></li>
|
||||
<li><a
|
||||
href="https://github.com/t-systems-on-site-services-gmbh/german-elmo-model">German
|
||||
ELMo Model</a></li>
|
||||
<li><a
|
||||
href="https://github.com/PhilipMay/german-transformer-training">german-transformer-training</a></li>
|
||||
<li><a
|
||||
href="https://github.com/tonianelope/Multilingual-BERT">GermLM</a> (NER
|
||||
exploration)</li>
|
||||
<li><a href="https://github.com/bminixhofer/gerpt2">GerPT2</a></li>
|
||||
<li><a href="https://github.com/UKPLab/sentence-transformers">Sentence
|
||||
Transformers</a></li>
|
||||
</ul>
|
||||
<h3 id="on-wikipedia">On Wikipedia</h3>
|
||||
<ul>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Bag-of-words_model">Bag of
|
||||
words model</a></li>
|
||||
<li><a
|
||||
href="https://en.wikipedia.org/wiki/Document_classification">Document
|
||||
classification</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Language_model">Language
|
||||
models</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Naive_Bayes_classifier">Naive
|
||||
Bayes classification</a></li>
|
||||
<li><a
|
||||
href="https://en.wikipedia.org/wiki/Natural_language_processing">Natural
|
||||
language processing</a></li>
|
||||
<li><a
|
||||
href="https://en.wikipedia.org/wiki/Outline_of_natural_language_processing">Outline
|
||||
of natural language processing</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Part-of-speech_tagging">Parts
|
||||
of speech tagging</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Sentiment_analysis">Sentiment
|
||||
analysis</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Tf%E2%80%93idf">Term
|
||||
frequency - inverse document frequency</a></li>
|
||||
<li><a href="https://en.wikipedia.org/wiki/Vector_space_model">Vector
|
||||
space model</a></li>
|
||||
</ul>
|
||||
<h3 id="on-youtube">On Youtube</h3>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i">Computational
|
||||
Linguistics Lecture Playlist (Youtube)</a> - Lectures for University of
|
||||
Maryland class on computational linguistics.</li>
|
||||
<li><a
|
||||
href="https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g">The
|
||||
Virtual Linguistics Campus</a> - CC-licensed educational videos
|
||||
interconnected with Marburg University’s e-learning platform of the same
|
||||
name.</li>
|
||||
</ul>
|
||||
<h3 id="books">Books</h3>
|
||||
<p><em>Some of the more interesting and complete books.</em></p>
|
||||
<h4 id="free">Free</h4>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/">Essentials
|
||||
of Linguistics, 2nd edition</a> - An introductory book (2nd
|
||||
edition).</li>
|
||||
<li><a
|
||||
href="https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf">Introduction
|
||||
to Linguistics</a></li>
|
||||
<li><a href="https://www.nltk.org/book/">Natural Language Processing
|
||||
with Python</a> - The book from the NLTK package.</li>
|
||||
<li><a href="https://www.tidytextmining.com">Text Mining with R</a></li>
|
||||
</ul>
|
||||
<h4 id="non-free">Non free</h4>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://books.google.com/books?id=o9iGAgAAQBAJ&dq=Foundations+of+Computational+Linguistics&hl=nl&source=gbs_navlinks_s">Foundations
|
||||
of Computational Linguistics</a></li>
|
||||
<li><a href="https://books.google.nl/books?id=YiFDxbEX3SUC">Foundations
|
||||
of Statistical Natural Language Processing</a></li>
|
||||
<li><a
|
||||
href="https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&redir_esc=y">Semisupervised
|
||||
Learning for Computational Linguistics</a></li>
|
||||
<li><a href="https://books.google.nl/books?id=fZmj5UNK8AQC">Speech and
|
||||
Language Processing: An Introduction to Natural Language Processing,
|
||||
Computational Linguistics and Speech Recognition</a></li>
|
||||
<li><a
|
||||
href="https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349">The
|
||||
Oxford Handbook of Computational Linguistics</a></li>
|
||||
</ul>
|
||||
<h3 id="standards">Standards</h3>
|
||||
<ul>
|
||||
<li><a href="https://www.deutschestextarchiv.de/doku/basisformat/">DTA
|
||||
Basisformat</a></li>
|
||||
<li><a href="https://www.iso.org/committee/297592.html">ISO TC 37 SC
|
||||
4</a></li>
|
||||
<li><a
|
||||
href="https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html">UIMA</a></li>
|
||||
</ul>
|
||||
<h3 id="lists">Lists</h3>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://www.goodreads.com/shelf/show/natural-language-processing">15
|
||||
most popular books on good reads</a></li>
|
||||
<li>GitHub topics <a
|
||||
href="https://github.com/topics/corpus-linguistics">corpus-linguistics</a>
|
||||
& <a href="https://github.com/topics/nlp">nlp</a></li>
|
||||
<li><a
|
||||
href="https://github.com/niderhoff/nlp-datasets">nlp-datasets</a></li>
|
||||
<li><a
|
||||
href="https://github.com/sebastianruder/NLP-progress">NLP-progress</a></li>
|
||||
<li><a
|
||||
href="https://www.reddit.com/r/LanguageTechnology/">/r/LanguageTechnology/</a></li>
|
||||
<li><a href="https://github.com/keon/awesome-nlp">awesome-nlp</a></li>
|
||||
<li><a
|
||||
href="https://github.com/alvations/awesome-community-curated-nlp">Awesome
|
||||
Community-Curated NLP List</a></li>
|
||||
<li><a
|
||||
href="https://github.com/crownpku/Awesome-Chinese-NLP">awesome-chinese-nlp</a></li>
|
||||
<li><a
|
||||
href="https://github.com/fnielsen/awesome-danish">awesome-danish</a></li>
|
||||
<li><a
|
||||
href="https://github.com/oroszgy/awesome-hungarian-nlp">awesome-hungarian-nlp</a></li>
|
||||
<li><a
|
||||
href="https://github.com/harpribot/awesome-information-retrieval">awesome
|
||||
Information Retrieval</a></li>
|
||||
<li><a href="https://github.com/kmkurn/id-nlp-resource">Indonesian
|
||||
NLP</a></li>
|
||||
<li><a href="https://github.com/web64/norwegian-nlp-resources">Norwegian
|
||||
NLP resources</a></li>
|
||||
<li><a href="https://github.com/adbar/German-NLP/">German NLP
|
||||
resources</a></li>
|
||||
<li><a
|
||||
href="https://github.com/ksopyla/awesome-nlp-polish">awesome-nlp-polish</a></li>
|
||||
<li><a
|
||||
href="https://github.com/dav009/awesome-spanish-nlp">awesome-spanish-nlp</a></li>
|
||||
<li><a
|
||||
href="https://martinweisser.org/corpora_site/comp_ling_resources.html">M.
|
||||
Weisser’s list of NLP/Computational Linguistics Resources</a></li>
|
||||
</ul>
|
||||
<h3 id="communities">Communities</h3>
|
||||
<ul>
|
||||
<li><a href="https://linguistics.stackexchange.com/">Linguistics Stack
|
||||
Exchange</a></li>
|
||||
<li><a href="https://untranslatable.co/">Untranslatable.co, Multilingual
|
||||
urban dictionary</a></li>
|
||||
</ul>
|
||||
<p><a
|
||||
href="https://github.com/theimpossibleastronaut/awesome-linguistics">linguistics.md
|
||||
Github</a></p>
|
||||
Reference in New Issue
Block a user