Awesome Linguistics !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome) A curated list of anything remotely related to linguistics, sorted in alphabetical order. - Programming (#programming) - **Platforms and toolkits** (#platforms-and-toolkits) - **Algorithms** (#algorithms)  - **Data sets** (#data-sets)  - Resources (#resources) - **Deep learning models and transformers** (#deep-learning-models-and-transformers) - **On Wikipedia** (#on-wikipedia)  - **On Youtube** (#on-youtube)  - **Books** (#books)   - **Free** (#free)   - **Non free** (#non-free)   - **Lists** (#lists)  - Standards (#standards) - Lists (#lists) - Communities (#communities) Programming Libraries, frameworks and applications useful for developing applications. Platforms and toolkits ⟡ CLARIN-D web tools (https://www.clarin-d.net/en/analysing) - Tools for Analysing Research Data  ⟡ CorpusExplorer  (https://notes.jan-oliver-ruediger.de/software/corpusexplorer-overview/) - Software for corpus linguists and text/data mining enthusiasts. The CorpusExplorer combines over 50 interactive visualizations under a user-friendly interface. ⟡ Haxe-linguistics (https://github.com/sexybiggetje/haxe-linguistics) - Early linguistical analysis and natural language processing library for Haxe. ⟡ Natural (https://github.com/NaturalNode/natural) - General natural language tools for Node.js. ⟡ Natural Language ToolKit (NLTK) (http://www.nltk.org/) - The most complete platform for building Python programs to work with human language data. ⟡ Snowball (https://snowballstem.org/) - Snowball is a language in which stemming algorithms can be easily represented. ⟡ Spacy (https://spacy.io/) - Industrial-strength National Language Processing in Python. ⟡ Mate Tools (http://hdl.handle.net/11022/1007-0000-0000-8E4E-A), webservice via WebLicht ⟡ UBIAI (https://ubiai.tools/) - Easy-to-use text annotation tool for teams with most comprehensive auto-annotation features. Supports NER, relations and document classification as well as OCR annotation for invoice labeling. ⟡ textblob-de (https://github.com/markuskiller/textblob-de) - Nice alternative for spacy (see above). ⟡ tyo (https://github.com/mongsvo/tyo) - A utility for finding Typo-Bridges. ⟡ UralicNLP (https://github.com/mikahama/uralicNLP) - An open source Python library for processing morphologically rich and, for the most part, endangered Uralic languages. It can do morphological analysis, generation, lemmatization,  disambiguation and lexical lookup for a great many Uralic languages. Algorithms ⟡ Stemming algorithms for various European languages (http://snowball.tartarus.org/texts/stemmersoverview.html) - Various stemming algorithms from snowball. ⟡ The Porter Stemmer Algorithm (http://tartarus.org/martin/PorterStemmer/) - The ‘official’ home page for distribution of the Porter Stemming Algorithm, written and maintained by its author, Martin Porter. Data sets ⟡ EuroRomCom Data (https://github.com/kirkins/euroromcom) - JSON formatted Pan-Romance word lists. ⟡ Araneum Germanicum (http://aranea.juls.savba.sk/aranea_about/_germanicum.html) ⟡ CEHugeWebCorpus (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-2638) - German corpus based on CommonCrawl ⟡ Digitales Wörterbuch der deutschen Sprache (DWDS) (https://dwds.de) ⟡ GC4 Corpus (https://german-nlp-group.github.io/projects/gc4-corpus.html) (CommonCrawl) ⟡ IDS Corpora (https://www1.ids-mannheim.de/kl/projekte/korpora) - German Reference Corpus ⟡ Leipzig Corpora Collection (https://wortschatz.uni-leipzig.de/en/download/) - sampled sentences in different languages. ⟡ SdeWaC (https://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/sdewac.en.html) - big german internet corpus ⟡ C-WEP (http://lingured.info/linguistic-resources/cwep/) ⟡ DysList (list of dyslexic errors) (https://github.com/Rauschii/DysListGerman) ⟡ Falko (https://www.linguistik.hu-berlin.de/de/institut/professuren/korpuslinguistik/forschung/falko) ⟡ Litkey (https://www.linguistics.ruhr-uni-bochum.de/litkeycorpus/) ⟡ OpinionSpam (https://github.com/hdaSprachtechnologie/OpinionSpam) Resources ⟡ Low Resource Languages (https://github.com/RIchardLitt/low-resource-languages) - A list of resources for conservation, development, and documentation of low resource (human) languages. ⟡ Language Science Press (https://langsci-press.org/) - Language Science Press is a born-digital scholar-led open access publisher in linguistics. Deep learning models and transformers ⟡ dbmdz BERT models (https://github.com/dbmdz/berts) ⟡ Deepset German BERT model (https://deepset.ai/german-bert) ⟡ Evaluating German Transformer Language Models with Syntactic Agreement Tests (https://github.com/DFKI-NLP/gevalm) ⟡ German ELMo Model (https://github.com/t-systems-on-site-services-gmbh/german-elmo-model) ⟡ german-transformer-training (https://github.com/PhilipMay/german-transformer-training) ⟡ GermLM (https://github.com/tonianelope/Multilingual-BERT) (NER exploration) ⟡ GerPT2 (https://github.com/bminixhofer/gerpt2) ⟡ Sentence Transformers (https://github.com/UKPLab/sentence-transformers) On Wikipedia ⟡ Bag of words model (https://en.wikipedia.org/wiki/Bag-of-words_model) ⟡ Document classification (https://en.wikipedia.org/wiki/Document_classification) ⟡ Language models (https://en.wikipedia.org/wiki/Language_model) ⟡ Naive Bayes classification (https://en.wikipedia.org/wiki/Naive_Bayes_classifier) ⟡ Natural language processing (https://en.wikipedia.org/wiki/Natural_language_processing) ⟡ Outline of natural language processing (https://en.wikipedia.org/wiki/Outline_of_natural_language_processing) ⟡ Parts of speech tagging (https://en.wikipedia.org/wiki/Part-of-speech_tagging) ⟡ Sentiment analysis (https://en.wikipedia.org/wiki/Sentiment_analysis) ⟡ Term frequency - inverse document frequency (https://en.wikipedia.org/wiki/Tf%E2%80%93idf) ⟡ Vector space model (https://en.wikipedia.org/wiki/Vector_space_model) On Youtube ⟡ Computational Linguistics Lecture Playlist (Youtube) (https://www.youtube.com/playlist?list=PLegWUnz91WfuPebLI97-WueAP90JO-15i) - Lectures for University of Maryland class on computational linguistics. ⟡ The Virtual Linguistics Campus (https://www.youtube.com/channel/UCaMpov1PPVXGcKYgwHjXB3g) - CC-licensed educational videos interconnected with Marburg University's e-learning platform of the same name. Books Some of the more interesting and complete books. Free ⟡ Essentials of Linguistics, 2nd edition (https://ecampusontario.pressbooks.pub/essentialsoflinguistics2/) - An introductory book (2nd edition). ⟡ Introduction to Linguistics (https://linguistics.ucla.edu/people/Kracht/courses/ling20-fall07/ling-intro.pdf) ⟡ Natural Language Processing with Python (https://www.nltk.org/book/) - The book from the NLTK package. ⟡ Text Mining with R (https://www.tidytextmining.com) Non free ⟡ Foundations of Computational Linguistics (https://books.google.com/books?id=o9iGAgAAQBAJ&dq=Foundations+of+Computational+Linguistics&hl=nl&source=gbs_navlinks_s) ⟡ Foundations of Statistical Natural Language Processing (https://books.google.nl/books?id=YiFDxbEX3SUC) ⟡ Semisupervised Learning for Computational Linguistics (https://books.google.com/books/about/Semisupervised_Learning_for_Computationa.html?id=VCd67cGB_rAC&redir_esc=y) ⟡ Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition (https://books.google.nl/books?id=fZmj5UNK8AQC) ⟡ The Oxford Handbook of Computational Linguistics (https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780199276349.001.0001/oxfordhb-9780199276349) Standards ⟡ DTA Basisformat (https://www.deutschestextarchiv.de/doku/basisformat/) ⟡ ISO TC 37 SC 4 (https://www.iso.org/committee/297592.html) ⟡ UIMA (https://docs.oasis-open.org/uima/v1.0/os/uima-spec-os.html) Lists ⟡ 15 most popular books on good reads (https://www.goodreads.com/shelf/show/natural-language-processing) ⟡ GitHub topics corpus-linguistics (https://github.com/topics/corpus-linguistics) & nlp (https://github.com/topics/nlp) ⟡ nlp-datasets (https://github.com/niderhoff/nlp-datasets) ⟡ NLP-progress (https://github.com/sebastianruder/NLP-progress) ⟡ /r/LanguageTechnology/ (https://www.reddit.com/r/LanguageTechnology/) ⟡ awesome-nlp (https://github.com/keon/awesome-nlp) ⟡ Awesome Community-Curated NLP List (https://github.com/alvations/awesome-community-curated-nlp) ⟡ awesome-chinese-nlp (https://github.com/crownpku/Awesome-Chinese-NLP) ⟡ awesome-danish (https://github.com/fnielsen/awesome-danish) ⟡ awesome-hungarian-nlp (https://github.com/oroszgy/awesome-hungarian-nlp) ⟡ awesome Information Retrieval (https://github.com/harpribot/awesome-information-retrieval) ⟡ Indonesian NLP (https://github.com/kmkurn/id-nlp-resource) ⟡ Norwegian NLP resources (https://github.com/web64/norwegian-nlp-resources) ⟡ German NLP resources (https://github.com/adbar/German-NLP/) ⟡ awesome-nlp-polish (https://github.com/ksopyla/awesome-nlp-polish) ⟡ awesome-spanish-nlp (https://github.com/dav009/awesome-spanish-nlp) ⟡ M. Weisser's list of NLP/Computational Linguistics Resources (https://martinweisser.org/corpora_site/comp_ling_resources.html) Communities ⟡ Linguistics Stack Exchange (https://linguistics.stackexchange.com/) ⟡ Untranslatable.co, Multilingual urban dictionary (https://untranslatable.co/) linguistics Github: https://github.com/theimpossibleastronaut/awesome-linguistics