Files
awesome-awesomeness/html/spanishnlp.html
2024-04-20 19:22:54 +02:00

189 lines
8.4 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="awesome-linguistics-resources-for-spanish-awesome">Awesome
Linguistics Resources for Spanish <a
href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></h1>
<p>Curated list of Linguistic Resources for doing Spanish NLP &amp;
CL.</p>
<h1 id="clustering">Clustering</h1>
<ul>
<li><a
href="https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA">Multilingual
Latent Dirichlet Allocation LDA</a></li>
</ul>
<h1 id="speech">Speech</h1>
<ul>
<li><a href="http://www.speechocean.com/en-ASR-Corpora/631.html">Mexican
Spanish Speech Recognition DB - 150 Speakers</a></li>
<li><a href="http://www.speechocean.com/en-ASR-Corpora/603.html">Mexican
Spanish Speech Recognition DB - 299 Speakers</a></li>
<li><a
href="http://www.speechocean.com/en-Text-Corpora/692.html">Phonetic
Transcriptions of Spanish Pronunciation Lexicon</a></li>
<li><a
href="http://www.speech.cs.cmu.edu/sphinx/models/hub4spanish_itesm/">Sphinx
Speech Recognition Models</a></li>
</ul>
<h2 id="part-of-speech-taggers-pos-taggers">Part of Speech Taggers (POS
Taggers)</h2>
<ul>
<li><a
href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/">TreeTagger
- POSTagger</a></li>
<li><a href="http://nlp.stanford.edu/software/tagger.shtml">Stanford -
POSTagger</a></li>
<li><a href="http://nlp.lsi.upc.edu/freeling/">Freeling</a></li>
<li><a
href="https://github.com/ixa-ehu/ixa-pipe-pos">ixa-pipe-pos</a></li>
<li><a href="https://github.com/MaG21/estem">Ruby Snowball
Implementation</a></li>
<li><a href="https://code.google.com/p/spaghetti-tagger/">Spaguetti
POSTagger(Based on NLTK + CESS corpus</a></li>
</ul>
<h1 id="multiword-expressions-extractors-mlwe">Multiword Expressions
Extractors (MLWE)</h1>
<ul>
<li><a href="http://nlp.lsi.upc.edu/freeling/">Freeling</a></li>
</ul>
<h2 id="name-entity-recognition-ner">Name Entity Recognition (NER)</h2>
<ul>
<li><a href="http://opennlp.sourceforge.net/models-1.5/">OpenNLP -
Person/Place/Organization models</a></li>
<li><a
href="https://github.com/dbpedia-spotlight/dbpedia-spotlight/">DBPedia
Spotlight</a></li>
<li><a
href="http://gramatica.usc.es/pln/tools/CitiusTools.html">CitiusTagger -
Spanish NER and POSTagger</a></li>
</ul>
<h2 id="corpora">Corpora</h2>
<h3 id="shared-tasks">Shared tasks</h3>
<ul>
<li><a href="http://www.statmt.org/wmt06/shared-task/">Exploiting
Parallel Texts for Statistical Machine Translation - NAACL 2006 in New
York City</a></li>
<li><a
href="http://ufal.mff.cuni.cz/conll2009-st/trial-data.html">CoNLL-2009
Shared Task: Syntactic and Semantic Dependencies in Multiple
Languages</a></li>
<li><a href="http://www.quest.dcs.shef.ac.uk/wmt13_qe.html">Quality
Estimation (Spanish - English) WMT13</a></li>
<li><a href="http://www.statmt.org/wmt10/translation-task.html">ACL 2010
in Uppsala - Shared Task: Machine Translation for European
Languages</a></li>
<li><a href="http://www.daedalus.es/TASS2014/tass2014.php">TASS - 2014
(Sentiment Analysis focused on Spanish)</a></li>
<li><a
href="http://semeval2.fbk.eu/semeval2.php?location=tasks">SemEval-2 2010
Coreference Resolution in Multiple Languages</a></li>
<li><a href="http://sabcorpus.linkeddata.es/">SAB Corpus (Spanish Corpus
for Sentiment Analysis towards Brands)</a></li>
</ul>
<h3 id="corpora-1">Corpora</h3>
<ul>
<li><a
href="http://catalog.elra.info/product_info.php?products_id=636">Multilingual
Aligned Annotated Corpus (CRATER)</a></li>
<li><a href="http://elvira.lllf.uam.es/~sandoval/UAMTreebank.html">UAM
Treebank - 1,500 syntactically annotated sentences extracted from
newspapers (El País Digital and Compra Maestra</a></li>
<li><a
href="http://www.elsnet.org/resources/eciCorpus.html">POSTagged/syntactic
dependencies - European Corpus Initiative Multilingual Corpus I</a></li>
<li><a href="http://sfncorpora.uab.es/CQPweb/cea/">The Corpus of
Contemporary Spanish(POStags, lemmas)</a></li>
<li><a
href="http://sfn.uab.es:8080/SFN/dictionary/dictionary-information-lemmas-and-expanded-forms">Lemmas
Dictionary</a></li>
<li><a
href="http://www.sketchengine.co.uk/documentation/wiki/Corpora/TenTen/esTenTen">esTenten
Spanish (POSTagged)</a></li>
<li><a href="http://www.statmt.org/europarl/">Europarl Corpus (Parallel
Corpus English-Spanish)</a></li>
<li><a
href="https://github.com/dav009/LatinamericanTextResources">Colombian
Political Speeches</a></li>
<li><a href="https://github.com/dav009/LatinamericanTextResources">South
American Slang Expressions/MTWE</a></li>
<li><a
href="http://ufal.mff.cuni.cz/conll2009-st/trial/CoNLL2009-ST-Spanish-trial.zip">Syntax
and Semantic Annotations (Subset Ancora Corpus)</a></li>
<li><a href="http://www.iula.upf.edu/corpus/corpusuk.htm">Plurilingual
Specific Corpus on Economics, Medicine, Computer Science</a></li>
<li><a
href="http://code.google.com/p/copenhagen-dependency-treebank/">Copenhagen
Treebank (Dependency Parsing)</a></li>
<li><a href="http://trec.nist.gov/data/reuters/reuters.html">Reuters
Corpora RCV2 - New Corpora</a></li>
<li><a href="http://www.molinolabs.com/corpus.html">MolinoLabs Corpus -
News Corpora from Spain, Argentina and Mexico</a></li>
<li><a
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora">PANACEA-
Legislation Corpus</a></li>
<li><a
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora-n-grams/">PANACEA-
Legislation Ngram Corpus</a></li>
<li><a
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/">PANACEA-
Dependency Parsed Corpus</a></li>
<li><a
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-lexica/">PANACEA-
Monolingual Lexica (MWE, Frames, Semantic Classes)</a></li>
<li><a
href="https://www.sfu.ca/~mtaboada/SFU_Review_Corpus.html">Opinion
Mining - User reviews on Cars, Hotels, Washing machines, Books, Cell
phones, Music..</a></li>
<li><a href="http://www.celct.it/resources.php?id_page=CLTE">Cross
Lingual Textual Entailment (CLTE) Corpus (English-Spanish)</a></li>
<li><a href="http://ngrams.cavorite.com/datos/">Ngram Frequencies out of
Colombia News Corpora</a></li>
<li><a
href="http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/">Sagan
Textual Entailment Test Suite</a></li>
<li><a href="http://gramatica.usc.es/~marcos/corpora_nle.tgz">Garcia,
Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical
relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013.
Exploring the Effectiveness of Linguistic Knowledge for Biographical
Relation Extraction. Natural Language Engineering, CJO2013.
doi:10.1017/S1351324913000314.)</a></li>
<li><a
href="http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2">Garcia,
Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician
coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014.
Multilingual corpora with coreferential annotation of person entities.
In Proceedings of the 9th edition of the Language Resources and
Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.)</a></li>
<li><a href="http://hpsg.fu-berlin.de/cow/">COW(Corpora From the Web)
Ngram/Annotated Peoples Name Corpora</a></li>
<li><a href="http://www.cs.upc.edu/~nlp/wikicorpus/">Wikicorpus- Portion
of 2006s wikipedia annotated with WordNet Synsets and POS</a></li>
<li><a href="http://crscardellino.me/SBWCE/">Spanish Billion Words
Corpus with word2vec Embeddings</a></li>
<li><a href="https://traces1.inria.fr/oscar/">OSCAR or Open Super-large
Crawled ALMAnaCH coRpus Spanish subset</a></li>
</ul>
<h2 id="misc">Misc</h2>
<ul>
<li><a href="https://github.com/idio/wiki2vec">Word2Vec vectors for
Wikipedia Spanish Articles</a></li>
<li><a
href="http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/labels_es.nt.bz2">DBpedia
Spanish Entities Titles</a></li>
<li><a
href="http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/short_abstracts_es.nt.bz2">DBpedia
Spanish Abstracts</a></li>
<li><a
href="http://gramatica.usc.es/pln/tools/conjugador/download.html">Conshuga
- Galician Verb conjugator</a></li>
</ul>
<h2 id="contribute">Contribute</h2>
<p>Contributions welcome! Read the <a
href="contributing.md">contribution guidelines</a> first.</p>
<h2 id="license">License</h2>
<p><a href="https://creativecommons.org/publicdomain/zero/1.0/"><img
src="https://i.creativecommons.org/p/zero/1.0/88x31.png"
alt="CC0" /></a></p>
<p>To the extent possible under law, <a
href="http://alejandro.pictures">David Przybilla</a> has waived all
copyright and related or neighboring rights to this work.</p>