189 lines
8.4 KiB
HTML
189 lines
8.4 KiB
HTML
<h1 id="awesome-linguistics-resources-for-spanish-awesome">Awesome
|
||
Linguistics Resources for Spanish <a
|
||
href="https://github.com/sindresorhus/awesome"><img
|
||
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
|
||
alt="Awesome" /></a></h1>
|
||
<p>Curated list of Linguistic Resources for doing Spanish NLP &
|
||
CL.</p>
|
||
<h1 id="clustering">Clustering</h1>
|
||
<ul>
|
||
<li><a
|
||
href="https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA">Multilingual
|
||
Latent Dirichlet Allocation LDA</a></li>
|
||
</ul>
|
||
<h1 id="speech">Speech</h1>
|
||
<ul>
|
||
<li><a href="http://www.speechocean.com/en-ASR-Corpora/631.html">Mexican
|
||
Spanish Speech Recognition DB - 150 Speakers</a></li>
|
||
<li><a href="http://www.speechocean.com/en-ASR-Corpora/603.html">Mexican
|
||
Spanish Speech Recognition DB - 299 Speakers</a></li>
|
||
<li><a
|
||
href="http://www.speechocean.com/en-Text-Corpora/692.html">Phonetic
|
||
Transcriptions of Spanish Pronunciation Lexicon</a></li>
|
||
<li><a
|
||
href="http://www.speech.cs.cmu.edu/sphinx/models/hub4spanish_itesm/">Sphinx
|
||
Speech Recognition Models</a></li>
|
||
</ul>
|
||
<h2 id="part-of-speech-taggers-pos-taggers">Part of Speech Taggers (POS
|
||
Taggers)</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/">TreeTagger
|
||
- POSTagger</a></li>
|
||
<li><a href="http://nlp.stanford.edu/software/tagger.shtml">Stanford -
|
||
POSTagger</a></li>
|
||
<li><a href="http://nlp.lsi.upc.edu/freeling/">Freeling</a></li>
|
||
<li><a
|
||
href="https://github.com/ixa-ehu/ixa-pipe-pos">ixa-pipe-pos</a></li>
|
||
<li><a href="https://github.com/MaG21/estem">Ruby Snowball
|
||
Implementation</a></li>
|
||
<li><a href="https://code.google.com/p/spaghetti-tagger/">Spaguetti
|
||
POSTagger(Based on NLTK + CESS corpus</a></li>
|
||
</ul>
|
||
<h1 id="multiword-expressions-extractors-mlwe">Multiword Expressions
|
||
Extractors (MLWE)</h1>
|
||
<ul>
|
||
<li><a href="http://nlp.lsi.upc.edu/freeling/">Freeling</a></li>
|
||
</ul>
|
||
<h2 id="name-entity-recognition-ner">Name Entity Recognition (NER)</h2>
|
||
<ul>
|
||
<li><a href="http://opennlp.sourceforge.net/models-1.5/">OpenNLP -
|
||
Person/Place/Organization models</a></li>
|
||
<li><a
|
||
href="https://github.com/dbpedia-spotlight/dbpedia-spotlight/">DBPedia
|
||
Spotlight</a></li>
|
||
<li><a
|
||
href="http://gramatica.usc.es/pln/tools/CitiusTools.html">CitiusTagger -
|
||
Spanish NER and POSTagger</a></li>
|
||
</ul>
|
||
<h2 id="corpora">Corpora</h2>
|
||
<h3 id="shared-tasks">Shared tasks</h3>
|
||
<ul>
|
||
<li><a href="http://www.statmt.org/wmt06/shared-task/">Exploiting
|
||
Parallel Texts for Statistical Machine Translation - NAACL 2006 in New
|
||
York City</a></li>
|
||
<li><a
|
||
href="http://ufal.mff.cuni.cz/conll2009-st/trial-data.html">CoNLL-2009
|
||
Shared Task: Syntactic and Semantic Dependencies in Multiple
|
||
Languages</a></li>
|
||
<li><a href="http://www.quest.dcs.shef.ac.uk/wmt13_qe.html">Quality
|
||
Estimation (Spanish - English) WMT13</a></li>
|
||
<li><a href="http://www.statmt.org/wmt10/translation-task.html">ACL 2010
|
||
in Uppsala - Shared Task: Machine Translation for European
|
||
Languages</a></li>
|
||
<li><a href="http://www.daedalus.es/TASS2014/tass2014.php">TASS - 2014
|
||
(Sentiment Analysis focused on Spanish)</a></li>
|
||
<li><a
|
||
href="http://semeval2.fbk.eu/semeval2.php?location=tasks">SemEval-2 2010
|
||
Coreference Resolution in Multiple Languages</a></li>
|
||
<li><a href="http://sabcorpus.linkeddata.es/">SAB Corpus (Spanish Corpus
|
||
for Sentiment Analysis towards Brands)</a></li>
|
||
</ul>
|
||
<h3 id="corpora-1">Corpora</h3>
|
||
<ul>
|
||
<li><a
|
||
href="http://catalog.elra.info/product_info.php?products_id=636">Multilingual
|
||
Aligned Annotated Corpus (CRATER)</a></li>
|
||
<li><a href="http://elvira.lllf.uam.es/~sandoval/UAMTreebank.html">UAM
|
||
Treebank - 1,500 syntactically annotated sentences extracted from
|
||
newspapers (El País Digital and Compra Maestra</a></li>
|
||
<li><a
|
||
href="http://www.elsnet.org/resources/eciCorpus.html">POSTagged/syntactic
|
||
dependencies - European Corpus Initiative Multilingual Corpus I</a></li>
|
||
<li><a href="http://sfncorpora.uab.es/CQPweb/cea/">The Corpus of
|
||
Contemporary Spanish(POStags, lemmas)</a></li>
|
||
<li><a
|
||
href="http://sfn.uab.es:8080/SFN/dictionary/dictionary-information-lemmas-and-expanded-forms">Lemmas
|
||
Dictionary</a></li>
|
||
<li><a
|
||
href="http://www.sketchengine.co.uk/documentation/wiki/Corpora/TenTen/esTenTen">esTenten
|
||
Spanish (POSTagged)</a></li>
|
||
<li><a href="http://www.statmt.org/europarl/">Europarl Corpus (Parallel
|
||
Corpus English-Spanish)</a></li>
|
||
<li><a
|
||
href="https://github.com/dav009/LatinamericanTextResources">Colombian
|
||
Political Speeches</a></li>
|
||
<li><a href="https://github.com/dav009/LatinamericanTextResources">South
|
||
American Slang Expressions/MTWE</a></li>
|
||
<li><a
|
||
href="http://ufal.mff.cuni.cz/conll2009-st/trial/CoNLL2009-ST-Spanish-trial.zip">Syntax
|
||
and Semantic Annotations (Subset Ancora Corpus)</a></li>
|
||
<li><a href="http://www.iula.upf.edu/corpus/corpusuk.htm">Plurilingual
|
||
Specific Corpus on Economics, Medicine, Computer Science</a></li>
|
||
<li><a
|
||
href="http://code.google.com/p/copenhagen-dependency-treebank/">Copenhagen
|
||
Treebank (Dependency Parsing)</a></li>
|
||
<li><a href="http://trec.nist.gov/data/reuters/reuters.html">Reuters
|
||
Corpora RCV2 - New Corpora</a></li>
|
||
<li><a href="http://www.molinolabs.com/corpus.html">MolinoLabs Corpus -
|
||
News Corpora from Spain, Argentina and Mexico</a></li>
|
||
<li><a
|
||
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora">PANACEA-
|
||
Legislation Corpus</a></li>
|
||
<li><a
|
||
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora-n-grams/">PANACEA-
|
||
Legislation Ngram Corpus</a></li>
|
||
<li><a
|
||
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/">PANACEA-
|
||
Dependency Parsed Corpus</a></li>
|
||
<li><a
|
||
href="http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-lexica/">PANACEA-
|
||
Monolingual Lexica (MWE, Frames, Semantic Classes)</a></li>
|
||
<li><a
|
||
href="https://www.sfu.ca/~mtaboada/SFU_Review_Corpus.html">Opinion
|
||
Mining - User reviews on Cars, Hotels, Washing machines, Books, Cell
|
||
phones, Music..</a></li>
|
||
<li><a href="http://www.celct.it/resources.php?id_page=CLTE">Cross
|
||
Lingual Textual Entailment (CLTE) Corpus (English-Spanish)</a></li>
|
||
<li><a href="http://ngrams.cavorite.com/datos/">Ngram Frequencies out of
|
||
Colombia News Corpora</a></li>
|
||
<li><a
|
||
href="http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/">Sagan
|
||
Textual Entailment Test Suite</a></li>
|
||
<li><a href="http://gramatica.usc.es/~marcos/corpora_nle.tgz">Garcia,
|
||
Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical
|
||
relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013.
|
||
Exploring the Effectiveness of Linguistic Knowledge for Biographical
|
||
Relation Extraction. Natural Language Engineering, CJO2013.
|
||
doi:10.1017/S1351324913000314.)</a></li>
|
||
<li><a
|
||
href="http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2">Garcia,
|
||
Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician
|
||
coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014.
|
||
Multilingual corpora with coreferential annotation of person entities.
|
||
In Proceedings of the 9th edition of the Language Resources and
|
||
Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.)</a></li>
|
||
<li><a href="http://hpsg.fu-berlin.de/cow/">COW(Corpora From the Web)
|
||
Ngram/Annotated People’s Name Corpora</a></li>
|
||
<li><a href="http://www.cs.upc.edu/~nlp/wikicorpus/">Wikicorpus- Portion
|
||
of 2006’s wikipedia annotated with WordNet Synsets and POS</a></li>
|
||
<li><a href="http://crscardellino.me/SBWCE/">Spanish Billion Words
|
||
Corpus with word2vec Embeddings</a></li>
|
||
<li><a href="https://traces1.inria.fr/oscar/">OSCAR or Open Super-large
|
||
Crawled ALMAnaCH coRpus Spanish subset</a></li>
|
||
</ul>
|
||
<h2 id="misc">Misc</h2>
|
||
<ul>
|
||
<li><a href="https://github.com/idio/wiki2vec">Word2Vec vectors for
|
||
Wikipedia Spanish Articles</a></li>
|
||
<li><a
|
||
href="http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/labels_es.nt.bz2">DBpedia
|
||
Spanish Entities Titles</a></li>
|
||
<li><a
|
||
href="http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/short_abstracts_es.nt.bz2">DBpedia
|
||
Spanish Abstracts</a></li>
|
||
<li><a
|
||
href="http://gramatica.usc.es/pln/tools/conjugador/download.html">Conshuga
|
||
- Galician Verb conjugator</a></li>
|
||
</ul>
|
||
<h2 id="contribute">Contribute</h2>
|
||
<p>Contributions welcome! Read the <a
|
||
href="contributing.md">contribution guidelines</a> first.</p>
|
||
<h2 id="license">License</h2>
|
||
<p><a href="https://creativecommons.org/publicdomain/zero/1.0/"><img
|
||
src="https://i.creativecommons.org/p/zero/1.0/88x31.png"
|
||
alt="CC0" /></a></p>
|
||
<p>To the extent possible under law, <a
|
||
href="http://alejandro.pictures">David Przybilla</a> has waived all
|
||
copyright and related or neighboring rights to this work.</p>
|