Awesome
Linguistics Resources for Spanish 
Curated list of Linguistic Resources for doing Spanish NLP &
CL.
Clustering
Speech
Part of Speech Taggers (POS
Taggers)
Name Entity Recognition (NER)
Corpora
Shared tasks
Corpora
- Multilingual
Aligned Annotated Corpus (CRATER)
- UAM
Treebank - 1,500 syntactically annotated sentences extracted from
newspapers (El País Digital and Compra Maestra
- POSTagged/syntactic
dependencies - European Corpus Initiative Multilingual Corpus I
- The Corpus of
Contemporary Spanish(POStags, lemmas)
- Lemmas
Dictionary
- esTenten
Spanish (POSTagged)
- Europarl Corpus (Parallel
Corpus English-Spanish)
- Colombian
Political Speeches
- South
American Slang Expressions/MTWE
- Syntax
and Semantic Annotations (Subset Ancora Corpus)
- Plurilingual
Specific Corpus on Economics, Medicine, Computer Science
- Copenhagen
Treebank (Dependency Parsing)
- Reuters
Corpora RCV2 - New Corpora
- MolinoLabs Corpus -
News Corpora from Spain, Argentina and Mexico
- PANACEA-
Legislation Corpus
- PANACEA-
Legislation Ngram Corpus
- PANACEA-
Dependency Parsed Corpus
- PANACEA-
Monolingual Lexica (MWE, Frames, Semantic Classes)
- Opinion
Mining - User reviews on Cars, Hotels, Washing machines, Books, Cell
phones, Music..
- Cross
Lingual Textual Entailment (CLTE) Corpus (English-Spanish)
- Ngram Frequencies out of
Colombia News Corpora
- Sagan
Textual Entailment Test Suite
- Garcia,
Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical
relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013.
Exploring the Effectiveness of Linguistic Knowledge for Biographical
Relation Extraction. Natural Language Engineering, CJO2013.
doi:10.1017/S1351324913000314.)
- Garcia,
Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician
coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014.
Multilingual corpora with coreferential annotation of person entities.
In Proceedings of the 9th edition of the Language Resources and
Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.)
- COW(Corpora From the Web)
Ngram/Annotated People’s Name Corpora
- Wikicorpus- Portion
of 2006’s wikipedia annotated with WordNet Synsets and POS
- Spanish Billion Words
Corpus with word2vec Embeddings
- OSCAR or Open Super-large
Crawled ALMAnaCH coRpus Spanish subset
Misc
Contribute
Contributions welcome! Read the contribution guidelines first.
License

To the extent possible under law, David Przybilla has waived all
copyright and related or neighboring rights to this work.