update lists

This commit is contained in:
2025-07-18 22:22:32 +02:00
parent 55bed3b4a1
commit 5916c5c074
3078 changed files with 331679 additions and 357255 deletions

View File

@@ -1,12 +1,12 @@
 Awesome Linguistics Resources for Spanish !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
 Awesome Linguistics Resources for Spanish !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
Curated list of Linguistic Resources for doing Spanish NLP & CL.
 Clustering
 Clustering
- Multilingual Latent Dirichlet Allocation LDA (https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA)
 Speech
 Speech
- Mexican Spanish Speech Recognition DB - 150 Speakers (http://www.speechocean.com/en-ASR-Corpora/631.html)
- Mexican Spanish Speech Recognition DB - 299 Speakers (http://www.speechocean.com/en-ASR-Corpora/603.html)
@@ -21,7 +21,7 @@
- Ruby Snowball Implementation (https://github.com/MaG21/estem)
- Spaguetti POSTagger(Based on NLTK + CESS corpus (https://code.google.com/p/spaghetti-tagger/)
 Multiword Expressions Extractors (MLWE)
 Multiword Expressions Extractors (MLWE)
- Freeling (http://nlp.lsi.upc.edu/freeling/)
Name Entity Recognition (NER)
@@ -63,10 +63,10 @@
- Cross Lingual Textual Entailment (CLTE) Corpus (English-Spanish) (http://www.celct.it/resources.php?id_page=CLTE)
- Ngram Frequencies out of Colombia News Corpora (http://ngrams.cavorite.com/datos/)
- Sagan Textual Entailment Test Suite (http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/)
- Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation 
Extraction. Natural Language Engineering, CJO2013. doi:10.1017/S1351324913000314.) (http://gramatica.usc.es/~marcos/corpora_nle.tgz)
- Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In Proceedings of the 9th 
edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.) (http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2)
- Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation Extraction. Natural 
Language Engineering, CJO2013. doi:10.1017/S1351324913000314.) (http://gramatica.usc.es/~marcos/corpora_nle.tgz)
- Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In Proceedings of the 9th edition of 
the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.) (http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2)
- COW(Corpora From the Web) Ngram/Annotated People's Name Corpora  (http://hpsg.fu-berlin.de/cow/)
- Wikicorpus- Portion of 2006's wikipedia annotated with WordNet Synsets and POS (http://www.cs.upc.edu/~nlp/wikicorpus/)
- Spanish Billion Words Corpus with word2vec Embeddings (http://crscardellino.me/SBWCE/)
@@ -89,3 +89,5 @@
!CC0 (https://i.creativecommons.org/p/zero/1.0/88x31.png) (https://creativecommons.org/publicdomain/zero/1.0/)
To the extent possible under law, David Przybilla (http://alejandro.pictures) has waived all copyright and related or neighboring rights to this work.
spanishnlp Github: https://github.com/dav009/awesome-spanish-nlp