Updating conversion, creating readmes

This commit is contained in:
Jonas Zeunert
2024-04-19 23:37:46 +02:00
parent 3619ac710a
commit 08e75b0f0a
635 changed files with 30878 additions and 37344 deletions

View File

@@ -1,12 +1,12 @@
 Awesome Linguistics Resources for Spanish !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
 Awesome Linguistics Resources for Spanish !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
Curated list of Linguistic Resources for doing Spanish NLP & CL.
 Clustering
 Clustering
- Multilingual Latent Dirichlet Allocation LDA (https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA)
 Speech
 Speech
- Mexican Spanish Speech Recognition DB - 150 Speakers (http://www.speechocean.com/en-ASR-Corpora/631.html)
- Mexican Spanish Speech Recognition DB - 299 Speakers (http://www.speechocean.com/en-ASR-Corpora/603.html)
@@ -21,7 +21,7 @@
- Ruby Snowball Implementation (https://github.com/MaG21/estem)
- Spaguetti POSTagger(Based on NLTK + CESS corpus (https://code.google.com/p/spaghetti-tagger/)
 Multiword Expressions Extractors (MLWE)
 Multiword Expressions Extractors (MLWE)
- Freeling (http://nlp.lsi.upc.edu/freeling/)
Name Entity Recognition (NER)
@@ -63,10 +63,10 @@
- Cross Lingual Textual Entailment (CLTE) Corpus (English-Spanish) (http://www.celct.it/resources.php?id_page=CLTE)
- Ngram Frequencies out of Colombia News Corpora (http://ngrams.cavorite.com/datos/)
- Sagan Textual Entailment Test Suite (http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/)
- Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for 
Biographical Relation Extraction. Natural Language Engineering, CJO2013. doi:10.1017/S1351324913000314.) (http://gramatica.usc.es/~marcos/corpora_nle.tgz)
- Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In 
Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.) (http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2)
- Garcia, Marcos and Pablo Gamallo, 2013 - Portuguese and Spanish biographical relation extraction corpora (Garcia, Marcos and Pablo Gamallo, 2013. Exploring the Effectiveness of Linguistic Knowledge for Biographical Relation 
Extraction. Natural Language Engineering, CJO2013. doi:10.1017/S1351324913000314.) (http://gramatica.usc.es/~marcos/corpora_nle.tgz)
- Garcia, Marcos and Pablo Gamallo, 2014 - Portuguese, Spanish and Galician coreference corpora (Garcia, Marcos and Pablo Gamallo, 2014. Multilingual corpora with coreferential annotation of person entities. In Proceedings of the 9th 
edition of the Language Resources and Evaluation Conference (LREC 2014), Reykjavik: 3229-3233.) (http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2)
- COW(Corpora From the Web) Ngram/Annotated People's Name Corpora  (http://hpsg.fu-berlin.de/cow/)
- Wikicorpus- Portion of 2006's wikipedia annotated with WordNet Synsets and POS (http://www.cs.upc.edu/~nlp/wikicorpus/)
- Spanish Billion Words Corpus with word2vec Embeddings (http://crscardellino.me/SBWCE/)