94 lines
13 KiB
Plaintext
94 lines
13 KiB
Plaintext
[38;5;12m [39m[38;2;255;187;0m[1m[4mAwesome Linguistics Resources for Spanish [0m[38;5;14m[1m[4m![0m[38;2;255;187;0m[1m[4mAwesome[0m[38;5;14m[1m[4m (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg)[0m[38;2;255;187;0m[1m[4m (https://github.com/sindresorhus/awesome)[0m
|
||
|
||
|
||
[38;5;12mCurated list of Linguistic Resources for doing Spanish NLP & CL.[39m
|
||
|
||
[38;5;12m [39m[38;2;255;187;0m[1m[4mClustering[0m
|
||
[38;5;12m- [39m[38;5;14m[1mMultilingual Latent Dirichlet Allocation LDA[0m[38;5;12m (https://github.com/ArtificiAI/Multilingual-Latent-Dirichlet-Allocation-LDA)[39m
|
||
|
||
[38;5;12m [39m[38;2;255;187;0m[1m[4mSpeech[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mMexican Spanish Speech Recognition DB - 150 Speakers[0m[38;5;12m (http://www.speechocean.com/en-ASR-Corpora/631.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mMexican Spanish Speech Recognition DB - 299 Speakers[0m[38;5;12m (http://www.speechocean.com/en-ASR-Corpora/603.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPhonetic Transcriptions of Spanish Pronunciation Lexicon[0m[38;5;12m (http://www.speechocean.com/en-Text-Corpora/692.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSphinx Speech Recognition Models[0m[38;5;12m (http://www.speech.cs.cmu.edu/sphinx/models/hub4spanish_itesm/)[39m
|
||
|
||
[38;2;255;187;0m[4mPart of Speech Taggers (POS Taggers)[0m
|
||
[38;5;12m- [39m[38;5;14m[1mTreeTagger - POSTagger[0m[38;5;12m (http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mStanford - POSTagger[0m[38;5;12m (http://nlp.stanford.edu/software/tagger.shtml)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mFreeling[0m[38;5;12m (http://nlp.lsi.upc.edu/freeling/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mixa-pipe-pos[0m[38;5;12m (https://github.com/ixa-ehu/ixa-pipe-pos)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mRuby Snowball Implementation[0m[38;5;12m (https://github.com/MaG21/estem)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSpaguetti POSTagger(Based on NLTK + CESS corpus[0m[38;5;12m (https://code.google.com/p/spaghetti-tagger/)[39m
|
||
|
||
[38;5;12m [39m[38;2;255;187;0m[1m[4mMultiword Expressions Extractors (MLWE)[0m
|
||
[38;5;12m- [39m[38;5;14m[1mFreeling[0m[38;5;12m (http://nlp.lsi.upc.edu/freeling/)[39m
|
||
|
||
[38;2;255;187;0m[4mName Entity Recognition (NER)[0m
|
||
[38;5;12m- [39m[38;5;14m[1mOpenNLP - Person/Place/Organization models[0m[38;5;12m (http://opennlp.sourceforge.net/models-1.5/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mDBPedia Spotlight[0m[38;5;12m (https://github.com/dbpedia-spotlight/dbpedia-spotlight/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCitiusTagger - Spanish NER and POSTagger[0m[38;5;12m (http://gramatica.usc.es/pln/tools/CitiusTools.html)[39m
|
||
|
||
[38;2;255;187;0m[4mCorpora[0m
|
||
|
||
[38;2;255;187;0m[4mShared tasks[0m
|
||
[38;5;12m- [39m[38;5;14m[1mExploiting Parallel Texts for Statistical Machine Translation - NAACL 2006 in New York City[0m[38;5;12m (http://www.statmt.org/wmt06/shared-task/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages[0m[38;5;12m (http://ufal.mff.cuni.cz/conll2009-st/trial-data.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mQuality Estimation (Spanish - English) WMT13[0m[38;5;12m (http://www.quest.dcs.shef.ac.uk/wmt13_qe.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1m ACL 2010 in Uppsala - Shared Task: Machine Translation for European Languages[0m[38;5;12m (http://www.statmt.org/wmt10/translation-task.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mTASS - 2014 (Sentiment Analysis focused on Spanish)[0m[38;5;12m (http://www.daedalus.es/TASS2014/tass2014.php)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSemEval-2 2010 Coreference Resolution in Multiple Languages[0m[38;5;12m (http://semeval2.fbk.eu/semeval2.php?location=tasks)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSAB Corpus (Spanish Corpus for Sentiment Analysis towards Brands)[0m[38;5;12m (http://sabcorpus.linkeddata.es/)[39m
|
||
|
||
[38;2;255;187;0m[4mCorpora[0m
|
||
[38;5;12m- [39m[38;5;14m[1mMultilingual Aligned Annotated Corpus (CRATER)[0m[38;5;12m (http://catalog.elra.info/product_info.php?products_id=636)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mUAM Treebank - 1,500 syntactically annotated sentences extracted from newspapers (El País Digital and Compra Maestra[0m[38;5;12m (http://elvira.lllf.uam.es/~sandoval/UAMTreebank.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPOSTagged/syntactic dependencies - European Corpus Initiative Multilingual Corpus I [0m[38;5;12m (http://www.elsnet.org/resources/eciCorpus.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mThe Corpus of Contemporary Spanish(POStags, lemmas)[0m[38;5;12m (http://sfncorpora.uab.es/CQPweb/cea/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mLemmas Dictionary[0m[38;5;12m (http://sfn.uab.es:8080/SFN/dictionary/dictionary-information-lemmas-and-expanded-forms)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mesTenten Spanish (POSTagged) [0m[38;5;12m (http://www.sketchengine.co.uk/documentation/wiki/Corpora/TenTen/esTenTen)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mEuroparl Corpus (Parallel Corpus English-Spanish)[0m[38;5;12m (http://www.statmt.org/europarl/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mColombian Political Speeches[0m[38;5;12m (https://github.com/dav009/LatinamericanTextResources)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSouth American Slang Expressions/MTWE[0m[38;5;12m (https://github.com/dav009/LatinamericanTextResources)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSyntax and Semantic Annotations (Subset Ancora Corpus)[0m[38;5;12m (http://ufal.mff.cuni.cz/conll2009-st/trial/CoNLL2009-ST-Spanish-trial.zip)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPlurilingual Specific Corpus on Economics, Medicine, Computer Science[0m[38;5;12m (http://www.iula.upf.edu/corpus/corpusuk.htm)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCopenhagen Treebank (Dependency Parsing)[0m[38;5;12m (http://code.google.com/p/copenhagen-dependency-treebank/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mReuters Corpora RCV2 - New Corpora[0m[38;5;12m (http://trec.nist.gov/data/reuters/reuters.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mMolinoLabs Corpus - News Corpora from Spain, Argentina and Mexico[0m[38;5;12m (http://www.molinolabs.com/corpus.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPANACEA- Legislation Corpus[0m[38;5;12m (http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPANACEA- Legislation Ngram Corpus[0m[38;5;12m (http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-corpora-n-grams/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPANACEA- Dependency Parsed Corpus[0m[38;5;12m (http://panacea-lr.eu/en/info-for-researchers/data-sets/dependency-parsed-corpora/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mPANACEA- Monolingual Lexica (MWE, Frames, Semantic Classes)[0m[38;5;12m (http://panacea-lr.eu/en/info-for-researchers/data-sets/monolingual-lexica/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mOpinion Mining - User reviews on Cars, Hotels, Washing machines, Books, Cell phones, Music..[0m[38;5;12m (https://www.sfu.ca/~mtaboada/SFU_Review_Corpus.html)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCross Lingual Textual Entailment (CLTE) Corpus (English-Spanish)[0m[38;5;12m (http://www.celct.it/resources.php?id_page=CLTE)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mNgram Frequencies out of Colombia News Corpora[0m[38;5;12m (http://ngrams.cavorite.com/datos/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSagan Textual Entailment Test Suite[0m[38;5;12m (http://www.investigacion.frc.utn.edu.ar/mslabs/~jcastillo/Sagan-test-suite/)[39m
|
||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mGarcia,[0m[38;5;14m[1m [0m[38;5;14m[1mMarcos[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mPablo[0m[38;5;14m[1m [0m[38;5;14m[1mGamallo,[0m[38;5;14m[1m [0m[38;5;14m[1m2013[0m[38;5;14m[1m [0m[38;5;14m[1m-[0m[38;5;14m[1m [0m[38;5;14m[1mPortuguese[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mSpanish[0m[38;5;14m[1m [0m[38;5;14m[1mbiographical[0m[38;5;14m[1m [0m[38;5;14m[1mrelation[0m[38;5;14m[1m [0m[38;5;14m[1mextraction[0m[38;5;14m[1m [0m[38;5;14m[1mcorpora[0m[38;5;14m[1m [0m[38;5;14m[1m(Garcia,[0m[38;5;14m[1m [0m[38;5;14m[1mMarcos[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mPablo[0m[38;5;14m[1m [0m[38;5;14m[1mGamallo,[0m[38;5;14m[1m [0m[38;5;14m[1m2013.[0m[38;5;14m[1m [0m[38;5;14m[1mExploring[0m[38;5;14m[1m [0m[38;5;14m[1mthe[0m[38;5;14m[1m [0m[38;5;14m[1mEffectiveness[0m[38;5;14m[1m [0m[38;5;14m[1mof[0m[38;5;14m[1m [0m[38;5;14m[1mLinguistic[0m[38;5;14m[1m [0m[38;5;14m[1mKnowledge[0m[38;5;14m[1m [0m[38;5;14m[1mfor[0m[38;5;14m[1m [0m[38;5;14m[1mBiographical[0m[38;5;14m[1m [0m[38;5;14m[1mRelation[0m[38;5;14m[1m [0m[38;5;14m[1mExtraction.[0m[38;5;14m[1m [0m[38;5;14m[1mNatural[0m[38;5;14m[1m [0m
|
||
[38;5;14m[1mLanguage[0m[38;5;14m[1m [0m[38;5;14m[1mEngineering,[0m[38;5;14m[1m [0m[38;5;14m[1mCJO2013.[0m[38;5;14m[1m [0m[38;5;14m[1mdoi:10.1017/S1351324913000314.)[0m[38;5;12m [39m[38;5;12m(http://gramatica.usc.es/~marcos/corpora_nle.tgz)[39m
|
||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mGarcia,[0m[38;5;14m[1m [0m[38;5;14m[1mMarcos[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mPablo[0m[38;5;14m[1m [0m[38;5;14m[1mGamallo,[0m[38;5;14m[1m [0m[38;5;14m[1m2014[0m[38;5;14m[1m [0m[38;5;14m[1m-[0m[38;5;14m[1m [0m[38;5;14m[1mPortuguese,[0m[38;5;14m[1m [0m[38;5;14m[1mSpanish[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mGalician[0m[38;5;14m[1m [0m[38;5;14m[1mcoreference[0m[38;5;14m[1m [0m[38;5;14m[1mcorpora[0m[38;5;14m[1m [0m[38;5;14m[1m(Garcia,[0m[38;5;14m[1m [0m[38;5;14m[1mMarcos[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mPablo[0m[38;5;14m[1m [0m[38;5;14m[1mGamallo,[0m[38;5;14m[1m [0m[38;5;14m[1m2014.[0m[38;5;14m[1m [0m[38;5;14m[1mMultilingual[0m[38;5;14m[1m [0m[38;5;14m[1mcorpora[0m[38;5;14m[1m [0m[38;5;14m[1mwith[0m[38;5;14m[1m [0m[38;5;14m[1mcoreferential[0m[38;5;14m[1m [0m[38;5;14m[1mannotation[0m[38;5;14m[1m [0m[38;5;14m[1mof[0m[38;5;14m[1m [0m[38;5;14m[1mperson[0m[38;5;14m[1m [0m[38;5;14m[1mentities.[0m[38;5;14m[1m [0m[38;5;14m[1mIn[0m[38;5;14m[1m [0m[38;5;14m[1mProceedings[0m[38;5;14m[1m [0m[38;5;14m[1mof[0m[38;5;14m[1m [0m[38;5;14m[1mthe[0m[38;5;14m[1m [0m[38;5;14m[1m9th[0m[38;5;14m[1m [0m[38;5;14m[1medition[0m[38;5;14m[1m [0m[38;5;14m[1mof[0m[38;5;14m[1m [0m
|
||
[38;5;14m[1mthe[0m[38;5;14m[1m [0m[38;5;14m[1mLanguage[0m[38;5;14m[1m [0m[38;5;14m[1mResources[0m[38;5;14m[1m [0m[38;5;14m[1mand[0m[38;5;14m[1m [0m[38;5;14m[1mEvaluation[0m[38;5;14m[1m [0m[38;5;14m[1mConference[0m[38;5;14m[1m [0m[38;5;14m[1m(LREC[0m[38;5;14m[1m [0m[38;5;14m[1m2014),[0m[38;5;14m[1m [0m[38;5;14m[1mReykjavik:[0m[38;5;14m[1m [0m[38;5;14m[1m3229-3233.)[0m[38;5;12m [39m[38;5;12m(http://gramatica.usc.es/~marcos/resources/corpora_coref.tar.bz2)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mCOW(Corpora From the Web) Ngram/Annotated People's Name Corpora [0m[38;5;12m (http://hpsg.fu-berlin.de/cow/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mWikicorpus- Portion of 2006's wikipedia annotated with WordNet Synsets and POS[0m[38;5;12m (http://www.cs.upc.edu/~nlp/wikicorpus/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mSpanish Billion Words Corpus with word2vec Embeddings[0m[38;5;12m (http://crscardellino.me/SBWCE/)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mOSCAR or Open Super-large Crawled ALMAnaCH coRpus Spanish subset[0m[38;5;12m (https://traces1.inria.fr/oscar/) [39m
|
||
|
||
|
||
[38;2;255;187;0m[4mMisc[0m
|
||
|
||
[38;5;12m- [39m[38;5;14m[1mWord2Vec vectors for Wikipedia Spanish Articles[0m[38;5;12m (https://github.com/idio/wiki2vec)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mDBpedia Spanish Entities Titles[0m[38;5;12m (http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/labels_es.nt.bz2)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mDBpedia Spanish Abstracts[0m[38;5;12m (http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/es/short_abstracts_es.nt.bz2)[39m
|
||
[38;5;12m- [39m[38;5;14m[1mConshuga - Galician Verb conjugator[0m[38;5;12m (http://gramatica.usc.es/pln/tools/conjugador/download.html)[39m
|
||
|
||
[38;2;255;187;0m[4mContribute[0m
|
||
|
||
[38;5;12mContributions welcome! Read the [39m[38;5;14m[1mcontribution guidelines[0m[38;5;12m (contributing.md) first.[39m
|
||
|
||
[38;2;255;187;0m[4mLicense[0m
|
||
|
||
[38;5;14m[1m![0m[38;5;12mCC0[39m[38;5;14m[1m (https://i.creativecommons.org/p/zero/1.0/88x31.png)[0m[38;5;12m (https://creativecommons.org/publicdomain/zero/1.0/)[39m
|
||
|
||
[38;5;12mTo the extent possible under law, [39m[38;5;14m[1mDavid Przybilla[0m[38;5;12m (http://alejandro.pictures) has waived all copyright and related or neighboring rights to this work.[39m
|
||
|
||
[38;5;12mspanishnlp Github: https://github.com/dav009/awesome-spanish-nlp[39m
|