295 lines
14 KiB
HTML
295 lines
14 KiB
HTML
<h1 id="awesome-computational-biology-awesome">Awesome Computational
|
|
Biology <a href="https://awesome.re"><img
|
|
src="https://awesome.re/badge.svg" alt="Awesome" /></a></h1>
|
|
<p>A knowledge collection of databases, software and papers related to
|
|
computational biology.</p>
|
|
<blockquote>
|
|
<p>Computational biology involves the development and application of
|
|
data-analytical and theoretical methods, mathematical modelling and
|
|
computational simulation techniques to the study of biological,
|
|
ecological, behavioural, and social systems. - <a
|
|
href="https://en.wikipedia.org/wiki/Computational_biology">Wikipedia</a></p>
|
|
</blockquote>
|
|
<h2 id="contents">Contents</h2>
|
|
<ul>
|
|
<li><a href="#databases">Databases</a>
|
|
<ul>
|
|
<li><a href="#scrna">scRNA</a></li>
|
|
<li><a href="#compound">Compound</a></li>
|
|
<li><a href="#pathway">Pathway</a></li>
|
|
<li><a href="#mass-spectra">Mass Spectra</a></li>
|
|
<li><a href="#protein">Protein</a></li>
|
|
<li><a href="#genome">Genome</a></li>
|
|
<li><a href="#disease">Disease</a></li>
|
|
<li><a href="#interaction">Interaction</a></li>
|
|
<li><a href="#clinical-trial">Clinical Trial</a></li>
|
|
</ul></li>
|
|
<li><a href="#api">API</a></li>
|
|
<li><a href="#preprocess">Preprocess</a></li>
|
|
<li><a href="#machine-learning-tasks-and-models">Machine Learning Tasks
|
|
and Models</a>
|
|
<ul>
|
|
<li><a href="#drug-response-prediction">Drug Response
|
|
Prediction</a></li>
|
|
<li><a href="#drug-repurposing">Drug Repurposing</a></li>
|
|
<li><a href="#drug-target-interaction">Drug Target Interaction</a></li>
|
|
<li><a href="#compound-protein-interaction">Compound Protein
|
|
Interaction</a></li>
|
|
<li><a href="#pre-trained-embedding">Pre-trained embedding</a></li>
|
|
<li><a href="#llm-for-biology">LLM for biology</a></li>
|
|
</ul></li>
|
|
</ul>
|
|
<h2 id="databases">Databases</h2>
|
|
<h3 id="scrna">scRNA</h3>
|
|
<ul>
|
|
<li><a href="https://www.ncbi.nlm.nih.gov/geo/">Gene Expression
|
|
Omnibus</a> - Public functional genemics database.</li>
|
|
<li><a href="https://singlecell.broadinstitute.org/single_cell">Single
|
|
Cell PORTAL</a> - Public database for single cell RNA.</li>
|
|
<li><a href="https://www.ebi.ac.uk/gxa/sc/home">Single Cell Expression
|
|
Atlas</a> - Public database for single cell RNA. ### Compound</li>
|
|
<li><a href="https://pubchem.ncbi.nlm.nih.gov/">PubChem</a> - One of the
|
|
biggest chemical database such as compounds, genes and proteins.</li>
|
|
<li><a href="https://www.ebi.ac.uk/chebi/">ChEBI</a> - Chemical database
|
|
focused on small chemical compounds.</li>
|
|
<li><a href="https://www.ebi.ac.uk/chembl/">ChEMBL</a> - Database of
|
|
bioactive molecules with drug-like properties.</li>
|
|
<li><a href="http://www.chemspider.com/">ChemSpider</a> - Chemical
|
|
structure database.</li>
|
|
<li><a href="https://www.genome.jp/kegg/compound/">KEGG COMPOUND</a> -
|
|
Collection of small molecules and biopolymers.</li>
|
|
<li><a href="https://www.lipidmaps.org/databases/lmsd/overview">LIPID
|
|
MAPS</a> - Database of lipids.</li>
|
|
<li><a href="https://www.rhea-db.org/">Rhea</a> - Database of chemical
|
|
reactions.</li>
|
|
<li><a
|
|
href="https://repo-hub.broadinstitute.org/repurposing#download-data">Drug
|
|
Repurposing Hub</a> - Collections of drug repurposing data containing
|
|
drug, moa, target etc.</li>
|
|
<li><a href="https://idrblab.net/ttd/full-data-download">Therapeutic
|
|
Target Database</a> - collections of drug-target, target-disease, and
|
|
drug-disease dataset.</li>
|
|
<li><a href="https://zinc.docking.org/">ZINC ligand discovery
|
|
database</a> - Free database of commercially-available compounds for
|
|
virtual screening.</li>
|
|
<li><a href="http://moleculenet.ai/">MoleculeNet</a> - Benchmark for
|
|
molecular machine learning.</li>
|
|
<li><a
|
|
href="https://www.sciencedirect.com/science/article/abs/pii/S0166354220302412">Ames
|
|
Mutagenicity dataset</a> - Dataset for predicting mutagenicity.</li>
|
|
<li><a href="https://www.antibody-drug.com/">ADCdb</a> - Database for
|
|
antibody-drug conjugates. ### Pathway</li>
|
|
<li><a href="https://www.pathwaycommons.org/">PathwayCommons</a> -
|
|
Database of Pathways and Interactions.</li>
|
|
<li><a href="https://www.genome.jp/kegg/pathway.html">KEGG PATHWAY</a> -
|
|
Collection fo drawn pathway maps.</li>
|
|
<li><a href="https://wikipathways.org/">WikiPathways</a> - Database of
|
|
biological pathways. ### Mass Spectra</li>
|
|
<li><a href="http://www.massbank.jp/">MassBank</a> - Open souce
|
|
databases and tools for mass spectrometry reference spectra.</li>
|
|
<li><a href="https://mona.fiehnlab.ucdavis.edu/">MoNA MassBank of North
|
|
America</a> - Meta database of metabolite mass spectra, metadata and
|
|
associated compounds. ### Protein</li>
|
|
<li><a href="https://www.proteinatlas.org/">THE HUMAN PROTEIN ATLAS</a>
|
|
- One of the biggest human protein database contained cells, tissues,
|
|
and organs.</li>
|
|
<li><a href="https://www.rcsb.org/">PROTEIN DATA BANK</a> - Database of
|
|
the 3D shapes of proteins, nucleic acids, and complex assemblies.</li>
|
|
<li><a href="https://www.uniprot.org/">UniProt</a> - The collection of
|
|
functional information on proteins.</li>
|
|
<li><a href="https://alphafold.ebi.ac.uk/api-docs">AlphaFold Protein
|
|
Structure Database</a> - Database of 3D protein structures.</li>
|
|
<li><a href="https://www.rcsb.org/">RCSB Protein Data Bank (PDB)</a> -
|
|
Repository of 3D structural data of large biological molecules.</li>
|
|
<li><a href="https://predictioncenter.org/">Critical Assessment of
|
|
Structure Prediction (CASP)</a> - Experiment for advancing the methods
|
|
of predicting protein structure from sequence.</li>
|
|
<li><a href="https://uniclust.mmseqs.com/">Uniclust</a> - Collection of
|
|
clustered protein sequence databases.</li>
|
|
<li><a href="https://www.cathdb.info/">CATH database</a> - Hierarchical
|
|
classification of protein domain structures. ### Genome</li>
|
|
<li><a
|
|
href="https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml">Human
|
|
Genome Resources at NCBI</a> - Database of image, proteomics,
|
|
transcriptomics and systems biology.</li>
|
|
<li><a href="https://www.ncbi.nlm.nih.gov/genbank/">GenBank</a> -
|
|
Database of genetic sequence offered by NCBI.</li>
|
|
<li><a href="https://genome.ucsc.edu/">UCSC Genome Browser</a> - Genome
|
|
blowser offered by UCSC.</li>
|
|
<li><a href="https://www.cbioportal.org/">cBioPortal</a> - Database of
|
|
Cancer Genomics. This has overall metaview for a lot of patients.</li>
|
|
<li><a href="https://www.10xgenomics.com/resources/datasets">10x
|
|
Genomics Dataset</a> - Collection of single-cell datasets.</li>
|
|
<li><a href="https://gtexportal.org/home/">The Genotype-Tissue
|
|
Expression (GTEx)</a> - Resource for studying human gene expression and
|
|
regulation.</li>
|
|
<li><a href="https://depmap.org/portal/">Dependency Map (DepMap)</a> -
|
|
Genome-wide CRISPR-Cas9 screens in cancer cell lines.</li>
|
|
<li><a href="https://cancer.sanger.ac.uk/cosmic">Catalogue Of Somatic
|
|
Mutations In Cancer (COSMIC)</a> - Comprehensive resource for exploring
|
|
somatic mutations in human cancers.</li>
|
|
<li><a href="https://www.ebi.ac.uk/metagenomics/">MGnify</a> - Free
|
|
resource for archiving, analysis, and browsing of metagenomic and
|
|
metatranscriptomic data.</li>
|
|
<li><a href="http://jaspar.genereg.net/">JASPAR</a> - Open-access
|
|
database of curated, non-redundant transcription factor binding
|
|
profiles. ### Disease</li>
|
|
<li><a href="https://www.genome.jp/kegg/drug/">KEGG DRUG</a> -
|
|
Comprehensive drug information resource for approved drugs.</li>
|
|
<li><a href="https://www.drugbank.com/">DrugBank</a> - A database of
|
|
drug and target maintained by the University of Alberta. ###
|
|
Interaction</li>
|
|
<li>Drug Gene Interaction
|
|
<ul>
|
|
<li><a href="https://www.dgidb.org/">DGIdb</a> - A database of drug-gene
|
|
interactions and the druggable genome.</li>
|
|
<li><a href="http://ctdbase.org/">Comparative Toxicogenomics
|
|
Database</a> - A database of Chemical-gene interactions,
|
|
Chemical-disease associations, Gene-disease associations, and
|
|
Chemical-phenotype associations.</li>
|
|
<li><a
|
|
href="https://snap.stanford.edu/biodata/datasets/10002/10002-ChG-Miner.html#:~:text=Dataset%20information,or%20activation%20of%20the%20drug.">SNAP</a>
|
|
- A dataset which contains Drug-gene interactions.</li>
|
|
<li><a href="https://tdcommons.ai/">Therapeutics Data Commons</a> - A
|
|
database for a lot of tasks such as drug-target, drug-response,
|
|
drug-drug interaction.</li>
|
|
</ul></li>
|
|
<li>Drug (-Cell line) Response
|
|
<ul>
|
|
<li><a
|
|
href="https://dtp.cancer.gov/discovery_development/nci-60/">NCI60</a> A
|
|
database which focus on 60 cancer cell lines with many drugs.</li>
|
|
<li><a href="https://www.cancerrxgene.org/">Genomics of Drug Sensitivity
|
|
in Cancer (GDSC)</a> - A database of drug sensitibity which has 1000
|
|
human cancer cell lines and 100s compounds.</li>
|
|
<li><a href="https://sites.broadinstitute.org/ccle/">Cancer Cell Line
|
|
Encyclopedia</a> - A database of cancer cell lines. This has 1000 cell
|
|
lines.</li>
|
|
<li><a href="https://discover.nci.nih.gov/cellminercdb/">CellMiner Cross
|
|
Database (CellMinerCDB)</a> - Integration of multiple cancer cell line
|
|
databases.</li>
|
|
</ul></li>
|
|
<li>Chemical Protein Interaction
|
|
<ul>
|
|
<li><a href="http://stitch.embl.de/">STITCH</a> - A database of Chemical
|
|
Protein Interaction.</li>
|
|
<li><a href="https://www.bindingdb.org/rwd/bind/index.jsp">BindingDB</a>
|
|
- A database of compounds and targes.</li>
|
|
<li><a href="http://www.pdbbind.org.cn/">PDBBind</a> - Database of
|
|
experimentally measured binding affinity data for biomolecular
|
|
complexes.</li>
|
|
<li><a href="https://arxiv.org/abs/2001.01037">CrossDocked2020</a> -
|
|
Large-scale dataset for machine learning in structure-based virtual
|
|
screening.</li>
|
|
</ul></li>
|
|
<li>Protein-Protein Interaction
|
|
<ul>
|
|
<li><a href="https://string-db.org/">STRING</a> - Protein-Protein
|
|
Interaction Networks for several organisms.</li>
|
|
<li><a href="https://thebiogrid.org/">BioGRID</a> - Database of Protein,
|
|
Genetic and Chemical Interactions.</li>
|
|
<li><a
|
|
href="http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/">HIPPIE</a> -
|
|
Human Protein-Protein Interaction database.</li>
|
|
</ul></li>
|
|
<li>Knowledge Graph
|
|
<ul>
|
|
<li><a href="https://github.com/SuLab/DrugMechDB/tree/2.0.1">Drug
|
|
Mechanism Database (DrugMechDB)</a>: database of the mechanism of action
|
|
from a drug to a disease.</li>
|
|
<li><a href="https://github.com/gnn4dr/DRKG">DRKG</a> - A library for
|
|
biological knowledge graph. ### Clinical Trial</li>
|
|
</ul></li>
|
|
<li><a href="https://clinicaltrials.gov/">ClinicalTrials.gov</a> -
|
|
Database of privately and publicly funded clinical studies.</li>
|
|
<li><a href="https://icd.who.int/browse10/2019/en">ICD10</a> -
|
|
International Classification of Diseases, 10th revision.</li>
|
|
<li><a href="https://eudract.ema.europa.eu/">EU Drug Regulating
|
|
Authorities Clinical Trials DB (EudraCT)</a> - European database of
|
|
clinical trials.</li>
|
|
<li><a href="https://mimic.mit.edu/">MIMIC-IV</a> - Freely accessible
|
|
critical care database.</li>
|
|
</ul>
|
|
<h2 id="api">API</h2>
|
|
<ul>
|
|
<li><a
|
|
href="https://www.nlm.nih.gov/dataguide/edirect/esearch.html">PubMed
|
|
esearch</a>: API for searching articles in PubMed.</li>
|
|
</ul>
|
|
<h2 id="preprocess">Preprocess</h2>
|
|
<ul>
|
|
<li><a href="https://github.com/cdk/cdk">Chemistry Development Kit</a> -
|
|
A software of cheminformatics and Machine Learning.</li>
|
|
<li><a href="https://github.com/rdkit/rdkit">RDKit</a> - A software of
|
|
cheminformatics and Machine Learning.</li>
|
|
<li><a href="https://scanpy.readthedocs.io/en/stable/">Scanpy</a> -
|
|
scRNA analysis library in Python.</li>
|
|
<li><a href="https://satijalab.org/seurat/">Seurat</a> - scRNA analysis
|
|
library in R.</li>
|
|
</ul>
|
|
<h2 id="machine-learning-tasks-and-models">Machine Learning Tasks and
|
|
Models</h2>
|
|
<h2 id="drug-response-prediction">Drug Response Prediction</h2>
|
|
<ul>
|
|
<li><a href="https://github.com/inoue0426/drGAT">drGAT</a>: A model for
|
|
drug response prediction with gene explainability with attention
|
|
mechanism.</li>
|
|
<li><a href="https://github.com/weiba/MOFGCN/tree/main">MOFGCN</a>: GCN
|
|
+ heterogeneous network</li>
|
|
<li><a
|
|
href="https://ieeexplore-ieee-org.ezp2.lib.umn.edu/stamp/stamp.jsp?tp=&arnumber=8723620&tag=1">DeepDSC</a>:
|
|
Autoencoder + Fully Connected NN</li>
|
|
<li><a href="https://github.com/minwoopak/heteronet">DGDRP</a>:
|
|
Multi-view embedding NN.</li>
|
|
<li><a href="https://github.com/zhejiangzhuque/DeepAEG">DeepAEG</a>: GNN
|
|
Embedding + Attention</li>
|
|
</ul>
|
|
<h3 id="drug-repurposing">Drug Repurposing</h3>
|
|
<ul>
|
|
<li><a
|
|
href="https://github.com/kexinhuang12345/DeepPurpose">DeepPurpose</a> -
|
|
A DL Library for Drug Repurposing.</li>
|
|
</ul>
|
|
<h3 id="drug-target-interaction">Drug Target Interaction</h3>
|
|
<ul>
|
|
<li><a href="https://github.com/FangpingWan/NeoDTI">NeoDTI</a> - A
|
|
library for Drug Target Interaction.</li>
|
|
</ul>
|
|
<h3 id="compound-protein-interaction">Compound Protein Interaction</h3>
|
|
<ul>
|
|
<li><a
|
|
href="https://github.com/mhlee0903/multi_channels_PINN">MCPINN</a> - A
|
|
library for drug discovery using Compound Protein Interaction and
|
|
Machine Learning.</li>
|
|
<li><a
|
|
href="https://github.com/lifanchen-simm/transformerCPI">TransformerCPI</a>
|
|
- A library for Compound Protein Interaction prediction using
|
|
Transformer.</li>
|
|
</ul>
|
|
<h3 id="pre-trained-embedding">Pre-trained embedding</h3>
|
|
<ul>
|
|
<li><a href="https://github.com/facebookresearch/esm">Evolutionary Scale
|
|
Modeling</a> - a library for protein embeddings.</li>
|
|
<li><a
|
|
href="https://github.com/seyonechithrananda/bert-loves-chemistry">ChemBERTa-2</a>
|
|
- a library for chemical embeddingg and prediction.</li>
|
|
</ul>
|
|
<h3 id="llm-for-biology">LLM for biology</h3>
|
|
<ul>
|
|
<li><a
|
|
href="https://huggingface.co/AI4Chem/ChemLLM-7B-Chat">AI4Chem/ChemLLM-7B-Chat</a>
|
|
- LLM for chemical and molecule science</li>
|
|
<li><a href="https://github.com/microsoft/BioGPT">BioGPT</a> - LLM for
|
|
Biomedical text generation</li>
|
|
<li><a href="https://github.com/ncbi/GeneGPT">GeneGPT</a> - LLM for
|
|
biomedical information with several API.</li>
|
|
<li><a href="https://github.com/yiqunchen/GenePT">GenePT</a> -
|
|
foundation LLM for single cell data</li>
|
|
<li><a href="https://github.com/cantinilab/scPRINT">scPRINT</a> -
|
|
scPRINT is pretrained on 50M cells to denoise and perform zero
|
|
imputation of any single cell RNAseq profile.</li>
|
|
</ul>
|
|
<p><a
|
|
href="https://github.com/inoue0426/awesome-computational-biology">computationalbiology.md
|
|
Github</a></p>
|