Files
awesome-awesomeness/html/computationalbiology.md2.html
2025-07-18 23:13:11 +02:00

295 lines
14 KiB
HTML

<h1 id="awesome-computational-biology-awesome">Awesome Computational
Biology <a href="https://awesome.re"><img
src="https://awesome.re/badge.svg" alt="Awesome" /></a></h1>
<p>A knowledge collection of databases, software and papers related to
computational biology.</p>
<blockquote>
<p>Computational biology involves the development and application of
data-analytical and theoretical methods, mathematical modelling and
computational simulation techniques to the study of biological,
ecological, behavioural, and social systems. - <a
href="https://en.wikipedia.org/wiki/Computational_biology">Wikipedia</a></p>
</blockquote>
<h2 id="contents">Contents</h2>
<ul>
<li><a href="#databases">Databases</a>
<ul>
<li><a href="#scrna">scRNA</a></li>
<li><a href="#compound">Compound</a></li>
<li><a href="#pathway">Pathway</a></li>
<li><a href="#mass-spectra">Mass Spectra</a></li>
<li><a href="#protein">Protein</a></li>
<li><a href="#genome">Genome</a></li>
<li><a href="#disease">Disease</a></li>
<li><a href="#interaction">Interaction</a></li>
<li><a href="#clinical-trial">Clinical Trial</a></li>
</ul></li>
<li><a href="#api">API</a></li>
<li><a href="#preprocess">Preprocess</a></li>
<li><a href="#machine-learning-tasks-and-models">Machine Learning Tasks
and Models</a>
<ul>
<li><a href="#drug-response-prediction">Drug Response
Prediction</a></li>
<li><a href="#drug-repurposing">Drug Repurposing</a></li>
<li><a href="#drug-target-interaction">Drug Target Interaction</a></li>
<li><a href="#compound-protein-interaction">Compound Protein
Interaction</a></li>
<li><a href="#pre-trained-embedding">Pre-trained embedding</a></li>
<li><a href="#llm-for-biology">LLM for biology</a></li>
</ul></li>
</ul>
<h2 id="databases">Databases</h2>
<h3 id="scrna">scRNA</h3>
<ul>
<li><a href="https://www.ncbi.nlm.nih.gov/geo/">Gene Expression
Omnibus</a> - Public functional genemics database.</li>
<li><a href="https://singlecell.broadinstitute.org/single_cell">Single
Cell PORTAL</a> - Public database for single cell RNA.</li>
<li><a href="https://www.ebi.ac.uk/gxa/sc/home">Single Cell Expression
Atlas</a> - Public database for single cell RNA. ### Compound</li>
<li><a href="https://pubchem.ncbi.nlm.nih.gov/">PubChem</a> - One of the
biggest chemical database such as compounds, genes and proteins.</li>
<li><a href="https://www.ebi.ac.uk/chebi/">ChEBI</a> - Chemical database
focused on small chemical compounds.</li>
<li><a href="https://www.ebi.ac.uk/chembl/">ChEMBL</a> - Database of
bioactive molecules with drug-like properties.</li>
<li><a href="http://www.chemspider.com/">ChemSpider</a> - Chemical
structure database.</li>
<li><a href="https://www.genome.jp/kegg/compound/">KEGG COMPOUND</a> -
Collection of small molecules and biopolymers.</li>
<li><a href="https://www.lipidmaps.org/databases/lmsd/overview">LIPID
MAPS</a> - Database of lipids.</li>
<li><a href="https://www.rhea-db.org/">Rhea</a> - Database of chemical
reactions.</li>
<li><a
href="https://repo-hub.broadinstitute.org/repurposing#download-data">Drug
Repurposing Hub</a> - Collections of drug repurposing data containing
drug, moa, target etc.</li>
<li><a href="https://idrblab.net/ttd/full-data-download">Therapeutic
Target Database</a> - collections of drug-target, target-disease, and
drug-disease dataset.</li>
<li><a href="https://zinc.docking.org/">ZINC ligand discovery
database</a> - Free database of commercially-available compounds for
virtual screening.</li>
<li><a href="http://moleculenet.ai/">MoleculeNet</a> - Benchmark for
molecular machine learning.</li>
<li><a
href="https://www.sciencedirect.com/science/article/abs/pii/S0166354220302412">Ames
Mutagenicity dataset</a> - Dataset for predicting mutagenicity.</li>
<li><a href="https://www.antibody-drug.com/">ADCdb</a> - Database for
antibody-drug conjugates. ### Pathway</li>
<li><a href="https://www.pathwaycommons.org/">PathwayCommons</a> -
Database of Pathways and Interactions.</li>
<li><a href="https://www.genome.jp/kegg/pathway.html">KEGG PATHWAY</a> -
Collection fo drawn pathway maps.</li>
<li><a href="https://wikipathways.org/">WikiPathways</a> - Database of
biological pathways. ### Mass Spectra</li>
<li><a href="http://www.massbank.jp/">MassBank</a> - Open souce
databases and tools for mass spectrometry reference spectra.</li>
<li><a href="https://mona.fiehnlab.ucdavis.edu/">MoNA MassBank of North
America</a> - Meta database of metabolite mass spectra, metadata and
associated compounds. ### Protein</li>
<li><a href="https://www.proteinatlas.org/">THE HUMAN PROTEIN ATLAS</a>
- One of the biggest human protein database contained cells, tissues,
and organs.</li>
<li><a href="https://www.rcsb.org/">PROTEIN DATA BANK</a> - Database of
the 3D shapes of proteins, nucleic acids, and complex assemblies.</li>
<li><a href="https://www.uniprot.org/">UniProt</a> - The collection of
functional information on proteins.</li>
<li><a href="https://alphafold.ebi.ac.uk/api-docs">AlphaFold Protein
Structure Database</a> - Database of 3D protein structures.</li>
<li><a href="https://www.rcsb.org/">RCSB Protein Data Bank (PDB)</a> -
Repository of 3D structural data of large biological molecules.</li>
<li><a href="https://predictioncenter.org/">Critical Assessment of
Structure Prediction (CASP)</a> - Experiment for advancing the methods
of predicting protein structure from sequence.</li>
<li><a href="https://uniclust.mmseqs.com/">Uniclust</a> - Collection of
clustered protein sequence databases.</li>
<li><a href="https://www.cathdb.info/">CATH database</a> - Hierarchical
classification of protein domain structures. ### Genome</li>
<li><a
href="https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml">Human
Genome Resources at NCBI</a> - Database of image, proteomics,
transcriptomics and systems biology.</li>
<li><a href="https://www.ncbi.nlm.nih.gov/genbank/">GenBank</a> -
Database of genetic sequence offered by NCBI.</li>
<li><a href="https://genome.ucsc.edu/">UCSC Genome Browser</a> - Genome
blowser offered by UCSC.</li>
<li><a href="https://www.cbioportal.org/">cBioPortal</a> - Database of
Cancer Genomics. This has overall metaview for a lot of patients.</li>
<li><a href="https://www.10xgenomics.com/resources/datasets">10x
Genomics Dataset</a> - Collection of single-cell datasets.</li>
<li><a href="https://gtexportal.org/home/">The Genotype-Tissue
Expression (GTEx)</a> - Resource for studying human gene expression and
regulation.</li>
<li><a href="https://depmap.org/portal/">Dependency Map (DepMap)</a> -
Genome-wide CRISPR-Cas9 screens in cancer cell lines.</li>
<li><a href="https://cancer.sanger.ac.uk/cosmic">Catalogue Of Somatic
Mutations In Cancer (COSMIC)</a> - Comprehensive resource for exploring
somatic mutations in human cancers.</li>
<li><a href="https://www.ebi.ac.uk/metagenomics/">MGnify</a> - Free
resource for archiving, analysis, and browsing of metagenomic and
metatranscriptomic data.</li>
<li><a href="http://jaspar.genereg.net/">JASPAR</a> - Open-access
database of curated, non-redundant transcription factor binding
profiles. ### Disease</li>
<li><a href="https://www.genome.jp/kegg/drug/">KEGG DRUG</a> -
Comprehensive drug information resource for approved drugs.</li>
<li><a href="https://www.drugbank.com/">DrugBank</a> - A database of
drug and target maintained by the University of Alberta. ###
Interaction</li>
<li>Drug Gene Interaction
<ul>
<li><a href="https://www.dgidb.org/">DGIdb</a> - A database of drug-gene
interactions and the druggable genome.</li>
<li><a href="http://ctdbase.org/">Comparative Toxicogenomics
Database</a> - A database of Chemical-gene interactions,
Chemical-disease associations, Gene-disease associations, and
Chemical-phenotype associations.</li>
<li><a
href="https://snap.stanford.edu/biodata/datasets/10002/10002-ChG-Miner.html#:~:text=Dataset%20information,or%20activation%20of%20the%20drug.">SNAP</a>
- A dataset which contains Drug-gene interactions.</li>
<li><a href="https://tdcommons.ai/">Therapeutics Data Commons</a> - A
database for a lot of tasks such as drug-target, drug-response,
drug-drug interaction.</li>
</ul></li>
<li>Drug (-Cell line) Response
<ul>
<li><a
href="https://dtp.cancer.gov/discovery_development/nci-60/">NCI60</a> A
database which focus on 60 cancer cell lines with many drugs.</li>
<li><a href="https://www.cancerrxgene.org/">Genomics of Drug Sensitivity
in Cancer (GDSC)</a> - A database of drug sensitibity which has 1000
human cancer cell lines and 100s compounds.</li>
<li><a href="https://sites.broadinstitute.org/ccle/">Cancer Cell Line
Encyclopedia</a> - A database of cancer cell lines. This has 1000 cell
lines.</li>
<li><a href="https://discover.nci.nih.gov/cellminercdb/">CellMiner Cross
Database (CellMinerCDB)</a> - Integration of multiple cancer cell line
databases.</li>
</ul></li>
<li>Chemical Protein Interaction
<ul>
<li><a href="http://stitch.embl.de/">STITCH</a> - A database of Chemical
Protein Interaction.</li>
<li><a href="https://www.bindingdb.org/rwd/bind/index.jsp">BindingDB</a>
- A database of compounds and targes.</li>
<li><a href="http://www.pdbbind.org.cn/">PDBBind</a> - Database of
experimentally measured binding affinity data for biomolecular
complexes.</li>
<li><a href="https://arxiv.org/abs/2001.01037">CrossDocked2020</a> -
Large-scale dataset for machine learning in structure-based virtual
screening.</li>
</ul></li>
<li>Protein-Protein Interaction
<ul>
<li><a href="https://string-db.org/">STRING</a> - Protein-Protein
Interaction Networks for several organisms.</li>
<li><a href="https://thebiogrid.org/">BioGRID</a> - Database of Protein,
Genetic and Chemical Interactions.</li>
<li><a
href="http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/">HIPPIE</a> -
Human Protein-Protein Interaction database.</li>
</ul></li>
<li>Knowledge Graph
<ul>
<li><a href="https://github.com/SuLab/DrugMechDB/tree/2.0.1">Drug
Mechanism Database (DrugMechDB)</a>: database of the mechanism of action
from a drug to a disease.</li>
<li><a href="https://github.com/gnn4dr/DRKG">DRKG</a> - A library for
biological knowledge graph. ### Clinical Trial</li>
</ul></li>
<li><a href="https://clinicaltrials.gov/">ClinicalTrials.gov</a> -
Database of privately and publicly funded clinical studies.</li>
<li><a href="https://icd.who.int/browse10/2019/en">ICD10</a> -
International Classification of Diseases, 10th revision.</li>
<li><a href="https://eudract.ema.europa.eu/">EU Drug Regulating
Authorities Clinical Trials DB (EudraCT)</a> - European database of
clinical trials.</li>
<li><a href="https://mimic.mit.edu/">MIMIC-IV</a> - Freely accessible
critical care database.</li>
</ul>
<h2 id="api">API</h2>
<ul>
<li><a
href="https://www.nlm.nih.gov/dataguide/edirect/esearch.html">PubMed
esearch</a>: API for searching articles in PubMed.</li>
</ul>
<h2 id="preprocess">Preprocess</h2>
<ul>
<li><a href="https://github.com/cdk/cdk">Chemistry Development Kit</a> -
A software of cheminformatics and Machine Learning.</li>
<li><a href="https://github.com/rdkit/rdkit">RDKit</a> - A software of
cheminformatics and Machine Learning.</li>
<li><a href="https://scanpy.readthedocs.io/en/stable/">Scanpy</a> -
scRNA analysis library in Python.</li>
<li><a href="https://satijalab.org/seurat/">Seurat</a> - scRNA analysis
library in R.</li>
</ul>
<h2 id="machine-learning-tasks-and-models">Machine Learning Tasks and
Models</h2>
<h2 id="drug-response-prediction">Drug Response Prediction</h2>
<ul>
<li><a href="https://github.com/inoue0426/drGAT">drGAT</a>: A model for
drug response prediction with gene explainability with attention
mechanism.</li>
<li><a href="https://github.com/weiba/MOFGCN/tree/main">MOFGCN</a>: GCN
+ heterogeneous network</li>
<li><a
href="https://ieeexplore-ieee-org.ezp2.lib.umn.edu/stamp/stamp.jsp?tp=&amp;arnumber=8723620&amp;tag=1">DeepDSC</a>:
Autoencoder + Fully Connected NN</li>
<li><a href="https://github.com/minwoopak/heteronet">DGDRP</a>:
Multi-view embedding NN.</li>
<li><a href="https://github.com/zhejiangzhuque/DeepAEG">DeepAEG</a>: GNN
Embedding + Attention</li>
</ul>
<h3 id="drug-repurposing">Drug Repurposing</h3>
<ul>
<li><a
href="https://github.com/kexinhuang12345/DeepPurpose">DeepPurpose</a> -
A DL Library for Drug Repurposing.</li>
</ul>
<h3 id="drug-target-interaction">Drug Target Interaction</h3>
<ul>
<li><a href="https://github.com/FangpingWan/NeoDTI">NeoDTI</a> - A
library for Drug Target Interaction.</li>
</ul>
<h3 id="compound-protein-interaction">Compound Protein Interaction</h3>
<ul>
<li><a
href="https://github.com/mhlee0903/multi_channels_PINN">MCPINN</a> - A
library for drug discovery using Compound Protein Interaction and
Machine Learning.</li>
<li><a
href="https://github.com/lifanchen-simm/transformerCPI">TransformerCPI</a>
- A library for Compound Protein Interaction prediction using
Transformer.</li>
</ul>
<h3 id="pre-trained-embedding">Pre-trained embedding</h3>
<ul>
<li><a href="https://github.com/facebookresearch/esm">Evolutionary Scale
Modeling</a> - a library for protein embeddings.</li>
<li><a
href="https://github.com/seyonechithrananda/bert-loves-chemistry">ChemBERTa-2</a>
- a library for chemical embeddingg and prediction.</li>
</ul>
<h3 id="llm-for-biology">LLM for biology</h3>
<ul>
<li><a
href="https://huggingface.co/AI4Chem/ChemLLM-7B-Chat">AI4Chem/ChemLLM-7B-Chat</a>
- LLM for chemical and molecule science</li>
<li><a href="https://github.com/microsoft/BioGPT">BioGPT</a> - LLM for
Biomedical text generation</li>
<li><a href="https://github.com/ncbi/GeneGPT">GeneGPT</a> - LLM for
biomedical information with several API.</li>
<li><a href="https://github.com/yiqunchen/GenePT">GenePT</a> -
foundation LLM for single cell data</li>
<li><a href="https://github.com/cantinilab/scPRINT">scPRINT</a> -
scPRINT is pretrained on 50M cells to denoise and perform zero
imputation of any single cell RNAseq profile.</li>
</ul>
<p><a
href="https://github.com/inoue0426/awesome-computational-biology">computationalbiology.md
Github</a></p>