Awesome Computational
Biology 
A knowledge collection of databases, software and papers related to
computational biology.
Computational biology involves the development and application of
data-analytical and theoretical methods, mathematical modelling and
computational simulation techniques to the study of biological,
ecological, behavioural, and social systems. - Wikipedia
Contents
Databases
scRNA
- Gene Expression
Omnibus - Public functional genemics database.
- Single
Cell PORTAL - Public database for single cell RNA.
- Single Cell Expression
Atlas - Public database for single cell RNA. ### Compound
- PubChem - One of the
biggest chemical database such as compounds, genes and proteins.
- ChEBI - Chemical database
focused on small chemical compounds.
- ChEMBL - Database of
bioactive molecules with drug-like properties.
- ChemSpider - Chemical
structure database.
- KEGG COMPOUND -
Collection of small molecules and biopolymers.
- LIPID
MAPS - Database of lipids.
- Rhea - Database of chemical
reactions.
- Drug
Repurposing Hub - Collections of drug repurposing data containing
drug, moa, target etc.
- Therapeutic
Target Database - collections of drug-target, target-disease, and
drug-disease dataset.
- ZINC ligand discovery
database - Free database of commercially-available compounds for
virtual screening.
- MoleculeNet - Benchmark for
molecular machine learning.
- Ames
Mutagenicity dataset - Dataset for predicting mutagenicity.
- ADCdb - Database for
antibody-drug conjugates. ### Pathway
- PathwayCommons -
Database of Pathways and Interactions.
- KEGG PATHWAY -
Collection fo drawn pathway maps.
- WikiPathways - Database of
biological pathways. ### Mass Spectra
- MassBank - Open souce
databases and tools for mass spectrometry reference spectra.
- MoNA MassBank of North
America - Meta database of metabolite mass spectra, metadata and
associated compounds. ### Protein
- THE HUMAN PROTEIN ATLAS
- One of the biggest human protein database contained cells, tissues,
and organs.
- PROTEIN DATA BANK - Database of
the 3D shapes of proteins, nucleic acids, and complex assemblies.
- UniProt - The collection of
functional information on proteins.
- AlphaFold Protein
Structure Database - Database of 3D protein structures.
- RCSB Protein Data Bank (PDB) -
Repository of 3D structural data of large biological molecules.
- Critical Assessment of
Structure Prediction (CASP) - Experiment for advancing the methods
of predicting protein structure from sequence.
- Uniclust - Collection of
clustered protein sequence databases.
- CATH database - Hierarchical
classification of protein domain structures. ### Genome
- Human
Genome Resources at NCBI - Database of image, proteomics,
transcriptomics and systems biology.
- GenBank -
Database of genetic sequence offered by NCBI.
- UCSC Genome Browser - Genome
blowser offered by UCSC.
- cBioPortal - Database of
Cancer Genomics. This has overall metaview for a lot of patients.
- 10x
Genomics Dataset - Collection of single-cell datasets.
- The Genotype-Tissue
Expression (GTEx) - Resource for studying human gene expression and
regulation.
- Dependency Map (DepMap) -
Genome-wide CRISPR-Cas9 screens in cancer cell lines.
- Catalogue Of Somatic
Mutations In Cancer (COSMIC) - Comprehensive resource for exploring
somatic mutations in human cancers.
- MGnify - Free
resource for archiving, analysis, and browsing of metagenomic and
metatranscriptomic data.
- JASPAR - Open-access
database of curated, non-redundant transcription factor binding
profiles. ### Disease
- KEGG DRUG -
Comprehensive drug information resource for approved drugs.
- DrugBank - A database of
drug and target maintained by the University of Alberta. ###
Interaction
- Drug Gene Interaction
- DGIdb - A database of drug-gene
interactions and the druggable genome.
- Comparative Toxicogenomics
Database - A database of Chemical-gene interactions,
Chemical-disease associations, Gene-disease associations, and
Chemical-phenotype associations.
- SNAP
- A dataset which contains Drug-gene interactions.
- Therapeutics Data Commons - A
database for a lot of tasks such as drug-target, drug-response,
drug-drug interaction.
- Drug (-Cell line) Response
- Chemical Protein Interaction
- STITCH - A database of Chemical
Protein Interaction.
- BindingDB
- A database of compounds and targes.
- PDBBind - Database of
experimentally measured binding affinity data for biomolecular
complexes.
- CrossDocked2020 -
Large-scale dataset for machine learning in structure-based virtual
screening.
- Protein-Protein Interaction
- STRING - Protein-Protein
Interaction Networks for several organisms.
- BioGRID - Database of Protein,
Genetic and Chemical Interactions.
- HIPPIE -
Human Protein-Protein Interaction database.
- Knowledge Graph
- ClinicalTrials.gov -
Database of privately and publicly funded clinical studies.
- ICD10 -
International Classification of Diseases, 10th revision.
- EU Drug Regulating
Authorities Clinical Trials DB (EudraCT) - European database of
clinical trials.
- MIMIC-IV - Freely accessible
critical care database.
API
Preprocess
- Chemistry Development Kit -
A software of cheminformatics and Machine Learning.
- RDKit - A software of
cheminformatics and Machine Learning.
- Scanpy -
scRNA analysis library in Python.
- Seurat - scRNA analysis
library in R.
Machine Learning Tasks and
Models
Drug Response Prediction
- drGAT: A model for
drug response prediction with gene explainability with attention
mechanism.
- MOFGCN: GCN
+ heterogeneous network
- DeepDSC:
Autoencoder + Fully Connected NN
- DGDRP:
Multi-view embedding NN.
- DeepAEG: GNN
Embedding + Attention
Drug Repurposing
Drug Target Interaction
- NeoDTI - A
library for Drug Target Interaction.
Compound Protein Interaction
- MCPINN - A
library for drug discovery using Compound Protein Interaction and
Machine Learning.
- TransformerCPI
- A library for Compound Protein Interaction prediction using
Transformer.
Pre-trained embedding
LLM for biology
- AI4Chem/ChemLLM-7B-Chat
- LLM for chemical and molecule science
- BioGPT - LLM for
Biomedical text generation
- GeneGPT - LLM for
biomedical information with several API.
- GenePT -
foundation LLM for single cell data
- scPRINT -
scPRINT is pretrained on 50M cells to denoise and perform zero
imputation of any single cell RNAseq profile.
computationalbiology.md
Github