Files
awesome-awesomeness/terminal/computationalbiology
2025-07-18 22:22:32 +02:00

17 KiB

Awesome Computational Biology !Awesome (https://awesome.re/badge.svg) (https://awesome.re)
 
A knowledge collection of databases, software and papers related to computational biology.
 
Computational biology involves the development and application of data-analytical and theoretical methods,
mathematical modelling and computational simulation techniques to the study of biological, ecological,
behavioural, and social systems. - Wikipedia (https://en.wikipedia.org/wiki/Computational_biology)
 
Contents
 
- Databases (#databases)
- scRNA (#scrna)
- Compound (#compound)
- Pathway (#pathway)
- Mass Spectra (#mass-spectra)
- Protein (#protein)
- Genome (#genome)
- Disease (#disease)
- Interaction (#interaction)
- Clinical Trial (#clinical-trial)
- API (#api)
- Preprocess (#preprocess)
- Machine Learning Tasks and Models (#machine-learning-tasks-and-models)
- Drug Response Prediction (#drug-response-prediction)
- Drug Repurposing (#drug-repurposing)
- Drug Target Interaction (#drug-target-interaction)
- Compound Protein Interaction (#compound-protein-interaction)
- Pre-trained embedding (#pre-trained-embedding)
- LLM for biology (#llm-for-biology)
 
Databases
scRNA
- Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/) - Public functional genemics database.
- Single Cell PORTAL (https://singlecell.broadinstitute.org/single_cell) - Public database for single cell RNA.
- Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/home) - Public database for single cell RNA.
Compound
- PubChem (https://pubchem.ncbi.nlm.nih.gov/) - One of the biggest chemical database such as compounds, genes and proteins.
- ChEBI (https://www.ebi.ac.uk/chebi/) - Chemical database focused on small chemical compounds.
- ChEMBL (https://www.ebi.ac.uk/chembl/) - Database of bioactive molecules with drug-like properties.
- ChemSpider (http://www.chemspider.com/) - Chemical structure database.
- KEGG COMPOUND (https://www.genome.jp/kegg/compound/) - Collection of small molecules and biopolymers.
- LIPID MAPS (https://www.lipidmaps.org/databases/lmsd/overview) - Database of lipids.
- Rhea (https://www.rhea-db.org/) - Database of chemical reactions.
- Drug Repurposing Hub (https://repo-hub.broadinstitute.org/repurposing#download-data) - Collections of drug repurposing data containing drug, moa, target etc.
- Therapeutic Target Database (https://idrblab.net/ttd/full-data-download) - collections of drug-target, target-disease, and drug-disease dataset.
- ZINC ligand discovery database (https://zinc.docking.org/) - Free database of commercially-available compounds for virtual screening.
- MoleculeNet (http://moleculenet.ai/) - Benchmark for molecular machine learning.
- Ames Mutagenicity dataset (https://www.sciencedirect.com/science/article/abs/pii/S0166354220302412) - Dataset for predicting mutagenicity.
- ADCdb (https://www.antibody-drug.com/) - Database for antibody-drug conjugates.
Pathway
- PathwayCommons (https://www.pathwaycommons.org/) - Database of Pathways and Interactions.
- KEGG PATHWAY (https://www.genome.jp/kegg/pathway.html) - Collection fo drawn pathway maps.
- WikiPathways (https://wikipathways.org/) - Database of biological pathways.
Mass Spectra
- MassBank (http://www.massbank.jp/) - Open souce databases and tools for mass spectrometry reference spectra.
- MoNA MassBank of North America (https://mona.fiehnlab.ucdavis.edu/) - Meta database of metabolite mass spectra, metadata and associated compounds.
Protein
- THE HUMAN PROTEIN ATLAS (https://www.proteinatlas.org/) - One of the biggest human protein database contained cells, tissues, and organs.
- PROTEIN DATA BANK (https://www.rcsb.org/) - Database of the 3D shapes of proteins, nucleic acids, and complex assemblies.
- UniProt (https://www.uniprot.org/) - The collection of functional information on proteins.
- AlphaFold Protein Structure Database (https://alphafold.ebi.ac.uk/api-docs) - Database of 3D protein structures.
- RCSB Protein Data Bank (PDB) (https://www.rcsb.org/) - Repository of 3D structural data of large biological molecules.
- Critical Assessment of Structure Prediction (CASP) (https://predictioncenter.org/) - Experiment for advancing the methods of predicting protein structure from sequence.
- Uniclust (https://uniclust.mmseqs.com/) - Collection of clustered protein sequence databases.
- CATH database (https://www.cathdb.info/) - Hierarchical classification of protein domain structures.
Genome
- Human Genome Resources at NCBI (https://www.ncbi.nlm.nih.gov/projects/genome/guide/human/index.shtml) - Database of image, proteomics, transcriptomics and systems biology.
- GenBank (https://www.ncbi.nlm.nih.gov/genbank/) - Database of genetic sequence offered by NCBI.
- UCSC Genome Browser (https://genome.ucsc.edu/) - Genome blowser offered by UCSC.
- cBioPortal (https://www.cbioportal.org/) - Database of Cancer Genomics. This has overall metaview for a lot of patients.
- 10x Genomics Dataset (https://www.10xgenomics.com/resources/datasets) - Collection of single-cell datasets.
- The Genotype-Tissue Expression (GTEx) (https://gtexportal.org/home/) - Resource for studying human gene expression and regulation.
- Dependency Map (DepMap) (https://depmap.org/portal/) - Genome-wide CRISPR-Cas9 screens in cancer cell lines.
- Catalogue Of Somatic Mutations In Cancer (COSMIC) (https://cancer.sanger.ac.uk/cosmic) - Comprehensive resource for exploring somatic mutations in human cancers.
- MGnify (https://www.ebi.ac.uk/metagenomics/) - Free resource for archiving, analysis, and browsing of metagenomic and metatranscriptomic data.
- JASPAR (http://jaspar.genereg.net/) - Open-access database of curated, non-redundant transcription factor binding profiles.
Disease
- KEGG DRUG (https://www.genome.jp/kegg/drug/) - Comprehensive drug information resource for approved drugs.
- DrugBank (https://www.drugbank.com/) - A database of drug and target maintained by the University of Alberta.
Interaction
- Drug Gene Interaction
- DGIdb (https://www.dgidb.org/) - A database of drug-gene interactions and the druggable genome.
- Comparative Toxicogenomics Database (http://ctdbase.org/) - A database of Chemical-gene interactions, Chemical-disease associations, Gene-disease associations, and Chemical-phenotype associations.
- SNAP (https://snap.stanford.edu/biodata/datasets/10002/10002-ChG-Miner.html#:~:text=Dataset%20information,or%20activation%20of%20the%20drug.) - A dataset which contains Drug-gene interactions.
- Therapeutics Data Commons (https://tdcommons.ai/) - A database for a lot of tasks such as drug-target, drug-response, drug-drug interaction.
- Drug (-Cell line) Response
- NCI60 (https://dtp.cancer.gov/discovery_development/nci-60/) A database which focus on 60 cancer cell lines with many drugs.
- Genomics of Drug Sensitivity in Cancer (GDSC) (https://www.cancerrxgene.org/) - A database of drug sensitibity which has 1000 human cancer cell lines and 100s compounds.
- Cancer Cell Line Encyclopedia (https://sites.broadinstitute.org/ccle/) - A database of cancer cell lines. This has 1000 cell lines.
- CellMiner Cross Database (CellMinerCDB) (https://discover.nci.nih.gov/cellminercdb/) - Integration of multiple cancer cell line databases.
- Chemical Protein Interaction
- STITCH (http://stitch.embl.de/) - A database of Chemical Protein Interaction.
- BindingDB (https://www.bindingdb.org/rwd/bind/index.jsp) - A database of compounds and targes.
- PDBBind (http://www.pdbbind.org.cn/) - Database of experimentally measured binding affinity data for biomolecular complexes.
- CrossDocked2020 (https://arxiv.org/abs/2001.01037) - Large-scale dataset for machine learning in structure-based virtual screening.
- Protein-Protein Interaction
- STRING (https://string-db.org/) - Protein-Protein Interaction Networks for several organisms.
- BioGRID (https://thebiogrid.org/) - Database of Protein, Genetic and Chemical Interactions.
- HIPPIE (http://cbdm-01.zdv.uni-mainz.de/~mschaefer/hippie/) - Human Protein-Protein Interaction database.
- Knowledge Graph
- Drug Mechanism Database (DrugMechDB) (https://github.com/SuLab/DrugMechDB/tree/2.0.1): database of the mechanism of action from a drug to a disease.
- DRKG (https://github.com/gnn4dr/DRKG) - A library for biological knowledge graph.
Clinical Trial
- ClinicalTrials.gov (https://clinicaltrials.gov/) - Database of privately and publicly funded clinical studies.
- ICD10 (https://icd.who.int/browse10/2019/en) - International Classification of Diseases, 10th revision.
- EU Drug Regulating Authorities Clinical Trials DB (EudraCT) (https://eudract.ema.europa.eu/) - European database of clinical trials.
- MIMIC-IV (https://mimic.mit.edu/) - Freely accessible critical care database.

API
- PubMed esearch (https://www.nlm.nih.gov/dataguide/edirect/esearch.html): API for searching articles in PubMed.
 
Preprocess
 
- Chemistry Development Kit (https://github.com/cdk/cdk) - A software of cheminformatics and Machine Learning.
- RDKit (https://github.com/rdkit/rdkit) - A software of cheminformatics and Machine Learning.
- Scanpy (https://scanpy.readthedocs.io/en/stable/) - scRNA analysis library in Python.
- Seurat (https://satijalab.org/seurat/) - scRNA analysis library in R.
 
Machine Learning Tasks and Models
 
Drug Response Prediction
- drGAT (https://github.com/inoue0426/drGAT): A model for drug response prediction with gene explainability with attention mechanism.
- MOFGCN (https://github.com/weiba/MOFGCN/tree/main): GCN + heterogeneous network
- DeepDSC (https://ieeexplore-ieee-org.ezp2.lib.umn.edu/stamp/stamp.jsp?tp=&arnumber=8723620&tag=1): Autoencoder + Fully Connected NN
- DGDRP (https://github.com/minwoopak/heteronet): Multi-view embedding NN.
- DeepAEG (https://github.com/zhejiangzhuque/DeepAEG): GNN Embedding + Attention
 
Drug Repurposing
 
- DeepPurpose (https://github.com/kexinhuang12345/DeepPurpose) - A DL Library for Drug Repurposing.
 
Drug Target Interaction
 
- NeoDTI (https://github.com/FangpingWan/NeoDTI) - A library for Drug Target Interaction.
 
Compound Protein Interaction
 
- MCPINN (https://github.com/mhlee0903/multi_channels_PINN) - A library for drug discovery using Compound Protein Interaction and Machine Learning.
- TransformerCPI (https://github.com/lifanchen-simm/transformerCPI) - A library for Compound Protein Interaction prediction using Transformer.
 
Pre-trained embedding
 
- Evolutionary Scale Modeling (https://github.com/facebookresearch/esm) - a library for protein embeddings.
- ChemBERTa-2 (https://github.com/seyonechithrananda/bert-loves-chemistry) - a library for chemical embeddingg and prediction.
 
LLM for biology
 
- AI4Chem/ChemLLM-7B-Chat (https://huggingface.co/AI4Chem/ChemLLM-7B-Chat) - LLM for chemical and molecule science
- BioGPT (https://github.com/microsoft/BioGPT) - LLM for Biomedical text generation
- GeneGPT (https://github.com/ncbi/GeneGPT) - LLM for biomedical information with several API.
- GenePT (https://github.com/yiqunchen/GenePT) - foundation LLM for single cell data
- scPRINT (https://github.com/cantinilab/scPRINT) - scPRINT is pretrained on 50M cells to denoise and perform zero imputation of any single cell RNAseq profile.
 
 
 
 
computationalbiology Github: https://github.com/inoue0426/awesome-computational-biology