<a href="https://krzjoa.github.io/awesome-python-data-science/"><img width="250" height="250" src="img/py-datascience.png" alt="pyds"></a>
<br>
<br>
<br>
Awesome Python Data Science
Probably the best curated list of data science software in Python
Contents
Machine Learning
General Purpose Machine
Learning
- scikit-learn - Machine
learning in Python.

- PyCaret - An
open-source, low-code machine learning library in Python.

- Shogun -
Machine learning toolbox.
- xLearn - High
Performance, Easy-to-use, and Scalable Machine Learning Package.
- cuML - RAPIDS Machine
Learning Library.

- modAL - Modular
active learning framework for Python3.

- Sparkit-learn -
PySpark + scikit-learn = Sparkit-learn.

- mlpack - A scalable
C++ machine learning library (Python bindings).
- dlib - Toolkit for
making real-world machine learning and data analysis applications in C++
(Python bindings).
- MLxtend - Extension
and helper modules for Python’s data analysis and machine learning
libraries.

- hyperlearn
- 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn,
Statsmodels.

- Reproducible Experiment
Platform (REP) - Machine Learning toolbox for Humans.

- scikit-multilearn
- Multi-label classification for python.

- seqlearn -
Sequence classification toolkit for Python.

- pystruct - Simple
structured learning framework for Python.

- sklearn-expertsys
- Highly interpretable classifiers for scikit learn.

- RuleFit -
Implementation of the rulefit.

- metric-learn
- Metric learning algorithms in Python.

- pyGAM - Generalized
Additive Models in Python.
- causalml - Uplift
modeling and causal inference with machine learning algorithms.

Gradient Boosting
- XGBoost - Scalable,
Portable, and Distributed Gradient Boosting.

- LightGBM - A
fast, distributed, high-performance gradient boosting.

- CatBoost - An
open-source gradient boosting on decision trees library.

- ThunderGBM -
Fast GBDTs and Random Forests on GPUs.

- NGBoost -
Natural Gradient Boosting for Probabilistic Prediction.
- TensorFlow
Decision Forests - A collection of state-of-the-art algorithms for
the training, serving and interpretation of Decision Forest models in
Keras.

Ensemble Methods
- ML-Ensemble - High performance
ensemble learning.

- Stacking - Simple
and useful stacking library written in Python.

- stacked_generalization
- Library for machine learning stacking generalization.

- vecstack - Python
package for stacking (machine learning technique).

Imbalanced Datasets
- imbalanced-learn
- Module to perform under-sampling and over-sampling with various
techniques.

- imbalanced-algorithms
- Python-based implementations of algorithms for learning on imbalanced
data.

Random Forests
Kernel Methods
- pyFM -
Factorization machines in python.

- fastFM - A library
for Factorization Machines.

- tffm - TensorFlow
implementation of an arbitrary order Factorization Machine.

- liquidSVM - An
implementation of SVMs.
- scikit-rvm
- Relevance Vector Machine implementation using the scikit-learn API.

- ThunderSVM - A
fast SVM Library on GPUs and CPUs.

Deep Learning
PyTorch
- PyTorch - Tensors
and Dynamic neural networks in Python with strong GPU acceleration.

- pytorch-lightning -
PyTorch Lightning is just organized PyTorch.

- ignite - High-level
library to help with training neural networks in PyTorch.

- skorch - A
scikit-learn compatible neural network library that wraps PyTorch.

- Catalyst -
High-level utils for PyTorch DL & RL research.

- ChemicalX - A
PyTorch-based deep learning library for drug pair scoring.

TensorFlow
- TensorFlow -
Computation using data flow graphs for scalable machine learning by
Google.

- TensorLayer -
Deep Learning and Reinforcement Learning Library for Researcher and
Engineer.

- TFLearn - Deep
learning library featuring a higher-level API for TensorFlow.

- Sonnet -
TensorFlow-based neural network library.

- tensorpack - A
Neural Net Training Interface on TensorFlow.

- Polyaxon - A
platform that helps you build, manage and monitor deep learning models.

- tfdeploy - Deploy
TensorFlow graphs for fast evaluation and export to TensorFlow-less
environments running numpy.

- tensorflow-upstream
- TensorFlow ROCm port.

- TensorFlow Fold -
Deep learning with dynamic computation graphs in TensorFlow.

- TensorLight - A
high-level framework for TensorFlow.

- Mesh TensorFlow -
Model Parallelism Made Easier.

- Ludwig - A toolbox that
allows one to train and test deep learning models without the need to
write code.

- Keras - A high-level neural networks
API running on top of TensorFlow.

- keras-contrib -
Keras community contributions.

- Hyperas - Keras
+ Hyperopt: A straightforward wrapper for a convenient hyperparameter.

- Elephas -
Distributed Deep learning with Keras & Spark.

- qkeras - A
quantization deep learning library.

MXNet
- MXNet -
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with
Dynamic, Mutation-aware Dataflow Dep Scheduler.

- Gluon - A
clear, concise, simple yet powerful and efficient API for deep learning
(now included in MXNet).

- Xfer - Transfer Learning
library for Deep Neural Networks.

- MXNet -
HIP Port of MXNet.

JAX
- JAX - Composable
transformations of Python+NumPy programs: differentiate, vectorize, JIT
to GPU/TPU, and more.
- FLAX - A neural network
library for JAX that is designed for flexibility.
- Optax - A
gradient processing and optimization library for JAX.
Others
- transformers -
State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

- Tangent -
Source-to-Source Debuggable Derivatives in Pure Python.
- autograd -
Efficiently computes derivatives of numpy code.
- Caffe - A fast open
framework for deep learning.
- nnabla - Neural Network
Libraries by Sony.
Automated Machine Learning
- auto-sklearn -
An AutoML toolkit and a drop-in replacement for a scikit-learn
estimator.

- Auto-PyTorch -
Automatic architecture search and hyperparameter optimization for
PyTorch.

- AutoKeras -
AutoML library for deep learning.

- AutoGluon -
AutoML for Image, Text, Tabular, Time-Series, and MultiModal Data.
- TPOT - AutoML tool
that optimizes machine learning pipelines using genetic programming.

- MLBox - A
powerful Automated Machine Learning python library.
Natural Language Processing
- torchtext - Data
loaders and abstractions for text and NLP.

- gluon-nlp - NLP made
easy.

- KerasNLP -
Modular Natural Language Processing workflows with Keras.

- spaCy - Industrial-Strength Natural
Language Processing.
- NLTK - Modules, data
sets, and tutorials supporting research and development in Natural
Language Processing.
- CLTK - The Classical
Language Toolkik.
- gensim - Topic
Modelling for Humans.
- pyMorfologik
- Python binding for
Morfologik.
- skift - Scikit-learn
wrappers for Python fastText.

- Phonemizer -
Simple text-to-phonemes converter for multiple languages.
- flair - Very
simple framework for state-of-the-art NLP.
Computer Audition
- torchaudio - An audio
library for PyTorch.

- librosa - Python
library for audio and music analysis.
- Yaafe - Audio features
extraction.
- aubio - A library for
audio and music analysis.
- Essentia - Library for
audio and music analysis, description, and synthesis.
- LibXtract -
A simple, portable, lightweight library of audio feature extraction
functions.
- Marsyas - Music
Analysis, Retrieval, and Synthesis for Audio Signals.
- muda - A library for
augmenting annotated audio data.
- madmom - Python audio
and music signal processing library.
Computer Vision
- torchvision -
Datasets, Transforms, and Models specific to Computer Vision.

- PyTorch3D -
PyTorch3D is FAIR’s library of reusable components for deep learning
with 3D data.

- gluon-cv - Provides
implementations of the state-of-the-art deep learning models in computer
vision.

- KerasCV -
Industry-strength Computer Vision workflows with Keras.

- OpenCV - Open Source
Computer Vision Library.
- Decord - An efficient
video loader for deep learning with smart shuffling that’s super easy to
digest.
- MMEngine -
OpenMMLab Foundational Library for Training Deep Learning Models.

- scikit-image -
Image Processing SciKit (Toolbox for SciPy).
- imgaug - Image
augmentation for machine learning experiments.
- imgaug_extension
- Additional augmentations for imgaug.
- Augmentor -
Image augmentation library in Python for machine learning.
- albumentations
- Fast image augmentation library and easy-to-use wrapper around other
libraries.
- LAVIS - A One-stop
Library for Language-Vision Intelligence.
Time Series
- sktime
- A unified framework for machine learning with time series.

- darts - A python
library for easy manipulation and forecasting of time series.
- statsforecast
- Lightning fast forecasting with statistical and econometric
models.
- mlforecast -
Scalable machine learning-based time series forecasting.
- neuralforecast -
Scalable machine learning-based time series forecasting.
- tslearn - Machine
learning toolkit dedicated to time-series data.

- tick - Module
for statistical learning, with a particular emphasis on time-dependent
modeling.

- greykite - A
flexible, intuitive, and fast forecasting library next.
- Prophet -
Automatic Forecasting Procedure.
- PyFlux - Open source
time series library for Python.
- bayesloop -
Probabilistic programming framework that facilitates objective model
selection for time-varying parameter models.
- luminol - Anomaly
Detection and Correlation library.
- dateutil -
Powerful extensions to the standard datetime module
- maya - makes it
very easy to parse a string and for changing timezones
- Chaos
Genius - ML powered analytics engine for outlier/anomaly detection
and root cause analysis
Reinforcement Learning
- Gymnasium - An
API standard for single-agent reinforcement learning environments, with
popular reference environments and related utilities (formerly Gym).
- PettingZoo -
An API standard for multi-agent reinforcement learning environments,
with popular reference environments and related utilities.
- MAgent2 -
An engine for high performance multi-agent environments with very large
numbers of agents, along with a set of reference environments.
- Stable
Baselines3 - A set of improved implementations of reinforcement
learning algorithms based on OpenAI Baselines.
- Shimmy -
An API conversion tool for popular external reinforcement learning
environments.
- EnvPool - C++-based
high-performance parallel environment execution engine (vectorized env)
for general RL environments.
- RLlib
- Scalable Reinforcement Learning.
- Tianshou
- An elegant PyTorch deep reinforcement learning library.

- Acme - A
library of reinforcement learning components and agents.
- Catalyst-RL -
PyTorch framework for RL research.

- d3rlpy - An offline
deep reinforcement learning library.
- DI-engine -
OpenDILab Decision AI Engine.

- TF-Agents - A
library for Reinforcement Learning in TensorFlow.

- TensorForce
- A TensorFlow library for applied reinforcement learning.

- TRFL - TensorFlow
Reinforcement Learning.

- Dopamine - A
research framework for fast prototyping of reinforcement learning
algorithms.
- keras-rl - Deep
Reinforcement Learning for Keras.

- garage - A
toolkit for reproducible reinforcement learning research.
- Horizon -
A platform for Applied Reinforcement Learning.
- rlpyt - Reinforcement
Learning in PyTorch.

- cleanrl -
High-quality single file implementation of Deep Reinforcement Learning
algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3,
SAC, PPG).
- Machin - A
reinforcement library designed for pytorch.

- SKRL - Modular
reinforcement learning library (on PyTorch and JAX) with support for
NVIDIA Isaac Gym, Isaac Orbit and Omniverse Isaac Gym.

- Imitation -
Clean PyTorch implementations of imitation and reward learning
algorithms.

Graph Machine Learning
- pytorch_geometric
- Geometric Deep Learning Extension Library for PyTorch.

- pytorch_geometric_temporal
- Temporal Extension Library for PyTorch Geometric.

- PyTorch
Geometric Signed Directed - A signed/directed graph neural network
extension library for PyTorch Geometric.

- dgl - Python package built
to ease deep learning on graph, on top of existing DL frameworks.

- Spektral
- Deep learning on graphs.

- StellarGraph -
Machine Learning on Graphs.

- Graph
Nets - Build Graph Nets in Tensorflow.

- TensorFlow GNN - A
library to build Graph Neural Networks on the TensorFlow platform.

- Auto Graph Learning
-An autoML framework & toolkit for machine learning on graphs.
- PyTorch-BigGraph
- Generate embeddings from large-scale graph-structured data.

- Auto Graph Learning
- An autoML framework & toolkit for machine learning on graphs.
- Karate
Club - An unsupervised machine learning library for graph-structured
data.
- Little
Ball of Fur - A library for sampling graph structured data.
- GreatX - A
graph reliability toolbox based on PyTorch and PyTorch Geometric (PyG).

- Jraph - A
Graph Neural Network Library in Jax.
Learning-to-Rank &
Recommender Systems
- LightFM - A Python
implementation of LightFM, a hybrid recommendation algorithm.
- Spotlight -
Deep recommender models using PyTorch.
- Surprise - A
Python scikit for building and analyzing recommender systems.
- RecBole - A
unified, comprehensive and efficient recommendation library.

- allRank - allRank
is a framework for training learning-to-rank neural models based on
PyTorch.

- TensorFlow
Recommenders - A library for building recommender system models
using TensorFlow.

- TensorFlow
Ranking - Learning to Rank in TensorFlow.

Probabilistic Graphical
Models
- pomegranate -
Probabilistic and graphical models for Python.

- pgmpy - A python
library for working with Probabilistic Graphical Models.
- pyAgrum - A GRaphical
Universal Modeler.
Probabilistic Methods
- pyro - A flexible,
scalable deep probabilistic programming library built on PyTorch.

- PyMC - Bayesian
Stochastic Modelling in Python.
- ZhuSuan -
Bayesian Deep Learning.

- GPflow -
Gaussian processes in TensorFlow.

- InferPy - Deep
Probabilistic Modelling Made Easy.

- PyStan - Bayesian
inference using the No-U-Turn sampler (Python interface).
- sklearn-bayes
- Python package for Bayesian Machine Learning with scikit-learn API.

- skpro -
Supervised domain-agnostic prediction framework for probabilistic
modelling by The Alan Turing
Institute.

- PyVarInf -
Bayesian Deep Learning methods with Variational Inference for PyTorch.

- emcee - The Python
ensemble sampling toolkit for affine-invariant MCMC.
- hsmmlearn - A
library for hidden semi-Markov models with explicit durations.
- pyhsmm - Bayesian
inference in HSMMs and HMMs.
- GPyTorch - A
highly efficient and modular implementation of Gaussian Processes in
PyTorch.

- sklearn-crfsuite
- A scikit-learn-inspired API for CRFsuite.

Model Explanation
- dalex - moDel
Agnostic Language for Exploration and explanation.


- Shapley
- A data-driven framework to quantify the value of classifiers in a
machine learning ensemble.
- Alibi - Algorithms
for monitoring and explaining machine learning models.
- anchor - Code for
“High-Precision Model-Agnostic Explanations” paper.
- aequitas - Bias and
Fairness Audit Toolkit.
- Contrastive
Explanation - Contrastive Explanation (Foil Trees).

- yellowbrick -
Visual analysis and diagnostic tools to facilitate machine learning
model selection.

- scikit-plot
- An intuitive library to add plotting functionality to scikit-learn
objects.

- shap - A unified
approach to explain the output of any machine learning model.

- ELI5 - A library
for debugging/inspecting machine learning classifiers and explaining
their predictions.
- Lime - Explaining the
predictions of any machine learning classifier.

- FairML - FairML is
a python toolbox auditing the machine learning models for bias.

- L2X - Code for
replicating the experiments in the paper Learning to Explain: An
Information-Theoretic Perspective on Model Interpretation.
- PDPbox - Partial
dependence plot toolbox.
- PyCEbox -
Python Individual Conditional Expectation Plot Toolbox.
- Skater -
Python Library for Model Interpretation.
- model-analysis -
Model analysis tools for TensorFlow.

- themis-ml - A
library that implements fairness-aware machine learning algorithms.

- treeinterpreter -
Interpreting scikit-learn’s decision tree and random forest predictions.

- AI Explainability 360 -
Interpretability and explainability of data and machine learning
models.
- Auralisation -
Auralisation of learned features in CNN (for audio).
- CapsNet-Visualization
- A visualization of the CapsNet layers to better understand how it
works.
- lucid - A
collection of infrastructure and tools for research in neural network
interpretability.
- Netron -
Visualizer for deep learning and machine learning models (no Python
code, but visualizes models from most Python Deep Learning
frameworks).
- FlashLight -
Visualization Tool for your NeuralNetwork.
- tensorboard-pytorch
- Tensorboard for PyTorch (and chainer, mxnet, numpy, …).
- mxboard - Logging
MXNet data for visualization in TensorBoard.

Genetic Programming
- gplearn -
Genetic Programming in Python.

- DEAP - Distributed
Evolutionary Algorithms in Python.
- karoo_gp - A
Genetic Programming platform for Python with GPU support.

- monkeys - A
strongly-typed genetic programming framework for Python.
- sklearn-genetic
- Genetic feature selection module for scikit-learn.

## Optimization * Optuna - A hyperparameter
optimization framework. * Spearmint - Bayesian
optimization. * BoTorch
- Bayesian optimization in PyTorch.
* scikit-opt -
Heuristic Algorithms for optimization. * sklearn-genetic-opt
- Hyperparameters tuning and feature selection using evolutionary
algorithms.
*
SMAC3 - Sequential
Model-based Algorithm Configuration. * Optunity - Is a library
containing various optimizers for hyperparameter tuning. * hyperopt - Distributed
Asynchronous Hyperparameter Optimization in Python. * hyperopt-sklearn
- Hyper-parameter optimization for sklearn.
* sklearn-deap - Use
evolutionary algorithms instead of gridsearch in scikit-learn.
* sigopt_sklearn -
SigOpt wrappers for scikit-learn methods.
* Bayesian
Optimization - A Python implementation of global optimization with
gaussian processes. * SafeOpt - Safe Bayesian
Optimization. * scikit-optimize
- Sequential model-based optimization with a scipy.optimize
interface. * Solid - A
comprehensive gradient-free optimization framework written in Python. *
PySwarms - A
research toolkit for particle swarm optimization in Python. * Platypus - A
Free and Open Source Python Library for Multiobjective Optimization. *
GPflowOpt - Bayesian
Optimization using GPflow.
* POT - Python Optimal
Transport library. * Talos - Hyperparameter
Optimization for Keras Models. * nlopt - Library for
nonlinear optimization (global and local, constrained or unconstrained).
* OR-Tools - An
open-source software suite for optimization by Google; provides a
unified programming interface to a half dozen solvers: SCIP, GLPK, GLOP,
CP-SAT, CPLEX, and Gurobi.
Feature Engineering
General
- Featuretools -
Automated feature engineering.
- Feature
Engine - Feature engineering package with sklearn-like
functionality.

- OpenFE -
Automated feature generation with expert-level performance.
- skl-groups - A
scikit-learn addon to operate on set/“group”-based features.

- Feature
Forge - A set of tools for creating and testing machine learning
features.

- few - A feature
engineering wrapper for sklearn.

- scikit-mdr
- A sklearn-compatible Python implementation of Multifactor
Dimensionality Reduction (MDR) for feature construction.

- tsfresh -
Automatic extraction of relevant features from time series.

- dirty_cat -
Machine learning on dirty tabular data (especially: string-based
variables for classifcation and regression).

- NitroFE - Moving
window features.

- sk-transformer
- A collection of various pandas & scikit-learn compatible
transformers for all kinds of preprocessing and feature engineering
steps

Feature Selection
- scikit-feature -
Feature selection repository in Python.
- boruta_py -
Implementations of the Boruta all-relevant feature selection method.

- BoostARoota
- A fast xgboost feature selection algorithm.

- scikit-rebate -
A scikit-learn-compatible Python implementation of ReBATE, a suite of
Relief-based feature selection algorithms for Machine Learning.

- zoofs - A
feature selection library based on evolutionary algorithms.
Visualization
General Purposes
- Matplotlib -
Plotting with Python.
- seaborn -
Statistical data visualization using matplotlib.
- prettyplotlib
- Painlessly create beautiful matplotlib plots.
- python-ternary -
Ternary plotting library for Python with matplotlib.
- missingno -
Missing data visualization module for Python.
- chartify - Python
library that makes it easy for data scientists to create charts.
- physt - Improved
histograms. ### Interactive plots
- animatplot - A
python package for animating plots built on matplotlib.
- plotly - A Python library that
makes interactive and publication-quality graphs.
- Bokeh - Interactive Web
Plotting for Python.
- Altair - Declarative
statistical visualization library for Python. Can easily do many data
transformation within the code to create graph
- bqplot - Plotting
library for IPython/Jupyter notebooks
- pyecharts -
Migrated from Echarts, a
charting and visualization library, to Python’s interactive visual
drawing
library.
### Map
- folium
- Makes it easy to visualize data on an interactive open street map
- geemap - Python
package for interactive mapping with Google Earth Engine (GEE) ###
Automatic Plotting
- HoloViews - Stop
plotting your data - annotate your data and let it visualize
itself.
- AutoViz: Visualize
data automatically with 1 line of code (ideal for machine learning)
- SweetViz:
Visualize and compare datasets, target values and associations, with one
line of code.
NLP
- pyLDAvis: Visualize
interactive topic model
Deployment
- fastapi - Modern, fast
(high-performance), a web framework for building APIs with Python
- streamlit - Make it easy to
deploy the machine learning model
- streamsync -
No-code in the front, Python in the back. An open-source framework for
creating data apps.
- gradio - Create
UIs for your machine learning model in Python in 3 minutes.
- Vizro - A toolkit
for creating modular data visualization applications.
- datapane - A collection of APIs
to turn scripts and notebooks into interactive reports.
- binder - Enable sharing and
execute Jupyter Notebooks
Statistics
- pandas_summary
- Extension to pandas dataframes describe function.

- Pandas
Profiling - Create HTML profiling reports from pandas DataFrame
objects.

- statsmodels
- Statistical modeling and econometrics in Python.
- stockstats -
Supply a wrapper
StockDataFrame based on the
pandas.DataFrame with inline stock statistics/indicators
support.
- weightedcalcs
- A pandas-based utility to calculate weighted means, medians,
distributions, standard deviations, and more.
- scikit-posthocs -
Pairwise Multiple Comparisons Post-hoc Tests.
- Alphalens -
Performance analysis of predictive (alpha) stock factors.
Data Manipulation
Data Frames
- pandas -
Powerful Python data analysis toolkit.
- polars - A fast
multi-threaded, hybrid-out-of-core DataFrame library.
- Arctic -
High-performance datastore for time series and tick data.
- datatable -
Data.table for Python.

- pandas_profiling
- Create HTML profiling reports from pandas DataFrame objects
- cuDF - GPU DataFrame
Library.

- blaze - NumPy and
pandas interface to Big Data.

- pandasql - Allows you
to query pandas DataFrames using SQL syntax.

- pandas-gbq -
pandas Google Big Query.

- xpandas -
Universal 1d/2d data containers with Transformers .functionality for
data analysis by The Alan Turing
Institute.
- pysparkling
- A pure Python implementation of Apache Spark’s RDD and DStream
interfaces.

- modin - Speed
up your pandas workflows by changing a single line of code.

- swifter - A
package that efficiently applies any function to a pandas dataframe or
series in the fastest available manner.
- pandas-log
- A package that allows providing feedback about basic pandas operations
and finds both business logic and performance issues.
- vaex - Out-of-Core
DataFrames for Python, ML, visualize and explore big tabular data at a
billion rows per second.
- xarray - Xarray
combines the best features of NumPy and pandas for multidimensional data
selection by supplementing numerical axis labels with named dimensions
for more intuitive, concise, and less error-prone indexing
routines.
Pipelines
- pdpipe - Sasy
pipelines for pandas DataFrames.
- SSPipe - Python pipe (|)
operator with support for DataFrames and Numpy, and Pytorch.
- pandas-ply -
Functional data manipulation for pandas.

- Dplython - Dplyr
for Python.

- sklearn-pandas
- pandas integration with sklearn.

- Dataset -
Helps you conveniently work with random or sequential batches of your
data and define data processing.
- pyjanitor - Clean
APIs for data cleaning.

- meza - A Python
toolkit for processing tabular data.
- Prodmodel -
Build system for data science pipelines.
- dopanda -
Hints and tips for using pandas in an analysis environment.

- Hamilton - A
microframework for dataframe generation that applies Directed Acyclic
Graphs specified by a flow of lazily evaluated Python functions.
Data-centric AI
- cleanlab - The
standard data-centric AI package for data quality and machine learning
with messy, real-world data and labels.
- snorkel - A
system for quickly generating training data with weak supervision.
- dataprep - Collect,
clean, and visualize your data in Python with a few lines of code.
Synthetic Data
- ydata-synthetic -
A package to generate synthetic tabular and time-series data leveraging
the state-of-the-art generative models.

Distributed Computing
- Horovod - Distributed
training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

- PySpark
- Exposes the Spark programming model to Python.

- Veles - Distributed
machine learning platform.
- Jubatus - Framework
and Library for Distributed Online Machine Learning.
- DMTK - Microsoft
Distributed Machine Learning Toolkit.
- PaddlePaddle -
PArallel Distributed Deep LEarning.
- dask-ml - Distributed
and parallel machine learning.

- Distributed -
Distributed computation in Python.
Experimentation
- mlflow - Open source
platform for the machine learning lifecycle.
- Neptune - A lightweight ML
experiment tracking, results visualization, and management tool.
- dvc - Data Version
Control | Git for Data & Models | ML Experiments Management.
- envd - 🏕️ machine
learning development environment for data science and AI/ML engineering
teams.
- Sacred - A tool to
help you configure, organize, log, and reproduce experiments.
- Ax - Adaptive
Experimentation Platform.

Data Validation
- great_expectations
- Always know what to expect from your data.
- pandera - A
lightweight, flexible, and expressive statistical data testing
library.
- deepchecks -
Validation & testing of ML models and data during model development,
deployment, and production.

- evidently -
Evaluate and monitor ML models from validation to production.
- TensorFlow
Data Validation - Library for exploring and validating machine
learning data.
Evaluation
- recmetrics
- Library of useful metrics and plots for evaluating recommender
systems.
- Metrics - Machine
learning evaluation metric.
- sklearn-evaluation
- Model evaluation made easy: plots, tables, and markdown reports.

- AI Fairness 360 -
Fairness metrics for datasets and ML models, explanations, and
algorithms to mitigate bias in datasets and models.
Computations
- numpy - The fundamental package
needed for scientific computing with Python.
- Dask - Parallel computing
with task scheduling.

- bottleneck -
Fast NumPy array functions written in C.
- CuPy - NumPy-like API
accelerated with CUDA.
- scikit-tensor -
Python library for multilinear algebra and tensor factorizations.
- numdifftools -
Solve automatic numerical differentiation problems in one or more
variables.
- quaternion - Add
built-in support for quaternions to numpy.
- adaptive -
Tools for adaptive and parallel samping of mathematical functions.
- NumExpr - A fast
numerical expression evaluator for NumPy that comes with an integrated
computing virtual machine to speed calculations up by avoiding memory
allocation for intermediate results.
Web Scraping
- BeautifulSoup:
The easiest library to scrape static websites for beginners
- Scrapy: Fast and extensible
scraping library. Can write rules and create customized scraper without
touching the core
- Selenium:
Use Selenium Python API to access all functionalities of Selenium
WebDriver in an intuitive way like a real user.
- Pattern: High level
scraping for well-establish websites such as Google, Twitter, and
Wikipedia. Also has NLP, machine learning algorithms, and
visualization
- twitterscraper:
Efficient library to scrape Twitter
Spatial Analysis
- GeoPandas -
Python tools for geographic data.

- PySal - Python Spatial
Analysis Library.
Quantum Computing
- qiskit - Qiskit is an
open-source SDK for working with quantum computers at the level of
circuits, algorithms, and application modules.
- cirq - A python
framework for creating, editing, and invoking Noisy Intermediate Scale
Quantum (NISQ) circuits.
- PennyLane -
Quantum machine learning, automatic differentiation, and optimization of
hybrid quantum-classical computations.
- QML - A Python Toolkit
for Quantum Machine Learning.
Conversion
- sklearn-porter -
Transpile trained scikit-learn estimators to C, Java, JavaScript, and
others.
- ONNX - Open Neural
Network Exchange.
- MMdnn - A set of
tools to help users inter-operate among different deep learning
frameworks.
- treelite - Universal
model exchange and serialization format for decision tree forests.
Contributing
Contributions are welcome! :sunglasses: Read the
contribution
guideline.
License
This work is licensed under the Creative Commons Attribution 4.0
International License - CC BY 4.0