305 lines
15 KiB
HTML
305 lines
15 KiB
HTML
<h1
|
||
id="awesome-software-engineering-for-machine-learning-awesomeprs-welcome">Awesome
|
||
Software Engineering for Machine Learning <a
|
||
href="https://awesome.re"><img src="https://awesome.re/badge-flat2.svg"
|
||
alt="Awesome" /></a><a
|
||
href="https://github.com/SE-ML/awesome-seml/blob/master/contributing.md"><img
|
||
src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg?style=flat-square"
|
||
alt="PRs Welcome" /></a></h1>
|
||
<p>Software Engineering for Machine Learning are techniques and
|
||
guidelines for building ML applications that do not concern the core ML
|
||
problem – e.g. the development of new algorithms – but rather the
|
||
surrounding activities like data ingestion, coding, testing, versioning,
|
||
deployment, quality control, and team collaboration. Good software
|
||
engineering practices enhance development, deployment and maintenance of
|
||
production level applications using machine learning components.</p>
|
||
<p>⭐ Must-read</p>
|
||
<p>🎓 Scientific publication</p>
|
||
<p><br> Based on this literature, we compiled a survey on the adoption
|
||
of software engineering practices for applications with machine learning
|
||
components.</p>
|
||
<p>Feel free to <a href="https://se-ml.github.io/survey">take and share
|
||
the survey</a> and to <a href="https://se-ml.github.io/practices">read
|
||
more</a>!</p>
|
||
<h2 id="contents">Contents</h2>
|
||
<ul>
|
||
<li><a href="#broad-overviews">Broad Overviews</a></li>
|
||
<li><a href="#data-management">Data Management</a></li>
|
||
<li><a href="#model-training">Model Training</a></li>
|
||
<li><a href="#deployment-and-operation">Deployment and
|
||
Operation</a></li>
|
||
<li><a href="#social-aspects">Social Aspects</a></li>
|
||
<li><a href="#governance">Governance</a></li>
|
||
<li><a href="#tooling">Tooling</a></li>
|
||
</ul>
|
||
<h2 id="broad-overviews">Broad Overviews</h2>
|
||
<p>These resources cover all aspects. - <a
|
||
href="https://resources.sei.cmu.edu/asset_files/WhitePaper/2019_019_001_634648.pdf">AI
|
||
Engineering: 11 Foundational Practices</a> ⭐ - <a
|
||
href="https://pdfs.semanticscholar.org/2869/6212a4a204783e9dd3953f06e103c02c6972.pdf">Best
|
||
Practices for Machine Learning Applications</a> - <a
|
||
href="https://se-ml.github.io/practices/">Engineering Best Practices for
|
||
Machine Learning</a> ⭐ - <a
|
||
href="https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf">Hidden
|
||
Technical Debt in Machine Learning Systems</a> 🎓⭐ - <a
|
||
href="https://developers.google.com/machine-learning/guides/rules-of-ml">Rules
|
||
of Machine Learning: Best Practices for ML Engineering</a> ⭐ - <a
|
||
href="https://www.microsoft.com/en-us/research/publication/software-engineering-for-machine-learning-a-case-study/">Software
|
||
Engineering for Machine Learning: A Case Study</a> 🎓⭐</p>
|
||
<h2 id="data-management">Data Management</h2>
|
||
<p>How to manage the data sets you use in machine learning.</p>
|
||
<ul>
|
||
<li><a
|
||
href="https://deepai.org/publication/a-survey-on-data-collection-for-machine-learning-a-big-data-ai-integration-perspective">A
|
||
Survey on Data Collection for Machine Learning A Big Data - AI
|
||
Integration Perspective_2019</a> 🎓</li>
|
||
<li><a
|
||
href="http://www.vldb.org/pvldb/vol11/p1781-schelter.pdf">Automating
|
||
Large-Scale Data Quality Verification</a> 🎓</li>
|
||
<li><a
|
||
href="https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46178.pdf">Data
|
||
management challenges in production machine learning</a></li>
|
||
<li><a href="https://mlsys.org/Conferences/2019/doc/2019/167.pdf">Data
|
||
Validation for Machine Learning</a> 🎓</li>
|
||
<li><a
|
||
href="https://www.altexsoft.com/blognp/datascience/how-to-organize-data-labeling-for-machine-learning-approaches-and-tools/">How
|
||
to organize data labelling for ML</a></li>
|
||
<li><a
|
||
href="https://aws.amazon.com/blogs/apn/the-curse-of-big-data-labeling-and-three-ways-to-solve-it/">The
|
||
curse of big data labeling and three ways to solve it</a></li>
|
||
<li><a
|
||
href="http://learningsys.org/nips17/assets/papers/paper_19.pdf">The Data
|
||
Linter: Lightweight, Automated Sanity Checking for ML Data Sets</a>
|
||
🎓</li>
|
||
<li><a href="https://www.cloudfactory.com/data-labeling-guide">The
|
||
ultimate guide to data labeling for ML</a></li>
|
||
</ul>
|
||
<h2 id="model-training">Model Training</h2>
|
||
<p>How to organize your model training experiments.</p>
|
||
<ul>
|
||
<li><a
|
||
href="https://nanonets.com/blog/10-best-practices-deep-learning/#track-model-experiments">10
|
||
Best Practices for Deep Learning</a></li>
|
||
<li><a
|
||
href="https://dl.acm.org/doi/abs/10.1145/1882471.1882479">Apples-to-apples
|
||
in cross-validation studies: pitfalls in classifier performance
|
||
measurement</a> 🎓</li>
|
||
<li><a
|
||
href="https://scontent-amt2-1.xx.fbcdn.net/v/t39.8562-6/159714417_1180893265647073_4215201353052552221_n.pdf?_nc_cat=111&ccb=1-3&_nc_sid=ae5e01&_nc_ohc=6WFnNMmyp68AX95bRHk&_nc_ht=scontent-amt2-1.xx&oh=7a548f822e659b7bb2f58a511c30ee19&oe=606F33AD">Fairness
|
||
On The Ground: Applying Algorithmic FairnessApproaches To Production
|
||
Systems</a>🎓</li>
|
||
<li><a
|
||
href="https://medium.com/@hadyelsahar/how-do-you-manage-your-machine-learning-experiments-ab87508348ac">How
|
||
do you manage your Machine Learning Experiments?</a></li>
|
||
<li><a href="https://arxiv.org/pdf/1906.10742.pdf">Machine Learning
|
||
Testing: Survey, Landscapes and Horizons</a> 🎓</li>
|
||
<li><a
|
||
href="https://matthewmcateer.me/blog/machine-learning-technical-debt/">Nitpicking
|
||
Machine Learning Technical Debt</a></li>
|
||
<li><a
|
||
href="https://link.springer.com/article/10.1023/A:1009752403260">On
|
||
Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach</a>
|
||
🎓⭐</li>
|
||
<li><a href="https://arxiv.org/pdf/1611.08309.pdf">On human intellect
|
||
and machine failures: Troubleshooting integrative machine learning
|
||
systems</a> 🎓</li>
|
||
<li><a
|
||
href="https://www.jair.org/index.php/jair/article/download/11420/26488/">Pitfalls
|
||
and Best Practices in Algorithm Configuration</a> 🎓</li>
|
||
<li><a
|
||
href="https://academic.oup.com/bioinformatics/article/26/3/440/213774">Pitfalls
|
||
of supervised feature selection</a> 🎓</li>
|
||
<li><a
|
||
href="https://www.gartner.com/en/documents/3889770/preparing-and-architecting-for-machine-learning-2018-upd">Preparing
|
||
and Architecting for Machine Learning</a></li>
|
||
<li><a href="https://arxiv.org/abs/1910.05528">Preliminary Systematic
|
||
Literature Review of Machine Learning System Development Process</a>
|
||
🎓</li>
|
||
<li><a
|
||
href="https://towardsdatascience.com/software-development-best-practices-in-a-deep-learning-environment-a1769e9859b1">Software
|
||
development best practices in a deep learning environment</a></li>
|
||
<li><a
|
||
href="https://developers.google.com/machine-learning/testing-debugging">Testing
|
||
and Debugging in Machine Learning</a></li>
|
||
<li><a
|
||
href="https://www.microsoft.com/en-us/research/publication/what-went-wrong-and-why-diagnosing-situated-interaction-failures-in-the-wild/">What
|
||
Went Wrong and Why? Diagnosing Situated Interaction Failures in the
|
||
Wild</a> 🎓</li>
|
||
</ul>
|
||
<h2 id="deployment-and-operation">Deployment and Operation</h2>
|
||
<p>How to deploy and operate your models in a production
|
||
environment.</p>
|
||
<ul>
|
||
<li><a
|
||
href="https://algorithmia.com/blog/best-practices-in-machine-learning-infrastructure">Best
|
||
Practices in Machine Learning Infrastructure</a></li>
|
||
<li><a
|
||
href="http://pages.cs.wisc.edu/~wentaowu/papers/kdd20-ci-for-ml.pdf">Building
|
||
Continuous Integration Services for Machine Learning</a> 🎓</li>
|
||
<li><a href="https://martinfowler.com/articles/cd4ml.html">Continuous
|
||
Delivery for Machine Learning</a> ⭐</li>
|
||
<li><a
|
||
href="https://www.usenix.org/system/files/opml19papers-baylor.pdf">Continuous
|
||
Training for Production ML in the TensorFlow Extended (TFX) Platform</a>
|
||
🎓</li>
|
||
<li><a
|
||
href="https://ai.googleblog.com/2019/12/fairness-indicators-scalable.html">Fairness
|
||
Indicators: Scalable Infrastructure for Fair ML Systems</a> 🎓</li>
|
||
<li><a href="https://mapr.com/ebook/machine-learning-logistics/">Machine
|
||
Learning Logistics</a></li>
|
||
<li><a
|
||
href="https://blog.codecentric.de/en/2019/03/machine-learning-experiments-production/">Machine
|
||
learning: Moving from experiments to production</a></li>
|
||
<li><a
|
||
href="https://towardsdatascience.com/ml-ops-machine-learning-as-an-engineering-discipline-b86ca4874a3f">ML
|
||
Ops: Machine Learning as an engineered disciplined</a></li>
|
||
<li><a
|
||
href="https://www.usenix.org/conference/atc18/presentation/sridhar">Model
|
||
Governance Reducing the Anarchy of Production</a> 🎓</li>
|
||
<li><a href="http://hummer.io/docs/2019-ic2e-modelops.pdf">ModelOps:
|
||
Cloud-based lifecycle management for reliable and trusted AI</a></li>
|
||
<li><a
|
||
href="https://www.kdnuggets.com/2018/04/operational-machine-learning-successful-mlops.html">Operational
|
||
Machine Learning</a></li>
|
||
<li><a href="http://proceedings.mlr.press/v67/li17a/li17a.pdf">Scaling
|
||
Machine Learning as a Service</a>🎓</li>
|
||
<li><a
|
||
href="https://dl.acm.org/doi/pdf/10.1145/3097983.3098021?download=true">TFX:
|
||
A tensorflow-based Production-Scale ML Platform</a> 🎓</li>
|
||
<li><a href="https://research.google/pubs/pub46555/">The ML Test Score:
|
||
A Rubric for ML Production Readiness and Technical Debt Reduction</a>
|
||
🎓</li>
|
||
<li><a href="https://arxiv.org/abs/2011.03395">Underspecification
|
||
Presents Challenges for Credibility in Modern Machine Learning</a>
|
||
🎓</li>
|
||
<li><a href="https://doi.org/10.1145/3076246.3076248">Versioning for
|
||
end-to-end machine learning pipelines</a> 🎓</li>
|
||
</ul>
|
||
<h2 id="social-aspects">Social Aspects</h2>
|
||
<p>How to organize teams and projects to ensure effective collaboration
|
||
and accountability.</p>
|
||
<ul>
|
||
<li><a
|
||
href="http://web.cs.ucla.edu/~miryung/Publications/tse2017-datascientists.pdf">Data
|
||
Scientists in Software Teams: State of the Art and Challenges</a>
|
||
🎓</li>
|
||
<li><a
|
||
href="https://github.com/chiphuyen/machine-learning-systems-design/blob/master/build/build1/consolidated.pdf">Machine
|
||
Learning Interviews</a></li>
|
||
<li><a
|
||
href="https://d1.awsstatic.com/whitepapers/aws-managing-ml-projects.pdf">Managing
|
||
Machine Learning Projects</a></li>
|
||
<li><a
|
||
href="https://dev.to/robogeek/principled-machine-learning-4eho">Principled
|
||
Machine Learning: Practices and Tools for Efficient
|
||
Collaboration</a></li>
|
||
</ul>
|
||
<h2 id="governance">Governance</h2>
|
||
<ul>
|
||
<li><a href="https://arxiv.org/pdf/2104.13299.pdf">A Human-Centered
|
||
Interpretability Framework Based on Weight of Evidence</a> 🎓</li>
|
||
<li><a href="https://berryvilleiml.com/docs/ara.pdf">An Architectural
|
||
Risk Analysis Of Machine Learning Systems</a></li>
|
||
<li><a
|
||
href="https://complexdiscovery.com/wp-content/uploads/2021/09/EDRi-Beyond-Debiasing-Report.pdf">Beyond
|
||
Debiasing</a></li>
|
||
<li><a href="https://dl.acm.org/doi/pdf/10.1145/3351095.3372873">Closing
|
||
the AI Accountability Gap: Defining an End-to-End Framework for Internal
|
||
Algorithmic Auditing</a> 🎓</li>
|
||
<li><a href="https://arxiv.org/abs/1609.05807">Inherent trade-offs in
|
||
the fair determination of risk scores</a> 🎓</li>
|
||
<li><a
|
||
href="https://ai.google/responsibilities/responsible-ai-practices/">Responsible
|
||
AI practices</a> ⭐</li>
|
||
<li><a href="https://arxiv.org/abs/2004.07213">Toward Trustworthy AI
|
||
Development: Mechanisms for Supporting Verifiable Claims</a></li>
|
||
<li><a href="https://dl.acm.org/doi/abs/10.1145/3453478">Understanding
|
||
Software-2.0</a> 🎓</li>
|
||
</ul>
|
||
<h2 id="tooling">Tooling</h2>
|
||
<p>Tooling can make your life easier.</p>
|
||
<p>We only share open source tools, or commercial platforms that offer
|
||
substantial free packages for research.</p>
|
||
<ul>
|
||
<li><a href="https://aimstack.io">Aim</a> - Aim is an open source
|
||
experiment tracking tool.</li>
|
||
<li><a href="https://airflow.apache.org/">Airflow</a> - Programmatically
|
||
author, schedule and monitor workflows.</li>
|
||
<li><a href="https://github.com/SeldonIO/alibi-detect">Alibi Detect</a>
|
||
- Python library focused on outlier, adversarial and drift
|
||
detection.</li>
|
||
<li><a href="https://github.com/microsoft/archai">Archai</a> - Neural
|
||
architecture search.</li>
|
||
<li><a href="https://dvc.org/">Data Version Control (DVC)</a> - DVC is a
|
||
data and ML experiments management tool.</li>
|
||
<li><a href="https://pair-code.github.io/facets/">Facets Overview /
|
||
Facets Dive</a> - Robust visualizations to aid in understanding machine
|
||
learning datasets.</li>
|
||
<li><a href="https://fairlearn.github.io/">FairLearn</a> - A toolkit to
|
||
assess and improve the fairness of machine learning models.</li>
|
||
<li><a href="https://git-lfs.github.com/">Git Large File System
|
||
(LFS)</a> - Replaces large files such as datasets with text pointers
|
||
inside Git.</li>
|
||
<li><a
|
||
href="https://github.com/great-expectations/great_expectations">Great
|
||
Expectations</a> - Data validation and testing with integration in
|
||
pipelines.</li>
|
||
<li><a href="https://github.com/PetrochukM/HParams">HParams</a> - A
|
||
thoughtful approach to configuration management for machine learning
|
||
projects.</li>
|
||
<li><a href="https://www.kubeflow.org/">Kubeflow</a> - A platform for
|
||
data scientists who want to build and experiment with ML pipelines.</li>
|
||
<li><a href="https://github.com/heartexlabs/label-studio">Label
|
||
Studio</a> - A multi-type data labeling and annotation tool with
|
||
standardized output format.</li>
|
||
<li><a href="https://github.com/linkedin/LiFT">LiFT</a> - Linkedin
|
||
fairness toolkit.</li>
|
||
<li><a href="https://mlflow.org/">MLFlow</a> - Manage the ML lifecycle,
|
||
including experimentation, deployment, and a central model
|
||
registry.</li>
|
||
<li><a href="https://github.com/tensorflow/model-card-toolkit">Model
|
||
Card Toolkit</a> - Streamlines and automates the generation of model
|
||
cards; for model documentation.</li>
|
||
<li><a href="https://neptune.ai/">Neptune.ai</a> - Experiment tracking
|
||
tool bringing organization and collaboration to data science
|
||
projects.</li>
|
||
<li><a href="https://github.com/Neuraxio/Neuraxle">Neuraxle</a> -
|
||
Sklearn-like framework for hyperparameter tuning and AutoML in deep
|
||
learning projects.</li>
|
||
<li><a href="https://www.openml.org">OpenML</a> - An inclusive movement
|
||
to build an open, organized, online ecosystem for machine learning.</li>
|
||
<li><a
|
||
href="https://github.com/PyTorchLightning/pytorch-lightning">PyTorch
|
||
Lightning</a> - The lightweight PyTorch wrapper for high-performance AI
|
||
research. Scale your models, not the boilerplate.</li>
|
||
<li><a href="https://github.com/princetonvisualai/revise-tool">REVISE:
|
||
REvealing VIsual biaSEs</a> - Automatically detect bias in visual data
|
||
sets.</li>
|
||
<li><a
|
||
href="https://github.com/google-research/robustness_metrics">Robustness
|
||
Metrics</a> - Lightweight modules to evaluate the robustness of
|
||
classification models.</li>
|
||
<li><a href="https://github.com/SeldonIO/seldon-core">Seldon Core</a> -
|
||
An MLOps framework to package, deploy, monitor and manage thousands of
|
||
production machine learning models on Kubernetes.</li>
|
||
<li><a href="https://spark.apache.org/mllib/">Spark Machine Learning</a>
|
||
- Spark’s ML library consisting of common learning algorithms and
|
||
utilities.</li>
|
||
<li><a href="https://www.tensorflow.org/tensorboard/">TensorBoard</a> -
|
||
TensorFlow’s Visualization Toolkit.</li>
|
||
<li><a href="https://www.tensorflow.org/tfx/">Tensorflow Extended
|
||
(TFX)</a> - An end-to-end platform for deploying production ML
|
||
pipelines.</li>
|
||
<li><a href="https://github.com/tensorflow/data-validation">Tensorflow
|
||
Data Validation (TFDV)</a> - Library for exploring and validating
|
||
machine learning data. Similar to Great Expectations, but for Tensorflow
|
||
data.</li>
|
||
<li><a href="https://www.wandb.com/">Weights & Biases</a> -
|
||
Experiment tracking, model optimization, and dataset versioning.</li>
|
||
</ul>
|
||
<h2 id="contribute">Contribute</h2>
|
||
<p>Contributions welcomed! Read the <a
|
||
href="contributing.md">contribution guidelines</a> first</p>
|
||
<p><a href="https://github.com/SE-ML/awesome-seml">seml.md
|
||
Github</a></p>
|