awesome-awesomeness/html/pipeline.html

<h1 id="awesome-pipeline">Awesome Pipeline</h1>
<p>A curated list of awesome pipeline toolkits inspired by <a
href="https://github.com/kahun/awesome-sysadmin">Awesome
Sysadmin</a></p>
<h2 id="pipeline-frameworks-libraries">Pipeline frameworks &amp;
libraries</h2>
<ul>
<li><a
href="http://docs.stackstorm.com/actionchain.html">ActionChain</a> - A
workflow system for simple linear success/failure workflows.</li>
<li><a href="https://github.com/diana-hep/adage">Adage</a> - Small
package to describe workflows that are not completely known at
definition time.</li>
<li><a href="https://github.com/aiidateam/aiida-core">AiiDA</a> -
workflow manager with a strong focus on provenance, performance and
extensibility.</li>
<li><a href="https://github.com/airbnb/airflow">Airflow</a> -
Python-based workflow system created by AirBnb.</li>
<li><a href="http://www.anduril.org/anduril/site/">Anduril</a> -
Component-based workflow framework for scientific data analysis.</li>
<li><a href="https://www.antha-lang.org/">Antha</a> - High-level
language for biology.</li>
<li><a href="https://github.com/MG-RAST/AWE/">AWE</a> - Workflow and
resource management system with CWL support.</li>
<li><a href="https://github.com/argonne-lcf/balsam">Balsam</a> -
Python-based high throughput task and workflow engine.</li>
<li><a href="http://pcingola.github.io/BigDataScript/">Bds</a> -
Scripting language for data pipelines.</li>
<li><a href="https://github.com/evoldoers/biomake">BioMake</a> -
GNU-Make-like utility for managing builds and complex workflows.</li>
<li><a href="https://github.com/liyao001/BioQueue">BioQueue</a> -
Explicit framework with web monitoring and resource estimation.</li>
<li><a href="https://github.com/papenfusslab/bioshake">Bioshake</a> -
Haskell DSL built on shake with strong typing and EDAM support.</li>
<li><a href="https://github.com/pveber/bistro">Bistro</a> - Library to
build and execute typed scientific workflows.</li>
<li><a href="https://github.com/ssadedin/bpipe/">Bpipe</a> - Tool for
running and managing bioinformatics pipelines.</li>
<li><a href="https://github.com/bloomreach/briefly">Briefly</a> - Python
Meta-programming Library for Job Flow Control.</li>
<li><a href="http://clusterflow.io">Cluster Flow</a> - Command-line tool
which uses common cluster managers to run bioinformatics pipelines.</li>
<li><a href="https://github.com/monajemi/clusterjob">Clusterjob</a> -
Automated reproducibility, and hassle-free submission of computational
jobs to clusters.</li>
<li><a href="https://www.sing-group.org/compi">Compi</a> - Application
framework for portable computational pipelines.</li>
<li><a
href="https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar">Compss</a>
- Programming model for distributed infrastructures.</li>
<li><a href="https://github.com/tburdett/Conan2">Conan2</a> -
Light-weight workflow management application.</li>
<li><a href="https://github.com/robdmc/consecution">Consecution</a> - A
Python pipeline abstraction inspired by Apache Storm topologies.</li>
<li><a href="https://mizzou-cbmi.github.io/">Cosmos</a> - Python library
for massively parallel workflows.</li>
<li><a href="https://github.com/couler-proj/couler">Couler</a> - Unified
interface for constructing and managing workflows on different workflow
engines, such as Argo Workflows, Tekton Pipelines, and Apache
Airflow.</li>
<li><a href="https://github.com/AgnostiqHQ/covalent">Covalent</a> -
Workflow orchestration toolkit for high-performance and quantum
computing research and development.</li>
<li><a href="https://github.com/broadinstitute/cromwell">Cromwell</a> -
Workflow Management System geared towards scientific workflows from the
Broad Institute.</li>
<li><a href="https://github.com/joergen7/cuneiform">Cuneiform</a> -
Advanced functional workflow language and framework, implemented in
Erlang.</li>
<li><a href="https://cylc.github.io/">Cylc</a> - A workflow engine for
cycling systems, originally developed for operational environmental
forecasting.</li>
<li><a href="https://github.com/thieman/dagobah">Dagobah</a> - Simple
DAG-based job scheduler in Python.</li>
<li><a href="https://github.com/fulcrumgenomics/dagr">Dagr</a> - A scala
based DSL and framework for writing and executing bioinformatics
pipelines as Directed Acyclic Graphs.</li>
<li><a href="https://github.com/dagster-io/dagster">Dagster</a> -
Python-based API for defining DAGs that interfaces with popular workflow
managers for building data applications.</li>
<li><a href="https://datajoint.io">DataJoint</a> - an open-source
relational framework for scientific data pipelines.</li>
<li><a href="https://github.com/dask/dask">Dask</a> - Dask is a flexible
parallel computing library for analytics.</li>
<li><a href="https://www.getdbt.com/">Dbt</a> - Framework for writing
analytics workflows entirely in SQL. The T part of ETL, focuses on
analytics engineering.</li>
<li><a
href="https://github.com/googlegenomics/dockerflow">Dockerflow</a> -
Workflow runner that uses Dataflow to run a series of tasks in
Docker.</li>
<li><a href="https://github.com/Factual/drake">Drake</a> - Robust DSL
akin to Make, implemented in Clojure.</li>
<li><a href="https://github.com/ropensci/drake">Drake R package</a> -
Reproducibility and high-performance computing with an easy R-focused
interface. Unrelated to <a
href="https://github.com/factual/drake">Factual’s Drake</a>. Succeeded
by <a href="https://github.com/ropensci/targets">Targets</a>.</li>
<li><a href="https://github.com/CenturyLinkLabs/dray">Dray</a> - An
engine for managing the execution of container-based workflows.</li>
<li><a href="https://github.com/ecmwf/ecflow">ecFlow</a> - Workflow
manager.</li>
<li><a href="https://github.com/Ensembl/ensembl-hive">eHive</a> - System
for creating and running pipelines on a distributed compute
resource.</li>
<li><a href="https://github.com/fission/fission-workflows">Fission
Workflows</a> - A fast, lightweight workflow engine for serverless/FaaS
functions.</li>
<li><a href="https://github.com/druths/flex/">Flex</a> - Language
agnostic framework for building flexible data science pipelines
(Python/Shell/Gnuplot).</li>
<li><a href="https://github.com/sahilseth/flowr">Flowr</a> - Robust and
efficient workflows using a simple language agnostic approach (R
package).</li>
<li><a href="https://github.com/uzh/gc3pie">Gc3pie</a> - Python
libraries and tools for running applications on diverse Grids and
clusters.</li>
<li><a href="https://guixwl.org/">Guix Workflow Language</a> - A
workflow management language extension for GNU Guix.</li>
<li><a href="https://github.com/mailund/gwf">Gwf</a> - Make-like utility
for submitting workflows via qsub.</li>
<li><a href="https://github.com/dagworks-inc/hamilton">Hamilton</a> - A
python micro-framework for describing dataflows; runs anywhere python
runs.</li>
<li><a href="https://github.com/It4innovations/HyperLoom">HyperLoom</a>
- Platform for defining and executing workflow pipelines in large-scale
distributed environments.</li>
<li><a href="https://joblib.readthedocs.io/en/latest/">Joblib</a> - Set
of tools to provide lightweight pipelining in Python.</li>
<li><a href="https://jug.readthedocs.io">Jug</a> - A task Based
parallelization framework for Python.</li>
<li><a href="https://github.com/quantumblacklabs/kedro">Kedro</a> -
Workflow development tool that helps you build data pipelines.</li>
<li><a href="https://github.com/kestra-io/kestra">Kestra</a> - Open
source data orchestration and scheduling platform with declarative
syntax.</li>
<li><a href="https://github.com/hammerlab/ketrew">Ketrew</a> - Embedded
DSL in the OCAML language alongside a client-server management
application.</li>
<li><a href="https://github.com/jtaghiyar/kronos">Kronos</a> - Workflow
assembler for cancer genome analytics and informatics.</li>
<li><a href="https://github.com/StanfordBioinformatics/loom">Loom</a> -
Tool for running bioinformatics workflows locally or in the cloud.</li>
<li><a href="http://www.hecbiosim.ac.uk/longbow">Longbow</a> - Job
proxying tool for biomolecular simulations.</li>
<li><a href="https://github.com/spotify/luigi">Luigi</a> - Python module
that helps you build complex pipelines of batch jobs.</li>
<li><a href="https://github.com/LLNL/maestrowf">Maestro</a> - YAML based
HPC workflow execution tool.</li>
<li><a href="http://ccl.cse.nd.edu/software/makeflow/">Makeflow</a> -
Workflow engine for executing large complex workflows on clusters.</li>
<li><a href="https://github.com/mara/data-integration">Mara</a> - A
lightweight, opinionated ETL framework, halfway between plain scripts
and Apache Airflow.</li>
<li><a href="https://github.com/intentmedia/mario">Mario</a> - Scala
library for defining data pipelines.</li>
<li><a href="http://martian-lang.org/">Martian</a> - A language and
framework for developing and executing complex computational
pipelines.</li>
<li><a href="https://github.com/MD-Studio/MDStudio">MD Studio</a> -
Microservice based workflow engine.</li>
<li><a href="https://metaflow.org/">MetaFlow</a> - Open-sourced
framework from Netflix, for DAG generation for data scientists. Python
and R API’s.</li>
<li><a href="https://github.com/openstack/mistral">Mistral</a> - Python
based workflow engine by the Open Stack project.</li>
<li><a href="https://github.com/mfiers/Moa">Moa</a> - Lightweight
workflows in bioinformatics.</li>
<li><a href="http://www.nextflow.io">Nextflow</a> - Flow-based
computational toolkit for reproducible and scalable bioinformatics
pipelines.</li>
<li><a href="https://github.com/nipy/nipype">NiPype</a> - Workflows and
interfaces for neuroimaging packages.</li>
<li><a href="https://github.com/adaptivegenome/openge">OpenGE</a> -
Accelerated framework for manipulating and interpreting high-throughput
sequencing data.</li>
<li><a href="https://www.pachyderm.io/">Pachyderm</a> - Distributed and
reproducible data pipelining and data management, built on the container
ecosystem.</li>
<li><a href="https://github.com/Parsl/parsl">Parsl</a> - Parallel
Scripting Library.</li>
<li><a
href="https://github.com/fstrozzi/bioruby-pipengine">PipEngine</a> -
Ruby based launcher for complex biological pipelines.</li>
<li><a href="https://github.com/pinterest/pinball">Pinball</a> - Python
based workflow engine by Pinterest.</li>
<li><a href="https://github.com/systemslab/popper">Popper</a> - YAML
based container-native workflow engine supporting Docker, Singularity,
Vagrant VMs with Docker daemon in VM, and local host.</li>
<li><a href="https://github.com/tweag/porcupine">Porcupine</a> - Haskell
workflow tool to express and compose tasks (optionally cached) whose
datasources and sinks are known ahead of time and rebindable, and which
can expose arbitrary sets of parameters to the outside world.</li>
<li><a href="https://docs.prefect.io/">Prefect</a> - Python based
workflow engine powering Prefect.</li>
<li><a href="https://github.com/nipype/pydra">Pydra</a> - Lightweight,
DAG-based Python dataflow engine for reproducible and scalable
scientific pipelines.</li>
<li><a href="https://github.com/Illumina/pyflow">PyFlow</a> -
Lightweight parallel task engine.</li>
<li><a href="https://github.com/baffelli/pyperator">pyperator</a> -
Simple push-based python workflow framework using asyncio, supporting
recursive networks.</li>
<li><a href="https://github.com/pwwang/pyppl">pyppl</a> - A python
lightweight pipeline framework.</li>
<li><a href="https://pypyr.io">pypyr</a> - Automation task-runner for
sequential steps defined in a pipeline yaml, with AWS and Slack
plug-ins.</li>
<li><a href="https://github.com/masa16/Pwrake/">Pwrake</a> - Parallel
workflow extension for Rake.</li>
<li><a href="https://bitbucket.org/berkeleylab/qdo">Qdo</a> -
Lightweight high-throughput queuing system for workflows with many small
tasks to perform.</li>
<li><a href="https://github.com/alastair-droop/qsubsec">Qsubsec</a> -
Simple tokenised template system for SGE.</li>
<li><a href="https://github.com/rabix/rabix">Rabix</a> - Python-based
workflow toolkit based on the Common Workflow Language and Docker.</li>
<li><a href="https://github.com/substantic/rain">Rain</a> - Framework
for large distributed task-based pipelines, written in Rust with Python
API.</li>
<li><a href="https://github.com/ray-project/ray">Ray</a> - Flexible,
high-performance distributed Python execution framework.</li>
<li><a href="https://github.com/insitro/redun">Redun</a> - Yet another
redundant workflow engine.</li>
<li><a href="https://github.com/grailbio/reflow">Reflow</a> - Language
and runtime for distributed, incremental data processing in the
cloud.</li>
<li><a href="https://github.com/richfitz/remake">Remake</a> - Make-like
declarative workflows in R.</li>
<li><a
href="http://physiology.med.cornell.edu/faculty/mason/lab/r-make/">Rmake</a>
- Wrapper for the creation of Makefiles, enabling massive
parallelization.</li>
<li><a href="https://github.com/bjpop/rubra">Rubra</a> - Pipeline system
for bioinformatics workflows.</li>
<li><a href="http://www.ruffus.org.uk">Ruffus</a> - Computation Pipeline
library for Python.</li>
<li><a href="https://github.com/kirillseva/ruigi">Ruigi</a> - Pipeline
tool for R, inspired by Luigi.</li>
<li><a href="http://tonyfischetti.github.io/sake/">Sake</a> -
Self-documenting build automation tool.</li>
<li><a href="https://github.com/pharmbio/sciluigi">SciLuigi</a> - Helper
library for writing flexible scientific workflows in Luigi.</li>
<li><a href="http://scipipe.org">SciPipe</a> - Library for writing
Scientific Workflows in Go.</li>
<li><a href="https://signac.io">Signac</a> - Lightweight, but scalable
framework for file-driven workflows to be run locally and on HPC
systems.</li>
<li><a href="https://github.com/soravux/scoop/">Scoop</a> - Scalable
Concurrent Operations in Python.</li>
<li><a href="https://github.com/nlgranger/SeqTools">Seqtools</a> -
Python library for lazy evaluation of pipelined transformations on
indexable containers.</li>
<li><a href="https://github.com/giacbrd/SmartPipeline">SmartPipeline</a>
- A framework for rapid development of robust data pipelines following a
simple design pattern.</li>
<li><a href="https://snakemake.readthedocs.io/en/stable">Snakemake</a> -
Tool for running and managing bioinformatics pipelines.</li>
<li><a href="https://github.com/knipknap/SpiffWorkflow">Spiff</a> -
Based on the Workflow Patterns initiative and implemented in
Python.</li>
<li><a href="https://github.com/sailthru/stolos">Stolos</a> - Directed
Acyclic Graph task dependency scheduler that simplify distributed
pipelines.</li>
<li><a href="https://github.com/minerva-ml/steppy">Steppy</a> -
lightweight, open-source, Python 3 library for fast and reproducible
experimentation.</li>
<li><a href="https://stpipe.readthedocs.io/">Stpipe</a> - File
processing pipelines as a Python library.</li>
<li><a href="https://github.com/alpha-unito/streamflow">StreamFlow</a> -
Container native workflow management system focused on hybrid
workflows.</li>
<li><a href="https://streampipes.apache.org">StreamPipes</a> - A
self-service IoT toolbox to enable non-technical users to connect,
analyze and explore IoT data streams.</li>
<li><a href="https://github.com/gilt/sundial">Sundial</a> - Jobsystem on
AWS ECS or AWS Batch managing dependencies and scheduling.</li>
<li><a href="https://github.com/Netflix/suro">Suro</a> - Java-based
distributed pipeline from Netflix.</li>
<li><a href="http://swift-lang.org">Swift</a> - Fast easy parallel
scripting - on multicores, clusters, clouds and supercomputers.</li>
<li><a href="https://github.com/ropensci/targets">Targets</a> - Dynamic,
function-oriented <a
href="https://www.gnu.org/software/make/">Make</a>-like reproducible
pipelines at scale in R.</li>
<li><a href="https://github.com/natcap/taskgraph">TaskGraph</a> - A
library to help manage complicated computational software pipelines
consisting of long running individual tasks.</li>
<li><a href="https://github.com/4dn-dcic/tibanna">Tibanna</a> - Tool
that helps you run genomic pipelines on Amazon cloud.</li>
<li><a href="https://github.com/BD2KGenomics/toil">Toil</a> -
Distributed pipeline workflow manager (mostly for genomics).</li>
<li><a href="http://opensource.nibr.com/yap/">Yap</a> - Extensible
parallel framework, written in Python using OpenMPI libraries.</li>
<li><a href="https://github.com/picanumber/yapp">Yapp</a> - A C++
parallel pipeline library for stream processing.</li>
<li><a href="https://www.wallaroolabs.com/">Wallaroo</a> - Framework for
streaming data applications and algorithms that react to real-time
events.</li>
<li><a href="http://worldmake.org/">WorldMake</a> - Easy Collaborative
Reproducible Computing.</li>
<li><a href="https://zenaton.com">Zenaton</a> - Workflow engine for
orchestrating jobs, data and events across your applications and third
party services.</li>
<li><a href="https://zenml.io">ZenML</a> - Extensible open-source MLOps
framework to create reproducible pipelines for data scientists.</li>
</ul>
<h2 id="workflow-platforms">Workflow platforms</h2>
<ul>
<li><a href="http://www.activepapers.org/">ActivePapers</a> -
Computational science made reproducible and publishable.</li>
<li><a href="https://github.com/automaticmode/active_workflow">Active
Workflow</a> - Polyglot workflows without leaving the comfort of your
technology stack.</li>
<li><a href="https://anvio.org/">Anvi’o</a> - A community and framework
centered around metagenomics, designed to facilitate reproducible
exploration and visualization of data.</li>
<li><a href="https://airavata.apache.org/">Apache Iravata</a> -
Framework for executing and managing computational workflows on
distributed computing resources.</li>
<li><a href="https://arteria-project.github.io/">Arteria</a> -
Event-driven automation for sequencing centers. Initiates workflows
based on events.</li>
<li><a href="http://arvados.org">Arvados</a> - A container based
workflow platform.</li>
<li>Biokepler - Bioinformatics Scientific Workflow for Distributed
Analysis of Large-Scale Biological Data. (<a
href="https://web.archive.org/web/20190108162953/https://www.biokepler.org/"><em>inactive
since 10/2019</em></a>)</li>
<li><a href="http://github.com/llevar/butler">Butler</a> - Framework for
running scientific workflows on public and academic clouds.</li>
<li><a href="http://chipster.csc.fi">Chipster</a> - Open source platform
for data analysis.</li>
<li><a href="https://bitbucket.org/bromberglab/clubber">Clubber</a> -
Cluster Load Balancer for Bioinformatics e-Resources.</li>
<li><a href="https://www.digdag.io">Digdag</a> - Workflow manager
designed for simplicity, extensibility and collaboration.</li>
<li><a href="https://github.com/Tauffer-Consulting/domino">Domino</a> -
User friendly and open source visual workflow management platform.</li>
<li><a
href="https://github.com/materialsproject/fireworks">Fireworks</a> -
Centralized workflow server for dynamic workflows of high-throughput
computations.</li>
<li><a href="https://github.com/lyft/flyte">Flyte</a> -
Container-native, type-safe workflow and pipelines platform for large
scale processing and ML.</li>
<li><a href="https://galaxyproject.org">Galaxy</a> - Powerful workflow
system which can be used on the command line or with the GUI.</li>
<li><a href="https://kepler-project.org/">Kepler</a> - Kepler scientific
workflow application from University of California.</li>
<li><a href="https://www.knime.org/knime-analytics-platform">KNIME
Analytics Platform</a> - General-purpose platform with many specialized
domain extensions.</li>
<li><a href="http://workflow.campagnelab.org">NextflowWorkbench</a> -
Integrated development environment for Nextflow, Docker and Reusable
Workflows.</li>
<li><a href="https://github.com/omegaml/omegaml">omega|ml DataOps
Platform</a> - Data &amp; model pipeline deployment for humans -
integrated, scalable, extensible.</li>
<li><a href="http://www.openmole.org/current/">OpenMOLE</a> - Workflow
Management System for exploration of models and parameter
optimization.</li>
<li><a href="http://ophidia.cmcc.it">Ophidia</a> - Data-analytics
platform with declarative workflows of distributed operations.</li>
<li><a href="https://github.com/orchest/orchest">Orchest</a> - An IDE
for Data Science.</li>
<li><a href="http://pegasus.isi.edu">Pegasus</a> - Workflow Management
System.</li>
<li><a href="https://github.com/creactiviti/piper">Piper</a> -
Distributed workflow engine designed to be dead simple.</li>
<li><a href="https://github.com/polyaxon/polyaxon">Polyaxon</a> - A
platform for machine learning experimentation workflow.</li>
<li><a href="https://github.com/reanahub/reana">Reana</a> - Platform for
reusable research data analyses developed by CERN.</li>
<li><a href="https://github.com/uzh/sushi">Sushi</a> - Supporting User
for SHell script Integration.</li>
<li><a href="http://ccg.murdoch.edu.au/yabi">Yabi</a> - Online research
environment for grid, HPC and cloud computing.</li>
<li><a href="http://www.taverna.org.uk">Taverna</a> - Domain independent
workflow system.</li>
<li><a href="https://www.temporal.io/">Temporal</a> - Highly scalable
developer oriented <em>Workflow as Code</em> engine.</li>
<li><a href="http://www.vistrails.org/">VisTrails</a> - Scientific
workflow and provenance management system.</li>
<li><a href="http://www.wings-workflows.org">Wings</a> - Semantic
workflow system utilizing Pegasus as execution system.</li>
<li><a href="https://github.com/klugem/watchdog">Watchdog</a> - Workflow
management system for the automated and distributed analysis of
large-scale experimental data.</li>
<li><a href="https://www.flowhub.com.cn">FlowHub</a> - FlowHub is a new
workflow cloud platform.</li>
</ul>
<h2 id="workflow-languages">Workflow languages</h2>
<ul>
<li><a
href="https://github.com/common-workflow-language/common-workflow-language">Common
Workflow Language</a></li>
<li><a href="http://cloudgene.uibk.ac.at/developer-guide">Cloudgene
Workflow Language</a></li>
<li><a
href="http://www.openmole.org/current/Documentation_Language.html">OpenMOLE
DSL</a></li>
<li><a href="https://github.com/openwdl/wdl">Workflow Description
Language</a></li>
<li><a href="http://www.yawlfoundation.org">Yet Another Workflow
Language</a></li>
<li><a href="https://github.com/calebwin/pipelines">Pipelines</a></li>
</ul>
<h2 id="workflow-standardization-initiatives">Workflow standardization
initiatives</h2>
<ul>
<li><a href="http://www.wf4ever-project.org">Workflow 4 Ever
Initiative</a></li>
<li><a href="http://wf4ever.github.io/ro">Workflow 4 Ever workflow
research object model</a></li>
<li><a href="http://www.workflowpatterns.com">Workflow Patterns
Initiative</a></li>
<li><a href="http://www.workflowpatterns.com/patterns">Workflow Patterns
Library</a></li>
<li><a href="http://www.researchobject.org">ResearchObject.org</a></li>
</ul>
<h2 id="etl-data-orchestration">ETL &amp; Data orchestration</h2>
<ul>
<li><a href="https://datalad.org">DataLad</a> - git and git-annex based
data version control system with lightweight provenance
capture/re-execution support.</li>
<li><a href="https://dvc.org">DVC</a> - Data version control system for
ML project with lightweight pipeline support.</li>
<li><a href="https://github.com/treeverse/lakeFS">lakeFS</a> -
Repeatable, atomic and versioned data lake on top of object
storage.</li>
<li><a href="https://github.com/projectnessie/nessie">Nessie</a> -
Provides Git-like capability &amp; version control for Iceberg Tables,
Delta Lake Tables &amp; SQL Views.</li>
</ul>
<h2 id="literate-programming-aka-interactive-notebooks">Literate
programming (aka interactive notebooks)</h2>
<ul>
<li><a href="http://beakernotebook.com/">Beaker</a> Notebook-style
development environment.</li>
<li><a href="http://mybinder.org/">Binder</a> - Turn a GitHub repo into
a collection of interactive notebooks powered by Jupyter and
Kubernetes</li>
<li><a href="https://ipython.org/">IPython</a> A rich architecture for
interactive computing.</li>
<li><a href="https://jupyter.org/">Jupyter</a> Language-agnostic
notebook literate programming environment.</li>
<li><a href="http://pathomx.org">Pathomx</a> - Interactive data
workflows built on Python.</li>
<li><a href="https://github.com/polynote/polynote">Polynote</a> - A
better notebook for Scala (and more). Built by Netflix.</li>
<li><a href="https://github.com/ploomber/ploomber">Ploomber</a> -
Consolidate your notebooks and scripts in a reproducible pipeline using
a <code>pipeline.yaml</code> file</li>
<li><a href="http://rmarkdown.rstudio.com/r_notebooks.html">R
Notebooks</a> - R Markdown notebook literate programming
environment.</li>
<li><a href="https://www.redpointnotebooks.com/">RedPoint Notebooks</a>
- Web-native computational notebook for programmers supporting multiple
languages, APIs and webooks.</li>
<li><a href="https://vatlab.github.io/sos-docs/">SoS</a> - Readable,
interactive, cross-platform and cross-language data science workflow
system.</li>
<li><a href="https://zeppelin.apache.org/">Zeppelin</a> - Web-based
notebook that enables interactive data analytics.</li>
</ul>
<h2 id="extract-transform-load-etl">Extract, transform, load (ETL)</h2>
<ul>
<li><a href="https://github.com/uber/cadence">Cadence</a> Distributed,
scalable, durable, and highly available orchestration engine developed
by Uber.</li>
<li><a href="https://github.com/dataform-co/dataform">Dataform</a> -
Dataform is a framework for managing SQL based operations in your data
warehouse.</li>
<li><a href="http://www.kiba-etl.org">Kiba ETL</a> - A data processing
&amp; ETL framework for Ruby.</li>
<li><a href="https://etl.linkedpipes.com">LinkedPipes ETL</a> - Linked
Data publishing and consumption ETL tool.</li>
<li><a
href="https://community.hitachivantara.com/s/article/data-integration-kettle">Pentaho
Kettle</a> - A plataform that delivers poweful ETL capabilities, using a
groundbreaking, metadata-driven approach.</li>
<li><a href="https://github.com/brexhq/substation">Substation</a> -
Substation is a cloud native data pipeline and transformation toolkit
written in Go.</li>
</ul>
<h2 id="continuous-delivery-workflows">Continuous Delivery
workflows</h2>
<ul>
<li><a href="https://github.com/argoproj/argo">Argo</a> - Get stuff done
with container-native workflows for Kubernetes.</li>
<li><a href="https://github.com/ovh/cds">CDS</a> - A pipeline based
Continuous Delivery Service written in Golang.</li>
</ul>
<h2 id="build-automation-tools">Build automation tools</h2>
<ul>
<li><a href="http://bazel.io/">Bazel</a> - Build software just as
engineers do at Google.</li>
<li><a href="https://github.com/pydoit/doit">doit</a> - Highly
generalized task-management and automation in Python.</li>
<li><a href="http://gradle.org/">Gradle</a> - Unified cross platforms
builds.</li>
<li><a href="https://github.com/casey/just">Just</a> - Command and
recipe runner similar to Make, built in Rust.</li>
<li><a href="https://www.gnu.org/software/make/">Make</a> - The GNU Make
build system.</li>
<li><a href="https://github.com/prodmodel/prodmodel">Prodmodel</a> -
Build system for data science pipelines.</li>
<li><a href="http://www.scons.org/">Scons</a> - Python library focused
on C/C++ builds.</li>
<li><a href="https://github.com/ndmitchell/shake">Shake</a> - Define
robust build systems akin to GNU Make using Haskell.</li>
</ul>
<h2 id="automated-workflow-composition">Automated workflow
composition</h2>
<ul>
<li><a href="https://github.com/sanctuuary/APE">APE</a> - A tool for the
automated exploration of possible computational workflows based on
semantic annotations.</li>
</ul>
<h2 id="other-projects">Other projects</h2>
<ul>
<li><a href="http://hpcgridrunner.github.io/">HPC Grid Runner</a></li>
<li><a href="https://nifi.apache.org">NiFi</a> - Powerful and scalable
directed graphs of data routing, transformation, and system mediation
logic.</li>
<li><a href="https://github.com/gems-uff/noworkflow">noWorkflow</a> -
Supporting infrastructure to run scientific experiments without a
scientific workflow management system, and still get things like
provenance.</li>
<li><a href="https://www.reprozip.org/">Reprozip</a> - Simplifies the
process of creating reproducible experiments from command-line
executions.</li>
</ul>
<h2 id="related-lists">Related lists</h2>
<ul>
<li><a href="https://github.com/manuzhang/awesome-streaming">Awesome
streaming</a> - Curated list of awesome streaming frameworks,
applications.</li>
<li><a href="https://github.com/pawl/awesome-etl">Awesome ETL</a> -
Curated list of notable ETL (extract, transform, load) frameworks,
libraries and software.</li>
<li><a
href="https://github.com/meirwah/awesome-workflow-engines">Awesome
workflow engines</a> - Curated list of awesome open source workflow
engines.</li>
<li><a
href="https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems">Computational
Data Analysis Workflow Systems</a></li>
</ul>