551 lines
28 KiB
HTML
551 lines
28 KiB
HTML
<h1 id="awesome-pipeline">Awesome Pipeline</h1>
|
||
<p>A curated list of awesome pipeline toolkits inspired by <a
|
||
href="https://github.com/kahun/awesome-sysadmin">Awesome
|
||
Sysadmin</a></p>
|
||
<h2 id="pipeline-frameworks-libraries">Pipeline frameworks &
|
||
libraries</h2>
|
||
<ul>
|
||
<li><a
|
||
href="http://docs.stackstorm.com/actionchain.html">ActionChain</a> - A
|
||
workflow system for simple linear success/failure workflows.</li>
|
||
<li><a href="https://github.com/diana-hep/adage">Adage</a> - Small
|
||
package to describe workflows that are not completely known at
|
||
definition time.</li>
|
||
<li><a href="https://github.com/aiidateam/aiida-core">AiiDA</a> -
|
||
workflow manager with a strong focus on provenance, performance and
|
||
extensibility.</li>
|
||
<li><a href="https://github.com/airbnb/airflow">Airflow</a> -
|
||
Python-based workflow system created by AirBnb.</li>
|
||
<li><a href="http://www.anduril.org/anduril/site/">Anduril</a> -
|
||
Component-based workflow framework for scientific data analysis.</li>
|
||
<li><a href="https://www.antha-lang.org/">Antha</a> - High-level
|
||
language for biology.</li>
|
||
<li><a href="https://github.com/MG-RAST/AWE/">AWE</a> - Workflow and
|
||
resource management system with CWL support.</li>
|
||
<li><a href="https://github.com/argonne-lcf/balsam">Balsam</a> -
|
||
Python-based high throughput task and workflow engine.</li>
|
||
<li><a href="http://pcingola.github.io/BigDataScript/">Bds</a> -
|
||
Scripting language for data pipelines.</li>
|
||
<li><a href="https://github.com/evoldoers/biomake">BioMake</a> -
|
||
GNU-Make-like utility for managing builds and complex workflows.</li>
|
||
<li><a href="https://github.com/liyao001/BioQueue">BioQueue</a> -
|
||
Explicit framework with web monitoring and resource estimation.</li>
|
||
<li><a href="https://github.com/papenfusslab/bioshake">Bioshake</a> -
|
||
Haskell DSL built on shake with strong typing and EDAM support.</li>
|
||
<li><a href="https://github.com/pveber/bistro">Bistro</a> - Library to
|
||
build and execute typed scientific workflows.</li>
|
||
<li><a href="https://github.com/ssadedin/bpipe/">Bpipe</a> - Tool for
|
||
running and managing bioinformatics pipelines.</li>
|
||
<li><a href="https://github.com/bloomreach/briefly">Briefly</a> - Python
|
||
Meta-programming Library for Job Flow Control.</li>
|
||
<li><a href="http://clusterflow.io">Cluster Flow</a> - Command-line tool
|
||
which uses common cluster managers to run bioinformatics pipelines.</li>
|
||
<li><a href="https://github.com/monajemi/clusterjob">Clusterjob</a> -
|
||
Automated reproducibility, and hassle-free submission of computational
|
||
jobs to clusters.</li>
|
||
<li><a href="https://www.sing-group.org/compi">Compi</a> - Application
|
||
framework for portable computational pipelines.</li>
|
||
<li><a
|
||
href="https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar">Compss</a>
|
||
- Programming model for distributed infrastructures.</li>
|
||
<li><a href="https://github.com/tburdett/Conan2">Conan2</a> -
|
||
Light-weight workflow management application.</li>
|
||
<li><a href="https://github.com/robdmc/consecution">Consecution</a> - A
|
||
Python pipeline abstraction inspired by Apache Storm topologies.</li>
|
||
<li><a href="https://mizzou-cbmi.github.io/">Cosmos</a> - Python library
|
||
for massively parallel workflows.</li>
|
||
<li><a href="https://github.com/couler-proj/couler">Couler</a> - Unified
|
||
interface for constructing and managing workflows on different workflow
|
||
engines, such as Argo Workflows, Tekton Pipelines, and Apache
|
||
Airflow.</li>
|
||
<li><a href="https://github.com/AgnostiqHQ/covalent">Covalent</a> -
|
||
Workflow orchestration toolkit for high-performance and quantum
|
||
computing research and development.</li>
|
||
<li><a href="https://github.com/broadinstitute/cromwell">Cromwell</a> -
|
||
Workflow Management System geared towards scientific workflows from the
|
||
Broad Institute.</li>
|
||
<li><a href="https://github.com/joergen7/cuneiform">Cuneiform</a> -
|
||
Advanced functional workflow language and framework, implemented in
|
||
Erlang.</li>
|
||
<li><a href="https://cylc.github.io/">Cylc</a> - A workflow engine for
|
||
cycling systems, originally developed for operational environmental
|
||
forecasting.</li>
|
||
<li><a href="https://github.com/thieman/dagobah">Dagobah</a> - Simple
|
||
DAG-based job scheduler in Python.</li>
|
||
<li><a href="https://github.com/fulcrumgenomics/dagr">Dagr</a> - A scala
|
||
based DSL and framework for writing and executing bioinformatics
|
||
pipelines as Directed Acyclic Graphs.</li>
|
||
<li><a href="https://github.com/dagster-io/dagster">Dagster</a> -
|
||
Python-based API for defining DAGs that interfaces with popular workflow
|
||
managers for building data applications.</li>
|
||
<li><a href="https://datajoint.io">DataJoint</a> - an open-source
|
||
relational framework for scientific data pipelines.</li>
|
||
<li><a href="https://github.com/dask/dask">Dask</a> - Dask is a flexible
|
||
parallel computing library for analytics.</li>
|
||
<li><a href="https://www.getdbt.com/">Dbt</a> - Framework for writing
|
||
analytics workflows entirely in SQL. The T part of ETL, focuses on
|
||
analytics engineering.</li>
|
||
<li><a
|
||
href="https://github.com/googlegenomics/dockerflow">Dockerflow</a> -
|
||
Workflow runner that uses Dataflow to run a series of tasks in
|
||
Docker.</li>
|
||
<li><a href="https://github.com/Factual/drake">Drake</a> - Robust DSL
|
||
akin to Make, implemented in Clojure.</li>
|
||
<li><a href="https://github.com/ropensci/drake">Drake R package</a> -
|
||
Reproducibility and high-performance computing with an easy R-focused
|
||
interface. Unrelated to <a
|
||
href="https://github.com/factual/drake">Factual’s Drake</a>. Succeeded
|
||
by <a href="https://github.com/ropensci/targets">Targets</a>.</li>
|
||
<li><a href="https://github.com/CenturyLinkLabs/dray">Dray</a> - An
|
||
engine for managing the execution of container-based workflows.</li>
|
||
<li><a href="https://github.com/ecmwf/ecflow">ecFlow</a> - Workflow
|
||
manager.</li>
|
||
<li><a href="https://github.com/Ensembl/ensembl-hive">eHive</a> - System
|
||
for creating and running pipelines on a distributed compute
|
||
resource.</li>
|
||
<li><a href="https://github.com/fission/fission-workflows">Fission
|
||
Workflows</a> - A fast, lightweight workflow engine for serverless/FaaS
|
||
functions.</li>
|
||
<li><a href="https://github.com/druths/flex/">Flex</a> - Language
|
||
agnostic framework for building flexible data science pipelines
|
||
(Python/Shell/Gnuplot).</li>
|
||
<li><a href="https://github.com/sahilseth/flowr">Flowr</a> - Robust and
|
||
efficient workflows using a simple language agnostic approach (R
|
||
package).</li>
|
||
<li><a href="https://github.com/uzh/gc3pie">Gc3pie</a> - Python
|
||
libraries and tools for running applications on diverse Grids and
|
||
clusters.</li>
|
||
<li><a href="https://guixwl.org/">Guix Workflow Language</a> - A
|
||
workflow management language extension for GNU Guix.</li>
|
||
<li><a href="https://github.com/mailund/gwf">Gwf</a> - Make-like utility
|
||
for submitting workflows via qsub.</li>
|
||
<li><a href="https://github.com/dagworks-inc/hamilton">Hamilton</a> - A
|
||
python micro-framework for describing dataflows; runs anywhere python
|
||
runs.</li>
|
||
<li><a href="https://github.com/It4innovations/HyperLoom">HyperLoom</a>
|
||
- Platform for defining and executing workflow pipelines in large-scale
|
||
distributed environments.</li>
|
||
<li><a href="https://joblib.readthedocs.io/en/latest/">Joblib</a> - Set
|
||
of tools to provide lightweight pipelining in Python.</li>
|
||
<li><a href="https://jug.readthedocs.io">Jug</a> - A task Based
|
||
parallelization framework for Python.</li>
|
||
<li><a href="https://github.com/quantumblacklabs/kedro">Kedro</a> -
|
||
Workflow development tool that helps you build data pipelines.</li>
|
||
<li><a href="https://github.com/kestra-io/kestra">Kestra</a> - Open
|
||
source data orchestration and scheduling platform with declarative
|
||
syntax.</li>
|
||
<li><a href="https://github.com/hammerlab/ketrew">Ketrew</a> - Embedded
|
||
DSL in the OCAML language alongside a client-server management
|
||
application.</li>
|
||
<li><a href="https://github.com/jtaghiyar/kronos">Kronos</a> - Workflow
|
||
assembler for cancer genome analytics and informatics.</li>
|
||
<li><a href="https://github.com/StanfordBioinformatics/loom">Loom</a> -
|
||
Tool for running bioinformatics workflows locally or in the cloud.</li>
|
||
<li><a href="http://www.hecbiosim.ac.uk/longbow">Longbow</a> - Job
|
||
proxying tool for biomolecular simulations.</li>
|
||
<li><a href="https://github.com/spotify/luigi">Luigi</a> - Python module
|
||
that helps you build complex pipelines of batch jobs.</li>
|
||
<li><a href="https://github.com/LLNL/maestrowf">Maestro</a> - YAML based
|
||
HPC workflow execution tool.</li>
|
||
<li><a href="http://ccl.cse.nd.edu/software/makeflow/">Makeflow</a> -
|
||
Workflow engine for executing large complex workflows on clusters.</li>
|
||
<li><a href="https://github.com/mara/data-integration">Mara</a> - A
|
||
lightweight, opinionated ETL framework, halfway between plain scripts
|
||
and Apache Airflow.</li>
|
||
<li><a href="https://github.com/intentmedia/mario">Mario</a> - Scala
|
||
library for defining data pipelines.</li>
|
||
<li><a href="http://martian-lang.org/">Martian</a> - A language and
|
||
framework for developing and executing complex computational
|
||
pipelines.</li>
|
||
<li><a href="https://github.com/MD-Studio/MDStudio">MD Studio</a> -
|
||
Microservice based workflow engine.</li>
|
||
<li><a href="https://metaflow.org/">MetaFlow</a> - Open-sourced
|
||
framework from Netflix, for DAG generation for data scientists. Python
|
||
and R API’s.</li>
|
||
<li><a href="https://github.com/openstack/mistral">Mistral</a> - Python
|
||
based workflow engine by the Open Stack project.</li>
|
||
<li><a href="https://github.com/mfiers/Moa">Moa</a> - Lightweight
|
||
workflows in bioinformatics.</li>
|
||
<li><a href="http://www.nextflow.io">Nextflow</a> - Flow-based
|
||
computational toolkit for reproducible and scalable bioinformatics
|
||
pipelines.</li>
|
||
<li><a href="https://github.com/nipy/nipype">NiPype</a> - Workflows and
|
||
interfaces for neuroimaging packages.</li>
|
||
<li><a href="https://github.com/adaptivegenome/openge">OpenGE</a> -
|
||
Accelerated framework for manipulating and interpreting high-throughput
|
||
sequencing data.</li>
|
||
<li><a href="https://www.pachyderm.io/">Pachyderm</a> - Distributed and
|
||
reproducible data pipelining and data management, built on the container
|
||
ecosystem.</li>
|
||
<li><a href="https://github.com/Parsl/parsl">Parsl</a> - Parallel
|
||
Scripting Library.</li>
|
||
<li><a
|
||
href="https://github.com/fstrozzi/bioruby-pipengine">PipEngine</a> -
|
||
Ruby based launcher for complex biological pipelines.</li>
|
||
<li><a href="https://github.com/pinterest/pinball">Pinball</a> - Python
|
||
based workflow engine by Pinterest.</li>
|
||
<li><a href="https://github.com/systemslab/popper">Popper</a> - YAML
|
||
based container-native workflow engine supporting Docker, Singularity,
|
||
Vagrant VMs with Docker daemon in VM, and local host.</li>
|
||
<li><a href="https://github.com/tweag/porcupine">Porcupine</a> - Haskell
|
||
workflow tool to express and compose tasks (optionally cached) whose
|
||
datasources and sinks are known ahead of time and rebindable, and which
|
||
can expose arbitrary sets of parameters to the outside world.</li>
|
||
<li><a href="https://docs.prefect.io/">Prefect</a> - Python based
|
||
workflow engine powering Prefect.</li>
|
||
<li><a href="https://github.com/nipype/pydra">Pydra</a> - Lightweight,
|
||
DAG-based Python dataflow engine for reproducible and scalable
|
||
scientific pipelines.</li>
|
||
<li><a href="https://github.com/Illumina/pyflow">PyFlow</a> -
|
||
Lightweight parallel task engine.</li>
|
||
<li><a href="https://github.com/baffelli/pyperator">pyperator</a> -
|
||
Simple push-based python workflow framework using asyncio, supporting
|
||
recursive networks.</li>
|
||
<li><a href="https://github.com/pwwang/pyppl">pyppl</a> - A python
|
||
lightweight pipeline framework.</li>
|
||
<li><a href="https://pypyr.io">pypyr</a> - Automation task-runner for
|
||
sequential steps defined in a pipeline yaml, with AWS and Slack
|
||
plug-ins.</li>
|
||
<li><a href="https://github.com/masa16/Pwrake/">Pwrake</a> - Parallel
|
||
workflow extension for Rake.</li>
|
||
<li><a href="https://bitbucket.org/berkeleylab/qdo">Qdo</a> -
|
||
Lightweight high-throughput queuing system for workflows with many small
|
||
tasks to perform.</li>
|
||
<li><a href="https://github.com/alastair-droop/qsubsec">Qsubsec</a> -
|
||
Simple tokenised template system for SGE.</li>
|
||
<li><a href="https://github.com/rabix/rabix">Rabix</a> - Python-based
|
||
workflow toolkit based on the Common Workflow Language and Docker.</li>
|
||
<li><a href="https://github.com/substantic/rain">Rain</a> - Framework
|
||
for large distributed task-based pipelines, written in Rust with Python
|
||
API.</li>
|
||
<li><a href="https://github.com/ray-project/ray">Ray</a> - Flexible,
|
||
high-performance distributed Python execution framework.</li>
|
||
<li><a href="https://github.com/insitro/redun">Redun</a> - Yet another
|
||
redundant workflow engine.</li>
|
||
<li><a href="https://github.com/grailbio/reflow">Reflow</a> - Language
|
||
and runtime for distributed, incremental data processing in the
|
||
cloud.</li>
|
||
<li><a href="https://github.com/richfitz/remake">Remake</a> - Make-like
|
||
declarative workflows in R.</li>
|
||
<li><a
|
||
href="http://physiology.med.cornell.edu/faculty/mason/lab/r-make/">Rmake</a>
|
||
- Wrapper for the creation of Makefiles, enabling massive
|
||
parallelization.</li>
|
||
<li><a href="https://github.com/bjpop/rubra">Rubra</a> - Pipeline system
|
||
for bioinformatics workflows.</li>
|
||
<li><a href="http://www.ruffus.org.uk">Ruffus</a> - Computation Pipeline
|
||
library for Python.</li>
|
||
<li><a href="https://github.com/kirillseva/ruigi">Ruigi</a> - Pipeline
|
||
tool for R, inspired by Luigi.</li>
|
||
<li><a href="http://tonyfischetti.github.io/sake/">Sake</a> -
|
||
Self-documenting build automation tool.</li>
|
||
<li><a href="https://github.com/pharmbio/sciluigi">SciLuigi</a> - Helper
|
||
library for writing flexible scientific workflows in Luigi.</li>
|
||
<li><a href="http://scipipe.org">SciPipe</a> - Library for writing
|
||
Scientific Workflows in Go.</li>
|
||
<li><a href="https://signac.io">Signac</a> - Lightweight, but scalable
|
||
framework for file-driven workflows to be run locally and on HPC
|
||
systems.</li>
|
||
<li><a href="https://github.com/soravux/scoop/">Scoop</a> - Scalable
|
||
Concurrent Operations in Python.</li>
|
||
<li><a href="https://github.com/nlgranger/SeqTools">Seqtools</a> -
|
||
Python library for lazy evaluation of pipelined transformations on
|
||
indexable containers.</li>
|
||
<li><a href="https://github.com/giacbrd/SmartPipeline">SmartPipeline</a>
|
||
- A framework for rapid development of robust data pipelines following a
|
||
simple design pattern.</li>
|
||
<li><a href="https://snakemake.readthedocs.io/en/stable">Snakemake</a> -
|
||
Tool for running and managing bioinformatics pipelines.</li>
|
||
<li><a href="https://github.com/knipknap/SpiffWorkflow">Spiff</a> -
|
||
Based on the Workflow Patterns initiative and implemented in
|
||
Python.</li>
|
||
<li><a href="https://github.com/sailthru/stolos">Stolos</a> - Directed
|
||
Acyclic Graph task dependency scheduler that simplify distributed
|
||
pipelines.</li>
|
||
<li><a href="https://github.com/minerva-ml/steppy">Steppy</a> -
|
||
lightweight, open-source, Python 3 library for fast and reproducible
|
||
experimentation.</li>
|
||
<li><a href="https://stpipe.readthedocs.io/">Stpipe</a> - File
|
||
processing pipelines as a Python library.</li>
|
||
<li><a href="https://github.com/alpha-unito/streamflow">StreamFlow</a> -
|
||
Container native workflow management system focused on hybrid
|
||
workflows.</li>
|
||
<li><a href="https://streampipes.apache.org">StreamPipes</a> - A
|
||
self-service IoT toolbox to enable non-technical users to connect,
|
||
analyze and explore IoT data streams.</li>
|
||
<li><a href="https://github.com/gilt/sundial">Sundial</a> - Jobsystem on
|
||
AWS ECS or AWS Batch managing dependencies and scheduling.</li>
|
||
<li><a href="https://github.com/Netflix/suro">Suro</a> - Java-based
|
||
distributed pipeline from Netflix.</li>
|
||
<li><a href="http://swift-lang.org">Swift</a> - Fast easy parallel
|
||
scripting - on multicores, clusters, clouds and supercomputers.</li>
|
||
<li><a href="https://github.com/ropensci/targets">Targets</a> - Dynamic,
|
||
function-oriented <a
|
||
href="https://www.gnu.org/software/make/">Make</a>-like reproducible
|
||
pipelines at scale in R.</li>
|
||
<li><a href="https://github.com/natcap/taskgraph">TaskGraph</a> - A
|
||
library to help manage complicated computational software pipelines
|
||
consisting of long running individual tasks.</li>
|
||
<li><a href="https://github.com/4dn-dcic/tibanna">Tibanna</a> - Tool
|
||
that helps you run genomic pipelines on Amazon cloud.</li>
|
||
<li><a href="https://github.com/BD2KGenomics/toil">Toil</a> -
|
||
Distributed pipeline workflow manager (mostly for genomics).</li>
|
||
<li><a href="http://opensource.nibr.com/yap/">Yap</a> - Extensible
|
||
parallel framework, written in Python using OpenMPI libraries.</li>
|
||
<li><a href="https://github.com/picanumber/yapp">Yapp</a> - A C++
|
||
parallel pipeline library for stream processing.</li>
|
||
<li><a href="https://www.wallaroolabs.com/">Wallaroo</a> - Framework for
|
||
streaming data applications and algorithms that react to real-time
|
||
events.</li>
|
||
<li><a href="http://worldmake.org/">WorldMake</a> - Easy Collaborative
|
||
Reproducible Computing.</li>
|
||
<li><a href="https://zenaton.com">Zenaton</a> - Workflow engine for
|
||
orchestrating jobs, data and events across your applications and third
|
||
party services.</li>
|
||
<li><a href="https://zenml.io">ZenML</a> - Extensible open-source MLOps
|
||
framework to create reproducible pipelines for data scientists.</li>
|
||
</ul>
|
||
<h2 id="workflow-platforms">Workflow platforms</h2>
|
||
<ul>
|
||
<li><a href="http://www.activepapers.org/">ActivePapers</a> -
|
||
Computational science made reproducible and publishable.</li>
|
||
<li><a href="https://github.com/automaticmode/active_workflow">Active
|
||
Workflow</a> - Polyglot workflows without leaving the comfort of your
|
||
technology stack.</li>
|
||
<li><a href="https://anvio.org/">Anvi’o</a> - A community and framework
|
||
centered around metagenomics, designed to facilitate reproducible
|
||
exploration and visualization of data.</li>
|
||
<li><a href="https://airavata.apache.org/">Apache Iravata</a> -
|
||
Framework for executing and managing computational workflows on
|
||
distributed computing resources.</li>
|
||
<li><a href="https://arteria-project.github.io/">Arteria</a> -
|
||
Event-driven automation for sequencing centers. Initiates workflows
|
||
based on events.</li>
|
||
<li><a href="http://arvados.org">Arvados</a> - A container based
|
||
workflow platform.</li>
|
||
<li>Biokepler - Bioinformatics Scientific Workflow for Distributed
|
||
Analysis of Large-Scale Biological Data. (<a
|
||
href="https://web.archive.org/web/20190108162953/https://www.biokepler.org/"><em>inactive
|
||
since 10/2019</em></a>)</li>
|
||
<li><a href="http://github.com/llevar/butler">Butler</a> - Framework for
|
||
running scientific workflows on public and academic clouds.</li>
|
||
<li><a href="http://chipster.csc.fi">Chipster</a> - Open source platform
|
||
for data analysis.</li>
|
||
<li><a href="https://bitbucket.org/bromberglab/clubber">Clubber</a> -
|
||
Cluster Load Balancer for Bioinformatics e-Resources.</li>
|
||
<li><a href="https://www.digdag.io">Digdag</a> - Workflow manager
|
||
designed for simplicity, extensibility and collaboration.</li>
|
||
<li><a href="https://github.com/Tauffer-Consulting/domino">Domino</a> -
|
||
User friendly and open source visual workflow management platform.</li>
|
||
<li><a
|
||
href="https://github.com/materialsproject/fireworks">Fireworks</a> -
|
||
Centralized workflow server for dynamic workflows of high-throughput
|
||
computations.</li>
|
||
<li><a href="https://github.com/lyft/flyte">Flyte</a> -
|
||
Container-native, type-safe workflow and pipelines platform for large
|
||
scale processing and ML.</li>
|
||
<li><a href="https://galaxyproject.org">Galaxy</a> - Powerful workflow
|
||
system which can be used on the command line or with the GUI.</li>
|
||
<li><a href="https://kepler-project.org/">Kepler</a> - Kepler scientific
|
||
workflow application from University of California.</li>
|
||
<li><a href="https://www.knime.org/knime-analytics-platform">KNIME
|
||
Analytics Platform</a> - General-purpose platform with many specialized
|
||
domain extensions.</li>
|
||
<li><a href="http://workflow.campagnelab.org">NextflowWorkbench</a> -
|
||
Integrated development environment for Nextflow, Docker and Reusable
|
||
Workflows.</li>
|
||
<li><a href="https://github.com/omegaml/omegaml">omega|ml DataOps
|
||
Platform</a> - Data & model pipeline deployment for humans -
|
||
integrated, scalable, extensible.</li>
|
||
<li><a href="http://www.openmole.org/current/">OpenMOLE</a> - Workflow
|
||
Management System for exploration of models and parameter
|
||
optimization.</li>
|
||
<li><a href="http://ophidia.cmcc.it">Ophidia</a> - Data-analytics
|
||
platform with declarative workflows of distributed operations.</li>
|
||
<li><a href="https://github.com/orchest/orchest">Orchest</a> - An IDE
|
||
for Data Science.</li>
|
||
<li><a href="http://pegasus.isi.edu">Pegasus</a> - Workflow Management
|
||
System.</li>
|
||
<li><a href="https://github.com/creactiviti/piper">Piper</a> -
|
||
Distributed workflow engine designed to be dead simple.</li>
|
||
<li><a href="https://github.com/polyaxon/polyaxon">Polyaxon</a> - A
|
||
platform for machine learning experimentation workflow.</li>
|
||
<li><a href="https://github.com/reanahub/reana">Reana</a> - Platform for
|
||
reusable research data analyses developed by CERN.</li>
|
||
<li><a href="https://github.com/uzh/sushi">Sushi</a> - Supporting User
|
||
for SHell script Integration.</li>
|
||
<li><a href="http://ccg.murdoch.edu.au/yabi">Yabi</a> - Online research
|
||
environment for grid, HPC and cloud computing.</li>
|
||
<li><a href="http://www.taverna.org.uk">Taverna</a> - Domain independent
|
||
workflow system.</li>
|
||
<li><a href="https://www.temporal.io/">Temporal</a> - Highly scalable
|
||
developer oriented <em>Workflow as Code</em> engine.</li>
|
||
<li><a href="http://www.vistrails.org/">VisTrails</a> - Scientific
|
||
workflow and provenance management system.</li>
|
||
<li><a href="http://www.wings-workflows.org">Wings</a> - Semantic
|
||
workflow system utilizing Pegasus as execution system.</li>
|
||
<li><a href="https://github.com/klugem/watchdog">Watchdog</a> - Workflow
|
||
management system for the automated and distributed analysis of
|
||
large-scale experimental data.</li>
|
||
<li><a href="https://www.flowhub.com.cn">FlowHub</a> - FlowHub is a new
|
||
workflow cloud platform.</li>
|
||
</ul>
|
||
<h2 id="workflow-languages">Workflow languages</h2>
|
||
<ul>
|
||
<li><a
|
||
href="https://github.com/common-workflow-language/common-workflow-language">Common
|
||
Workflow Language</a></li>
|
||
<li><a href="http://cloudgene.uibk.ac.at/developer-guide">Cloudgene
|
||
Workflow Language</a></li>
|
||
<li><a
|
||
href="http://www.openmole.org/current/Documentation_Language.html">OpenMOLE
|
||
DSL</a></li>
|
||
<li><a href="https://github.com/openwdl/wdl">Workflow Description
|
||
Language</a></li>
|
||
<li><a href="http://www.yawlfoundation.org">Yet Another Workflow
|
||
Language</a></li>
|
||
<li><a href="https://github.com/calebwin/pipelines">Pipelines</a></li>
|
||
</ul>
|
||
<h2 id="workflow-standardization-initiatives">Workflow standardization
|
||
initiatives</h2>
|
||
<ul>
|
||
<li><a href="http://www.wf4ever-project.org">Workflow 4 Ever
|
||
Initiative</a></li>
|
||
<li><a href="http://wf4ever.github.io/ro">Workflow 4 Ever workflow
|
||
research object model</a></li>
|
||
<li><a href="http://www.workflowpatterns.com">Workflow Patterns
|
||
Initiative</a></li>
|
||
<li><a href="http://www.workflowpatterns.com/patterns">Workflow Patterns
|
||
Library</a></li>
|
||
<li><a href="http://www.researchobject.org">ResearchObject.org</a></li>
|
||
</ul>
|
||
<h2 id="etl-data-orchestration">ETL & Data orchestration</h2>
|
||
<ul>
|
||
<li><a href="https://datalad.org">DataLad</a> - git and git-annex based
|
||
data version control system with lightweight provenance
|
||
capture/re-execution support.</li>
|
||
<li><a href="https://dvc.org">DVC</a> - Data version control system for
|
||
ML project with lightweight pipeline support.</li>
|
||
<li><a href="https://github.com/treeverse/lakeFS">lakeFS</a> -
|
||
Repeatable, atomic and versioned data lake on top of object
|
||
storage.</li>
|
||
<li><a href="https://github.com/projectnessie/nessie">Nessie</a> -
|
||
Provides Git-like capability & version control for Iceberg Tables,
|
||
Delta Lake Tables & SQL Views.</li>
|
||
</ul>
|
||
<h2 id="literate-programming-aka-interactive-notebooks">Literate
|
||
programming (aka interactive notebooks)</h2>
|
||
<ul>
|
||
<li><a href="http://beakernotebook.com/">Beaker</a> Notebook-style
|
||
development environment.</li>
|
||
<li><a href="http://mybinder.org/">Binder</a> - Turn a GitHub repo into
|
||
a collection of interactive notebooks powered by Jupyter and
|
||
Kubernetes</li>
|
||
<li><a href="https://ipython.org/">IPython</a> A rich architecture for
|
||
interactive computing.</li>
|
||
<li><a href="https://jupyter.org/">Jupyter</a> Language-agnostic
|
||
notebook literate programming environment.</li>
|
||
<li><a href="http://pathomx.org">Pathomx</a> - Interactive data
|
||
workflows built on Python.</li>
|
||
<li><a href="https://github.com/polynote/polynote">Polynote</a> - A
|
||
better notebook for Scala (and more). Built by Netflix.</li>
|
||
<li><a href="https://github.com/ploomber/ploomber">Ploomber</a> -
|
||
Consolidate your notebooks and scripts in a reproducible pipeline using
|
||
a <code>pipeline.yaml</code> file</li>
|
||
<li><a href="http://rmarkdown.rstudio.com/r_notebooks.html">R
|
||
Notebooks</a> - R Markdown notebook literate programming
|
||
environment.</li>
|
||
<li><a href="https://www.redpointnotebooks.com/">RedPoint Notebooks</a>
|
||
- Web-native computational notebook for programmers supporting multiple
|
||
languages, APIs and webooks.</li>
|
||
<li><a href="https://vatlab.github.io/sos-docs/">SoS</a> - Readable,
|
||
interactive, cross-platform and cross-language data science workflow
|
||
system.</li>
|
||
<li><a href="https://zeppelin.apache.org/">Zeppelin</a> - Web-based
|
||
notebook that enables interactive data analytics.</li>
|
||
</ul>
|
||
<h2 id="extract-transform-load-etl">Extract, transform, load (ETL)</h2>
|
||
<ul>
|
||
<li><a href="https://github.com/uber/cadence">Cadence</a> Distributed,
|
||
scalable, durable, and highly available orchestration engine developed
|
||
by Uber.</li>
|
||
<li><a href="https://github.com/dataform-co/dataform">Dataform</a> -
|
||
Dataform is a framework for managing SQL based operations in your data
|
||
warehouse.</li>
|
||
<li><a href="http://www.kiba-etl.org">Kiba ETL</a> - A data processing
|
||
& ETL framework for Ruby.</li>
|
||
<li><a href="https://etl.linkedpipes.com">LinkedPipes ETL</a> - Linked
|
||
Data publishing and consumption ETL tool.</li>
|
||
<li><a
|
||
href="https://community.hitachivantara.com/s/article/data-integration-kettle">Pentaho
|
||
Kettle</a> - A plataform that delivers poweful ETL capabilities, using a
|
||
groundbreaking, metadata-driven approach.</li>
|
||
<li><a href="https://github.com/brexhq/substation">Substation</a> -
|
||
Substation is a cloud native data pipeline and transformation toolkit
|
||
written in Go.</li>
|
||
</ul>
|
||
<h2 id="continuous-delivery-workflows">Continuous Delivery
|
||
workflows</h2>
|
||
<ul>
|
||
<li><a href="https://github.com/argoproj/argo">Argo</a> - Get stuff done
|
||
with container-native workflows for Kubernetes.</li>
|
||
<li><a href="https://github.com/ovh/cds">CDS</a> - A pipeline based
|
||
Continuous Delivery Service written in Golang.</li>
|
||
</ul>
|
||
<h2 id="build-automation-tools">Build automation tools</h2>
|
||
<ul>
|
||
<li><a href="http://bazel.io/">Bazel</a> - Build software just as
|
||
engineers do at Google.</li>
|
||
<li><a href="https://github.com/pydoit/doit">doit</a> - Highly
|
||
generalized task-management and automation in Python.</li>
|
||
<li><a href="http://gradle.org/">Gradle</a> - Unified cross platforms
|
||
builds.</li>
|
||
<li><a href="https://github.com/casey/just">Just</a> - Command and
|
||
recipe runner similar to Make, built in Rust.</li>
|
||
<li><a href="https://www.gnu.org/software/make/">Make</a> - The GNU Make
|
||
build system.</li>
|
||
<li><a href="https://github.com/prodmodel/prodmodel">Prodmodel</a> -
|
||
Build system for data science pipelines.</li>
|
||
<li><a href="http://www.scons.org/">Scons</a> - Python library focused
|
||
on C/C++ builds.</li>
|
||
<li><a href="https://github.com/ndmitchell/shake">Shake</a> - Define
|
||
robust build systems akin to GNU Make using Haskell.</li>
|
||
</ul>
|
||
<h2 id="automated-workflow-composition">Automated workflow
|
||
composition</h2>
|
||
<ul>
|
||
<li><a href="https://github.com/sanctuuary/APE">APE</a> - A tool for the
|
||
automated exploration of possible computational workflows based on
|
||
semantic annotations.</li>
|
||
</ul>
|
||
<h2 id="other-projects">Other projects</h2>
|
||
<ul>
|
||
<li><a href="http://hpcgridrunner.github.io/">HPC Grid Runner</a></li>
|
||
<li><a href="https://nifi.apache.org">NiFi</a> - Powerful and scalable
|
||
directed graphs of data routing, transformation, and system mediation
|
||
logic.</li>
|
||
<li><a href="https://github.com/gems-uff/noworkflow">noWorkflow</a> -
|
||
Supporting infrastructure to run scientific experiments without a
|
||
scientific workflow management system, and still get things like
|
||
provenance.</li>
|
||
<li><a href="https://www.reprozip.org/">Reprozip</a> - Simplifies the
|
||
process of creating reproducible experiments from command-line
|
||
executions.</li>
|
||
</ul>
|
||
<h2 id="related-lists">Related lists</h2>
|
||
<ul>
|
||
<li><a href="https://github.com/manuzhang/awesome-streaming">Awesome
|
||
streaming</a> - Curated list of awesome streaming frameworks,
|
||
applications.</li>
|
||
<li><a href="https://github.com/pawl/awesome-etl">Awesome ETL</a> -
|
||
Curated list of notable ETL (extract, transform, load) frameworks,
|
||
libraries and software.</li>
|
||
<li><a
|
||
href="https://github.com/meirwah/awesome-workflow-engines">Awesome
|
||
workflow engines</a> - Curated list of awesome open source workflow
|
||
engines.</li>
|
||
<li><a
|
||
href="https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems">Computational
|
||
Data Analysis Workflow Systems</a></li>
|
||
</ul>
|