update
This commit is contained in:
619
html/pipeline.html
Normal file
619
html/pipeline.html
Normal file
@@ -0,0 +1,619 @@
|
||||
<h1 id="awesome-pipeline">Awesome Pipeline</h1>
|
||||
<p>A curated list of awesome pipeline toolkits inspired by <a
|
||||
href="https://github.com/kahun/awesome-sysadmin">Awesome
|
||||
Sysadmin</a></p>
|
||||
<h2 id="pipeline-frameworks-libraries">Pipeline frameworks &
|
||||
libraries</h2>
|
||||
<ul>
|
||||
<li><a
|
||||
href="http://docs.stackstorm.com/actionchain.html">ActionChain</a> - A
|
||||
workflow system for simple linear success/failure workflows.</li>
|
||||
<li><a href="https://github.com/diana-hep/adage">Adage</a> - Small
|
||||
package to describe workflows that are not completely known at
|
||||
definition time.</li>
|
||||
<li><a href="https://github.com/aiidateam/aiida-core">AiiDA</a> -
|
||||
workflow manager with a strong focus on provenance, performance and
|
||||
extensibility.</li>
|
||||
<li><a href="https://github.com/airbnb/airflow">Airflow</a> -
|
||||
Python-based workflow system created by AirBnb.</li>
|
||||
<li><a href="http://www.anduril.org/anduril/site/">Anduril</a> -
|
||||
Component-based workflow framework for scientific data analysis.</li>
|
||||
<li><a href="https://www.antha-lang.org/">Antha</a> - High-level
|
||||
language for biology.</li>
|
||||
<li><a href="https://argoproj.github.io/argo-workflows/">Argo
|
||||
Workflows</a> - Container-native workflow engine for orchestrating
|
||||
parallel data processing, ML, or CI jobs on Kubernetes.</li>
|
||||
<li><a href="https://autosubmit.readthedocs.io/">Autosubmit</a> - An
|
||||
open source Python experiment and workflow manager used to manage
|
||||
complex workflows on Cloud and HPC platforms.</li>
|
||||
<li><a href="https://github.com/MG-RAST/AWE/">AWE</a> - Workflow and
|
||||
resource management system with CWL support.</li>
|
||||
<li><a href="https://github.com/argonne-lcf/balsam">Balsam</a> -
|
||||
Python-based high throughput task and workflow engine.</li>
|
||||
<li><a href="http://pcingola.github.io/BigDataScript/">Bds</a> -
|
||||
Scripting language for data pipelines.</li>
|
||||
<li><a href="https://beam.apache.org/">Beam</a> - Unified programming
|
||||
model for batch and streaming data-parallel processing pipelines.</li>
|
||||
<li><a href="https://github.com/evoldoers/biomake">BioMake</a> -
|
||||
GNU-Make-like utility for managing builds and complex workflows.</li>
|
||||
<li><a href="https://github.com/liyao001/BioQueue">BioQueue</a> -
|
||||
Explicit framework with web monitoring and resource estimation.</li>
|
||||
<li><a href="https://github.com/papenfusslab/bioshake">Bioshake</a> -
|
||||
Haskell DSL built on shake with strong typing and EDAM support.</li>
|
||||
<li><a href="https://github.com/pveber/bistro">Bistro</a> - Library to
|
||||
build and execute typed scientific workflows.</li>
|
||||
<li><a href="https://github.com/ssadedin/bpipe/">Bpipe</a> - Tool for
|
||||
running and managing bioinformatics pipelines.</li>
|
||||
<li><a href="https://github.com/bloomreach/briefly">Briefly</a> - Python
|
||||
Meta-programming Library for Job Flow Control.</li>
|
||||
<li><a href="https://github.com/dagworks-inc/burr">Burr</a> - Python
|
||||
based lightweight graph (i.e. can do loops and conditional branching,
|
||||
and not just DAGs) orchestrator.</li>
|
||||
<li><a href="http://clusterflow.io">Cluster Flow</a> - Command-line tool
|
||||
which uses common cluster managers to run bioinformatics pipelines.</li>
|
||||
<li><a href="https://github.com/monajemi/clusterjob">Clusterjob</a> -
|
||||
Automated reproducibility, and hassle-free submission of computational
|
||||
jobs to clusters.</li>
|
||||
<li><a href="https://github.com/cocoindex-io/cocoindex">Cocoindex</a> -
|
||||
ETL framework to build fresh index.</li>
|
||||
<li><a href="https://www.sing-group.org/compi">Compi</a> - Application
|
||||
framework for portable computational pipelines.</li>
|
||||
<li><a
|
||||
href="https://www.bsc.es/research-and-development/software-and-apps/software-list/comp-superscalar">Compss</a>
|
||||
- Programming model for distributed infrastructures.</li>
|
||||
<li><a href="https://github.com/tburdett/Conan2">Conan2</a> -
|
||||
Light-weight workflow management application.</li>
|
||||
<li><a href="https://github.com/robdmc/consecution">Consecution</a> - A
|
||||
Python pipeline abstraction inspired by Apache Storm topologies.</li>
|
||||
<li><a href="https://mizzou-cbmi.github.io/">Cosmos</a> - Python library
|
||||
for massively parallel workflows.</li>
|
||||
<li><a href="https://github.com/couler-proj/couler">Couler</a> - Unified
|
||||
interface for constructing and managing workflows on different workflow
|
||||
engines, such as Argo Workflows, Tekton Pipelines, and Apache
|
||||
Airflow.</li>
|
||||
<li><a href="https://github.com/AgnostiqHQ/covalent">Covalent</a> -
|
||||
Workflow orchestration toolkit for high-performance and quantum
|
||||
computing research and development.</li>
|
||||
<li><a href="https://github.com/broadinstitute/cromwell">Cromwell</a> -
|
||||
Workflow Management System geared towards scientific workflows from the
|
||||
Broad Institute.</li>
|
||||
<li><a href="https://github.com/joergen7/cuneiform">Cuneiform</a> -
|
||||
Advanced functional workflow language and framework, implemented in
|
||||
Erlang.</li>
|
||||
<li><a href="https://cylc.github.io/">Cylc</a> - A workflow engine for
|
||||
cycling systems, originally developed for operational environmental
|
||||
forecasting.</li>
|
||||
<li><a href="https://github.com/thieman/dagobah">Dagobah</a> - Simple
|
||||
DAG-based job scheduler in Python.</li>
|
||||
<li><a href="https://github.com/fulcrumgenomics/dagr">Dagr</a> - A scala
|
||||
based DSL and framework for writing and executing bioinformatics
|
||||
pipelines as Directed Acyclic Graphs.</li>
|
||||
<li><a href="https://github.com/dagster-io/dagster">Dagster</a> -
|
||||
Python-based API for defining DAGs that interfaces with popular workflow
|
||||
managers for building data applications.</li>
|
||||
<li><a href="https://datajoint.io">DataJoint</a> - an open-source
|
||||
relational framework for scientific data pipelines.</li>
|
||||
<li><a href="https://github.com/dask/dask">Dask</a> - Dask is a flexible
|
||||
parallel computing library for analytics.</li>
|
||||
<li><a href="https://www.getdbt.com/">Dbt</a> - Framework for writing
|
||||
analytics workflows entirely in SQL. The T part of ETL, focuses on
|
||||
analytics engineering.</li>
|
||||
<li><a
|
||||
href="https://github.com/googlegenomics/dockerflow">Dockerflow</a> -
|
||||
Workflow runner that uses Dataflow to run a series of tasks in
|
||||
Docker.</li>
|
||||
<li><a href="https://github.com/dotflow-io/dotflow">Dotflow</a> - Python
|
||||
library for creating pipelines and workflows easily.</li>
|
||||
<li><a href="https://github.com/Factual/drake">Drake</a> - Robust DSL
|
||||
akin to Make, implemented in Clojure.</li>
|
||||
<li><a href="https://github.com/ropensci/drake">Drake R package</a> -
|
||||
Reproducibility and high-performance computing with an easy R-focused
|
||||
interface. Unrelated to <a
|
||||
href="https://github.com/factual/drake">Factual’s Drake</a>. Succeeded
|
||||
by <a href="https://github.com/ropensci/targets">Targets</a>.</li>
|
||||
<li><a href="https://github.com/CenturyLinkLabs/dray">Dray</a> - An
|
||||
engine for managing the execution of container-based workflows.</li>
|
||||
<li><a href="https://github.com/ecmwf/ecflow">ecFlow</a> - Workflow
|
||||
manager.</li>
|
||||
<li><a href="https://github.com/Ensembl/ensembl-hive">eHive</a> - System
|
||||
for creating and running pipelines on a distributed compute
|
||||
resource.</li>
|
||||
<li><a href="https://github.com/fission/fission-workflows">Fission
|
||||
Workflows</a> - A fast, lightweight workflow engine for serverless/FaaS
|
||||
functions.</li>
|
||||
<li><a href="https://github.com/druths/flex/">Flex</a> - Language
|
||||
agnostic framework for building flexible data science pipelines
|
||||
(Python/Shell/Gnuplot).</li>
|
||||
<li><a href="https://github.com/sahilseth/flowr">Flowr</a> - Robust and
|
||||
efficient workflows using a simple language agnostic approach (R
|
||||
package).</li>
|
||||
<li><a href="https://github.com/uzh/gc3pie">Gc3pie</a> - Python
|
||||
libraries and tools for running applications on diverse Grids and
|
||||
clusters.</li>
|
||||
<li><a href="https://guixwl.org/">Guix Workflow Language</a> - A
|
||||
workflow management language extension for GNU Guix.</li>
|
||||
<li><a href="https://github.com/mailund/gwf">Gwf</a> - Make-like utility
|
||||
for submitting workflows via qsub.</li>
|
||||
<li><a href="https://github.com/dagworks-inc/hamilton">Hamilton</a> - A
|
||||
python micro-framework for describing dataflows; runs anywhere python
|
||||
runs.</li>
|
||||
<li><a href="https://github.com/argoproj-labs/hera">Hera</a> - Hera is
|
||||
an Argo Python SDK. Hera aims to make construction and submission of
|
||||
various Argo Project resources easy and accessible to everyone! Hera
|
||||
abstracts away low-level setup details while still maintaining a
|
||||
consistent vocabulary with Argo.</li>
|
||||
<li><a href="https://github.com/It4innovations/HyperLoom">HyperLoom</a>
|
||||
- Platform for defining and executing workflow pipelines in large-scale
|
||||
distributed environments.</li>
|
||||
<li><a
|
||||
href="https://github.com/It4innovations/hyperqueue">HyperQueue</a> -
|
||||
HPC-focused task scheduler that automatically assigns tasks to Slurm/PBS
|
||||
allocations and submits them for the user.</li>
|
||||
<li><a href="https://joblib.readthedocs.io/en/latest/">Joblib</a> - Set
|
||||
of tools to provide lightweight pipelining in Python.</li>
|
||||
<li><a href="https://jug.readthedocs.io">Jug</a> - A task Based
|
||||
parallelization framework for Python.</li>
|
||||
<li><a href="https://github.com/quantumblacklabs/kedro">Kedro</a> -
|
||||
Workflow development tool that helps you build data pipelines.</li>
|
||||
<li><a href="https://github.com/kestra-io/kestra">Kestra</a> - Open
|
||||
source data orchestration and scheduling platform with declarative
|
||||
syntax.</li>
|
||||
<li><a href="https://github.com/hammerlab/ketrew">Ketrew</a> - Embedded
|
||||
DSL in the OCAML language alongside a client-server management
|
||||
application.</li>
|
||||
<li>[https://github.com/Nike-Inc/koheesio] - Python framework for
|
||||
building efficient data pipelines.</li>
|
||||
<li><a href="https://github.com/jtaghiyar/kronos">Kronos</a> - Workflow
|
||||
assembler for cancer genome analytics and informatics.</li>
|
||||
<li><a
|
||||
href="https://www.kubeflow.org/docs/components/pipelines/">Kubeflow
|
||||
Pipelines</a> - Framework for building and deploying portable, scalable
|
||||
machine learning workflows using Docker containers and Argo
|
||||
Workflows.</li>
|
||||
<li><a href="https://github.com/StanfordBioinformatics/loom">Loom</a> -
|
||||
Tool for running bioinformatics workflows locally or in the cloud.</li>
|
||||
<li><a href="http://www.hecbiosim.ac.uk/longbow">Longbow</a> - Job
|
||||
proxying tool for biomolecular simulations.</li>
|
||||
<li><a href="https://github.com/spotify/luigi">Luigi</a> - Python module
|
||||
that helps you build complex pipelines of batch jobs.</li>
|
||||
<li><a href="https://github.com/LLNL/maestrowf">Maestro</a> - YAML based
|
||||
HPC workflow execution tool.</li>
|
||||
<li><a href="http://ccl.cse.nd.edu/software/makeflow/">Makeflow</a> -
|
||||
Workflow engine for executing large complex workflows on clusters.</li>
|
||||
<li><a href="https://github.com/kinto-b/makepipe">makepipe</a> - An R
|
||||
package which provides a set of simple tools for transforming an
|
||||
existing workflow into a self-documenting pipeline with very minimal
|
||||
upfront costs.</li>
|
||||
<li><a href="https://github.com/mara/data-integration">Mara</a> - A
|
||||
lightweight, opinionated ETL framework, halfway between plain scripts
|
||||
and Apache Airflow.</li>
|
||||
<li><a href="https://github.com/intentmedia/mario">Mario</a> - Scala
|
||||
library for defining data pipelines.</li>
|
||||
<li><a href="http://martian-lang.org/">Martian</a> - A language and
|
||||
framework for developing and executing complex computational
|
||||
pipelines.</li>
|
||||
<li><a href="https://github.com/MD-Studio/MDStudio">MD Studio</a> -
|
||||
Microservice based workflow engine.</li>
|
||||
<li><a href="https://metaflow.org/">MetaFlow</a> - Open-sourced
|
||||
framework from Netflix, for DAG generation for data scientists. Python
|
||||
and R API’s.</li>
|
||||
<li><a href="https://github.com/openstack/mistral">Mistral</a> - Python
|
||||
based workflow engine by the Open Stack project.</li>
|
||||
<li><a href="https://github.com/mfiers/Moa">Moa</a> - Lightweight
|
||||
workflows in bioinformatics.</li>
|
||||
<li><a href="http://www.nextflow.io">Nextflow</a> - Flow-based
|
||||
computational toolkit for reproducible and scalable bioinformatics
|
||||
pipelines.</li>
|
||||
<li><a href="https://github.com/NitorCreations/nFlow">nFlow</a> -
|
||||
Embeddable JVM-based workflow engine with high availability, fault
|
||||
tolerance, and support for multiple databases. Additional libraries are
|
||||
provided for visualization and REST API.</li>
|
||||
<li><a href="https://github.com/nipy/nipype">NiPype</a> - Workflows and
|
||||
interfaces for neuroimaging packages.</li>
|
||||
<li><a href="https://github.com/adaptivegenome/openge">OpenGE</a> -
|
||||
Accelerated framework for manipulating and interpreting high-throughput
|
||||
sequencing data.</li>
|
||||
<li><a href="https://www.pachyderm.io/">Pachyderm</a> - Distributed and
|
||||
reproducible data pipelining and data management, built on the container
|
||||
ecosystem.</li>
|
||||
<li><a href="https://parsl-project.org/">Parsl</a> - Productive parallel
|
||||
programming, for creating parallel programs composed of Python functions
|
||||
and external components.</li>
|
||||
<li><a href="https://github.com/pipefunc/pipefunc">PipeFunc</a> -
|
||||
Lightweight function pipeline (DAG) creation in pure Python for
|
||||
scientific workflows.</li>
|
||||
<li><a
|
||||
href="https://github.com/fstrozzi/bioruby-pipengine">PipEngine</a> -
|
||||
Ruby based launcher for complex biological pipelines.</li>
|
||||
<li><a href="https://github.com/pinterest/pinball">Pinball</a> - Python
|
||||
based workflow engine by Pinterest.</li>
|
||||
<li><a href="https://github.com/systemslab/popper">Popper</a> - YAML
|
||||
based container-native workflow engine supporting Docker, Singularity,
|
||||
Vagrant VMs with Docker daemon in VM, and local host.</li>
|
||||
<li><a href="https://github.com/tweag/porcupine">Porcupine</a> - Haskell
|
||||
workflow tool to express and compose tasks (optionally cached) whose
|
||||
datasources and sinks are known ahead of time and rebindable, and which
|
||||
can expose arbitrary sets of parameters to the outside world.</li>
|
||||
<li><a href="https://docs.prefect.io/">Prefect</a> - Python based
|
||||
workflow engine powering Prefect.</li>
|
||||
<li><a href="https://github.com/nipype/pydra">Pydra</a> - Lightweight,
|
||||
DAG-based Python dataflow engine for reproducible and scalable
|
||||
scientific pipelines.</li>
|
||||
<li><a href="https://github.com/Illumina/pyflow">PyFlow</a> -
|
||||
Lightweight parallel task engine.</li>
|
||||
<li><a href="https://github.com/baffelli/pyperator">pyperator</a> -
|
||||
Simple push-based python workflow framework using asyncio, supporting
|
||||
recursive networks.</li>
|
||||
<li><a href="https://github.com/pwwang/pyppl">pyppl</a> - A python
|
||||
lightweight pipeline framework.</li>
|
||||
<li><a href="https://pypyr.io">pypyr</a> - Automation task-runner for
|
||||
sequential steps defined in a pipeline yaml, with AWS and Slack
|
||||
plug-ins.</li>
|
||||
<li><a href="https://github.com/pytask-dev/pytask">pytask</a> - A
|
||||
workflow management system that facilitates reproducible data
|
||||
analyses.</li>
|
||||
<li><a href="https://github.com/masa16/Pwrake/">Pwrake</a> - Parallel
|
||||
workflow extension for Rake.</li>
|
||||
<li><a href="https://bitbucket.org/berkeleylab/qdo">Qdo</a> -
|
||||
Lightweight high-throughput queuing system for workflows with many small
|
||||
tasks to perform.</li>
|
||||
<li><a href="https://github.com/alastair-droop/qsubsec">Qsubsec</a> -
|
||||
Simple tokenised template system for SGE.</li>
|
||||
<li><a href="https://github.com/rabix/rabix">Rabix</a> - Python-based
|
||||
workflow toolkit based on the Common Workflow Language and Docker.</li>
|
||||
<li><a href="https://github.com/substantic/rain">Rain</a> - Framework
|
||||
for large distributed task-based pipelines, written in Rust with Python
|
||||
API.</li>
|
||||
<li><a href="https://github.com/ray-project/ray">Ray</a> - Flexible,
|
||||
high-performance distributed Python execution framework.</li>
|
||||
<li><a href="https://github.com/insitro/redun">Redun</a> - Yet another
|
||||
redundant workflow engine.</li>
|
||||
<li><a href="https://github.com/grailbio/reflow">Reflow</a> - Language
|
||||
and runtime for distributed, incremental data processing in the
|
||||
cloud.</li>
|
||||
<li><a href="https://github.com/richfitz/remake">Remake</a> - Make-like
|
||||
declarative workflows in R.</li>
|
||||
<li><a
|
||||
href="http://physiology.med.cornell.edu/faculty/mason/lab/r-make/">Rmake</a>
|
||||
- Wrapper for the creation of Makefiles, enabling massive
|
||||
parallelization.</li>
|
||||
<li><a href="https://github.com/bjpop/rubra">Rubra</a> - Pipeline system
|
||||
for bioinformatics workflows.</li>
|
||||
<li><a href="http://www.ruffus.org.uk">Ruffus</a> - Computation Pipeline
|
||||
library for Python.</li>
|
||||
<li><a href="https://github.com/kirillseva/ruigi">Ruigi</a> - Pipeline
|
||||
tool for R, inspired by Luigi.</li>
|
||||
<li><a href="http://tonyfischetti.github.io/sake/">Sake</a> -
|
||||
Self-documenting build automation tool.</li>
|
||||
<li><a href="https://github.com/pharmbio/sciluigi">SciLuigi</a> - Helper
|
||||
library for writing flexible scientific workflows in Luigi.</li>
|
||||
<li><a href="http://scipipe.org">SciPipe</a> - Library for writing
|
||||
Scientific Workflows in Go.</li>
|
||||
<li><a href="https://signac.io">Signac</a> - Lightweight, but scalable
|
||||
framework for file-driven workflows to be run locally and on HPC
|
||||
systems.</li>
|
||||
<li><a href="https://github.com/soravux/scoop/">Scoop</a> - Scalable
|
||||
Concurrent Operations in Python.</li>
|
||||
<li><a href="https://github.com/nlgranger/SeqTools">Seqtools</a> -
|
||||
Python library for lazy evaluation of pipelined transformations on
|
||||
indexable containers.</li>
|
||||
<li><a href="https://github.com/giacbrd/SmartPipeline">SmartPipeline</a>
|
||||
- A framework for rapid development of robust data pipelines following a
|
||||
simple design pattern.</li>
|
||||
<li><a href="https://snakemake.readthedocs.io/en/stable">Snakemake</a> -
|
||||
Tool for running and managing bioinformatics pipelines.</li>
|
||||
<li><a href="https://github.com/knipknap/SpiffWorkflow">Spiff</a> -
|
||||
Based on the Workflow Patterns initiative and implemented in
|
||||
Python.</li>
|
||||
<li><a href="https://github.com/sailthru/stolos">Stolos</a> - Directed
|
||||
Acyclic Graph task dependency scheduler that simplify distributed
|
||||
pipelines.</li>
|
||||
<li><a href="https://github.com/minerva-ml/steppy">Steppy</a> -
|
||||
lightweight, open-source, Python 3 library for fast and reproducible
|
||||
experimentation. (This repository has been archived by the owner on Jun
|
||||
22, 2022.)</li>
|
||||
<li><a href="https://stpipe.readthedocs.io/">Stpipe</a> - File
|
||||
processing pipelines as a Python library.</li>
|
||||
<li><a href="https://github.com/alpha-unito/streamflow">StreamFlow</a> -
|
||||
Container native workflow management system focused on hybrid
|
||||
workflows.</li>
|
||||
<li><a href="https://streampipes.apache.org">StreamPipes</a> - A
|
||||
self-service IoT toolbox to enable non-technical users to connect,
|
||||
analyze and explore IoT data streams.</li>
|
||||
<li><a href="https://github.com/gilt/sundial">Sundial</a> - Jobsystem on
|
||||
AWS ECS or AWS Batch managing dependencies and scheduling.</li>
|
||||
<li><a href="https://github.com/Netflix/suro">Suro</a> - Java-based
|
||||
distributed pipeline from Netflix.</li>
|
||||
<li><a href="http://swift-lang.org">Swift</a> - Fast easy parallel
|
||||
scripting - on multicores, clusters, clouds and supercomputers.</li>
|
||||
<li><a href="https://github.com/ices-tools-prod/TAF">TAF</a> - R package
|
||||
to organize reproducible scientific workflows.</li>
|
||||
<li><a href="https://github.com/ropensci/targets">Targets</a> - Dynamic,
|
||||
function-oriented <a
|
||||
href="https://www.gnu.org/software/make/">Make</a>-like reproducible
|
||||
pipelines at scale in R.</li>
|
||||
<li><a href="https://github.com/natcap/taskgraph">TaskGraph</a> - A
|
||||
library to help manage complicated computational software pipelines
|
||||
consisting of long running individual tasks.</li>
|
||||
<li><a href="https://github.com/4dn-dcic/tibanna">Tibanna</a> - Tool
|
||||
that helps you run genomic pipelines on Amazon cloud.</li>
|
||||
<li><a href="https://github.com/BD2KGenomics/toil">Toil</a> -
|
||||
Distributed pipeline workflow manager (mostly for genomics).</li>
|
||||
<li><a href="http://opensource.nibr.com/yap/">Yap</a> - Extensible
|
||||
parallel framework, written in Python using OpenMPI libraries.</li>
|
||||
<li><a href="https://github.com/picanumber/yapp">Yapp</a> - A C++
|
||||
parallel pipeline library for stream processing.</li>
|
||||
<li><a href="https://www.wallaroolabs.com/">Wallaroo</a> - Framework for
|
||||
streaming data applications and algorithms that react to real-time
|
||||
events.</li>
|
||||
<li><a href="http://worldmake.org/">WorldMake</a> - Easy Collaborative
|
||||
Reproducible Computing.</li>
|
||||
<li><a href="https://zenaton.com">Zenaton</a> - Workflow engine for
|
||||
orchestrating jobs, data and events across your applications and third
|
||||
party services.</li>
|
||||
<li><a href="https://zenml.io">ZenML</a> - Extensible open-source MLOps
|
||||
framework to create reproducible pipelines for data scientists.</li>
|
||||
</ul>
|
||||
<h2 id="workflow-platforms">Workflow platforms</h2>
|
||||
<ul>
|
||||
<li><a href="http://www.activepapers.org/">ActivePapers</a> -
|
||||
Computational science made reproducible and publishable.</li>
|
||||
<li><a href="https://github.com/automaticmode/active_workflow">Active
|
||||
Workflow</a> - Polyglot workflows without leaving the comfort of your
|
||||
technology stack.</li>
|
||||
<li><a href="https://anvio.org/">Anvi’o</a> - A community and framework
|
||||
centered around metagenomics, designed to facilitate reproducible
|
||||
exploration and visualization of data.</li>
|
||||
<li><a href="https://airavata.apache.org/">Apache Iravata</a> -
|
||||
Framework for executing and managing computational workflows on
|
||||
distributed computing resources.</li>
|
||||
<li><a href="https://arteria-project.github.io/">Arteria</a> -
|
||||
Event-driven automation for sequencing centers. Initiates workflows
|
||||
based on events.</li>
|
||||
<li><a href="http://arvados.org">Arvados</a> - A container based
|
||||
workflow platform.</li>
|
||||
<li>Biokepler - Bioinformatics Scientific Workflow for Distributed
|
||||
Analysis of Large-Scale Biological Data. (<a
|
||||
href="https://web.archive.org/web/20190108162953/https://www.biokepler.org/"><em>inactive
|
||||
since 10/2019</em></a>)</li>
|
||||
<li><a href="http://github.com/llevar/butler">Butler</a> - Framework for
|
||||
running scientific workflows on public and academic clouds.</li>
|
||||
<li><a href="http://chipster.csc.fi">Chipster</a> - Open source platform
|
||||
for data analysis.</li>
|
||||
<li><a href="https://bitbucket.org/bromberglab/clubber">Clubber</a> -
|
||||
Cluster Load Balancer for Bioinformatics e-Resources.</li>
|
||||
<li><a href="https://www.digdag.io">Digdag</a> - Workflow manager
|
||||
designed for simplicity, extensibility and collaboration.</li>
|
||||
<li><a href="https://github.com/Tauffer-Consulting/domino">Domino</a> -
|
||||
User friendly and open source visual workflow management platform.</li>
|
||||
<li><a
|
||||
href="https://github.com/materialsproject/fireworks">Fireworks</a> -
|
||||
Centralized workflow server for dynamic workflows of high-throughput
|
||||
computations.</li>
|
||||
<li><a href="https://github.com/flojoy-ai/studio">Flojoy</a> - Open
|
||||
source visual Python scripting for test, measurement, and robotics
|
||||
control.</li>
|
||||
<li><a href="https://github.com/lyft/flyte">Flyte</a> -
|
||||
Container-native, type-safe workflow and pipelines platform for large
|
||||
scale processing and ML.</li>
|
||||
<li><a href="https://galaxyproject.org">Galaxy</a> - Powerful workflow
|
||||
system which can be used on the command line or with the GUI.</li>
|
||||
<li><a href="https://github.com/ESIPFed/Geoweaver">Geoweaver</a> -
|
||||
In-browser tool for data processing workflows with high-performance
|
||||
server support, featuring code history and workflow orchestration.</li>
|
||||
<li><a href="https://kepler-project.org/">Kepler</a> - Kepler scientific
|
||||
workflow application from University of California.</li>
|
||||
<li><a href="https://www.knime.org/knime-analytics-platform">KNIME
|
||||
Analytics Platform</a> - General-purpose platform with many specialized
|
||||
domain extensions.</li>
|
||||
<li><a href="https://www.kubeflow.org/">Kubeflow</a> - Toolkit for
|
||||
making deployments of machine learning workflows on Kubernetes simple,
|
||||
portable and scalable.</li>
|
||||
<li><a href="http://workflow.campagnelab.org">NextflowWorkbench</a> -
|
||||
Integrated development environment for Nextflow, Docker and Reusable
|
||||
Workflows.</li>
|
||||
<li><a href="https://github.com/omegaml/omegaml">omega|ml DataOps
|
||||
Platform</a> - Data & model pipeline deployment for humans -
|
||||
integrated, scalable, extensible.</li>
|
||||
<li><a href="http://www.openmole.org/current/">OpenMOLE</a> - Workflow
|
||||
Management System for exploration of models and parameter
|
||||
optimization.</li>
|
||||
<li><a href="http://ophidia.cmcc.it">Ophidia</a> - Data-analytics
|
||||
platform with declarative workflows of distributed operations.</li>
|
||||
<li><a href="https://github.com/orchest/orchest">Orchest</a> - An IDE
|
||||
for Data Science.</li>
|
||||
<li><a href="http://pegasus.isi.edu">Pegasus</a> - Workflow Management
|
||||
System.</li>
|
||||
<li><a href="https://github.com/creactiviti/piper">Piper</a> -
|
||||
Distributed workflow engine designed to be dead simple.</li>
|
||||
<li><a href="https://github.com/polyaxon/polyaxon">Polyaxon</a> - A
|
||||
platform for machine learning experimentation workflow.</li>
|
||||
<li><a href="https://github.com/reanahub/reana">Reana</a> - Platform for
|
||||
reusable research data analyses developed by CERN.</li>
|
||||
<li><a href="https://github.com/uzh/sushi">Sushi</a> - Supporting User
|
||||
for SHell script Integration.</li>
|
||||
<li><a href="http://ccg.murdoch.edu.au/yabi">Yabi</a> - Online research
|
||||
environment for grid, HPC and cloud computing.</li>
|
||||
<li><a href="http://www.taverna.org.uk">Taverna</a> - Domain independent
|
||||
workflow system.</li>
|
||||
<li><a href="https://www.temporal.io/">Temporal</a> - Highly scalable
|
||||
developer oriented <em>Workflow as Code</em> engine.</li>
|
||||
<li><a href="https://github.com/windmill-labs/windmill">Windmill</a> -
|
||||
Developer platform and workflow engine to turn scripts into internal
|
||||
tools.</li>
|
||||
<li><a href="http://www.vistrails.org/">VisTrails</a> - Scientific
|
||||
workflow and provenance management system.</li>
|
||||
<li><a href="http://www.wings-workflows.org">Wings</a> - Semantic
|
||||
workflow system utilizing Pegasus as execution system.</li>
|
||||
<li><a href="https://github.com/klugem/watchdog">Watchdog</a> - Workflow
|
||||
management system for the automated and distributed analysis of
|
||||
large-scale experimental data.</li>
|
||||
<li><a href="https://www.flowhub.com.cn">FlowHub</a> - FlowHub is a new
|
||||
workflow cloud platform.</li>
|
||||
</ul>
|
||||
<h2 id="workflow-languages">Workflow languages</h2>
|
||||
<ul>
|
||||
<li><a
|
||||
href="https://github.com/common-workflow-language/common-workflow-language">Common
|
||||
Workflow Language</a></li>
|
||||
<li><a href="http://cloudgene.uibk.ac.at/developer-guide">Cloudgene
|
||||
Workflow Language</a></li>
|
||||
<li><a
|
||||
href="http://www.openmole.org/current/Documentation_Language.html">OpenMOLE
|
||||
DSL</a></li>
|
||||
<li><a href="https://github.com/openwdl/wdl">Workflow Description
|
||||
Language</a></li>
|
||||
<li><a href="http://www.yawlfoundation.org">Yet Another Workflow
|
||||
Language</a></li>
|
||||
<li><a href="https://github.com/calebwin/pipelines">Pipelines</a></li>
|
||||
</ul>
|
||||
<h2 id="workflow-standardization-initiatives">Workflow standardization
|
||||
initiatives</h2>
|
||||
<ul>
|
||||
<li><a href="http://www.wf4ever-project.org">Workflow 4 Ever
|
||||
Initiative</a></li>
|
||||
<li><a href="http://wf4ever.github.io/ro">Workflow 4 Ever workflow
|
||||
research object model</a></li>
|
||||
<li><a href="http://www.workflowpatterns.com">Workflow Patterns
|
||||
Initiative</a></li>
|
||||
<li><a href="http://www.workflowpatterns.com/patterns">Workflow Patterns
|
||||
Library</a></li>
|
||||
<li><a href="http://www.researchobject.org">ResearchObject.org</a></li>
|
||||
</ul>
|
||||
<h2 id="etl-data-orchestration">ETL & Data orchestration</h2>
|
||||
<ul>
|
||||
<li><a href="https://datalad.org">DataLad</a> - git and git-annex based
|
||||
data version control system with lightweight provenance
|
||||
capture/re-execution support.</li>
|
||||
<li><a href="https://dvc.org">DVC</a> - Data version control system for
|
||||
ML project with lightweight pipeline support.</li>
|
||||
<li><a href="https://github.com/treeverse/lakeFS">lakeFS</a> -
|
||||
Repeatable, atomic and versioned data lake on top of object
|
||||
storage.</li>
|
||||
<li><a href="https://github.com/projectnessie/nessie">Nessie</a> -
|
||||
Provides Git-like capability & version control for Iceberg Tables,
|
||||
Delta Lake Tables & SQL Views.</li>
|
||||
</ul>
|
||||
<h2 id="literate-programming-aka-interactive-notebooks">Literate
|
||||
programming (aka interactive notebooks)</h2>
|
||||
<ul>
|
||||
<li><a href="http://beakernotebook.com/">Beaker</a> Notebook-style
|
||||
development environment.</li>
|
||||
<li><a href="http://mybinder.org/">Binder</a> - Turn a GitHub repo into
|
||||
a collection of interactive notebooks powered by Jupyter and
|
||||
Kubernetes</li>
|
||||
<li><a href="https://ipython.org/">IPython</a> A rich architecture for
|
||||
interactive computing.</li>
|
||||
<li><a href="https://jupyter.org/">Jupyter</a> Language-agnostic
|
||||
notebook literate programming environment.</li>
|
||||
<li><a href="https://orgmode.org/">Org Mode</a> GNU Emacs major mode for
|
||||
computational notebooks, literate programming, and much more.</li>
|
||||
<li><a href="http://pathomx.org">Pathomx</a> - Interactive data
|
||||
workflows built on Python.</li>
|
||||
<li><a href="https://github.com/polynote/polynote">Polynote</a> - A
|
||||
better notebook for Scala (and more). Built by Netflix.</li>
|
||||
<li><a href="https://github.com/ploomber/ploomber">Ploomber</a> -
|
||||
Consolidate your notebooks and scripts in a reproducible pipeline using
|
||||
a <code>pipeline.yaml</code> file</li>
|
||||
<li><a href="http://rmarkdown.rstudio.com/r_notebooks.html">R
|
||||
Notebooks</a> - R Markdown notebook literate programming
|
||||
environment.</li>
|
||||
<li><a href="https://www.redpointnotebooks.com/">RedPoint Notebooks</a>
|
||||
- Web-native computational notebook for programmers supporting multiple
|
||||
languages, APIs and webooks.</li>
|
||||
<li><a href="https://vatlab.github.io/sos-docs/">SoS</a> - Readable,
|
||||
interactive, cross-platform and cross-language data science workflow
|
||||
system.</li>
|
||||
<li><a href="https://zeppelin.apache.org/">Zeppelin</a> - Web-based
|
||||
notebook that enables interactive data analytics.</li>
|
||||
</ul>
|
||||
<h2 id="extract-transform-load-etl">Extract, transform, load (ETL)</h2>
|
||||
<ul>
|
||||
<li><a href="https://github.com/uber/cadence">Cadence</a> Distributed,
|
||||
scalable, durable, and highly available orchestration engine developed
|
||||
by Uber.</li>
|
||||
<li><a href="https://github.com/dataform-co/dataform">Dataform</a> -
|
||||
Dataform is a framework for managing SQL based operations in your data
|
||||
warehouse.</li>
|
||||
<li><a href="https://hevodata.com/integrations/pipeline/">Hevo</a> -
|
||||
Hevo is a Fully Automated, No-code Data Pipeline Platform that supports
|
||||
150+ ready-to-use integrations across Databases, SaaS Applications,
|
||||
Cloud Storage, SDKs, and Streaming Services.</li>
|
||||
<li><a href="http://www.kiba-etl.org">Kiba ETL</a> - A data processing
|
||||
& ETL framework for Ruby.</li>
|
||||
<li><a href="https://etl.linkedpipes.com">LinkedPipes ETL</a> - Linked
|
||||
Data publishing and consumption ETL tool.</li>
|
||||
<li><a
|
||||
href="https://community.hitachivantara.com/s/article/data-integration-kettle">Pentaho
|
||||
Kettle</a> - A plataform that delivers poweful ETL capabilities, using a
|
||||
groundbreaking, metadata-driven approach.</li>
|
||||
<li><a href="https://github.com/brexhq/substation">Substation</a> -
|
||||
Substation is a cloud native data pipeline and transformation toolkit
|
||||
written in Go.</li>
|
||||
</ul>
|
||||
<h2 id="continuous-delivery-workflows">Continuous Delivery
|
||||
workflows</h2>
|
||||
<ul>
|
||||
<li><a href="https://github.com/argoproj/argo">Argo</a> - Get stuff done
|
||||
with container-native workflows for Kubernetes.</li>
|
||||
<li><a href="https://github.com/ovh/cds">CDS</a> - A pipeline based
|
||||
Continuous Delivery Service written in Golang.</li>
|
||||
</ul>
|
||||
<h2 id="build-automation-tools">Build automation tools</h2>
|
||||
<ul>
|
||||
<li><a href="http://bazel.io/">Bazel</a> - Build software just as
|
||||
engineers do at Google.</li>
|
||||
<li><a href="https://github.com/pydoit/doit">doit</a> - Highly
|
||||
generalized task-management and automation in Python.</li>
|
||||
<li><a href="http://gradle.org/">Gradle</a> - Unified cross platforms
|
||||
builds.</li>
|
||||
<li><a href="https://github.com/casey/just">Just</a> - Command and
|
||||
recipe runner similar to Make, built in Rust.</li>
|
||||
<li><a href="https://www.gnu.org/software/make/">Make</a> - The GNU Make
|
||||
build system.</li>
|
||||
<li><a href="https://github.com/prodmodel/prodmodel">Prodmodel</a> -
|
||||
Build system for data science pipelines.</li>
|
||||
<li><a href="http://www.scons.org/">Scons</a> - Python library focused
|
||||
on C/C++ builds.</li>
|
||||
<li><a href="https://github.com/ndmitchell/shake">Shake</a> - Define
|
||||
robust build systems akin to GNU Make using Haskell.</li>
|
||||
</ul>
|
||||
<h2 id="automated-workflow-composition">Automated workflow
|
||||
composition</h2>
|
||||
<ul>
|
||||
<li><a href="https://github.com/sanctuuary/APE">APE</a> - A tool for the
|
||||
automated exploration of possible computational workflows based on
|
||||
semantic annotations.</li>
|
||||
</ul>
|
||||
<h2 id="other-projects">Other projects</h2>
|
||||
<ul>
|
||||
<li><a href="http://hpcgridrunner.github.io/">HPC Grid Runner</a></li>
|
||||
<li><a href="https://nifi.apache.org">NiFi</a> - Powerful and scalable
|
||||
directed graphs of data routing, transformation, and system mediation
|
||||
logic.</li>
|
||||
<li><a href="https://github.com/gems-uff/noworkflow">noWorkflow</a> -
|
||||
Supporting infrastructure to run scientific experiments without a
|
||||
scientific workflow management system, and still get things like
|
||||
provenance.</li>
|
||||
<li><a href="https://www.reprozip.org/">Reprozip</a> - Simplifies the
|
||||
process of creating reproducible experiments from command-line
|
||||
executions.</li>
|
||||
</ul>
|
||||
<h2 id="related-lists">Related lists</h2>
|
||||
<ul>
|
||||
<li><a href="https://github.com/manuzhang/awesome-streaming">Awesome
|
||||
streaming</a> - Curated list of awesome streaming frameworks,
|
||||
applications.</li>
|
||||
<li><a href="https://github.com/pawl/awesome-etl">Awesome ETL</a> -
|
||||
Curated list of notable ETL (extract, transform, load) frameworks,
|
||||
libraries and software.</li>
|
||||
<li><a
|
||||
href="https://github.com/meirwah/awesome-workflow-engines">Awesome
|
||||
workflow engines</a> - Curated list of awesome open source workflow
|
||||
engines.</li>
|
||||
<li><a
|
||||
href="https://github.com/common-workflow-language/common-workflow-language/wiki/Existing-Workflow-systems">Computational
|
||||
Data Analysis Workflow Systems</a></li>
|
||||
</ul>
|
||||
<p><a href="https://github.com/pditommaso/awesome-pipeline">pipeline.md
|
||||
Github</a></p>
|
||||
Reference in New Issue
Block a user