update lists

This commit is contained in:
2025-07-18 22:22:32 +02:00
parent 55bed3b4a1
commit 5916c5c074
3078 changed files with 331679 additions and 357255 deletions

View File

@@ -31,9 +31,13 @@
- Apache Heron (incubating) (https://github.com/apache/incubator-heron) Java - a realtime, distributed, fault-tolerant stream processing engine from Twitter.
- Apache Samza (https://github.com/apache/samza) Scala/Java - distributed stream processing framework that build on Kafka(messaging, storage) and YARN(fault tolerance, processor isolation, security and resource management).
- Apache Spark Streaming (https://github.com/apache/spark) Scala - makes it easy to build scalable fault-tolerant streaming applications.
- Apache Storm (https://github.com/apache/storm) Clojure/Java - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing. 
- Apache Storm (https://github.com/apache/storm) Clojure/Java - distributed real-time computation system. Storm is to stream processing what Hadoop is to batch processing.
- ArkFlow (https://github.com/arkflow-rs/arkflow) Rust - High-performance Rust stream processing engine, providing powerful data stream processing capabilities, supporting multiple input/output sources and processors.
- Arroyo (https://github.com/ArroyoSystems/arroyo) Rust - a distributed stream processing engine. Supports SQL and Rust pipelines. Scales up to millions of events per second. Supports stateful operations like windows and joins, state 
checkpointing for fault-tolerance and recovery of pipelines. Uses the Timely Dataflow model.
- AthenaX (https://github.com/uber/AthenaX) Java - Uber's Stream Analytics Framework used in production
- Bytewax (https://github.com/bytewax/bytewax) Python - data parallel, distributed, stateful stream processing framework.
- CocoIndex (https://github.com/cocoindex-io/cocoindex) Rust/Python - ETL framework to build fresh index for AI, with realtime incremental updates.
- Faust (https://github.com/robinhood/faust) Python - stream processing library, porting the ideas from Kafka Streams to Python
- Gearpump (https://github.com/gearpump/gearpump) Scala - lightweight real-time distributed streaming engine built on Akka.
- Hazelcast Jet (https://github.com/hazelcast/hazelcast-jet) Java - A general purpose distributed data processing engine, built on top of Hazelcast.
@@ -52,12 +56,15 @@
- Teknek (https://github.com/edwardcapriolo/teknek-core) Java - Simple elegant stream processing with interactive prototying shell SOL (Stream Operator Language)
Mesos, designed for high performance data processing jobs that require flexibility & control.
- Trill (https://github.com/Microsoft/trill) .NET/C# - Trill is a high-performance one-pass in-memory streaming analytics engine from Microsoft Research.
- Wallaroo (https://github.com/WallarooLabs/wallaroo) Python - A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has 
never been simpler.
- Wallaroo (https://github.com/WallarooLabs/wallaroo) Python - A fast, stream-processing framework. Wallaroo makes it easy to react to data in real-time. By eliminating infrastructure complexity, going from prototype to production has never been
simpler.
- LightSaber (https://github.com/lsds/LightSaber) C++ - Multi-core Window-Based Stream Processing Engine. LightSaber uses code generation for efficient window aggregation.
- HStreamDB (https://github.com/hstreamdb/hstream) Haskell - The streaming database built for IoT data storage and real-time processing.
- Kuiper (https://github.com/emqx/kuiper) Golang - An edge lightweight IoT data analytics/streaming software implemented by Golang, and it can be run at all kinds of resource-constrained edge devices.
- WindFlow (https://paragroup.github.io/WindFlow) C++ - A C++17 Data Stream Processing Parallel Library for Multicores and GPUs
- WindFlow (https://paragroup.github.io/WindFlow) C++ - A C++17 Data Stream Processing Parallel Library for Multicores and GPUs.
- RisingWave (https://github.com/risingwavelabs/risingwave) Rust - A PostgreSQL-compatible streaming database that is designed to build event-driven applications, real-time ETL pipelines, continuous analytics services, and feature stores for AI 
applications. It excels in extracting fresh and consistent insights from real-time event streams, database CDC, and time series data within sub-seconds. It unifies streaming and batch processing, enabling users to ingest, join, and analyze both 
live and historical data at a cloud scale.
Streaming Library
@@ -67,27 +74,27 @@
- Daggy (https://github.com/synacker/daggy) C++ - real-time streams aggregation and catching. 
- Benthos (https://github.com/Jeffail/benthos) Go - Benthos is a high performance and resilient message streaming service, able to connect various sources and sinks and perform arbitrary actions, transformations and filters on payloads
- FS2(prev. 'Scalaz-Stream') (https://github.com/functional-streams-for-scala/fs2) Scala - Compositional, streaming I/O library for Scala.
- FastStream (https://github.com/airtai/faststream) Python - powerful and easy-to-use Python library simplifying the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation 
generation automatically. Supports multiple protocols such as Apache Kafka, RabbitMQ and alike.
- FastStream (https://github.com/airtai/faststream) Python - powerful and easy-to-use Python library simplifying the process of writing producers and consumers for message queues, handling all the parsing, networking and documentation generation
automatically. Supports multiple protocols such as Apache Kafka, RabbitMQ and alike.
- monix (https://github.com/monix/monix) Scala - high-performance Scala / Scala.js library for composing asynchronous and event-based programs.
- Quix Streams (https://github.com/quixio/quix-streams) Python - a streaming library originally designed for the McLaren Formula 1 racing team that can process high volumes of time-series data with up to nanosecond precision using 
Apache Kafka as a message broker.
- Scramjet Node.js (https://github.com/scramjetorg/framework-js) - Node.js functional reactive stream programming framework written on top of Node.js object streams + the legacy Scramjet.js version 
(https://github.com/scramjetorg/scramjet)
- Quix Streams (https://github.com/quixio/quix-streams) Python - a streaming library originally designed for the McLaren Formula 1 racing team that can process high volumes of time-series data with up to nanosecond precision using Apache Kafka 
as a message broker.
- Scramjet Node.js (https://github.com/scramjetorg/framework-js) - Node.js functional reactive stream programming framework written on top of Node.js object streams + the legacy Scramjet.js version (https://github.com/scramjetorg/scramjet)
- Scramjet Python (https://github.com/scramjetorg/framework-python) - Python functional reactive stream programming framework written from scratch operating on object, string and buffer streams.
- Scramjet C++ (https://github.com/scramjetorg/framework-cpp) - C++ functional reactive stream programming framework written on top of Node.js object streams.
- Streamline (https://github.com/hortonworks/streamline) Java - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components
to focus on business logic.
- Streamline (https://github.com/hortonworks/streamline) Java - Stream Analytics Framework by Hortonworks, designed as a wrapper around existing streaming solutions like Storm. Aimed to allow users to drag-and-drop streaming components to focus 
on business logic.
- StreamAlert (https://github.com/airbnb/streamalert) Python - Airbnb's Real-time Data Analysis and Alerting.
- Swave (https://github.com/sirthias/swave) Scala - A lightweight Reactive Streams Infrastructure Toolkit for Scala.
- Streamz (https://github.com/python-streamz/streamz) Python - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back 
pressure, and so on.
- Streamz (https://github.com/python-streamz/streamz) Python - A lightweight library for building pipelines to manage continuous streams of data; supports complex pipelines that involve branching, joining, flow control, feedback, back pressure, 
and so on.
- Stream Ops (https://github.com/nanosai/stream-ops-java) Java - A fully embeddable data streaming engine and stream processing API for Java.
- Substation (https://github.com/brexhq/substation) Go - Substation is a cloud native data pipeline and transformation toolkit written in Go.
- SwimOS (https://github.com/swimos/swim-rust) Rust - A framework for building real-time streaming data processing applications written in Rust.
- Tributary (https://github.com/timkpaine/tributary) Python - A python library for constructing dataflow graphs. Supports synchronous, reactive data streams built using python generators that mimic complex event processors, as well as 
lazily-evaluated acyclic graphs and functional currying streams.
- YoMo (https://github.com/yomorun/yomo) Go - An open source Streaming Serverless Framework for building Low-latency Geo-distributed system. YoMo Built atop QUIC Transport Protocol (https://en.wikipedia.org/wiki/QUIC) and Functional 
Reactive Programming interface. 
- YoMo (https://github.com/yomorun/yomo) Go - An open source Streaming Serverless Framework for building Low-latency Geo-distributed system. YoMo Built atop QUIC Transport Protocol (https://en.wikipedia.org/wiki/QUIC) and Functional Reactive 
Programming interface. 
- Mediapipe (https://github.com/google/mediapipe) - Cross-platform, customizable ML solutions for live and streaming media.
Streaming Application
@@ -100,27 +107,28 @@
IoT
- sensorbee (https://github.com/sensorbee/sensorbee) Go - lightweight stream processing engine for IoT.
- Apache Edgent (https://github.com/apache/incubator-edgent) Java - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient 
and timely analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
- Apache Edgent (https://github.com/apache/incubator-edgent) Java - a programming model and runtime that enables continuous streaming analytics on gateways and edge devices which can work with centralized systems to provide efficient and timely 
analytics across the whole IoT ecosystem: from the center to the edge, opens sourced by IBM.
- Apache StreamPipes (https://github.com/apache/incubator-streampipes) Java - a self-service (Industrial) IoT toolbox to enable non-technical users to connect, analyze and explore IoT data streams.
DSL
- Apache Beam (https://github.com/apache/beam) Java, Python, SQL, Scala, Go - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, 
supporting Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
- Apache Beam (https://github.com/apache/beam) Java, Python, SQL, Scala, Go - unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting 
Enterprise Integration Patterns (EIPs) and Domain Specific Languages (DSLs), open sourced by Google.
- coast (https://github.com/bkirwi/coast) Scala - a DSL that builds DAGs on top of Samza and provides exactly-once semantics.
- Esper (https://github.com/espertechinc/esper) Java - component for complex event processing (CEP) and event series analysis.
- Streamparse (https://github.com/Parsely/streamparse) Python - lets you run Python code against real-time streams of data via Apache Storm.
- summingbird (https://github.com/twitter/summingbird) Scala - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed 
MapReduce platforms, including Storm and Scalding.
- summingbird (https://github.com/twitter/summingbird) Scala - library that lets you write MapReduce programs that look like native Scala or Java collection transformations and execute them on a number of well-known distributed MapReduce 
platforms, including Storm and Scalding.
Data Pipeline
- Apache Kafka (https://github.com/apache/kafka) Scala/Java - distributed, partitioned, replicated commit log service, which provides the functionality of a messaging system, but with a unique design.
- Apache Pulsar (https://github.com/apache/incubator-pulsar) Java - distributed pub-sub messaging platform with a very flexible messaging model and an intuitive client API.
- Apache RocketMQ (https://github.com/apache/rocketmq) Java - distributed messaging and streaming platform with low latency, high performance and reliability, trillion-level capacity and flexible scalability.
- brooklin (https://github.com/linkedin/Brooklin/) Java - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin 
(replaced databus).
- AutoMQ (https://github.com/AutoMQ/automq) Scala/Java - cloud-first alternative to Kafka by decoupling durability to S3 and EBS. 100% Kafka compatible. 10x cost-effective. Autoscale in seconds. Single-digit ms latency.
- brooklin (https://github.com/linkedin/Brooklin/) Java - a distributed system intended for streaming data between various heterogeneous source and destination systems with high reliability and throughput at scale from Linkedin (replaced 
databus).
- camus (https://github.com/linkedin/camus) Java - Linkedin's Kafka -> HDFS pipeline.
- databus (https://github.com/linkedin/databus) Java - Linkedin's source-agnostic distributed change data capture system.
- flume (https://github.com/apache/flume) Java - distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
@@ -133,8 +141,8 @@
- Redpanda (https://github.com/redpanda-data/redpanda) C++ - Redpanda is Kafka compatible, ZooKeeper-free, JVM-free and source available.
- RudderStack (https://github.com/rudderlabs/rudder-server) Go - an open source customer data infrastructure (segment, mparticle alternative).
- suro (https://github.com/Netflix/suro) Java - data pipeline service for collecting, aggregating, and dispatching large volume of application events including log data.
- StreamSets Data Collector (https://github.com/streamsets/datacollector-oss) Java - continuous big data ingestion infrastructure that reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra 
and many others.
- StreamSets Data Collector (https://github.com/streamsets/datacollector-oss) Java - continuous big data ingestion infrastructure that reads from and writes to a large number of end-points, including S3, JDBC, Hadoop, Kafka, Cassandra and many 
others.
Online Machine Learning 
@@ -155,8 +163,8 @@
- StreamCQL (https://github.com/Zhiqiang-He/StreamCQL) Java - Continuous Query Language on RealTime Computation System.
- ksqlDB (https://github.com/confluentinc/ksql) Java - A cloud-native, source-available database (https://ksqldb.io/) purpose-built for stream processing applications
- Materialize (https://materialize.com) Rust - A source-available streaming SQL engine for maintaining materialized views on data from message brokers and databases.
- Siddhi (https://github.com/siddhi-io/siddhi) Java - A cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to capture events from diverse data sources, process them, detect 
complex conditions, and publish output to various endpoints in real time.
- Siddhi (https://github.com/siddhi-io/siddhi) Java - A cloud native Streaming and Complex Event Processing engine that understands Streaming SQL queries in order to capture events from diverse data sources, process them, detect complex 
conditions, and publish output to various endpoints in real time.
- Proton (https://github.com/timeplus-io/proton) C++ - A unified streaming and historical data analytics database in a single binary, powered by ClickHouse.
Benchmark
@@ -183,8 +191,8 @@
- Azure Stream Analytics (https://azure.microsoft.com/en-us/services/stream-analytics/) .NET a massively scalable, fully managed, real-time, data stream engine provided by Microsoft Azure.
- Cloud Dataflow (https://cloud.google.com/dataflow/)Java, Python, SQL, Scala - Google's managed stream and batch data processing engine. Supports running Beam pipelines.
- concord (https://www.slideshare.net/concord-io/may-2016-data-by-the-bay-concord-simple-flexible-stream-processing-on-apache-mesos) C++ - a distributed stream processing framework built in C++ on top of Apache.
- IBM Streams (https://www.ibm.com/analytics/us/en/technology/stream-computing/) Python/Java/Scala - platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, 
etc. out of the box.
- IBM Streams (https://www.ibm.com/analytics/us/en/technology/stream-computing/) Python/Java/Scala - platform for distributed processing and real-time analytics. Provides toolkits for advanced analytics like geospatial, time series, etc. out of 
the box.
- jubatus (http://jubat.us/en/) C++ - distributed processing framework and streaming machine learning library.
- millwheel (http://research.google.com/pubs/pub41378.html) - framework for building low-latency data-processing applications that is widely used at Google.
- NVIDIA Deep Stream (https://developer.nvidia.com/deepstream-sdk) Python/C/C++ - a platform for real-time image, video and audio processing, preferably using on edge devices or cloud.
@@ -203,3 +211,5 @@
!Creative Commons License (https://i.creativecommons.org/l/by-sa/4.0/80x15.png)
Licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (http://creativecommons.org/licenses/by-sa/4.0/)
streaming Github: https://github.com/manuzhang/awesome-streaming