update lists
This commit is contained in:
@@ -1,4 +1,4 @@
|
||||
[38;5;12m [39m[38;2;255;187;0m[1m[4mAwesome Data Engineering [0m[38;5;14m[1m[4m![0m[38;2;255;187;0m[1m[4mAwesome[0m[38;5;14m[1m[4m (https://awesome.re/badge-flat2.svg)[0m[38;2;255;187;0m[1m[4m (https://github.com/sindresorhus/awesome)[0m
|
||||
[38;5;12m [39m[38;2;255;187;0m[1m[4mAwesome Data Engineering [0m[38;5;14m[1m[4m![0m[38;2;255;187;0m[1m[4mAwesome[0m[38;5;14m[1m[4m (https://awesome.re/badge-flat2.svg)[0m[38;2;255;187;0m[1m[4m (https://github.com/sindresorhus/awesome)[0m
|
||||
|
||||
[38;5;11m[1m▐[0m[38;5;12m [39m[38;5;12mA curated list of awesome things related to Data Engineering.[39m
|
||||
|
||||
@@ -28,6 +28,7 @@
|
||||
[38;5;12m - [39m[38;5;14m[1mForums[0m[38;5;12m (#forums)[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mConferences[0m[38;5;12m (#conferences)[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mPodcasts[0m[38;5;12m (#podcasts)[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mBooks[0m[38;5;12m (#books)[39m
|
||||
|
||||
[38;2;255;187;0m[4mDatabases[0m
|
||||
|
||||
@@ -61,8 +62,8 @@
|
||||
[38;5;12m - [39m[38;5;14m[1mClickHouse[0m[38;5;12m (https://clickhouse.tech) - Distributed columnar DBMS for OLAP. SQL.[39m
|
||||
[38;5;12m- Document[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mMongoDB[0m[38;5;12m (https://www.mongodb.com) - An open-source, document database designed for ease of development and scaling.[39m
|
||||
[48;5;235m[38;5;249m- **Percona Server for MongoDB** (https://www.percona.com/software/mongo-database/percona-server-for-mongodb) - Percona Server for MongoDB® is a free, enhanced, fully compatible, open source, drop-in replacement for the MongoDB® Communi[49m[39m[48;5;235m[38;5;249m [49m[39m
|
||||
[48;5;235m[38;5;249mty Edition that includes enterprise-grade features and functionality.[49m[39m[48;5;235m[38;5;249m [49m[39m
|
||||
[48;5;235m[38;5;249m- **Percona Server for MongoDB** (https://www.percona.com/software/mongo-database/percona-server-for-mongodb) - Percona Server for MongoDB® is a free, enhanced, fully compatible, open source, drop-in replacement for the MongoDB® Community Edition[49m[39m[48;5;235m[38;5;249m [49m[39m
|
||||
[48;5;235m[38;5;249m that includes enterprise-grade features and functionality.[49m[39m[48;5;235m[38;5;249m [49m[39m
|
||||
[48;5;235m[38;5;249m- **MemDB** (https://github.com/rain1017/memdb) - Distributed Transactional In-Memory Database (based on MongoDB).[49m[39m[48;5;235m[38;5;249m [49m[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mElasticsearch[0m[38;5;12m (https://www.elastic.co/) - Search & Analyze Data in Real Time.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mCouchbase[0m[38;5;12m (https://www.couchbase.com/) - The highest performing NoSQL distributed database.[39m
|
||||
@@ -86,8 +87,7 @@
|
||||
[38;5;12m - [39m[38;5;14m[1mHeroic[0m[38;5;12m (https://github.com/spotify/heroic) - A scalable time series database based on Cassandra and Elasticsearch, by Spotify.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mDruid[0m[38;5;12m (https://github.com/apache/incubator-druid) - Column oriented distributed data store ideal for powering interactive applications.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mRiak-TS[0m[38;5;12m (https://basho.com/products/riak-ts/) - Riak TS is the only enterprise-grade NoSQL time series database optimized specifically for IoT and Time Series data.[39m
|
||||
[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mAkumuli[0m[38;5;12m [39m[38;5;12m(https://github.com/akumuli/Akumuli)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mAkumuli[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mnumeric[39m[38;5;12m [39m[38;5;12mtime-series[39m[38;5;12m [39m[38;5;12mdatabase.[39m[38;5;12m [39m[38;5;12mIt[39m[38;5;12m [39m[38;5;12mcan[39m[38;5;12m [39m[38;5;12mbe[39m[38;5;12m [39m[38;5;12mused[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mcapture,[39m[38;5;12m [39m[38;5;12mstore[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mprocess[39m[38;5;12m [39m[38;5;12mtime-series[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mreal-time.[39m[38;5;12m [39m[38;5;12mThe[39m[38;5;12m [39m[38;5;12mword[39m[38;5;12m [39m[38;5;12m"akumuli"[39m[38;5;12m [39m[38;5;12mcan[39m[38;5;12m [39m[38;5;12mbe[39m[38;5;12m [39m[38;5;12mtranslated[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m[38;5;12mesperanto[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m
|
||||
[38;5;12m"accumulate".[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mAkumuli[0m[38;5;12m (https://github.com/akumuli/Akumuli) - Akumuli is a numeric time-series database. It can be used to capture, store and process time-series data in real-time. The word "akumuli" can be translated from esperanto as "accumulate".[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mRhombus[0m[38;5;12m (https://github.com/Pardot/Rhombus) - A time-series object store for Cassandra that handles all the complexity of building wide row indexes.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mDalmatiner DB[0m[38;5;12m (https://github.com/dalmatinerdb/dalmatinerdb) - Fast distributed metrics database.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mBlueflood[0m[38;5;12m (https://github.com/rackerlabs/blueflood) - A distributed system designed to ingest and process time series data.[39m
|
||||
@@ -98,11 +98,12 @@
|
||||
[38;5;12m - [39m[38;5;14m[1mcayley[0m[38;5;12m (https://github.com/cayleygraph/cayley) - An open-source graph database. Google.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mSnappydata[0m[38;5;12m (https://github.com/SnappyDataInc/snappydata) - SnappyData: OLTP + OLAP Database built on Apache Spark.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mTimescaleDB[0m[38;5;12m (https://www.timescale.com/) - Built as an extension on top of PostgreSQL, TimescaleDB is a time-series SQL database providing fast analytics, scalability, with automated data management on a proven storage engine.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mDuckDB[0m[38;5;12m (https://duckdb.org/) - DuckDB is a fast in-process analytical database that has zero external dependencies, runs on Linux/macOS/Windows, offers a rich SQL dialect, and is free and extensible.[39m
|
||||
|
||||
[38;2;255;187;0m[4mData Comparison[0m
|
||||
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mdatacompy[0m[38;5;12m [39m[38;5;12m(https://github.com/capitalone/datacompy)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mDataComPy[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mPython[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12mfacilitates[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mcomparison[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mtwo[39m[38;5;12m [39m[38;5;12mDataFrames[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mpandas,[39m[38;5;12m [39m[38;5;12mPolars,[39m[38;5;12m [39m[38;5;12mSpark[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mmore.[39m[38;5;12m [39m[38;5;12mThe[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mgoes[39m[38;5;12m [39m[38;5;12mbeyond[39m[38;5;12m [39m[38;5;12mbasic[39m[38;5;12m [39m[38;5;12mequality[39m[38;5;12m [39m[38;5;12mchecks[39m[38;5;12m [39m[38;5;12mby[39m[38;5;12m [39m[38;5;12mproviding[39m[38;5;12m [39m
|
||||
[38;5;12mdetailed[39m[38;5;12m [39m[38;5;12minsights[39m[38;5;12m [39m[38;5;12minto[39m[38;5;12m [39m[38;5;12mdiscrepancies[39m[38;5;12m [39m[38;5;12mat[39m[38;5;12m [39m[38;5;12mboth[39m[38;5;12m [39m[38;5;12mrow[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mcolumn[39m[38;5;12m [39m[38;5;12mlevels.[39m[38;5;12m [39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mdatacompy[0m[38;5;12m [39m[38;5;12m(https://github.com/capitalone/datacompy)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mDataComPy[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mPython[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12mfacilitates[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mcomparison[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mtwo[39m[38;5;12m [39m[38;5;12mDataFrames[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mpandas,[39m[38;5;12m [39m[38;5;12mPolars,[39m[38;5;12m [39m[38;5;12mSpark[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mmore.[39m[38;5;12m [39m[38;5;12mThe[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mgoes[39m[38;5;12m [39m[38;5;12mbeyond[39m[38;5;12m [39m[38;5;12mbasic[39m[38;5;12m [39m[38;5;12mequality[39m[38;5;12m [39m[38;5;12mchecks[39m[38;5;12m [39m[38;5;12mby[39m[38;5;12m [39m[38;5;12mproviding[39m[38;5;12m [39m[38;5;12mdetailed[39m[38;5;12m [39m
|
||||
[38;5;12minsights[39m[38;5;12m [39m[38;5;12minto[39m[38;5;12m [39m[38;5;12mdiscrepancies[39m[38;5;12m [39m[38;5;12mat[39m[38;5;12m [39m[38;5;12mboth[39m[38;5;12m [39m[38;5;12mrow[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mcolumn[39m[38;5;12m [39m[38;5;12mlevels.[39m
|
||||
|
||||
[38;2;255;187;0m[4mData Ingestion[0m
|
||||
|
||||
@@ -116,21 +117,27 @@
|
||||
[38;5;12m - [39m[38;5;14m[1mkafka-manager[0m[38;5;12m (https://github.com/yahoo/kafka-manager) - A tool for managing Apache Kafka.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mkafka-node[0m[38;5;12m (https://github.com/SOHU-Co/kafka-node) - Node.js client for Apache Kafka 0.8.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mSecor[0m[38;5;12m (https://github.com/pinterest/secor) - Pinterest's Kafka to S3 distributed consumer.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mKafka-logger[0m[38;5;12m (https://github.com/uber/kafka-logger) - Kafka-winston logger for Node.js from uber.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mKafka-logger[0m[38;5;12m (https://github.com/uber/kafka-logger) - Kafka-winston logger for Node.js from Uber.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAWS Kinesis[0m[38;5;12m (https://aws.amazon.com/kinesis/) - A fully managed, cloud-based service for real-time data processing over large, distributed data streams.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mRabbitMQ[0m[38;5;12m (https://www.rabbitmq.com/) - Robust messaging for applications.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mdlt[0m[38;5;12m (https://www.dlthub.com) - A fast&simple pipeline building library for python data devs, runs in notebooks, cloud functions, airflow, etc. [39m
|
||||
[38;5;12m- [39m[38;5;14m[1mdlt[0m[38;5;12m (https://www.dlthub.com) - A fast&simple pipeline building library for python data devs, runs in notebooks, cloud functions, airflow, etc.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mFluentD[0m[38;5;12m (https://www.fluentd.org) - An open source data collector for unified logging layer.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mEmbulk[0m[38;5;12m (https://www.embulk.org) - An open source bulk data loader that helps data transfer between various databases, storages, file formats, and cloud services.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mApache Sqoop[0m[38;5;12m (https://sqoop.apache.org) - A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mHeka[0m[38;5;12m (https://github.com/mozilla-services/heka) - Data Acquisition and Processing Made Easy. Deprecated.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGobblin[0m[38;5;12m (https://github.com/apache/incubator-gobblin) - Universal data ingestion framework for Hadoop from Linkedin.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGobblin[0m[38;5;12m (https://github.com/apache/incubator-gobblin) - Universal data ingestion framework for Hadoop from LinkedIn.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mNakadi[0m[38;5;12m (https://nakadi.io) - Nakadi is an open source event messaging platform that provides a REST API on top of Kafka-like queues.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPravega[0m[38;5;12m (https://www.pravega.io) - Pravega provides a new storage abstraction - a stream - for continuous and unbounded data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mApache Pulsar[0m[38;5;12m (https://pulsar.apache.org/) - Apache Pulsar is an open-source distributed pub-sub messaging system.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAWS Data Wranlger[0m[38;5;12m (https://github.com/awslabs/aws-data-wrangler) - Utility belt to handle data on AWS.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAWS Data Wrangler[0m[38;5;12m (https://github.com/awslabs/aws-data-wrangler) - Utility belt to handle data on AWS.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAirbyte[0m[38;5;12m (https://airbyte.io/) - Open-source data integration for modern data teams.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mArtie[0m[38;5;12m (https://www.artie.com/) - Real-time data ingestion tool leveraging change data capture.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSling[0m[38;5;12m (https://slingdata.io/) - Sling is CLI data integration tool specialized in moving data between databases, as well as storage systems.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mMeltano[0m[38;5;12m (https://meltano.com/) - CLI & code-first ELT.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mSinger SDK[0m[38;5;12m (https://sdk.meltano.com) - The fastest way to build custom data extractors and loaders compliant with the Singer Spec.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGoogle Sheets ETL[0m[38;5;12m (https://github.com/fulldecent/google-sheets-etl) - Live import all your Google Sheets to your data warehouse.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCsvPath Framework[0m[38;5;12m (https://www.csvpath.org/) - A delimited data preboarding framework that fills the gap between MFT and the data lake.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mEstuary Flow[0m[38;5;12m (https://estuary.dev) - No/low-code data pipeline platform that handles both batch and real-time data ingestion.[39m
|
||||
|
||||
[38;2;255;187;0m[4mFile System[0m
|
||||
|
||||
@@ -139,13 +146,14 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mAWS S3[0m[38;5;12m (https://aws.amazon.com/s3/) - Object storage built to retrieve any amount of data from anywhere.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1msmart_open[0m[38;5;12m (https://github.com/RaRe-Technologies/smart_open) - Utils for streaming large files (S3, HDFS, gzip, bz2).[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAlluxio[0m[38;5;12m (https://www.alluxio.org/) - Alluxio is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCEPH[0m[38;5;12m (https://ceph.com/) - Ceph is a unified, distributed storage system designed for excellent performance, reliability and scalability.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCEPH[0m[38;5;12m (https://ceph.com/) - Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mJuiceFS[0m[38;5;12m (https://github.com/juicedata/juicefs) - JuiceFS is a high-performance Cloud-Native file system driven by object storage for large-scale data storage.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mOrangeFS[0m[38;5;12m (https://www.orangefs.org/) - Orange File System is a branch of the Parallel Virtual File System.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSnackFS[0m[38;5;12m (https://github.com/tuplejump/snackfs-release) - SnackFS is our bite-sized, lightweight HDFS compatible FileSystem built over Cassandra.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSnackFS[0m[38;5;12m (https://github.com/tuplejump/snackfs-release) - SnackFS is our bite-sized, lightweight HDFS compatible file system built over Cassandra.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGlusterFS[0m[38;5;12m (https://www.gluster.org/) - Gluster Filesystem.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mXtreemFS[0m[38;5;12m (https://www.xtreemfs.org/) - Fault-tolerant distributed file system for all storage needs.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSeaweedFS[0m[38;5;12m [39m[38;5;12m(https://github.com/chrislusf/seaweedfs)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mSeaweed-FS[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12msimple[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mhighly[39m[38;5;12m [39m[38;5;12mscalable[39m[38;5;12m [39m[38;5;12mdistributed[39m[38;5;12m [39m[38;5;12mfile[39m[38;5;12m [39m[38;5;12msystem.[39m[38;5;12m [39m[38;5;12mThere[39m[38;5;12m [39m[38;5;12mare[39m[38;5;12m [39m[38;5;12mtwo[39m[38;5;12m [39m[38;5;12mobjectives:[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mstore[39m[38;5;12m [39m[38;5;12mbillions[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mfiles![39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mserve[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mfiles[39m[38;5;12m [39m[38;5;12mfast![39m[38;5;12m [39m[38;5;12mInstead[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12msupporting[39m[38;5;12m [39m[38;5;12mfull[39m[38;5;12m [39m[38;5;12mPOSIX[39m[38;5;12m [39m
|
||||
[38;5;12mfile[39m[38;5;12m [39m[38;5;12msystem[39m[38;5;12m [39m[38;5;12msemantics,[39m[38;5;12m [39m[38;5;12mSeaweed-FS[39m[38;5;12m [39m[38;5;12mchoose[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mimplement[39m[38;5;12m [39m[38;5;12monly[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mkey~file[39m[38;5;12m [39m[38;5;12mmapping.[39m[38;5;12m [39m[38;5;12mSimilar[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mword[39m[38;5;12m [39m[38;5;12m"NoSQL",[39m[38;5;12m [39m[38;5;12myou[39m[38;5;12m [39m[38;5;12mcan[39m[38;5;12m [39m[38;5;12mcall[39m[38;5;12m [39m[38;5;12mit[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12m"NoFS".[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSeaweedFS[0m[38;5;12m [39m[38;5;12m(https://github.com/chrislusf/seaweedfs)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mSeaweed-FS[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12msimple[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mhighly[39m[38;5;12m [39m[38;5;12mscalable[39m[38;5;12m [39m[38;5;12mdistributed[39m[38;5;12m [39m[38;5;12mfile[39m[38;5;12m [39m[38;5;12msystem.[39m[38;5;12m [39m[38;5;12mThere[39m[38;5;12m [39m[38;5;12mare[39m[38;5;12m [39m[38;5;12mtwo[39m[38;5;12m [39m[38;5;12mobjectives:[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mstore[39m[38;5;12m [39m[38;5;12mbillions[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mfiles![39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mserve[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mfiles[39m[38;5;12m [39m[38;5;12mfast![39m[38;5;12m [39m[38;5;12mInstead[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12msupporting[39m[38;5;12m [39m[38;5;12mfull[39m[38;5;12m [39m[38;5;12mPOSIX[39m[38;5;12m [39m[38;5;12mfile[39m[38;5;12m [39m[38;5;12msystem[39m
|
||||
[38;5;12msemantics,[39m[38;5;12m [39m[38;5;12mSeaweed-FS[39m[38;5;12m [39m[38;5;12mchoose[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mimplement[39m[38;5;12m [39m[38;5;12monly[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mkey~file[39m[38;5;12m [39m[38;5;12mmapping.[39m[38;5;12m [39m[38;5;12mSimilar[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mword[39m[38;5;12m [39m[38;5;12m"NoSQL",[39m[38;5;12m [39m[38;5;12myou[39m[38;5;12m [39m[38;5;12mcan[39m[38;5;12m [39m[38;5;12mcall[39m[38;5;12m [39m[38;5;12mit[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12m"NoFS".[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mS3QL[0m[38;5;12m (https://github.com/s3ql/s3ql/) - S3QL is a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mLizardFS[0m[38;5;12m (https://lizardfs.com/) - LizardFS Software Defined Storage is a distributed, parallel, scalable, fault-tolerant, Geo-Redundant and highly available file system.[39m
|
||||
|
||||
@@ -170,6 +178,7 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mApache Samza[0m[38;5;12m (https://samza.apache.org) - Apache Samza is a distributed stream processing framework.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mApache NiFi[0m[38;5;12m (https://nifi.apache.org/) - An easy to use, powerful, and reliable system to process and distribute data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mApache Hudi[0m[38;5;12m (https://hudi.apache.org/) - An open source framework for managing storage for real time processing, one of the most interesting feature is the Upsert.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCocoIndex[0m[38;5;12m (https://github.com/cocoindex-io/cocoindex) - An open source ETL framework to build fresh index for AI. [39m
|
||||
[38;5;12m- [39m[38;5;14m[1mVoltDB[0m[38;5;12m (https://voltdb.com/) - VoltDb is an ACID-compliant RDBMS which uses a [39m[38;5;14m[1mshared nothing architecture[0m[38;5;12m (https://en.wikipedia.org/wiki/Shared-nothing_architecture).[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPipelineDB[0m[38;5;12m (https://github.com/pipelinedb/pipelinedb) - The Streaming SQL Database.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSpring Cloud Dataflow[0m[38;5;12m (https://cloud.spring.io/spring-cloud-dataflow/) - Streaming and tasks execution between Spring Boot apps.[39m
|
||||
@@ -177,12 +186,13 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mRobinhood's Faust[0m[38;5;12m (https://github.com/faust-streaming/faust) - Forever scalable event processing & in-memory durable K/V store as a library with asyncio & static typing.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mHStreamDB[0m[38;5;12m (https://github.com/hstreamdb/hstream) - The streaming database built for IoT data storage and real-time processing.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mKuiper[0m[38;5;12m (https://github.com/emqx/kuiper) - An edge lightweight IoT data analytics/streaming software implemented by Golang, and it can be run at all kinds of resource-constrained edge devices.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mZilla[0m[38;5;12m (https://github.com/aklivity/zilla) - - An API gateway built for event-driven architectures and streaming that supports standard protocols such as HTTP, SSE, gRPC, MQTT and the native Kafka protocol.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mZilla[0m[38;5;12m (https://github.com/aklivity/zilla) - - An API gateway built for event-driven architectures and streaming that supports standard protocols such as HTTP, SSE, gRPC, MQTT, and the native Kafka protocol.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSwimOS[0m[38;5;12m (https://github.com/swimos/swim-rust) - A framework for building real-time streaming data processing applications that supports a wide range of ingestion sources.[39m
|
||||
|
||||
[38;2;255;187;0m[4mBatch Processing[0m
|
||||
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mHadoop[0m[38;5;14m[1m [0m[38;5;14m[1mMapReduce[0m[38;5;12m [39m[38;5;12m(https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mHadoop[39m[38;5;12m [39m[38;5;12mMapReduce[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12msoftware[39m[38;5;12m [39m[38;5;12mframework[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12measily[39m[38;5;12m [39m[38;5;12mwriting[39m[38;5;12m [39m[38;5;12mapplications[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mprocess[39m[38;5;12m [39m[38;5;12mvast[39m[38;5;12m [39m
|
||||
[38;5;12mamounts[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12m(multi-terabyte[39m[38;5;12m [39m[38;5;12mdata-sets)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12min-parallel[39m[38;5;12m [39m[38;5;12mon[39m[38;5;12m [39m[38;5;12mlarge[39m[38;5;12m [39m[38;5;12mclusters[39m[38;5;12m [39m[38;5;12m(thousands[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mnodes)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mcommodity[39m[38;5;12m [39m[38;5;12mhardware[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mreliable,[39m[38;5;12m [39m[38;5;12mfault-tolerant[39m[38;5;12m [39m[38;5;12mmanner.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mHadoop[0m[38;5;14m[1m [0m[38;5;14m[1mMapReduce[0m[38;5;12m [39m[38;5;12m(https://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mHadoop[39m[38;5;12m [39m[38;5;12mMapReduce[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12msoftware[39m[38;5;12m [39m[38;5;12mframework[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12measily[39m[38;5;12m [39m[38;5;12mwriting[39m[38;5;12m [39m[38;5;12mapplications[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mprocess[39m[38;5;12m [39m[38;5;12mvast[39m[38;5;12m [39m[38;5;12mamounts[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m
|
||||
[38;5;12m(multi-terabyte[39m[38;5;12m [39m[38;5;12mdata-sets)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12min-parallel[39m[38;5;12m [39m[38;5;12mon[39m[38;5;12m [39m[38;5;12mlarge[39m[38;5;12m [39m[38;5;12mclusters[39m[38;5;12m [39m[38;5;12m(thousands[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mnodes)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mcommodity[39m[38;5;12m [39m[38;5;12mhardware[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mreliable,[39m[38;5;12m [39m[38;5;12mfault-tolerant[39m[38;5;12m [39m[38;5;12mmanner.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSpark[0m[38;5;12m (https://spark.apache.org/) - A multi-language engine for executing data engineering, data science, and machine learning on single-node machines or clusters.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mSpark Packages[0m[38;5;12m (https://spark-packages.org) - A community index of packages for Apache Spark.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mDeep Spark[0m[38;5;12m (https://github.com/Stratio/deep-spark) - Connecting Apache Spark with different data stores. Deprecated.[39m
|
||||
@@ -192,14 +202,14 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mAWS EMR[0m[38;5;12m (https://aws.amazon.com/emr/) - A web service that makes it easy to quickly and cost-effectively process vast amounts of data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mData Mechanics[0m[38;5;12m (https://www.datamechanics.co) - A cloud-based platform deployed on Kubernetes making Apache Spark more developer-friendly and cost-effective.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mTez[0m[38;5;12m (https://tez.apache.org/) - An application framework which allows for a complex directed-acyclic-graph of tasks for processing data.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mBistro[0m[38;5;12m [39m[38;5;12m(https://github.com/asavinov/bistro)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mlight-weight[39m[38;5;12m [39m[38;5;12mengine[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12mgeneral-purpose[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mprocessing[39m[38;5;12m [39m[38;5;12mincluding[39m[38;5;12m [39m[38;5;12mboth[39m[38;5;12m [39m[38;5;12mbatch[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mstream[39m[38;5;12m [39m[38;5;12manalytics.[39m[38;5;12m [39m[38;5;12mIt[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12mbased[39m[38;5;12m [39m[38;5;12mon[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mnovel[39m[38;5;12m [39m[38;5;12munique[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mmodel,[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mrepresents[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mvia[39m[38;5;12m [39m[38;5;12m_functions_[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m
|
||||
[38;5;12mprocesses[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mvia[39m[38;5;12m [39m[38;5;12m_columns[39m[38;5;12m [39m[38;5;12moperations_[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12mopposed[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mhaving[39m[38;5;12m [39m[38;5;12monly[39m[38;5;12m [39m[38;5;12mset[39m[38;5;12m [39m[38;5;12moperations[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mconventional[39m[38;5;12m [39m[38;5;12mapproaches[39m[38;5;12m [39m[38;5;12mlike[39m[38;5;12m [39m[38;5;12mMapReduce[39m[38;5;12m [39m[38;5;12mor[39m[38;5;12m [39m[38;5;12mSQL.[39m
|
||||
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mBistro[0m[38;5;12m [39m[38;5;12m(https://github.com/asavinov/bistro)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mlight-weight[39m[38;5;12m [39m[38;5;12mengine[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12mgeneral-purpose[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mprocessing[39m[38;5;12m [39m[38;5;12mincluding[39m[38;5;12m [39m[38;5;12mboth[39m[38;5;12m [39m[38;5;12mbatch[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mstream[39m[38;5;12m [39m[38;5;12manalytics.[39m[38;5;12m [39m[38;5;12mIt[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12mbased[39m[38;5;12m [39m[38;5;12mon[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mnovel[39m[38;5;12m [39m[38;5;12munique[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mmodel,[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mrepresents[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mvia[39m[38;5;12m [39m[38;5;12m_functions_[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mprocesses[39m[38;5;12m [39m
|
||||
[38;5;12mdata[39m[38;5;12m [39m[38;5;12mvia[39m[38;5;12m [39m[38;5;12m_columns[39m[38;5;12m [39m[38;5;12moperations_[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12mopposed[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mhaving[39m[38;5;12m [39m[38;5;12monly[39m[38;5;12m [39m[38;5;12mset[39m[38;5;12m [39m[38;5;12moperations[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mconventional[39m[38;5;12m [39m[38;5;12mapproaches[39m[38;5;12m [39m[38;5;12mlike[39m[38;5;12m [39m[38;5;12mMapReduce[39m[38;5;12m [39m[38;5;12mor[39m[38;5;12m [39m[38;5;12mSQL.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSubstation[0m[38;5;12m (https://github.com/brexhq/substation) - Substation is a cloud native data pipeline and transformation toolkit written in Go.[39m
|
||||
[38;5;12m- Batch ML[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mH2O[0m[38;5;12m (https://www.h2o.ai/) - Fast scalable machine learning API for smarter applications.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mMahout[0m[38;5;12m (https://mahout.apache.org/) - An environment for quickly creating scalable performant machine learning applications.[39m
|
||||
[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSpark[0m[38;5;14m[1m [0m[38;5;14m[1mMLlib[0m[38;5;12m [39m[38;5;12m(https://spark.apache.org/docs/latest/ml-guide.html)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mSpark's[39m[38;5;12m [39m[38;5;12mscalable[39m[38;5;12m [39m[38;5;12mmachine[39m[38;5;12m [39m[38;5;12mlearning[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mconsisting[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mcommon[39m[38;5;12m [39m[38;5;12mlearning[39m[38;5;12m [39m[38;5;12malgorithms[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mutilities,[39m[38;5;12m [39m[38;5;12mincluding[39m[38;5;12m [39m[38;5;12mclassification,[39m[38;5;12m [39m[38;5;12mregression,[39m[38;5;12m [39m[38;5;12mclustering,[39m[38;5;12m [39m[38;5;12mcollaborative[39m[38;5;12m [39m
|
||||
[38;5;12mfiltering,[39m[38;5;12m [39m[38;5;12mdimensionality[39m[38;5;12m [39m[38;5;12mreduction,[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12mwell[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12munderlying[39m[38;5;12m [39m[38;5;12moptimization[39m[38;5;12m [39m[38;5;12mprimitives.[39m
|
||||
[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSpark[0m[38;5;14m[1m [0m[38;5;14m[1mMLlib[0m[38;5;12m [39m[38;5;12m(https://spark.apache.org/docs/latest/ml-guide.html)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mSpark's[39m[38;5;12m [39m[38;5;12mscalable[39m[38;5;12m [39m[38;5;12mmachine[39m[38;5;12m [39m[38;5;12mlearning[39m[38;5;12m [39m[38;5;12mlibrary[39m[38;5;12m [39m[38;5;12mconsisting[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mcommon[39m[38;5;12m [39m[38;5;12mlearning[39m[38;5;12m [39m[38;5;12malgorithms[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mutilities,[39m[38;5;12m [39m[38;5;12mincluding[39m[38;5;12m [39m[38;5;12mclassification,[39m[38;5;12m [39m[38;5;12mregression,[39m[38;5;12m [39m[38;5;12mclustering,[39m[38;5;12m [39m[38;5;12mcollaborative[39m[38;5;12m [39m[38;5;12mfiltering,[39m[38;5;12m [39m
|
||||
[38;5;12mdimensionality[39m[38;5;12m [39m[38;5;12mreduction,[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12mwell[39m[38;5;12m [39m[38;5;12mas[39m[38;5;12m [39m[38;5;12munderlying[39m[38;5;12m [39m[38;5;12moptimization[39m[38;5;12m [39m[38;5;12mprimitives.[39m
|
||||
[38;5;12m- Batch Graph[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mGraphLab Create[0m[38;5;12m (https://turi.com/products/create/docs/) - A machine learning platform that enables data scientists and app developers to easily create intelligent apps at scale.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mGiraph[0m[38;5;12m (https://giraph.apache.org/) - An iterative graph processing system built for high scalability.[39m
|
||||
@@ -217,7 +227,7 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mZingChart[0m[38;5;12m (https://www.zingchart.com/) - Fast JavaScript charts for any data set.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mC3.js[0m[38;5;12m (https://c3js.org) - D3-based reusable chart library.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mD3.js[0m[38;5;12m (https://d3js.org/) - A JavaScript library for manipulating documents based on data.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mD3Plus[0m[38;5;12m (https://d3plus.org) - D3's simplier, easier to use cousin. Mostly predefined templates that you can just plug data in.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mD3Plus[0m[38;5;12m (https://d3plus.org) - D3's simpler, easier to use cousin. Mostly predefined templates that you can just plug data in.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSmoothieCharts[0m[38;5;12m (https://smoothiecharts.org) - A JavaScript Charting Library for Streaming Data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPyXley[0m[38;5;12m (https://github.com/stitchfix/pyxley) - Python helpers for building dashboards using Flask and React.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPlotly[0m[38;5;12m (https://github.com/plotly/dash) - Flask, JS, and CSS boilerplate for interactive, web-based visualization apps in Python.[39m
|
||||
@@ -225,45 +235,51 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mRedash[0m[38;5;12m (https://redash.io/) - Make Your Company Data Driven. Connect to any data source, easily visualize and share your data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mMetabase[0m[38;5;12m (https://github.com/metabase/metabase) - Metabase is the easy, open source way for everyone in your company to ask questions and learn from data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPyQtGraph[0m[38;5;12m (https://www.pyqtgraph.org/) - PyQtGraph is a pure-python graphics and GUI library built on PyQt4 / PySide and numpy. It is intended for use in mathematics / scientific / engineering applications.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mSeaborn[0m[38;5;12m (https://seaborn.pydata.org) - A Python visualization library based on matplotlib. It provides a high-level interface for drawing attractive statistical graphics.[39m
|
||||
|
||||
[38;2;255;187;0m[4mWorkflow[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mLuigi[0m[38;5;12m (https://github.com/spotify/luigi) - Luigi is a Python module that helps you build complex pipelines of batch jobs.[39m
|
||||
[38;5;12m - [39m[38;5;14m[1mCronQ[0m[38;5;12m (https://github.com/seatgeek/cronq) - An application cron-like system. [39m[38;5;14m[1mUsed[0m[38;5;12m (https://chairnerd.seatgeek.com/building-out-the-seatgeek-data-pipeline/) w/Luige. Deprecated.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCronQ[0m[38;5;12m (https://github.com/seatgeek/cronq) - An application cron-like system. [39m[38;5;14m[1mUsed[0m[38;5;12m (https://chairnerd.seatgeek.com/building-out-the-seatgeek-data-pipeline/) w/Luige. Deprecated.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCascading[0m[38;5;12m (https://www.cascading.org/) - Java based application development platform.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAirflow[0m[38;5;12m (https://github.com/apache/airflow) - Airflow is a system to programmaticaly author, schedule and monitor data pipelines.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mAzkaban[0m[38;5;12m [39m[38;5;12m(https://azkaban.github.io/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mAzkaban[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mbatch[39m[38;5;12m [39m[38;5;12mworkflow[39m[38;5;12m [39m[38;5;12mjob[39m[38;5;12m [39m[38;5;12mscheduler[39m[38;5;12m [39m[38;5;12mcreated[39m[38;5;12m [39m[38;5;12mat[39m[38;5;12m [39m[38;5;12mLinkedIn[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mrun[39m[38;5;12m [39m[38;5;12mHadoop[39m[38;5;12m [39m[38;5;12mjobs.[39m[38;5;12m [39m[38;5;12mAzkaban[39m[38;5;12m [39m[38;5;12mresolves[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mordering[39m[38;5;12m [39m[38;5;12mthrough[39m[38;5;12m [39m[38;5;12mjob[39m[38;5;12m [39m[38;5;12mdependencies[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mprovides[39m[38;5;12m [39m[38;5;12man[39m[38;5;12m [39m[38;5;12measy[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12muse[39m[38;5;12m [39m[38;5;12mweb[39m[38;5;12m [39m[38;5;12muser[39m[38;5;12m [39m[38;5;12minterface[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mmaintain[39m
|
||||
[38;5;12mand[39m[38;5;12m [39m[38;5;12mtrack[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mworkflows.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mAirflow[0m[38;5;12m (https://github.com/apache/airflow) - Airflow is a system to programmatically author, schedule, and monitor data pipelines.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mAzkaban[0m[38;5;12m [39m[38;5;12m(https://azkaban.github.io/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mAzkaban[39m[38;5;12m [39m[38;5;12mis[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mbatch[39m[38;5;12m [39m[38;5;12mworkflow[39m[38;5;12m [39m[38;5;12mjob[39m[38;5;12m [39m[38;5;12mscheduler[39m[38;5;12m [39m[38;5;12mcreated[39m[38;5;12m [39m[38;5;12mat[39m[38;5;12m [39m[38;5;12mLinkedIn[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mrun[39m[38;5;12m [39m[38;5;12mHadoop[39m[38;5;12m [39m[38;5;12mjobs.[39m[38;5;12m [39m[38;5;12mAzkaban[39m[38;5;12m [39m[38;5;12mresolves[39m[38;5;12m [39m[38;5;12mthe[39m[38;5;12m [39m[38;5;12mordering[39m[38;5;12m [39m[38;5;12mthrough[39m[38;5;12m [39m[38;5;12mjob[39m[38;5;12m [39m[38;5;12mdependencies[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mprovides[39m[38;5;12m [39m[38;5;12man[39m[38;5;12m [39m[38;5;12measy-to-use[39m[38;5;12m [39m[38;5;12mweb[39m[38;5;12m [39m[38;5;12muser[39m[38;5;12m [39m[38;5;12minterface[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mmaintain[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mtrack[39m
|
||||
[38;5;12myour[39m[38;5;12m [39m[38;5;12mworkflows.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mOozie[0m[38;5;12m (https://oozie.apache.org/) - Oozie is a workflow scheduler system to manage Apache Hadoop jobs.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPinball[0m[38;5;12m (https://github.com/pinterest/pinball) - DAG based workflow manager. Job flows are defined programmaticaly in Python. Support output passing between jobs.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPinball[0m[38;5;12m (https://github.com/pinterest/pinball) - DAG based workflow manager. Job flows are defined programmatically in Python. Support output passing between jobs.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mDagster[0m[38;5;12m (https://github.com/dagster-io/dagster) - Dagster is an open-source Python library for building data applications.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mHamilton[0m[38;5;12m (https://github.com/dagworks-inc/hamilton) - Hamilton is a lightweight library to define data transformations as a directed-acyclic graph (DAG). If you like dbt for SQL transforms, you will like Hamilton for Python processing.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mKedro[0m[38;5;12m (https://kedro.readthedocs.io/en/latest/) - Kedro is a framework that makes it easy to build robust and scalable data pipelines by providing uniform project templates, data abstraction, configuration and pipeline assembly.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mDataform[0m[38;5;12m [39m[38;5;12m(https://dataform.co/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mAn[39m[38;5;12m [39m[38;5;12mopen-source[39m[38;5;12m [39m[38;5;12mframework[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mweb[39m[38;5;12m [39m[38;5;12mbased[39m[38;5;12m [39m[38;5;12mIDE[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mmanage[39m[38;5;12m [39m[38;5;12mdatasets[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mdependencies.[39m[38;5;12m [39m[38;5;12mSQLX[39m[38;5;12m [39m[38;5;12mextends[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mexisting[39m[38;5;12m [39m[38;5;12mSQL[39m[38;5;12m [39m[38;5;12mwarehouse[39m[38;5;12m [39m[38;5;12mdialect[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12madd[39m[38;5;12m [39m[38;5;12mfeatures[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12msupport[39m[38;5;12m [39m[38;5;12mdependency[39m[38;5;12m [39m[38;5;12mmanagement,[39m[38;5;12m [39m[38;5;12mtesting,[39m[38;5;12m [39m
|
||||
[38;5;12mdocumentation[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mmore.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mDataform[0m[38;5;12m [39m[38;5;12m(https://dataform.co/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mAn[39m[38;5;12m [39m[38;5;12mopen-source[39m[38;5;12m [39m[38;5;12mframework[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mweb[39m[38;5;12m [39m[38;5;12mbased[39m[38;5;12m [39m[38;5;12mIDE[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mmanage[39m[38;5;12m [39m[38;5;12mdatasets[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mdependencies.[39m[38;5;12m [39m[38;5;12mSQLX[39m[38;5;12m [39m[38;5;12mextends[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mexisting[39m[38;5;12m [39m[38;5;12mSQL[39m[38;5;12m [39m[38;5;12mwarehouse[39m[38;5;12m [39m[38;5;12mdialect[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12madd[39m[38;5;12m [39m[38;5;12mfeatures[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12msupport[39m[38;5;12m [39m[38;5;12mdependency[39m[38;5;12m [39m[38;5;12mmanagement,[39m[38;5;12m [39m[38;5;12mtesting,[39m[38;5;12m [39m[38;5;12mdocumentation[39m[38;5;12m [39m
|
||||
[38;5;12mand[39m[38;5;12m [39m[38;5;12mmore.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCensus[0m[38;5;12m (https://getcensus.com/) - A reverse-ETL tool that let you sync data from your cloud data warehouse to SaaS applications like Salesforce, Marketo, HubSpot, Zendesk, etc. No engineering favors required—just SQL.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mdbt[0m[38;5;12m (https://getdbt.com/) - A command line tool that enables data analysts and engineers to transform data in their warehouses more effectively.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mRudderStack[0m[38;5;12m [39m[38;5;12m(https://github.com/rudderlabs/rudder-server)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mwarehouse-first[39m[38;5;12m [39m[38;5;12mCustomer[39m[38;5;12m [39m[38;5;12mData[39m[38;5;12m [39m[38;5;12mPlatform[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12menables[39m[38;5;12m [39m[38;5;12myou[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mcollect[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m[38;5;12mevery[39m[38;5;12m [39m[38;5;12mapplication,[39m[38;5;12m [39m[38;5;12mwebsite[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mSaaS[39m[38;5;12m [39m[38;5;12mplatform,[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mthen[39m[38;5;12m [39m[38;5;12mactivate[39m[38;5;12m [39m[38;5;12mit[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mwarehouse[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m
|
||||
[38;5;12mbusiness[39m[38;5;12m [39m[38;5;12mtools.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mKestra[0m[38;5;12m (https://kestra.io/) - Scalable, event-driven, language-agnostic orchestration and scheduling platform to manage millions of workflows declaratively in code.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mRudderStack[0m[38;5;12m (https://github.com/rudderlabs/rudder-server) - A warehouse-first Customer Data Platform that enables you to collect data from every application, website and SaaS platform, and then activate it in your warehouse and business tools.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPACE[0m[38;5;12m (https://github.com/getstrm/pace) - An open source framework that allows you to enforce agreements on how data should be accessed, used, and transformed, regardless of the data platform (Snowflake, BigQuery, DataBricks, etc.)[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mPrefect[0m[38;5;12m (https://prefect.io/) - Prefect is an orchestration and observability platform. With it, developers can rapidly build and scale resilient code, and triage disruptions effortlessly.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mMultiwoven[0m[38;5;12m (https://github.com/Multiwoven/multiwoven) - The open-source reverse ETL, data activation platform for modern data teams.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSuprSend[0m[38;5;12m [39m[38;5;12m(https://www.suprsend.com/products/workflows)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mCreate[39m[38;5;12m [39m[38;5;12mautomated[39m[38;5;12m [39m[38;5;12mworkflows[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mlogic[39m[38;5;12m [39m[38;5;12musing[39m[38;5;12m [39m[38;5;12mAPI's[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mnotification[39m[38;5;12m [39m[38;5;12mservice.[39m[38;5;12m [39m[38;5;12mAdd[39m[38;5;12m [39m[38;5;12mtemplates,[39m[38;5;12m [39m[38;5;12mbatching,[39m[38;5;12m [39m[38;5;12mpreferences,[39m[38;5;12m [39m[38;5;12minapp[39m[38;5;12m [39m[38;5;12minbox[39m[38;5;12m [39m[38;5;12mwith[39m[38;5;12m [39m[38;5;12mworkflows[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mtrigger[39m[38;5;12m [39m[38;5;12mnotifications[39m[38;5;12m [39m
|
||||
[38;5;12mdirectly[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mwarehouse.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mSuprSend[0m[38;5;12m [39m[38;5;12m(https://www.suprsend.com/products/workflows)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mCreate[39m[38;5;12m [39m[38;5;12mautomated[39m[38;5;12m [39m[38;5;12mworkflows[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mlogic[39m[38;5;12m [39m[38;5;12musing[39m[38;5;12m [39m[38;5;12mAPI's[39m[38;5;12m [39m[38;5;12mfor[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mnotification[39m[38;5;12m [39m[38;5;12mservice.[39m[38;5;12m [39m[38;5;12mAdd[39m[38;5;12m [39m[38;5;12mtemplates,[39m[38;5;12m [39m[38;5;12mbatching,[39m[38;5;12m [39m[38;5;12mpreferences,[39m[38;5;12m [39m[38;5;12minapp[39m[38;5;12m [39m[38;5;12minbox[39m[38;5;12m [39m[38;5;12mwith[39m[38;5;12m [39m[38;5;12mworkflows[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mtrigger[39m[38;5;12m [39m[38;5;12mnotifications[39m[38;5;12m [39m[38;5;12mdirectly[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m
|
||||
[38;5;12myour[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mwarehouse.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mKestra[0m[38;5;12m (https://github.com/kestra-io/kestra) - A versatile open source orchestrator and scheduler built on Java, designed to handle a broad range of workflows with a language-agnostic, API-first architecture.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mMage[0m[38;5;12m (https://www.mage.ai) - Open-source data pipeline tool for transforming and integrating data.[39m
|
||||
|
||||
[38;2;255;187;0m[4mData Lake Management[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mlakeFS[0m[38;5;12m (https://github.com/treeverse/lakeFS) - lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mProject Nessie[0m[38;5;12m (https://github.com/projectnessie/nessie) - Project Nessie is a Transactional Catalog for Data Lakes with Git-like semantics. Works with Apache Iceberg tables.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mIlum[0m[38;5;12m (https://ilum.cloud/) - Ilum is a modular Data Lakehouse platform that simplifies the management and monitoring of Apache Spark clusters across Kubernetes and Hadoop environments.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGravitino[0m[38;5;12m (https://github.com/apache/gravitino) - Gravitino is an open-source, unified metadata management for data lakes, data warehouses, and external catalogs. [39m
|
||||
|
||||
[38;2;255;187;0m[4mELK Elastic Logstash Kibana[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mdocker-logstash[0m[38;5;12m (https://github.com/pblittle/docker-logstash) - A highly configurable logstash (1.4.4) - docker image running Elasticsearch (1.7.0) - and Kibana (3.1.2).[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mdocker-logstash[0m[38;5;12m (https://github.com/pblittle/docker-logstash) - A highly configurable Logstash (1.4.4) - Docker image running Elasticsearch (1.7.0) - and Kibana (3.1.2).[39m
|
||||
[38;5;12m- [39m[38;5;14m[1melasticsearch-jdbc[0m[38;5;12m (https://github.com/jprante/elasticsearch-jdbc) - JDBC importer for Elasticsearch.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mZomboDB[0m[38;5;12m (https://github.com/zombodb/zombodb) - Postgres Extension that allows creating an index backed by Elasticsearch.[39m
|
||||
|
||||
[38;2;255;187;0m[4mDocker[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mGockerize[0m[38;5;12m (https://github.com/redbooth/gockerize) - Package golang service into minimal docker containers.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mGockerize[0m[38;5;12m (https://github.com/redbooth/gockerize) - Package golang service into minimal Docker containers.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mFlocker[0m[38;5;12m (https://github.com/ClusterHQ/flocker) - Easily manage Docker containers & their data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mRancher[0m[38;5;12m (https://rancher.com/rancher-os/) - RancherOS is a 20mb Linux distro that runs the entire OS as Docker containers.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mKontena[0m[38;5;12m (https://www.kontena.io/) - Application Containers for Masses.[39m
|
||||
@@ -272,8 +288,8 @@
|
||||
[38;5;12m- [39m[38;5;14m[1mcAdvisor[0m[38;5;12m (https://github.com/google/cadvisor) - Analyzes resource usage and performance characteristics of running containers.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mMicro S3 persistence[0m[38;5;12m (https://github.com/figadore/micro-s3-persistence) - Docker microservice for saving/restoring volume data to S3.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mRocker-compose[0m[38;5;12m (https://github.com/grammarly/rocker-compose) - Docker composition tool with idempotency features for deploying apps composed of multiple containers. Deprecated.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mNomad[0m[38;5;12m (https://github.com/hashicorp/nomad) - Nomad is a cluster manager, designed for both long lived services and short lived batch processing workloads.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mImageLayers[0m[38;5;12m (https://imagelayers.io/) - Vizualize docker images and the layers that compose them.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mNomad[0m[38;5;12m (https://github.com/hashicorp/nomad) - Nomad is a cluster manager, designed for both long-lived services and short-lived batch processing workloads.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mImageLayers[0m[38;5;12m (https://imagelayers.io/) - Visualize Docker images and the layers that compose them.[39m
|
||||
|
||||
[38;2;255;187;0m[4mDatasets[0m
|
||||
|
||||
@@ -287,7 +303,7 @@
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mGitHub Archive[0m[38;5;12m (https://www.gharchive.org/) - GitHub's public timeline since 2011, updated every hour.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mCommon Crawl[0m[38;5;12m (https://commoncrawl.org/) - Open source repository of web crawl data.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mWikipedia[0m[38;5;12m (https://dumps.wikimedia.org/enwiki/latest/) - Wikipedia's complete copy of all wikis, in the form of wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mWikipedia[0m[38;5;12m (https://dumps.wikimedia.org/enwiki/latest/) - Wikipedia's complete copy of all wikis, in the form of Wikitext source and metadata embedded in XML. A number of raw database tables in SQL form are also available.[39m
|
||||
|
||||
[38;2;255;187;0m[4mMonitoring[0m
|
||||
|
||||
@@ -304,15 +320,17 @@
|
||||
|
||||
[38;2;255;187;0m[4mTesting[0m
|
||||
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mGrai[0m[38;5;12m [39m[38;5;12m(https://github.com/grai-io/grai-core/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mcatalog[39m[38;5;12m [39m[38;5;12mtool[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12mintegrates[39m[38;5;12m [39m[38;5;12minto[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mCI[39m[38;5;12m [39m[38;5;12msystem[39m[38;5;12m [39m[38;5;12mexposing[39m[38;5;12m [39m[38;5;12mdownstream[39m[38;5;12m [39m[38;5;12mimpact[39m[38;5;12m [39m[38;5;12mtesting[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mchanges.[39m[38;5;12m [39m[38;5;12mThese[39m[38;5;12m [39m[38;5;12mtests[39m[38;5;12m [39m[38;5;12mprevent[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mchanges[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mmight[39m[38;5;12m [39m[38;5;12mbreak[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mpipelines[39m[38;5;12m [39m[38;5;12mor[39m[38;5;12m [39m[38;5;12mBI[39m[38;5;12m [39m
|
||||
[38;5;12mdashboards[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m[38;5;12mmaking[39m[38;5;12m [39m[38;5;12mit[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mproduction.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mGrai[0m[38;5;12m [39m[38;5;12m(https://github.com/grai-io/grai-core/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mcatalog[39m[38;5;12m [39m[38;5;12mtool[39m[38;5;12m [39m[38;5;12mthat[39m[38;5;12m [39m[38;5;12mintegrates[39m[38;5;12m [39m[38;5;12minto[39m[38;5;12m [39m[38;5;12myour[39m[38;5;12m [39m[38;5;12mCI[39m[38;5;12m [39m[38;5;12msystem[39m[38;5;12m [39m[38;5;12mexposing[39m[38;5;12m [39m[38;5;12mdownstream[39m[38;5;12m [39m[38;5;12mimpact[39m[38;5;12m [39m[38;5;12mtesting[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mchanges.[39m[38;5;12m [39m[38;5;12mThese[39m[38;5;12m [39m[38;5;12mtests[39m[38;5;12m [39m[38;5;12mprevent[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mchanges[39m[38;5;12m [39m[38;5;12mwhich[39m[38;5;12m [39m[38;5;12mmight[39m[38;5;12m [39m[38;5;12mbreak[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mpipelines[39m[38;5;12m [39m[38;5;12mor[39m[38;5;12m [39m[38;5;12mBI[39m[38;5;12m [39m[38;5;12mdashboards[39m[38;5;12m [39m[38;5;12mfrom[39m[38;5;12m [39m
|
||||
[38;5;12mmaking[39m[38;5;12m [39m[38;5;12mit[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mproduction.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mDQOps[0m[38;5;12m (https://github.com/dqops/dqo) - An open-source data quality platform for the whole data platform lifecycle from profiling new data sources to applying full automation of data quality monitoring.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mDataKitchen[0m[38;5;12m (https://datakitchen.io/) - Open Source Data Observability for end-to-end Data Journey Observability, data profiling, anomaly detection, and auto-created data quality validation tests.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1mRunSQL[0m[38;5;12m (https://runsql.com/) - Free online SQL playground for MySQL, PostgreSQL, and SQL Server. Create database structures, run queries, and share results instantly.[39m
|
||||
|
||||
[38;2;255;187;0m[4mCommunity[0m
|
||||
|
||||
[38;2;255;187;0m[4mForums[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1m/r/dataengineering[0m[38;5;12m (https://www.reddit.com/r/dataengineering/) - News, tips and background on Data Engineering.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1m/r/dataengineering[0m[38;5;12m (https://www.reddit.com/r/dataengineering/) - News, tips, and background on Data Engineering.[39m
|
||||
[38;5;12m- [39m[38;5;14m[1m/r/etl[0m[38;5;12m (https://www.reddit.com/r/ETL/) - Subreddit focused on ETL.[39m
|
||||
|
||||
[38;2;255;187;0m[4mConferences[0m
|
||||
@@ -322,5 +340,13 @@
|
||||
[38;2;255;187;0m[4mPodcasts[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mData Engineering Podcast[0m[38;5;12m (https://www.dataengineeringpodcast.com/) - The show about modern data infrastructure.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mThe[0m[38;5;14m[1m [0m[38;5;14m[1mData[0m[38;5;14m[1m [0m[38;5;14m[1mStack[0m[38;5;14m[1m [0m[38;5;14m[1mShow[0m[38;5;12m [39m[38;5;12m(https://datastackshow.com/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mshow[39m[38;5;12m [39m[38;5;12mwhere[39m[38;5;12m [39m[38;5;12mthey[39m[38;5;12m [39m[38;5;12mtalk[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mengineers,[39m[38;5;12m [39m[38;5;12manalysts,[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mscientists[39m[38;5;12m [39m[38;5;12mabout[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mexperience[39m[38;5;12m [39m[38;5;12maround[39m[38;5;12m [39m[38;5;12mbuilding[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mmaintaining[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12minfrastructure,[39m[38;5;12m [39m[38;5;12mdelivering[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mproducts,[39m[38;5;12m [39m
|
||||
[38;5;12mand[39m[38;5;12m [39m[38;5;12mdriving[39m[38;5;12m [39m[38;5;12mbetter[39m[38;5;12m [39m[38;5;12moutcomes[39m[38;5;12m [39m[38;5;12macross[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mbusinesses[39m[38;5;12m [39m[38;5;12mwith[39m[38;5;12m [39m[38;5;12mdata.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mThe[0m[38;5;14m[1m [0m[38;5;14m[1mData[0m[38;5;14m[1m [0m[38;5;14m[1mStack[0m[38;5;14m[1m [0m[38;5;14m[1mShow[0m[38;5;12m [39m[38;5;12m(https://datastackshow.com/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mA[39m[38;5;12m [39m[38;5;12mshow[39m[38;5;12m [39m[38;5;12mwhere[39m[38;5;12m [39m[38;5;12mthey[39m[38;5;12m [39m[38;5;12mtalk[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mengineers,[39m[38;5;12m [39m[38;5;12manalysts,[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mscientists[39m[38;5;12m [39m[38;5;12mabout[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mexperience[39m[38;5;12m [39m[38;5;12maround[39m[38;5;12m [39m[38;5;12mbuilding[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mmaintaining[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12minfrastructure,[39m[38;5;12m [39m[38;5;12mdelivering[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mproducts,[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m
|
||||
[38;5;12mdriving[39m[38;5;12m [39m[38;5;12mbetter[39m[38;5;12m [39m[38;5;12moutcomes[39m[38;5;12m [39m[38;5;12macross[39m[38;5;12m [39m[38;5;12mtheir[39m[38;5;12m [39m[38;5;12mbusinesses[39m[38;5;12m [39m[38;5;12mwith[39m[38;5;12m [39m[38;5;12mdata.[39m
|
||||
|
||||
[38;2;255;187;0m[4mBooks[0m
|
||||
|
||||
[38;5;12m- [39m[38;5;14m[1mSnowflake Data Engineering[0m[38;5;12m (https://www.manning.com/books/snowflake-data-engineering) - A practical introduction to data engineering on the Snowflake cloud data platform.[39m
|
||||
[38;5;12m-[39m[38;5;12m [39m[38;5;14m[1mBest[0m[38;5;14m[1m [0m[38;5;14m[1mData[0m[38;5;14m[1m [0m[38;5;14m[1mScience[0m[38;5;14m[1m [0m[38;5;14m[1mBooks[0m[38;5;12m [39m[38;5;12m(https://www.appliedaicourse.com/blog/data-science-books/)[39m[38;5;12m [39m[38;5;12m-[39m[38;5;12m [39m[38;5;12mThis[39m[38;5;12m [39m[38;5;12mblog[39m[38;5;12m [39m[38;5;12moffers[39m[38;5;12m [39m[38;5;12ma[39m[38;5;12m [39m[38;5;12mcurated[39m[38;5;12m [39m[38;5;12mlist[39m[38;5;12m [39m[38;5;12mof[39m[38;5;12m [39m[38;5;12mtop[39m[38;5;12m [39m[38;5;12mdata[39m[38;5;12m [39m[38;5;12mscience[39m[38;5;12m [39m[38;5;12mbooks,[39m[38;5;12m [39m[38;5;12mcategorized[39m[38;5;12m [39m[38;5;12mby[39m[38;5;12m [39m[38;5;12mtopics[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m[38;5;12mlearning[39m[38;5;12m [39m[38;5;12mstages,[39m[38;5;12m [39m[38;5;12mto[39m[38;5;12m [39m[38;5;12maid[39m[38;5;12m [39m[38;5;12mreaders[39m[38;5;12m [39m[38;5;12min[39m[38;5;12m [39m[38;5;12mbuilding[39m[38;5;12m [39m[38;5;12mfoundational[39m[38;5;12m [39m[38;5;12mknowledge[39m[38;5;12m [39m[38;5;12mand[39m[38;5;12m [39m
|
||||
[38;5;12mstaying[39m[38;5;12m [39m[38;5;12mupdated[39m[38;5;12m [39m[38;5;12mwith[39m[38;5;12m [39m[38;5;12mindustry[39m[38;5;12m [39m[38;5;12mtrends.[39m
|
||||
|
||||
[38;5;12mdataengineering Github: https://github.com/igorbarinov/awesome-data-engineering[39m
|
||||
|
||||
Reference in New Issue
Block a user