Update render script and Makefile

This commit is contained in:
Jonas Zeunert
2024-04-22 21:54:39 +02:00
parent 2d63fe63cd
commit 4d0cd768f7
10975 changed files with 47095 additions and 4031084 deletions

View File

@@ -1,7 +1,7 @@
 Awesome Hadoop !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
 Awesome Hadoop !Awesome (https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg) (https://github.com/sindresorhus/awesome)
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources. Inspired by Awesome PHP (https://github.com/ziadoz/awesome-php), Awesome Python (https://github.com/vinta/awesome-python) and Awesome 
Sysadmin (https://github.com/kahun/awesome-sysadmin)
A curated list of amazingly awesome Hadoop and Hadoop ecosystem resources. Inspired by Awesome PHP (https://github.com/ziadoz/awesome-php), Awesome Python 
(https://github.com/vinta/awesome-python) and Awesome Sysadmin (https://github.com/kahun/awesome-sysadmin)
- Awesome Hadoop (#awesome-hadoop)
- **Hadoop** (#hadoop) 
@@ -44,17 +44,17 @@
⟡ hdfs-du (https://github.com/twitter/hdfs-du) - HDFS-DU is an interactive visualization of the Hadoop distributed file system. 
⟡ White Elephant (https://github.com/linkedin/white-elephant) - Hadoop log aggregator and dashboard
⟡ Genie (https://github.com/Netflix/genie) - Genie provides REST-ful APIs to run Hadoop, Hive and Pig jobs, and to manage multiple Hadoop resources and perform job submissions across them.
⟡ Apache Kylin (http://kylin.incubator.apache.org/) - Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop 
supporting extremely large datasets
⟡ Apache Kylin (http://kylin.incubator.apache.org/) - Apache Kylin is an open source Distributed Analytics Engine from eBay Inc. that provides SQL interface and multi-dimensional analysis 
(OLAP) on Hadoop supporting extremely large datasets
⟡ Crunch (https://github.com/jondot/crunch) - Go-based toolkit for ETL and feature extraction on Hadoop
⟡ Apache Ignite (http://ignite.apache.org/) - Distributed in-memory platform
YARN
⟡ Apache Slider (http://slider.incubator.apache.org/) - Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy existing applications 
onto a YARN cluster.
⟡ Apache Twill (http://twill.incubator.apache.org/) - Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing developers to focus more
on their application logic.
⟡ Apache Slider (http://slider.incubator.apache.org/) - Apache Slider is a project in incubation at the Apache Software Foundation with the goal of making it possible and easy to deploy 
existing applications onto a YARN cluster.
⟡ Apache Twill (http://twill.incubator.apache.org/) - Apache Twill is an abstraction over Apache Hadoop® YARN that reduces the complexity of developing distributed applications, allowing 
developers to focus more on their application logic.
⟡ mpich2-yarn (https://github.com/alibaba/mpich2-yarn) - Running MPICH2 on Yarn
NoSQL
@@ -75,11 +75,11 @@
⟡ Apache Hive (http://hive.apache.org) - The Apache Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL
⟡ Apache Phoenix (http://phoenix.apache.org) A SQL skin over HBase supporting secondary indices
⟡ Apache HAWQ (incubating)
 (http://hawq.incubator.apache.org/) - Apache HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the scalability and convenience of Hadoop
⟡ Apache HAWQ (incubating) (http://hawq.incubator.apache.org/) - Apache HAWQ is a Hadoop native SQL query engine that combines the key technological advantages of MPP database with the 
scalability and convenience of Hadoop
⟡ Lingual (http://www.cascading.org/projects/lingual/) - SQL interface for Cascading (MR/Tez job generator)
⟡ Apache Impala (https://impala.apache.org/) - Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been 
described as the open-source equivalent of Google F1, which inspired its development in 2012.
⟡ Apache Impala (https://impala.apache.org/) - Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache 
Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its development in 2012.
⟡ Presto (https://prestodb.io/) - Distributed SQL Query Engine for Big Data. Open sourced by Facebook.
⟡ Apache Tajo (http://tajo.apache.org/) - Data warehouse system for Apache Hadoop
⟡ Apache Drill (https://drill.apache.org/) - Schema-free SQL Query Engine
@@ -89,8 +89,8 @@
⟡ Apache Calcite (http://calcite.apache.org/) - A Dynamic Data Management Framework
⟡ Apache Atlas (http://atlas.incubator.apache.org/) - Metadata tagging & lineage capture suppoting complex business data taxonomies
⟡ Apache Kudu (https://kudu.apache.org/) - Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single storage layer, 
complementing HDFS and Apache HBase.
⟡ Apache Kudu (https://kudu.apache.org/) - Kudu provides a combination of fast inserts/updates and efficient columnar scans to enable multiple real-time analytic workloads across a single 
storage layer, complementing HDFS and Apache HBase.
⟡ Confluent Schema registry for Kafka
 (https://github.com/confluentinc/schema-registry) - Schema Registry provides a serving layer for your metadata. It provides a RESTful interface for storing and retrieving Avro schemas.
⟡ Hortonworks Schema Registry (https://github.com/hortonworks/registry) - Schema Registry is a framework to build metadata repositories.
@@ -120,7 +120,8 @@
⟡ packetpig (https://github.com/packetloop/packetpig) - Open Source Big Data Security Analytics
⟡ akela (https://github.com/mozilla-metrics/akela) - Mozilla's utility library for Hadoop, HBase, Pig, etc.
⟡ seqpig (http://seqpig.sourceforge.net/) - Simple and scalable scripting for large sequencing data set(ex: bioinfomation) in Hadoop 
⟡ Lipstick (https://github.com/Netflix/Lipstick) - Pig workflow visualization tool. Introducing Lipstick on A(pache) Pig (http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html)
⟡ Lipstick (https://github.com/Netflix/Lipstick) - Pig workflow visualization tool. Introducing Lipstick on A(pache) Pig 
(http://techblog.netflix.com/2013/06/introducing-lipstick-on-apache-pig.html)
⟡ PigPen (https://github.com/Netflix/PigPen) - PigPen is map-reduce for Clojure, or distributed Clojure. It compiles to Apache Pig, but you don't need to know much about Pig to use it.
Libraries and Tools
@@ -136,11 +137,11 @@
⟡ hdfs - A native go client for HDFS (https://github.com/colinmarc/hdfs)
⟡ Oozie Eclipse Plugin (https://marketplace.eclipse.org/content/oozie-eclipse-plugin) - A graphical editor for editing Apache Oozie workflows inside Eclipse.
⟡ snakebite (https://pypi.python.org/pypi/snakebite/) - A pure python HDFS client
⟡ Apache Parquet (https://parquet.apache.org/) - Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or
programming language.
⟡ Apache Parquet (https://parquet.apache.org/) - Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing 
framework, data model or programming language.
⟡ Apache Superset (incubating) (https://superset.incubator.apache.org/) - Apache Superset (incubating) is a modern, enterprise-ready business intelligence web application
⟡ Schema Registry UI
 (https://github.com/Landoop/schema-registry-ui) - Web tool for the Confluent Schema Registry in order to create / view / search / evolve / view history & configure Avro schemas of your Kafka cluster.
⟡ Schema Registry UI (https://github.com/Landoop/schema-registry-ui) - Web tool for the Confluent Schema Registry in order to create / view / search / evolve / view history & configure Avro 
schemas of your Kafka cluster.
Realtime Data Processing
@@ -148,8 +149,8 @@
⟡ Apache Samza (http://samza.apache.org/)
⟡ Apache Spark (http://spark.apache.org/streaming/)
⟡ Apache Flink (https://flink.apache.org) - Apache Flink is a platform for efficient, distributed, general-purpose data processing. It supports exactly once stream processing.
⟡ Apache Pulsar (incubating) (http://pulsar.incubator.apache.org/) - Apache Pulsar (incubating) is a highly scalable, low latency messaging platform running on commodity hardware. It provides simple pub-sub 
semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication.
⟡ Apache Pulsar (incubating) (http://pulsar.incubator.apache.org/) - Apache Pulsar (incubating) is a highly scalable, low latency messaging platform running on commodity hardware. It provides
simple pub-sub semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication.
⟡ Apache Druid (incubating) (http://druid.incubator.apache.org/) - A high-performance, column-oriented, distributed data store.
Distributed Computing and Programming
@@ -161,8 +162,8 @@
⟡ Cascading (http://www.cascading.org/) - Cascading is the proven application development platform for building data applications on Hadoop.
⟡ Apache Flink (http://flink.apache.org/) - Apache Flink is a platform for efficient, distributed, general-purpose data processing.
⟡ Apache Apex (incubating) (http://apex.incubator.apache.org/) - Enterprise-grade unified stream and batch processing engine.
⟡ Apache Livy (incubating) (https://livy.incubator.apache.org/) - Apache Livy (incubating) is web service that exposes a REST interface for managing long running Apache Spark contexts in your cluster. With Livy,
new applications can be built on top of Apache Spark that require fine grained interaction with many Spark contexts.
⟡ Apache Livy (incubating) (https://livy.incubator.apache.org/) - Apache Livy (incubating) is web service that exposes a REST interface for managing long running Apache Spark contexts in your
cluster. With Livy, new applications can be built on top of Apache Spark that require fine grained interaction with many Spark contexts.
Packaging, Provisioning and Monitoring
@@ -196,8 +197,8 @@
⟡ Big Data Benchmark (https://amplab.cs.berkeley.edu/benchmark/)
⟡ HiBench (https://github.com/intel-hadoop/HiBench)
⟡ YCSB (https://github.com/brianfrankcooper/YCSB) - The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer 
programs. It is often used to compare relative performance of NoSQL database management systems.
⟡ YCSB (https://github.com/brianfrankcooper/YCSB) - The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance 
capabilities of computer programs. It is often used to compare relative performance of NoSQL database management systems.
Machine learning and Big Data analytics
@@ -208,8 +209,8 @@
⟡ RHadoop (https://github.com/RevolutionAnalytics/RHadoop/wiki) including RHDFS, RHBase, RMR2, plyrmr
⟡ Apache Lens (http://lens.apache.org/)
⟡ Apache SINGA (incubating) (https://singa.incubator.apache.org/) - SINGA is a general distributed deep learning platform for training big deep learning models over large datasets
⟡ BigDL (https://bigdl-project.github.io/) - BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark programs, which can 
directly run on top of existing Spark or Hadoop clusters.
⟡ BigDL (https://bigdl-project.github.io/) - BigDL is a distributed deep learning library for Apache Spark; with BigDL, users can write their deep learning applications as standard Spark 
programs, which can directly run on top of existing Spark or Hadoop clusters.
⟡ Apache Hivemall (incubating) (http://hivemall.incubator.apache.org/) - Apache Hivemall is a scalable machine learning library that runs on Apache Hive, Spark and Pig.
Misc.
@@ -247,7 +248,7 @@
  ⟡ Flume UDP Source (https://github.com/whitepages/flume-udp-source)
  ⟡ .Net FlumeNG Clients (https://github.com/marksl/DotNetFlumeNG.Clients)
 Resources
 Resources
Various resources, such as books, websites and articles.
Websites
@@ -284,5 +285,5 @@
⟡ DataWorks Summit (https://dataworkssummit.com/)
⟡ Spark Summit (https://databricks.com/sparkaisummit)
 Other Awesome Lists
 Other Awesome Lists
Other amazingly awesome lists can be found in the awesome-awesomeness (https://github.com/bayandin/awesome-awesomeness) and awesome (https://github.com/sindresorhus/awesome) list.