Files
awesome-awesomeness/html/spark.html
2025-07-18 22:22:32 +02:00

464 lines
23 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<p><a
href="https://spark.apache.org/"><img src="https://cdn.rawgit.com/awesome-spark/awesome-spark/f78a16db/spark-logo-trademark.svg" align="right"></a></p>
<h1 id="awesome-spark-awesome">Awesome Spark <a
href="https://github.com/sindresorhus/awesome"><img
src="https://cdn.rawgit.com/sindresorhus/awesome/d7305f38d29fed78fa85652e3a63e154dd8e8829/media/badge.svg"
alt="Awesome" /></a></h1>
<p>A curated list of awesome <a href="https://spark.apache.org/">Apache
Spark</a> packages and resources.</p>
<p><em>Apache Spark is an open-source cluster-computing framework.
Originally developed at the <a
href="https://www.universityofcalifornia.edu/">University of
California</a>, <a href="https://amplab.cs.berkeley.edu/">Berkeleys
AMPLab</a>, the Spark codebase was later donated to the <a
href="https://www.apache.org/">Apache Software Foundation</a>, which has
maintained it since. Spark provides an interface for programming entire
clusters with implicit data parallelism and fault-tolerance</em> (<a
href="#wikipedia-2017">Wikipedia 2017</a>).</p>
<p>Users of Apache Spark may choose between different the Python, R,
Scala and Java programming languages to interface with the Apache Spark
APIs.</p>
<h2 id="packages">Packages</h2>
<h3 id="language-bindings">Language Bindings</h3>
<ul>
<li><a href="https://github.com/Kotlin/kotlin-spark-api">Kotlin for
Apache Spark</a>
<img src="https://img.shields.io/github/last-commit/Kotlin/kotlin-spark-api.svg">
- Kotlin API bindings and extensions.</li>
<li><a href="https://github.com/dotnet/spark">.NET for Apache Spark</a>
<img src="https://img.shields.io/github/last-commit/dotnet/spark.svg"> -
.NET bindings.</li>
<li><a href="https://github.com/rstudio/sparklyr">sparklyr</a>
<img src="https://img.shields.io/github/last-commit/rstudio/sparklyr.svg">
- An alternative R backend, using <a
href="https://github.com/hadley/dplyr"><code>dplyr</code></a>.</li>
<li><a href="https://github.com/tweag/sparkle">sparkle</a>
<img src="https://img.shields.io/github/last-commit/tweag/sparkle.svg">
- Haskell on Apache Spark.</li>
<li><a
href="https://github.com/sjrusso8/spark-connect-rs">spark-connect-rs</a>
<img src="https://img.shields.io/github/last-commit/sjrusso8/spark-connect-rs.svg">
- Rust bindings.</li>
<li><a
href="https://github.com/apache/spark-connect-go">spark-connect-go</a>
<img src="https://img.shields.io/github/last-commit/apache/spark-connect-go.svg">
- Golang bindings.</li>
<li><a
href="https://github.com/mdrakiburrahman/spark-connect-csharp">spark-connect-csharp</a>
<img src="https://img.shields.io/github/last-commit/mdrakiburrahman/spark-connect-csharp.svg">
- C# bindings.</li>
</ul>
<h3 id="notebooks-and-ides">Notebooks and IDEs</h3>
<ul>
<li><a href="https://almond.sh/">almond</a>
<img src="https://img.shields.io/github/last-commit/almond-sh/almond.svg">
- A scala kernel for <a href="https://jupyter.org/">Jupyter</a>.</li>
<li><a href="https://zeppelin.incubator.apache.org/">Apache Zeppelin</a>
<img src="https://img.shields.io/github/last-commit/apache/zeppelin.svg">
- Web-based notebook that enables interactive data analytics with
plugable backends, integrated plotting, and extensive Spark support
out-of-the-box.</li>
<li><a href="https://polynote.org/">Polynote</a>
<img src="https://img.shields.io/github/last-commit/polynote/polynote.svg">
- Polynote: an IDE-inspired polyglot notebook. It supports mixing
multiple languages in one notebook, and sharing data between them
seamlessly. It encourages reproducible notebooks with its immutable data
model. Originating from <a
href="https://medium.com/netflix-techblog/open-sourcing-polynote-an-ide-inspired-polyglot-notebook-7f929d3f447">Netflix</a>.</li>
<li><a
href="https://github.com/jupyter-incubator/sparkmagic">sparkmagic</a>
<img src="https://img.shields.io/github/last-commit/jupyter-incubator/sparkmagic.svg">
- <a href="https://jupyter.org/">Jupyter</a> magics and kernels for
working with remote Spark clusters, for interactively working with
remote Spark clusters through <a
href="https://github.com/cloudera/livy">Livy</a>, in Jupyter
notebooks.</li>
</ul>
<h3 id="general-purpose-libraries">General Purpose Libraries</h3>
<ul>
<li><a href="https://github.com/yaooqinn/itachi">itachi</a>
<img src="https://img.shields.io/github/last-commit/yaooqinn/itachi.svg">
- A library that brings useful functions from modern database management
systems to Apache Spark.</li>
<li><a href="https://github.com/mrpowers-io/spark-daria">spark-daria</a>
<img src="https://img.shields.io/github/last-commit/mrpowers-io/spark-daria.svg">
- A Scala library with essential Spark functions and extensions to make
you more productive.</li>
<li><a href="https://github.com/mrpowers-io/quinn">quinn</a>
<img src="https://img.shields.io/github/last-commit/mrpowers-io/quinn.svg">
- A native PySpark implementation of spark-daria.</li>
<li><a
href="https://github.com/apache/datafu/tree/master/datafu-spark">Apache
DataFu</a>
<img src="https://img.shields.io/github/last-commit/apache/datafu.svg">
- A library of general purpose functions and UDFs.</li>
<li><a href="https://github.com/joblib/joblib-spark">Joblib Apache Spark
Backend</a>
<img src="https://img.shields.io/github/last-commit/joblib/joblib-spark.svg">
- <a href="https://github.com/joblib/joblib"><code>joblib</code></a>
backend for running tasks on Spark clusters.</li>
</ul>
<h3 id="sql-data-sources">SQL Data Sources</h3>
<p>SparkSQL has <a
href="https://spark.apache.org/docs/latest/sql-data-sources-load-save-functions.html#manually-specifying-options">serveral
built-in Data Sources</a> for files. These include <code>csv</code>,
<code>json</code>, <code>parquet</code>, <code>orc</code>, and
<code>avro</code>. It also supports JDBC databases as well as Apache
Hive. Additional data sources can be added by including the packages
listed below, or writing your own.</p>
<ul>
<li><a href="https://github.com/databricks/spark-xml">Spark XML</a>
<img src="https://img.shields.io/github/last-commit/databricks/spark-xml.svg">
- XML parser and writer.</li>
<li><a
href="https://github.com/datastax/spark-cassandra-connector">Spark
Cassandra Connector</a>
<img src="https://img.shields.io/github/last-commit/datastax/spark-cassandra-connector.svg">
- Cassandra support including data source and API and support for
arbitrary queries.</li>
<li><a href="https://github.com/mongodb/mongo-spark">Mongo-Spark</a>
<img src="https://img.shields.io/github/last-commit/mongodb/mongo-spark.svg">
- Official MongoDB connector.</li>
</ul>
<h3 id="storage">Storage</h3>
<ul>
<li><a href="https://github.com/delta-io/delta">Delta Lake</a>
<img src="https://img.shields.io/github/last-commit/delta-io/delta.svg">
- Storage layer with ACID transactions.</li>
<li><a href="https://github.com/apache/hudi">Apache Hudi</a>
<img src="https://img.shields.io/github/last-commit/apache/hudi.svg"> -
Upserts, Deletes And Incremental Processing on Big Data..</li>
<li><a href="https://github.com/apache/iceberg">Apache Iceberg</a>
<img src="https://img.shields.io/github/last-commit/apache/iceberg.svg">
- Upserts, Deletes And Incremental Processing on Big Data..</li>
<li><a href="https://docs.lakefs.io/integrations/spark.html">lakeFS</a>
<img src="https://img.shields.io/github/last-commit/treeverse/lakefs.svg">
- Integration with the lakeFS atomic versioned storage layer.</li>
</ul>
<h3 id="bioinformatics">Bioinformatics</h3>
<ul>
<li><a href="https://github.com/bigdatagenomics/adam">ADAM</a>
<img src="https://img.shields.io/github/last-commit/bigdatagenomics/adam.svg">
- Set of tools designed to analyse genomics data.</li>
<li><a href="https://github.com/hail-is/hail">Hail</a>
<img src="https://img.shields.io/github/last-commit/hail-is/hail.svg"> -
Genetic analysis framework.</li>
</ul>
<h3 id="gis">GIS</h3>
<ul>
<li><a href="https://github.com/apache/incubator-sedona">Apache
Sedona</a>
<img src="https://img.shields.io/github/last-commit/apache/incubator-sedona.svg">
- Cluster computing system for processing large-scale spatial data.</li>
</ul>
<h3 id="graph-processing">Graph Processing</h3>
<ul>
<li><a href="https://github.com/graphframes/graphframes">GraphFrames</a>
<img src="https://img.shields.io/github/last-commit/graphframes/graphframes.svg">
- Data frame based graph API.</li>
<li><a
href="https://github.com/neo4j-contrib/neo4j-spark-connector">neo4j-spark-connector</a>
<img src="https://img.shields.io/github/last-commit/neo4j-contrib/neo4j-spark-connector.svg">
- Bolt protocol based, Neo4j Connector with RDD, DataFrame and GraphX /
GraphFrames support.</li>
</ul>
<h3 id="machine-learning-extension">Machine Learning Extension</h3>
<ul>
<li><a href="https://systemml.apache.org/">Apache SystemML</a>
<img src="https://img.shields.io/github/last-commit/apache/systemml.svg">
- Declarative machine learning framework on top of Spark.</li>
<li><a
href="https://mahout.apache.org/users/sparkbindings/home.html">Mahout
Spark Bindings</a> [status unknown] - linear algebra DSL and optimizer
with R-like syntax.</li>
<li><a href="http://keystone-ml.org/">KeystoneML</a> - Type safe machine
learning pipelines with RDDs.</li>
<li><a href="https://github.com/jpmml/jpmml-spark">JPMML-Spark</a>
<img src="https://img.shields.io/github/last-commit/jpmml/jpmml-spark.svg">
- PMML transformer library for Spark ML.</li>
<li><a href="https://mitdbg.github.io/modeldb">ModelDB</a>
<img src="https://img.shields.io/github/last-commit/mitdbg/modeldb.svg">
- A system to manage machine learning models for <code>spark.ml</code>
and <a
href="https://github.com/scikit-learn/scikit-learn"><code>scikit-learn</code></a>
<img src="https://img.shields.io/github/last-commit/scikit-learn/scikit-learn.svg">.</li>
<li><a href="https://github.com/h2oai/sparkling-water">Sparkling
Water</a>
<img src="https://img.shields.io/github/last-commit/h2oai/sparkling-water.svg">
- <a href="http://www.h2o.ai/">H2O</a> interoperability layer.</li>
<li><a href="https://github.com/intel-analytics/BigDL">BigDL</a>
<img src="https://img.shields.io/github/last-commit/intel-analytics/BigDL.svg">
- Distributed Deep Learning library.</li>
<li><a href="https://github.com/combust/mleap">MLeap</a>
<img src="https://img.shields.io/github/last-commit/combust/mleap.svg">
- Execution engine and serialization format which supports deployment of
<code>o.a.s.ml</code> models without dependency on
<code>SparkSession</code>.</li>
<li><a href="https://github.com/Azure/mmlspark">Microsoft ML for Apache
Spark</a>
<img src="https://img.shields.io/github/last-commit/Azure/mmlspark.svg">
- A distributed ml library with support for LightGBM, Vowpal Wabbit,
OpenCV, Deep Learning, Cognitive Services, and Model Deployment.</li>
<li><a
href="https://mlflow.org/docs/latest/python_api/mlflow.spark.html#module-mlflow.spark">MLflow</a>
<img src="https://img.shields.io/github/last-commit/mlflow/mlflow.svg">
- Machine learning orchestration platform.</li>
</ul>
<h3 id="middleware">Middleware</h3>
<ul>
<li><a href="https://github.com/apache/incubator-livy">Livy</a>
<img src="https://img.shields.io/github/last-commit/apache/incubator-livy.svg">
- REST server with extensive language support (Python, R, Scala),
ability to maintain interactive sessions and object sharing.</li>
<li><a
href="https://github.com/spark-jobserver/spark-jobserver">spark-jobserver</a>
<img src="https://img.shields.io/github/last-commit/spark-jobserver/spark-jobserver.svg">
- Simple Spark as a Service which supports objects sharing using so
called named objects. JVM only.</li>
<li><a href="https://github.com/apache/incubator-toree">Apache Toree</a>
<img src="https://img.shields.io/github/last-commit/apache/incubator-toree.svg">
- IPython protocol based middleware for interactive applications.</li>
<li><a href="https://github.com/apache/kyuubi">Apache Kyuubi</a>
<img src="https://img.shields.io/github/last-commit/apache/kyuubi.svg">
- A distributed multi-tenant JDBC server for large-scale data processing
and analytics, built on top of Apache Spark.</li>
</ul>
<h3 id="monitoring">Monitoring</h3>
<ul>
<li><a href="https://github.com/datamechanics/delight">Data Mechanics
Delight</a>
<img src="https://img.shields.io/github/last-commit/datamechanics/delight.svg">
- Cross-platform monitoring tool (Spark UI / Spark History Server
replacement).</li>
</ul>
<h3 id="utilities">Utilities</h3>
<ul>
<li><a href="https://github.com/Tubular/sparkly">sparkly</a>
<img src="https://img.shields.io/github/last-commit/Tubular/sparkly.svg">
- Helpers &amp; syntactic sugar for PySpark.</li>
<li><a href="https://github.com/nchammas/flintrock">Flintrock</a>
<img src="https://img.shields.io/github/last-commit/nchammas/flintrock.svg">
- A command-line tool for launching Spark clusters on EC2.</li>
<li><a href="https://github.com/ironmussa/Optimus/">Optimus</a>
<img src="https://img.shields.io/github/last-commit/ironmussa/Optimus.svg">
- Data Cleansing and Exploration utilities with the goal of simplifying
data cleaning.</li>
</ul>
<h3 id="natural-language-processing">Natural Language Processing</h3>
<ul>
<li><a href="https://github.com/JohnSnowLabs/spark-nlp">spark-nlp</a>
<img src="https://img.shields.io/github/last-commit/JohnSnowLabs/spark-nlp.svg">
- Natural language processing library built on top of Apache Spark
ML.</li>
</ul>
<h3 id="streaming">Streaming</h3>
<ul>
<li><a href="https://bahir.apache.org/">Apache Bahir</a>
<img src="https://img.shields.io/github/last-commit/apache/bahir.svg"> -
Collection of the streaming connectors excluded from Spark 2.0 (Akka,
MQTT, Twitter. ZeroMQ).</li>
</ul>
<h3 id="interfaces">Interfaces</h3>
<ul>
<li><a href="https://beam.apache.org/">Apache Beam</a>
<img src="https://img.shields.io/github/last-commit/apache/beam.svg"> -
Unified data processing engine supporting both batch and streaming
applications. Apache Spark is one of the supported execution
environments.</li>
<li><a href="https://github.com/databricks/koalas">Koalas</a>
<img src="https://img.shields.io/github/last-commit/databricks/koalas.svg">
- Pandas DataFrame API on top of Apache Spark.</li>
</ul>
<h3 id="data-quality">Data quality</h3>
<ul>
<li><a href="https://github.com/awslabs/deequ">deequ</a>
<img src="https://img.shields.io/github/last-commit/awslabs/deequ.svg">
- Deequ is a library built on top of Apache Spark for defining “unit
tests for data”, which measure data quality in large datasets.</li>
<li><a href="https://github.com/awslabs/python-deequ">python-deequ</a>
<img src="https://img.shields.io/github/last-commit/awslabs/python-deequ.svg">
- Python API for Deequ.</li>
</ul>
<h3 id="testing">Testing</h3>
<ul>
<li><a
href="https://github.com/holdenk/spark-testing-base">spark-testing-base</a>
<img src="https://img.shields.io/github/last-commit/holdenk/spark-testing-base.svg">
- Collection of base test classes.</li>
<li><a
href="https://github.com/mrpowers-io/spark-fast-tests">spark-fast-tests</a>
<img src="https://img.shields.io/github/last-commit/mrpowers-io/spark-fast-tests.svg">
- A lightweight and fast testing framework.</li>
<li><a href="https://github.com/MrPowers/chispa">chispa</a>
<img src="https://img.shields.io/github/last-commit/MrPowers/chispa.svg">
- PySpark test helpers with beautiful error messages.</li>
</ul>
<h3 id="web-archives">Web Archives</h3>
<ul>
<li><a href="https://github.com/archivesunleashed/aut">Archives
Unleashed Toolkit</a>
<img src="https://img.shields.io/github/last-commit/archivesunleashed/aut.svg">
- Open-source toolkit for analyzing web archives.</li>
</ul>
<h3 id="workflow-management">Workflow Management</h3>
<ul>
<li><a
href="https://github.com/broadinstitute/cromwell#spark-backend">Cromwell</a>
<img src="https://img.shields.io/github/last-commit/broadinstitute/cromwell.svg">
- Workflow management system with <a
href="https://github.com/broadinstitute/cromwell#spark-backend">Spark
backend</a>.</li>
</ul>
<h2 id="resources">Resources</h2>
<h3 id="books">Books</h3>
<ul>
<li><a
href="https://www.oreilly.com/library/view/learning-spark-2nd/9781492050032/">Learning
Spark, 2nd Edition</a> - Introduction to Spark API with Spark 3.0
covered. Good source of knowledge about basic concepts.</li>
<li><a href="http://shop.oreilly.com/product/0636920035091.do">Advanced
Analytics with Spark</a> - Useful collection of Spark processing
patterns. Accompanying GitHub repository: <a
href="https://github.com/sryza/aas">sryza/aas</a>.</li>
<li><a
href="https://jaceklaskowski.gitbooks.io/mastering-apache-spark/">Mastering
Apache Spark</a> - Interesting compilation of notes by <a
href="https://github.com/jaceklaskowski">Jacek Laskowski</a>. Focused on
different aspects of Spark internals.</li>
<li><a href="https://www.manning.com/books/spark-in-action">Spark in
Action</a> - New book in the Mannings “in action” family with +400
pages. Starts gently, step-by-step and covers large number of topics.
Free excerpt on how to <a
href="http://freecontent.manning.com/how-to-start-developing-spark-applications-in-eclipse/">setup
Eclipse for Spark application development</a> and how to bootstrap a new
application using the provided Maven Archetype. You can find the
accompanying GitHub repo <a
href="https://github.com/spark-in-action/first-edition">here</a>.</li>
</ul>
<h3 id="papers">Papers</h3>
<ul>
<li><a href="https://arxiv.org/pdf/2009.08044.pdf">Large-Scale
Intelligent Microservices</a> - Microsoft paper that presents an Apache
Spark-based micro-service orchestration framework that extends database
operations to include web service primitives.</li>
<li><a
href="https://people.csail.mit.edu/matei/papers/2012/nsdi_spark.pdf">Resilient
Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster
Computing</a> - Paper introducing a core distributed memory
abstraction.</li>
<li><a
href="https://amplab.cs.berkeley.edu/wp-content/uploads/2015/03/SparkSQLSigmod2015.pdf">Spark
SQL: Relational Data Processing in Spark</a> - Paper introducing
relational underpinnings, code generation and Catalyst optimizer.</li>
<li><a
href="https://cs.stanford.edu/~matei/papers/2018/sigmod_structured_streaming.pdf">Structured
Streaming: A Declarative API for Real-Time Applications in Apache
Spark</a> - Structured Streaming is a new high-level streaming API, it
is a declarative API based on automatically incrementalizing a static
relational query.</li>
</ul>
<h3 id="moocs">MOOCS</h3>
<ul>
<li><a
href="https://www.edx.org/xseries/data-science-engineering-apache-spark">Data
Science and Engineering with Apache Spark (edX XSeries)</a> - Series of
five courses (<a
href="https://www.edx.org/course/introduction-apache-spark-uc-berkeleyx-cs105x">Introduction
to Apache Spark</a>, <a
href="https://www.edx.org/course/distributed-machine-learning-apache-uc-berkeleyx-cs120x">Distributed
Machine Learning with Apache Spark</a>, <a
href="https://www.edx.org/course/big-data-analysis-apache-spark-uc-berkeleyx-cs110x">Big
Data Analysis with Apache Spark</a>, <a
href="https://www.edx.org/course/advanced-apache-spark-data-science-data-uc-berkeleyx-cs115x">Advanced
Apache Spark for Data Science and Data Engineering</a>, <a
href="https://www.edx.org/course/advanced-distributed-machine-learning-uc-berkeleyx-cs125x">Advanced
Distributed Machine Learning with Apache Spark</a>) covering different
aspects of software engineering and data science. Python oriented.</li>
<li><a href="https://www.coursera.org/learn/big-data-analysys">Big Data
Analysis with Scala and Spark (Coursera)</a> - Scala oriented
introductory course. Part of <a
href="https://www.coursera.org/specializations/scala">Functional
Programming in Scala Specialization</a>.</li>
</ul>
<h3 id="workshops">Workshops</h3>
<ul>
<li><a href="http://ampcamp.berkeley.edu">AMP Camp</a> - Periodical
training event organized by the <a
href="https://amplab.cs.berkeley.edu/">UC Berkeley AMPLab</a>. A source
of useful exercise and recorded workshops covering different tools from
the <a href="https://amplab.cs.berkeley.edu/software/">Berkeley Data
Analytics Stack</a>.</li>
</ul>
<h3 id="projects-using-spark">Projects Using Spark</h3>
<ul>
<li><a href="https://github.com/OryxProject/oryx">Oryx 2</a> - <a
href="http://lambda-architecture.net/">Lambda architecture</a> platform
built on Apache Spark and <a href="http://kafka.apache.org/">Apache
Kafka</a> with specialization for real-time large scale machine
learning.</li>
<li><a href="https://github.com/linkedin/photon-ml">Photon ML</a> - A
machine learning library supporting classical Generalized Mixed Model
and Generalized Additive Mixed Effect Model.</li>
<li><a href="https://prediction.io/">PredictionIO</a> - Machine Learning
server for developers and data scientists to build and deploy predictive
applications in a fraction of the time.</li>
<li><a href="https://github.com/Stratio/Crossdata">Crossdata</a> - Data
integration platform with extended DataSource API and multi-user
environment.</li>
</ul>
<h3 id="docker-images">Docker Images</h3>
<ul>
<li><a href="https://hub.docker.com/r/apache/spark">apache/spark</a> -
Apache Spark Official Docker images.</li>
<li><a
href="https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook">jupyter/docker-stacks/pyspark-notebook</a>
- PySpark with Jupyter Notebook and Mesos client.</li>
<li><a
href="https://github.com/sequenceiq/docker-spark">sequenceiq/docker-spark</a>
- Yarn images from <a
href="http://www.sequenceiq.com/">SequenceIQ</a>.</li>
<li><a
href="https://hub.docker.com/r/datamechanics/spark">datamechanics/spark</a>
- An easy to setup Docker image for Apache Spark from <a
href="https://www.datamechanics.co/">Data Mechanics</a>.</li>
</ul>
<h3 id="miscellaneous">Miscellaneous</h3>
<ul>
<li><a href="https://gitter.im/spark-scala/Lobby">Spark with Scala
Gitter channel</a> - “<em>A place to discuss and ask questions about
using Scala for Spark programming</em>” started by <a
href="https://github.com/deanwampler"><span class="citation"
data-cites="deanwampler">@deanwampler</span></a>.</li>
<li><a
href="http://apache-spark-user-list.1001560.n3.nabble.com/">Apache Spark
User List</a> and <a
href="http://apache-spark-developers-list.1001551.n3.nabble.com/">Apache
Spark Developers List</a> - Mailing lists dedicated to usage questions
and development topics respectively.</li>
</ul>
<h2 id="references">References</h2>
<p id="wikipedia-2017">
Wikipedia. 2017. “Apache Spark — Wikipedia, the Free Encyclopedia.”
<a href="https://en.wikipedia.org/w/index.php?title=Apache_Spark&amp;oldid=781182753" class="uri">https://en.wikipedia.org/w/index.php?title=Apache_Spark&amp;oldid=781182753</a>.
</p>
<h2 id="license">License</h2>
<p xmlns:dct="http://purl.org/dc/terms/">
<a rel="license" href="http://creativecommons.org/publicdomain/mark/1.0/">
<img src="https://mirrors.creativecommons.org/presskit/buttons/88x31/svg/publicdomain.svg"
style="border-style: none;" alt="Public Domain Mark" /> </a> <br />
This work (<span property="dct:title">Awesome Spark</span>, by
<a href="https://github.com/awesome-spark/awesome-spark" rel="dct:creator">https://github.com/awesome-spark/awesome-spark</a>),
identified by
<a href="https://github.com/zero323" rel="dct:publisher"><span
property="dct:title">Maciej Szymkiewicz</span></a>, is free of known
copyright restrictions.
</p>
<p>Apache Spark, Spark, Apache, and the Spark logo are
<a href="https://www.apache.org/foundation/marks/">trademarks</a> of
<a href="http://www.apache.org">The Apache Software Foundation</a>. This
compilation is not endorsed by The Apache Software Foundation.</p>
<p>Inspired by <a
href="https://github.com/sindresorhus/awesome">sindresorhus/awesome</a>.</p>
<p><a href="https://github.com/awesome-spark/awesome-spark">spark.md
Github</a></p>