Awesome HPC !Awesome (https://awesome.re/badge-flat.svg) (https://awesome.re) High Performance Computing tools and resources for engineers and administrators. High Performance Computing (HPC) (https://en.wikipedia.org/wiki/Supercomputer) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop  computer or workstation in order to solve large problems in science, engineering, or business. Contents     - Provisioning (#provisioning) - Workload Managers (#workload-managers) - Pipelines (#pipelines) - Applications (#applications) - Compilers (#compilers) - MPI (#mpi) - Parallel Computing (#parallel-computing) - Benchmarking (#benchmarking) - Miscellaneous (#miscellaneous) - Performance (#performance) - Parallel Shells (#parallel-shells) - Containers (#containers) - Environment Management (#environment-management) - Visualization (#visualization) - Parallel Filesystems (#parallel-filesystems) - Programming Languages (#programming-languages) - Monitoring (#monitoring) - Journals (#journals) - Podcasts (#podcasts) - Blogs (#blogs) - Conferences (#conferences) - Websites (#websites) - User Groups (#user-groups) Provisioning - Grendel (https://grendel.readthedocs.io/) - Bare Metal Provisioning system for HPC Linux clusters (Source Code (https://github.comubccr/grendel)) GPL-3. - XCat (https://xcat.org/) - xCAT is a toolkit for deployment and administration of clusters of all sizes (Source Code (https://github.com/xcat2/xcat-core)) EPL-1.0. - Warewulf (https://warewulf.hpcng.org/) - Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems (Source Code (https://github.com/hpcng/warewulf)) BSD-3. - Rocks (http://www.rocksclusters.org/) - A Linux distribution for developing Linux clusters other. - Cobbler (https://cobbler.github.io/) - Cobbler is a Linux installation server that allows for rapid setup of network installation environments (Source Code (https://github.com/cobbler/cobbler)) GPL-2.0. - Base Command Manager (https://docs.nvidia.com/base-command-manager/index.html) - Base Command Manager allows administrator to quickly build and manage heterogeneous clusters Proprietary. - Scyld (https://www.penguinsolutions.com/computing/products/software/scyld-clusterware/) - Scyld Clusterware Scyld ClusterWare is developed based on the continuing evolution of Beowulf clusters first developed at NASA in the 1990s Proprietary. - BlueBanquise (https://bluebanquise.com) - BlueBanquise is an open source cluster deployment and management stack built on Python and Ansible (Source Code (https://github.com/bluebanquise/bluebanquise)) MIT. Workload Managers - Slurm (https://slurm.schedmd.com/documentation.html) - A free and open source job scheduler (Source Code (https://github.com/SchedMD/slurm)) OSS. - LSF (https://www.ibm.com/products/hpc-workload-management) - A job scheduler and workload management software developed by IBM Proprietary. - Moab (https://adaptivecomputing.com/moab-hpc-suite/) - Moab is a workload management and job scheduler other. - Torque (https://en.wikipedia.org/wiki/TORQUE) - Torque is a workload management and job scheduler other. - OpenLava (https://en.wikipedia.org/wiki/OpenLava) - OpenLava is a workload management and job scheduler other. - UGE/SGE (https://en.wikipedia.org/wiki/Univa_Grid_Engine) - Univa Grid Engine is a workload management engine for HPC Proprietary. - Volcano (https://volcano.sh/) - Volcano is a batch system built on Kubernetes Apache-2.0. - Maui (https://www.mhpcc.hpc.mil/) - Maui is a workload management and job scheduler other. - Kube Batch (https://github.com/kubernetes-sigs/kube-batch) - A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC Apache-2.0. - OpenPBS (https://www.openpbs.org/) - OpenPBS® software optimizes job scheduling and workload management in high-performance computing (HPC) environments (Source Code (https://github.com/openpbs/openpbs)) other. Pipelines - Nextflow (https://nextflow.io) - Data drive computational pipelines Apache-2.0. - Cromwell (https://cromwell.readthedocs.io/en/stable/) - Scientific workflow engine designed for simplicity & scalability (Source Code (https://github.com/broadinstitute/cromwell)) BSD-3. - Pegasus (https://pegasus.isi.edu/) - A configurable system for mapping and executing scientific workflows over a wide range of computational infrastructure (Source Code (https://github.com/pegasus-isi/pegasus))Apache-2.0. Applications - Spack (https://spack.io) - A flexible package manager that supports multiple versions, configurations, platforms, and compilers (Source Code (https://github.com/spack/spack)) other. - EasyBuild (https://easybuild.io/) - EasyBuild - building software with ease (Source Code (https://github.com/easybuilders/easybuild)) GPL-2. Compilers - Nvidia (https://developer.nvidia.com/hpc-compilers) - NVIDIA HPC compiler suite for Fortran, C/C++ with OpenACC Proprietary. - Portland Group (https://www.pgroup.com/index.htm) - The Portland Group compilers were Fortran, C/C++ compilers now integrated into NVIDIA HPC SDK Proprietary. - Intel (https://software.intel.com/content/www/us/en/develop/tools/oneapi/all-toolkits.html#hpc-kit) - The Intel compiler suite offers many language compilers for use in the HPC space Proprietary. - Cray (https://bluewaters.ncsa.illinois.edu/cray-compiler) - A suite of compilers designed and optimized to target the AMD interlagos instruction set Proprietary. - GNU (https://gcc.gnu.org/) - The GNU Compiler Collection is a suite of compilers targeting many languages (Source Code (https://gcc.gnu.org/git.html)) GPL-3. - LLVM (https://llvm.org/) - The LLVM project is a collection of modular compilers and toolchains (Source Code (https://github.com/llvm/llvm-project)) OSS. MPI - OpenMPI (https://www.open-mpi.org/) - OpenMPI is an open source implementation of the MPI-3.1 standard (Source Code (https://github.com/open-mpi/ompi)) BSD. - MPICH (https://www.mpich.org/) - MPICH is a high-performance and widely portable implementation of the MPI-3.1 standard (Source Code (https://github.com/pmodels/mpich)) other. - MVAPICH (https://mvapich.cse.ohio-state.edu/) - MVAPICH is an open source implementation of the MPI-3.1 standard developed by Ohio State University BSD. - Intel-MPI (https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) - Intel-MPI is Intel's MPI-3.1 implementation included in their compiler suite other. Parallel Computing - ArrayFire (https://arrayfire.org/docs/index.htm) - A general purpose tensor library that simplifies the process of software development for parallel architectures other. - OpenMP (https://www.openmp.org/) - OpenMP is an application programming interface that supports multi-platform shared-memory multiprocessing programming other. Benchmarking - OSU Benchmarks (https://mvapich.cse.ohio-state.edu/benchmarks/) - A collection of benchmarking tools for MPI developed by Ohio State University other. - Intel MPI Benchmarks (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-benchmarks.html) - A set of benchmarks developed by Intel for use with their Intel MPI other. - HPCC Systems (https://hpccsystems.com/) - HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics (Source Code  (https://github.com/hpcc-systems/HPCC-Platform)) other. - LINPACK (https://www.netlib.org/linpack/) - LINPACK is a set of efficient fortran subroutines for solving linear systems which benchmarks are useful for HPC other. - IOzone (https://www.iozone.org/) - IOzone is a filesystem benchmark tool OSS. - IOR (https://www.vi4io.org/tools/benchmarks/ior) - Interleaved or Random is a useful benchmarking tool for testing parallel filesystems other. - MDtest (https://www.vi4io.org/tools/benchmarks/mdtest) - MDtest is an MPI-based application for evaluating the metadata performance of a file system other. - FIO (https://fio.readthedocs.io/en/latest/fio_doc.html) - Flexible I/O is an advanced disk benchmark that depends upon the kernel's AIO access library (Source Code (https://git.kernel.dk/cgit/fio/)) GPL-2. - elbencho (https://github.com/breuner/elbencho) - A distributed storage benchmark for files, objects & blocks with support for GPUs GPL-3. Miscellaneous - OpenOnDemand (https://openondemand.org/) - Open OnDemand helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device (Source Code  (https://github.com/OSC/openondemand.org)) MIT. - Open XDMod (https://open.xdmod.org) - Open XDMoD is an open source tool to facilitate the management of high performance computing resources (Source Code (https://github.com/ubccr/xdmod/)) LGPL-3. - Coldfront (https://coldfront.readthedocs.io/en/latest/) - ColdFront is an open source resource allocation system designed to provide a central portal for administration, reporting, and measuring scientific impact of HPC resources (Source Code  (https://github.com/ubccr/coldfront)) GPL-3. - Pavilion2 (https://pavilion2.readthedocs.io/) - Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems (Source Code (https://github.com/hpc/pavilion2)) other. - Reframe (https://reframe-hpc.readthedocs.io/en/stable/) - A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems. (Source Code (https://github.com/reframe-hpc/reframe)) BSD-3. - OLCF Test Harness (https://olcf.github.io/olcf-test-harness/) - The OLCF Test Harness (OTH) helps automate the testing of applications, tools, and other system software (Source Code (https://github.com/olcf/olcf-test-harness)) other.  - GoSlmailer (https://github.com/CLIP-HPC/goslmailer) - Goslmailer is a drop-in notification delivery solution for slurm that can do slack, mattermost, teams, and more. Performance - TotalView (https://totalview.io/products/totalview) - TotalView is a debugging tool for HPC applications Proprietary. - Tau (https://www.cs.uoregon.edu/research/tau/home.php) - TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python other. - Valgrind (https://www.valgrind.org/) - Valgrind is a tool designed to profile programs to determine memory leaks (Source Code (https://sourceware.org/git/?p=valgrind.git)) GPL-2. - Paraver (https://tools.bsc.es/paraver) - Paraver is a very flexible data browser that is part of the CEPBA-Tools toolkit other. - PAPI (http://icl.cs.utk.edu/papi) - Performance Application Programming Interface (PAPI) is a performance analysis tool (Source Code (https://bitbucket.org/icl/papi/src/master/)) other. Parallel Shells - pdsh (https://linux.die.net/man/1/pdsh) - pdsh runs terminal commands across multiple hosts in parallel (Source Code (https://github.com/chaos/pdsh)) GPL-2. - ClusterShell (https://clustershell.readthedocs.io/en/latest/intro.html) - Scalable cluster administration Python framework (Source Code (https://github.com/cea-hpc/clustershell)) LGPL-2.1 . Containers - Apptainer (https://apptainer.org) - Apptainer is an open source container system (Source Code (https://github.com/apptainer/apptainer)) BSD. - Charliecloud (https://hpc.github.io/charliecloud/) - Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers (Source Code (https://github.com/hpc/charliecloud)) Apache-2.0. - Docker (https://www.docker.com/) - Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers other. - uDocker (https://indigo-dc.github.io/udocker/) - A basic user tool to execute simple docker containers in batch or interactive systems without root privileges (Source Code (https://github.com/indigo-dc/udocker)) Apache-2.0. - Shifter (https://www.nersc.gov/research-and-development/user-defined-images/) - Shifter is Linux containers for HPC (Source Code (https://github.com/NERSC/shifter)) other. - HPC Container Maker (https://github.com/NVIDIA/hpc-container-maker) - HPC Container Maker is an open source tool to make it easier to generate container specification files. Apache-2.0. - Scarus (https://github.com/eth-cscs/sarus) - An OCI-compatible container engine for HPC BSD. - Singularity HPC (https://singularity-hpc.readthedocs.io) - Singularity Registry HPC (shpc) allows you to install containers as modules (Source Code (https://github.com/singularityhub/singularity-hpc)) MPL 2.0. Environment Management - Lmod (https://lmod.readthedocs.io/en/latest/) - Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy (Source Code (https://github.com/TACC/Lmod)) other. - Environment Modules (https://modules.readthedocs.io/en/latest/) - Environment Modules: provides dynamic modification of a user's environment (Source Code (https://github.com/cea-hpc/modules)) GPL-2. - Anaconda (https://www.anaconda.com/) - Anaconda is a Python and R distribution for use in computational science other. - Mamba (https://mamba.readthedocs.io/en/latest/) - Mamba is a reimplementation of the conda package manager in C++ (Source Code (https://github.com/mamba-org/mamba)) BSD. Visualization - Visit (https://visit-dav.github.io/visit-website/) - VisIt - Visualization and Data Analysis for Mesh-based Scientific Data (Source Code (https://github.com/visit-dav/visit)) BSD-3. - Paraview (https://www.paraview.org/) - ParaView is an open-source, multi-platform data analysis and visualization application based on Visualization Toolkit (VTK) (Source Code (https://github.com/Kitware/ParaView)) BSD-3. Parallel Filesystems - GPFS (https://www.ibm.com/docs/en/gpfs/4.1.0.4?topic=guide-introducing-general-parallel-file-system) - GPFS is a high-performance clustered file system software developed by IBM Proprietary. - Quobyte (https://www.quobyte.com/storage-for/high-performance-computing-hpc?gclid=EAIaIQobChMI-fv1pfKG8wIV5x6tBh367Q5CEAAYASABEgJTgPD_BwE) - A high performance filesystem Proprietary. - Ceph (https://ceph.io/en/) - Ceph is a distributed object, block, and file storage platform (Source Code (https://github.com/ceph/ceph)) other. - Weka (https://www.weka.io/) - A file system designed for HPC Proprietary . - Lustre/Exascaler (https://www.lustre.org/) - Lustre is an open-source, distributed parallel file system software platform designed for scalability, high-performance, and high-availability (Source Code  (https://git.whamcloud.com/fs/lustre-release.git)) other. - BeeGFS (https://www.beegfs.io/c/) - BeeGFS is a hardware-independent POSIX parallel file system developed with a strong focus on performance and designed for ease of use, simple installation, and management Proprietary. - OrangeFS (http://www.orangefs.org/) - OrangeFS is a next generation parallel file system for Linux clusters (Source Code (https://github.com/waltligon/orangefs)) other. - MooseFS (https://moosefs.com/) - Moose File System is an Open-source, POSIX-compliant distributed file system developed by Core Technology (Source Code (https://github.com/moosefs/moosefs)) GPL-2.0. Programming Languages - Julia (https://julialang.org/) - Julia is a high-level, high-performance dynamic language for technical computing MIT. - Futhark (https://futhark-lang.org/) - Futhark is a purely functional data-parallel programming language in the ML family isc. - Chapel (https://chapel-lang.org/) - Chapel is a programming language designed for productive parallel computing at scale Apache-2.0. Monitoring Prometheus Based - Slurm Exporter (https://github.com/treydock/prometheus-slurm-exporter) - Prometheus exporter for performance metrics from Slurm GPL-3.0.  - Slurm Exporter (https://github.com/ubccr/slurm-exporter) - Slurm Exporter for Prometheus using Rest API GPL-3.0. - Infiniband Exporter (https://github.com/treydock/infiniband_exporter) - The InfiniBand exporter collects counters from InfiniBand switches and HCAs Apache-2.0. - Cgroup Exporter (https://github.com/treydock/cgroup_exporter) - Produces metrics from cgroups Apache-2.0. - Cgroup Exporter (https://github.com/phpHavok/cgroups_exporter) - A Prometheus exporter for cgroup-level metrics unknown. - GPFS Exporter (https://github.com/treydock/gpfs_exporter) - The GPFS exporter collects metrics from the GPFS filesystem Apache-2.0. - Lustre Exporter (https://github.com/GSI-HPC/lustre_exporter) - Prometheus exporter for use with the Lustre parallel filesystem GPL-3.0. - DCGM Exporter (https://github.com/NVIDIA/dcgm-exporter) - NVIDIA GPU metrics exporter for Prometheus leveraging DCGM Apache-2.0. Journals - Journal of Super Computing (https://www.springer.com/journal/11227) - An International Journal of High-Performance Computer Design, Analysis, and Use. Podcasts - This week in HPC (https://www.intersect360.com/media/podcasts/) - Each week, Intersect360 Research CEO Addison Snell and HPCwire editor Tiffany Trader dissect the week's top HPC stories.  - Exascaler Project (https://www.exascaleproject.org/podcast/) - ECP's Let's Talk Exascale podcast goes behind the scenes to chat with some of the people who are bringing a capable and sustainable exascale computing ecosystem to fruition. - @HPCpodcast (https://insidehpc.com/category/resources/hpc-podcast/) - Join Shahin Khan and Doug Black as they discuss Supercomputing technologies and the applications, markets, and policies that shape them. Blogs - HPCWire (https://www.hpcwire.com/) - Since 1987 covering the fastest computers in the world and the people who run them. - InsideHPC (https://insidehpc.com/) - insideHPC is a global publication recognized for its comprehensive and insightful coverage of the HPC-AI community, linking vendors, end-users and HPC strategists.  - The Next Platform (https://www.nextplatform.com/category/hpc/) - Offers in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds. - The Register HPC (http://www.theregister.co.uk/data_centre/hpc/) - The Register is a leading and trusted global online enterprise technology news publication, reaching roughly 40 million readers worldwide. - HPC at Dell (http://hpcatdell.com) - High-Performance Computing knowledge base articles from Dell. Conferences - Pearc (https://pearc.acm.org/) - Practice & Experience in Advanced Research Computing. - Supercomputing (SC) (https://supercomputing.org/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis. - Supercomputing International (ISC) (https://www.isc-hpc.com/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis. - CCGrid (https://dl.acm.org/conference/ccgrid) - IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing. - IEEE-HPEC (https://ieee-hpec.org/) - IEEE High Performance Embedded Computing. - Hot Chips (https://hotchips.org) - Semiconductor industry's leading conference on high-performance microprocessors and related circuits. - Hot Interconnects (https://hoti.org) - IEEE conference on software architectures and implementations for interconnection networks of all scales. - ESSA (https://sites.google.com/view/essa-2024/) - Workshop on Extreme-Scale Storage and Analysis. - IEEE-IPDPS (https://www.ipdps.org/) - IEEE International Parallel & Distributed Processing Symposium. - ESPM2 Workshop (http://nowlab.cse.ohio-state.edu/espm2/) - International Workshop on Extreme Scale Programming Models and Middleware. - LCI Workshops (https://linuxclustersinstitute.org/workshops/) - The Linux Clusters Institute (LCI) is providing education and advanced technical training for the deployment and use of computing clusters to the high performance computing  community worldwide. - HPC Carpentry (https://www.hpc-carpentry.org/) - Teaching basic skills for high-performance computing. Websites - Top500 (https://top500.org) - The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. User Groups - MVAPICH (https://mug.mvapich.cse.ohio-state.edu/) - The MUG conference provides an open forum for all attendees (users, system administrators, researchers, engineers, and students) to discuss and share their knowledge on using MVAPICH  libraries. - Slurm (https://slurm.schedmd.com/slurm_ug_agenda.html) - The annual Slurm user group meeting. Contributing Contributing guidelines can be found in contributing.md (contributing.md). hpc Github: https://github.com/dstdev/awesome-hpc