Files
awesome-awesomeness/terminal/hpc
2025-07-18 22:22:32 +02:00

39 KiB

 
Awesome HPC !Awesome (https://awesome.re/badge-flat.svg) (https://awesome.re)
 
High Performance Computing tools and resources for engineers and administrators.
 
High Performance Computing (HPC) (https://en.wikipedia.org/wiki/Supercomputer) most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop
computer or workstation in order to solve large problems in science, engineering, or business.
 
 
Contents
 


- Provisioning (#provisioning)
- Workload Managers (#workload-managers)
- Pipelines (#pipelines)
- Applications (#applications)
- Compilers (#compilers)
- MPI (#mpi)
- Parallel Computing (#parallel-computing)
- Benchmarking (#benchmarking)
- Miscellaneous (#miscellaneous)
- Performance (#performance)
- Parallel Shells (#parallel-shells)
- Containers (#containers)
- Environment Management (#environment-management)
- Visualization (#visualization)
- Parallel Filesystems (#parallel-filesystems)
- Programming Languages (#programming-languages)
- Monitoring (#monitoring)
- Journals (#journals)
- Podcasts (#podcasts)
- Blogs (#blogs)
- Conferences (#conferences)
- Websites (#websites)
- User Groups (#user-groups)
 
 
 
Provisioning
- Grendel (https://grendel.readthedocs.io/) - Bare Metal Provisioning system for HPC Linux clusters (Source Code (https://github.comubccr/grendel)) GPL-3.
- XCat (https://xcat.org/) - xCAT is a toolkit for deployment and administration of clusters of all sizes (Source Code (https://github.com/xcat2/xcat-core)) EPL-1.0.
- Warewulf (https://warewulf.hpcng.org/) - Warewulf is a stateless and diskless container operating system provisioning system for large clusters of bare metal and/or virtual systems (Source Code (https://github.com/hpcng/warewulf)) BSD-3.
- Rocks (http://www.rocksclusters.org/) - A Linux distribution for developing Linux clusters other.
- Cobbler (https://cobbler.github.io/) - Cobbler is a Linux installation server that allows for rapid setup of network installation environments (Source Code (https://github.com/cobbler/cobbler)) GPL-2.0.
- Base Command Manager (https://docs.nvidia.com/base-command-manager/index.html) - Base Command Manager allows administrator to quickly build and manage heterogeneous clusters Proprietary.
- Scyld (https://www.penguinsolutions.com/computing/products/software/scyld-clusterware/) - Scyld Clusterware Scyld ClusterWare is developed based on the continuing evolution of Beowulf clusters first developed at NASA in the 1990s Proprietary.
- BlueBanquise (https://bluebanquise.com) - BlueBanquise is an open source cluster deployment and management stack built on Python and Ansible (Source Code (https://github.com/bluebanquise/bluebanquise)) MIT.
 
Workload Managers
- Slurm (https://slurm.schedmd.com/documentation.html) - A free and open source job scheduler (Source Code (https://github.com/SchedMD/slurm)) OSS.
- LSF (https://www.ibm.com/products/hpc-workload-management) - A job scheduler and workload management software developed by IBM Proprietary.
- Moab (https://adaptivecomputing.com/moab-hpc-suite/) - Moab is a workload management and job scheduler other.
- Torque (https://en.wikipedia.org/wiki/TORQUE) - Torque is a workload management and job scheduler other.
- OpenLava (https://en.wikipedia.org/wiki/OpenLava) - OpenLava is a workload management and job scheduler other.
- UGE/SGE (https://en.wikipedia.org/wiki/Univa_Grid_Engine) - Univa Grid Engine is a workload management engine for HPC Proprietary.
- Volcano (https://volcano.sh/) - Volcano is a batch system built on Kubernetes Apache-2.0.
- Maui (https://www.mhpcc.hpc.mil/) - Maui is a workload management and job scheduler other.
- Kube Batch (https://github.com/kubernetes-sigs/kube-batch) - A batch scheduler of kubernetes for high performance workload, e.g. AI/ML, BigData, HPC Apache-2.0.
- OpenPBS (https://www.openpbs.org/) - OpenPBS® software optimizes job scheduling and workload management in high-performance computing (HPC) environments (Source Code (https://github.com/openpbs/openpbs)) other.
 
Pipelines
- Nextflow (https://nextflow.io) - Data drive computational pipelines Apache-2.0.
- Cromwell (https://cromwell.readthedocs.io/en/stable/) - Scientific workflow engine designed for simplicity & scalability (Source Code (https://github.com/broadinstitute/cromwell)) BSD-3.
- Pegasus (https://pegasus.isi.edu/) - A configurable system for mapping and executing scientific workflows over a wide range of computational infrastructure (Source Code (https://github.com/pegasus-isi/pegasus))Apache-2.0.
 
Applications
- Spack (https://spack.io) - A flexible package manager that supports multiple versions, configurations, platforms, and compilers (Source Code (https://github.com/spack/spack)) other.
- EasyBuild (https://easybuild.io/) - EasyBuild - building software with ease (Source Code (https://github.com/easybuilders/easybuild)) GPL-2.
 
Compilers
- Nvidia (https://developer.nvidia.com/hpc-compilers) - NVIDIA HPC compiler suite for Fortran, C/C++ with OpenACC Proprietary.
- Portland Group (https://www.pgroup.com/index.htm) - The Portland Group compilers were Fortran, C/C++ compilers now integrated into NVIDIA HPC SDK Proprietary.
- Intel (https://software.intel.com/content/www/us/en/develop/tools/oneapi/all-toolkits.html#hpc-kit) - The Intel compiler suite offers many language compilers for use in the HPC space Proprietary.
- Cray (https://bluewaters.ncsa.illinois.edu/cray-compiler) - A suite of compilers designed and optimized to target the AMD interlagos instruction set Proprietary.
- GNU (https://gcc.gnu.org/) - The GNU Compiler Collection is a suite of compilers targeting many languages (Source Code (https://gcc.gnu.org/git.html)) GPL-3.
- LLVM (https://llvm.org/) - The LLVM project is a collection of modular compilers and toolchains (Source Code (https://github.com/llvm/llvm-project)) OSS.
 
MPI
- OpenMPI (https://www.open-mpi.org/) - OpenMPI is an open source implementation of the MPI-3.1 standard (Source Code (https://github.com/open-mpi/ompi)) BSD.
- MPICH (https://www.mpich.org/) - MPICH is a high-performance and widely portable implementation of the MPI-3.1 standard (Source Code (https://github.com/pmodels/mpich)) other.
- MVAPICH (https://mvapich.cse.ohio-state.edu/) - MVAPICH is an open source implementation of the MPI-3.1 standard developed by Ohio State University BSD.
- Intel-MPI (https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html) - Intel-MPI is Intel's MPI-3.1 implementation included in their compiler suite other.
 
Parallel Computing
- ArrayFire (https://arrayfire.org/docs/index.htm) - A general purpose tensor library that simplifies the process of software development for parallel architectures other.
- OpenMP (https://www.openmp.org/) - OpenMP is an application programming interface that supports multi-platform shared-memory multiprocessing programming other.
 
Benchmarking
- OSU Benchmarks (https://mvapich.cse.ohio-state.edu/benchmarks/) - A collection of benchmarking tools for MPI developed by Ohio State University other.
- Intel MPI Benchmarks (https://software.intel.com/content/www/us/en/develop/articles/intel-mpi-benchmarks.html) - A set of benchmarks developed by Intel for use with their Intel MPI other.
- HPCC Systems (https://hpccsystems.com/) - HPCC Systems (High Performance Computing Cluster) is an open source, massive parallel-processing computing platform for big data processing and analytics (Source Code
(https://github.com/hpcc-systems/HPCC-Platform)) other.
- LINPACK (https://www.netlib.org/linpack/) - LINPACK is a set of efficient fortran subroutines for solving linear systems which benchmarks are useful for HPC other.
- IOzone (https://www.iozone.org/) - IOzone is a filesystem benchmark tool OSS.
- IOR (https://www.vi4io.org/tools/benchmarks/ior) - Interleaved or Random is a useful benchmarking tool for testing parallel filesystems other.
- MDtest (https://www.vi4io.org/tools/benchmarks/mdtest) - MDtest is an MPI-based application for evaluating the metadata performance of a file system other.
- FIO (https://fio.readthedocs.io/en/latest/fio_doc.html) - Flexible I/O is an advanced disk benchmark that depends upon the kernel's AIO access library (Source Code (https://git.kernel.dk/cgit/fio/)) GPL-2.
- elbencho (https://github.com/breuner/elbencho) - A distributed storage benchmark for files, objects & blocks with support for GPUs GPL-3.
 
Miscellaneous
- OpenOnDemand (https://openondemand.org/) - Open OnDemand helps computational researchers and students efficiently utilize remote computing resources by making them easy to access from any device (Source Code
(https://github.com/OSC/openondemand.org)) MIT.
- Open XDMod (https://open.xdmod.org) - Open XDMoD is an open source tool to facilitate the management of high performance computing resources (Source Code (https://github.com/ubccr/xdmod/)) LGPL-3.
- Coldfront (https://coldfront.readthedocs.io/en/latest/) - ColdFront is an open source resource allocation system designed to provide a central portal for administration, reporting, and measuring scientific impact of HPC resources (Source Code
(https://github.com/ubccr/coldfront)) GPL-3.
- Pavilion2 (https://pavilion2.readthedocs.io/) - Pavilion is a Python 3 (3.6+) based framework for running and analyzing tests targeting HPC systems (Source Code (https://github.com/hpc/pavilion2)) other.
- Reframe (https://reframe-hpc.readthedocs.io/en/stable/) - A powerful Python framework for writing and running portable regression tests and benchmarks for HPC systems. (Source Code (https://github.com/reframe-hpc/reframe)) BSD-3.
- OLCF Test Harness (https://olcf.github.io/olcf-test-harness/) - The OLCF Test Harness (OTH) helps automate the testing of applications, tools, and other system software (Source Code (https://github.com/olcf/olcf-test-harness)) other.
- GoSlmailer (https://github.com/CLIP-HPC/goslmailer) - Goslmailer is a drop-in notification delivery solution for slurm that can do slack, mattermost, teams, and more.
 
Performance
- TotalView (https://totalview.io/products/totalview) - TotalView is a debugging tool for HPC applications Proprietary.
- Tau (https://www.cs.uoregon.edu/research/tau/home.php) - TAU Performance System® is a portable profiling and tracing toolkit for performance analysis of parallel programs written in Fortran, C, C++, UPC, Java, Python other.
- Valgrind (https://www.valgrind.org/) - Valgrind is a tool designed to profile programs to determine memory leaks (Source Code (https://sourceware.org/git/?p=valgrind.git)) GPL-2.
- Paraver (https://tools.bsc.es/paraver) - Paraver is a very flexible data browser that is part of the CEPBA-Tools toolkit other.
- PAPI (http://icl.cs.utk.edu/papi) - Performance Application Programming Interface (PAPI) is a performance analysis tool (Source Code (https://bitbucket.org/icl/papi/src/master/)) other.
 
Parallel Shells
- pdsh (https://linux.die.net/man/1/pdsh) - pdsh runs terminal commands across multiple hosts in parallel (Source Code (https://github.com/chaos/pdsh)) GPL-2.
- ClusterShell (https://clustershell.readthedocs.io/en/latest/intro.html) - Scalable cluster administration Python framework (Source Code (https://github.com/cea-hpc/clustershell)) LGPL-2.1 .
 
Containers
- Apptainer (https://apptainer.org) - Apptainer is an open source container system (Source Code (https://github.com/apptainer/apptainer)) BSD.
- Charliecloud (https://hpc.github.io/charliecloud/) - Charliecloud provides user-defined software stacks (UDSS) for high-performance computing (HPC) centers (Source Code (https://github.com/hpc/charliecloud)) Apache-2.0.
- Docker (https://www.docker.com/) - Docker is a set of platform as a service products that use OS-level virtualization to deliver software in packages called containers other.
- uDocker (https://indigo-dc.github.io/udocker/) - A basic user tool to execute simple docker containers in batch or interactive systems without root privileges (Source Code (https://github.com/indigo-dc/udocker)) Apache-2.0.
- Shifter (https://www.nersc.gov/research-and-development/user-defined-images/) - Shifter is Linux containers for HPC (Source Code (https://github.com/NERSC/shifter)) other.
- HPC Container Maker (https://github.com/NVIDIA/hpc-container-maker) - HPC Container Maker is an open source tool to make it easier to generate container specification files. Apache-2.0.
- Scarus (https://github.com/eth-cscs/sarus) - An OCI-compatible container engine for HPC BSD.
- Singularity HPC (https://singularity-hpc.readthedocs.io) - Singularity Registry HPC (shpc) allows you to install containers as modules (Source Code (https://github.com/singularityhub/singularity-hpc)) MPL 2.0.
 
Environment Management
- Lmod (https://lmod.readthedocs.io/en/latest/) - Lmod: An Environment Module System based on Lua, Reads TCL Modules, Supports a Software Hierarchy (Source Code (https://github.com/TACC/Lmod)) other.
- Environment Modules (https://modules.readthedocs.io/en/latest/) - Environment Modules: provides dynamic modification of a user's environment (Source Code (https://github.com/cea-hpc/modules)) GPL-2.
- Anaconda (https://www.anaconda.com/) - Anaconda is a Python and R distribution for use in computational science other.
- Mamba (https://mamba.readthedocs.io/en/latest/) - Mamba is a reimplementation of the conda package manager in C++ (Source Code (https://github.com/mamba-org/mamba)) BSD.
 
Visualization
- Visit (https://visit-dav.github.io/visit-website/) - VisIt - Visualization and Data Analysis for Mesh-based Scientific Data (Source Code (https://github.com/visit-dav/visit)) BSD-3.
- Paraview (https://www.paraview.org/) - ParaView is an open-source, multi-platform data analysis and visualization application based on Visualization Toolkit (VTK) (Source Code (https://github.com/Kitware/ParaView)) BSD-3.
 
Parallel Filesystems
- GPFS (https://www.ibm.com/docs/en/gpfs/4.1.0.4?topic=guide-introducing-general-parallel-file-system) - GPFS is a high-performance clustered file system software developed by IBM Proprietary.
- Quobyte (https://www.quobyte.com/storage-for/high-performance-computing-hpc?gclid=EAIaIQobChMI-fv1pfKG8wIV5x6tBh367Q5CEAAYASABEgJTgPD_BwE) - A high performance filesystem Proprietary.
- Ceph (https://ceph.io/en/) - Ceph is a distributed object, block, and file storage platform (Source Code (https://github.com/ceph/ceph)) other.
- Weka (https://www.weka.io/) - A file system designed for HPC Proprietary .
- Lustre/Exascaler (https://www.lustre.org/) - Lustre is an open-source, distributed parallel file system software platform designed for scalability, high-performance, and high-availability (Source Code
(https://git.whamcloud.com/fs/lustre-release.git)) other.
- BeeGFS (https://www.beegfs.io/c/) - BeeGFS is a hardware-independent POSIX parallel file system developed with a strong focus on performance and designed for ease of use, simple installation, and management Proprietary.
- OrangeFS (http://www.orangefs.org/) - OrangeFS is a next generation parallel file system for Linux clusters (Source Code (https://github.com/waltligon/orangefs)) other.
- MooseFS (https://moosefs.com/) - Moose File System is an Open-source, POSIX-compliant distributed file system developed by Core Technology (Source Code (https://github.com/moosefs/moosefs)) GPL-2.0.
 
Programming Languages
- Julia (https://julialang.org/) - Julia is a high-level, high-performance dynamic language for technical computing MIT.
- Futhark (https://futhark-lang.org/) - Futhark is a purely functional data-parallel programming language in the ML family isc.
- Chapel (https://chapel-lang.org/) - Chapel is a programming language designed for productive parallel computing at scale Apache-2.0.
 
Monitoring
Prometheus Based
- Slurm Exporter (https://github.com/treydock/prometheus-slurm-exporter) - Prometheus exporter for performance metrics from Slurm GPL-3.0.
- Slurm Exporter (https://github.com/ubccr/slurm-exporter) - Slurm Exporter for Prometheus using Rest API GPL-3.0.
- Infiniband Exporter (https://github.com/treydock/infiniband_exporter) - The InfiniBand exporter collects counters from InfiniBand switches and HCAs Apache-2.0.
- Cgroup Exporter (https://github.com/treydock/cgroup_exporter) - Produces metrics from cgroups Apache-2.0.
- Cgroup Exporter (https://github.com/phpHavok/cgroups_exporter) - A Prometheus exporter for cgroup-level metrics unknown.
- GPFS Exporter (https://github.com/treydock/gpfs_exporter) - The GPFS exporter collects metrics from the GPFS filesystem Apache-2.0.
- Lustre Exporter (https://github.com/GSI-HPC/lustre_exporter) - Prometheus exporter for use with the Lustre parallel filesystem GPL-3.0.
- DCGM Exporter (https://github.com/NVIDIA/dcgm-exporter) - NVIDIA GPU metrics exporter for Prometheus leveraging DCGM Apache-2.0.
 
Journals
- Journal of Super Computing (https://www.springer.com/journal/11227) - An International Journal of High-Performance Computer Design, Analysis, and Use.
 
Podcasts
- This week in HPC (https://www.intersect360.com/media/podcasts/) - Each week, Intersect360 Research CEO Addison Snell and HPCwire editor Tiffany Trader dissect the week's top HPC stories.
- Exascaler Project (https://www.exascaleproject.org/podcast/) - ECP's Let's Talk Exascale podcast goes behind the scenes to chat with some of the people who are bringing a capable and sustainable exascale computing ecosystem to fruition.
- @HPCpodcast (https://insidehpc.com/category/resources/hpc-podcast/) - Join Shahin Khan and Doug Black as they discuss Supercomputing technologies and the applications, markets, and policies that shape them.
 
 
Blogs
- HPCWire (https://www.hpcwire.com/) - Since 1987 covering the fastest computers in the world and the people who run them.
- InsideHPC (https://insidehpc.com/) - insideHPC is a global publication recognized for its comprehensive and insightful coverage of the HPC-AI community, linking vendors, end-users and HPC strategists.
- The Next Platform (https://www.nextplatform.com/category/hpc/) - Offers in-depth coverage of high-end computing at large enterprises, supercomputing centers, hyperscale data centers, and public clouds.
- The Register HPC (http://www.theregister.co.uk/data_centre/hpc/) - The Register is a leading and trusted global online enterprise technology news publication, reaching roughly 40 million readers worldwide.
- HPC at Dell (http://hpcatdell.com) - High-Performance Computing knowledge base articles from Dell.
 
Conferences
 
- Pearc (https://pearc.acm.org/) - Practice & Experience in Advanced Research Computing.
- Supercomputing (SC) (https://supercomputing.org/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
- Supercomputing International (ISC) (https://www.isc-hpc.com/) - The International Conference for High Performance Computing, Networking, Storage, and Analysis.
- CCGrid (https://dl.acm.org/conference/ccgrid) - IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing.
- IEEE-HPEC (https://ieee-hpec.org/) - IEEE High Performance Embedded Computing.
- Hot Chips (https://hotchips.org) - Semiconductor industry's leading conference on high-performance microprocessors and related circuits.
- Hot Interconnects (https://hoti.org) - IEEE conference on software architectures and implementations for interconnection networks of all scales.
- ESSA (https://sites.google.com/view/essa-2024/) - Workshop on Extreme-Scale Storage and Analysis.
- IEEE-IPDPS (https://www.ipdps.org/) - IEEE International Parallel & Distributed Processing Symposium.
- ESPM2 Workshop (http://nowlab.cse.ohio-state.edu/espm2/) - International Workshop on Extreme Scale Programming Models and Middleware.
- LCI Workshops (https://linuxclustersinstitute.org/workshops/) - The Linux Clusters Institute (LCI) is providing education and advanced technical training for the deployment and use of computing clusters to the high performance computing
community worldwide.
- HPC Carpentry (https://www.hpc-carpentry.org/) - Teaching basic skills for high-performance computing.
 
Websites
 
- Top500 (https://top500.org) - The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world.
 
User Groups
- MVAPICH (https://mug.mvapich.cse.ohio-state.edu/) - The MUG conference provides an open forum for all attendees (users, system administrators, researchers, engineers, and students) to discuss and share their knowledge on using MVAPICH
libraries.
- Slurm (https://slurm.schedmd.com/slurm_ug_agenda.html) - The annual Slurm user group meeting.
 
Contributing
 
Contributing guidelines can be found in contributing.md (contributing.md).
 
hpc Github: https://github.com/dstdev/awesome-hpc