Awesome Linux Containers


Table of Contents
About the Author
Hello, everyone! My name is Filipp, and I have been working with high
load distribution systems and services, security, monitoring, continuous
deployment and release management (DevOps domain) since 2012.
One of my passions is developing DevOps solutions and contributing to
the open-source community. By sharing my knowledge and experiences, I
strive to save time for both myself and others while fostering a culture
of collaboration and learning.
I had to leave my home country, Belarus, due to my participation in
protests
against the oppressive regime of dictator Lukashenko, who maintains
a close affiliation with Putin. Since then, I’m trying to build my life
from zero in other countries.
If you are seeking a skilled DevOps lead or architect to enhance your
project, I invite you to connect with me on LinkedIn
or explore my valuable contributions on GitHub. Let’s collaborate and
create some cool solutions together :)
Foundations
- OPEN CONTAINER
INITIATIVE
The Open Container Initiative is a lightweight, open governance
structure, to be formed under the auspices of the Linux Foundation, for
the express purpose of creating open industry standards around container
formats and runtime.
- Cloud Native Computing
Foundation
The Cloud Native Computing Foundation will create and drive the adoption
of a new set of common container technologies informed by technical
merit and end user value, and inspired by Internet-scale computing.
- Cloud Foundry
Foundation
The Cloud is our foundry.
Specifications
- Open Container
Specifications
This project is where the Open Container Initiative Specifications are
written. This is a work in progress.
- App
Container basics
App Container (appc) is an open specification that defines several
aspects of how to run applications in containers: an image format,
runtime environment, and discovery protocol.
- Systemd
Container Interface
Systemd is a suite of basic building blocks for a Linux system. It
provides a system and service manager that runs as PID 1 and starts the
rest of the system. If you write a container solution, please consider
supporting the following interfaces.
- Nulecule
Specification
Nulecule defines a pattern and model for packaging complex
multi-container applications and services, referencing all their
dependencies, including orchestration metadata in a container image for
building, deploying, monitoring, and active management.
- Oracle
microcontainer manifesto
This is not a new container format, but simply a specific method for
constructing a container that allows for better security and
stability.
- Cloud Native
Application Bundle Specification
A package format specification that describes a technology for bundling,
installing, and managing distributed applications, that are by design,
cloud agnostic.
Clouds
- Amazon EC2 Container
Service
Container management service that supports Docker containers and allows
you to easily run applications on a managed cluster of Amazon EC2
instances.
- Google Cloud
Platform
Run Docker containers on Google Cloud Platform, powered by Kubernetes.
Google Container Engine actively schedules your containers, based on
declared needs, on a managed cluster of virtual machines.
- Jelastic
Unlimited PaaS and Container-Based IaaS in a Joint Cloud Solution for
DevOps.
- Joyent
High-Performance Container-Native Infrastructure for Today’s Demanding
Real-Time Web and Mobile Applications.
- Kubernetes
Manage a cluster of Linux containers as a single system to accelerate
Dev and simplify Ops.
- Mesosphere
The Mesosphere Datacenter Operating System (DCOS) is a new kind of
operating system that spans all of the machines in your datacenter or
cloud. It provides a highly elastic, and highly scalable way of
deploying applications, services and big data infrastructure on shared
resources.
- OpenShift Origin
OpenShift Origin is a distribution of Kubernetes optimized for continuous
application development and multi-tenant deployment. Origin adds
developer and operations-centric tools on top of Kubernetes to enable
rapid application development, easy deployment and scaling, and
long-term lifecycle maintenance for small and large teams.
- Warden
Manages isolated, ephemeral, and resource controlled environments. Part
of Cloud Foundry - the open platform as a service project.
- Virtuozzo
A platform, built on Virtuozzo containers, that can be easily run on top
of any bare-metal or virtual servers in any public or private cloud, to
automate, optimize, and accelerate internal IT and development
processes.
- Rancher
Rancher is a complete, open source platform for deploying and managing
containers in production. It includes commercially-supported
distributions of Kubernetes, Mesos, and Docker Swarm, making it easy to
run containerized applications on any infrastructure.
- Docker
Swarm
Docker Swarm is native clustering for Docker.
- Azure
Container Service
Azure Container Service optimizes the configuration of popular open
source tools and technologies specifically for Azure.
- CIAO
Cloud Integrated Advanced Orchestrator for Intel Clear Linux OS.
- Alibaba
Cloud Container Service
Container Service is a high-performance and scalable container
application management service that enables you to use Docker and
Kubernetes to manage the lifecycle of containerized applications.
- Nomad
HashiCorp Nomad is a single binary that schedules applications and
services on Linux, Windows, and Mac. It is an open source scheduler that
uses a declarative job file for scheduling virtualized, containerized,
and standalone applications.
Operating Systems
- CoreOs
A lightweight Linux operating system designed for clustered deployments
providing automation, security, and scalability for your most critical
applications.
- RancherOS
RancherOS is a tiny Linux distro that runs the entire OS as Docker
containers.
- Project Atomic
Project Atomic provides the best platform for your Linux Docker
Kubernetes (LDK) application stack. Use immutable infrastructure to
deploy and scale your containerized applications.
- Snappy Ubuntu
Core
Ubuntu Core is the perfect system for large-scale cloud container
deployments, bringing transactional updates to the world’s favourite
container platform.
- ResinOS
A host OS tailored for containers, designed for reliability, proven in
production.
- Photon
Photon OS is a minimal Linux container host designed to have a small
footprint and tuned for VMware platforms. Photon is intended to invite
collaboration around running containerized and Linux applications in a
virtualized environment.
- Clear Linux Project
The Clear Linux Project for Intel Architecture is a distribution built
for various Cloud use cases.
- CargOS
CargOS is a new lightweight, open source, platform for Docker hosts that
aims for speed, manageability and security. Releases are built for
64-bit Intel/AMD CPUs.
- OSv
OSv is the open source operating system designed for the cloud. Built
from the ground up for effortless deployment and management, with
superior performance.
- HypriotOS
Minimal Debian-based operating systems that is optimized to run Docker.
It made it dead easy use Docker on any Raspberry Pi.
- MCL
MCL (Minimal Container Linux) is a from scratch minimal Linux
OS designed specifically to run containers. It has a small footprint of
~50MB and boots within seconds. It is currently optimized to run
Docker.
Hypervisors
- Docker
An open platform for distributed applications for developers and
sysadmins. Standard de facto.
- LXD
Daemon based on liblxc offering a REST API to manage LXC
containers.
- OpenVZ
OpenVZ is container-based virtualization for Linux. OpenVZ creates
multiple secure, isolated Linux containers (otherwise known as VEs or
VPSs) on a single physical server enabling better server utilization and
ensuring that applications do not conflict.
- MultiDocker
Create a secure multi-user Docker machine, where each user is segregated
into an indepentent container.
- Lithos
Lithos is a process supervisor and containerizer for running services.
It is not intended to be system init, but rather tries to be a base tool
to build container orchestration.
- containerd
A container runtime which can manage a complete container lifecycle -
from image transfer/storage to container execution, supervision and
networking.
Containers
- runc
runc is a CLI tool for spawning and running containers according to the
OCS specification.
- Bocker
Docker implemented in around 100 lines of bash.
- Rocket
rkt (pronounced “rock-it”) is a CLI for running app containers on Linux.
rkt is designed to be composable, secure, and fast. Based on AppC
specification.
- LXC
LXC is the well known set of tools, templates, library and language
bindings. It’s pretty low level, very flexible and covers just about
every containment feature supported by the upstream kernel.
- Vagga
Vagga is a fully-userspace container engine inspired by Vagrant and
Docker, specialized for development environments.
- libct
Libct is a containers management library which provides convenient API
for frontend programs to rule a container during its whole
lifetime.
- libvirt
A big toolkit to interact with the virtualization capabilities of recent
versions of Linux (and other OSes).
- systemd-nspawn
Spawn a namespace container for debugging, testing and building. Part of
systemd.
- porto
The main goal of Porto is to create a convenient, reliable interface
over several Linux kernel mechanism such as cgroups, namespaces, mounts,
networking etc.
- udocker
A basic user tool to execute simple containers in batch or interactive
systems without root privileges.
- Let Me Contain That For
You
LMCTFY is the open source version of Google’s container stack, which
provides Linux application containers.
- cc-oci-runtime
Intel Clear Linux OCI (Open Containers Initiative) compatible
runtime.
- railcar
Railcar is a rust implementation of the opencontainers initiative’s
runtime spec. It is similar to the reference implementation runc, but it
is implemented completely in rust for memory safety without needing the
overhead of a garbage collector or multiple threads.
- Kata Containers
Kata Containers is a new open source project building extremely
lightweight virtual machines that seamlessly plug into the containers
ecosystem.
- plash
Lightweight, rootless containers.
- runv
Hypervisor-based (KVM, Xen, QEMU) Runtime for OCI. Security by
isolation.
- podman
Full management of container lifecycle.
- firecracker
Firecracker runs workloads in lightweight virtual machines, called
microVMs, which combine the security and isolation properties provided
by hardware virtualization technology with the speed and flexibility of
containers.
- sysbox
Sysbox is a “runc” that creates secure (rootless) containers / pods that
run not just microservices, but most workloads that run in VMs (e.g.,
systemd, Docker, and Kubernetes), seamlessly.
- youki
A container runtime written in Rust.
- footloose
Containers that look like Virtual Machines.
Sandboxes
- Firejail
Firejail is a SUID sandbox program that reduces the risk of security
breaches by restricting the running environment of untrusted
applications using Linux namespaces, seccomp-bpf and Linux
capabilities.
- NsJail
NsJail is a process isolation tool for Linux. It makes use of the
namespacing, resource control, and seccomp-bpf syscall filter subsystems
of the Linux kernel.
- Subuser
Securing the Linux desktop with Docker.
- Snappy
Snappy Ubuntu Core is a new rendition of Ubuntu with transactional
updates - a minimal server image with the same libraries as today’s
Ubuntu, but applications are provided through a simpler mechanism.
- xdg-app
xdg-app is a system for building, distributing and running sandboxed
desktop applications on Linux.
- Bubblewrap
Run applications in a sandbox using Linux namespaces without root
privileges, with user namespacing provided via setuid binary.
- singularity
Universal application containers for Linux.
- Lxroot
Lxroot is a flexible, lightweight, and safer alternative to chroot
and/or Docker for non-root users on Linux.
Partial Access
- nsenter
Run program with namespaces of other processes. Part of the
util-linux.
- ip-netns
Process network namespace management. Part of the iproute2.
- unshare
Run program with some namespaces unshared from parent. Part of the
util-linux.
- python-nsenter
This Python package allows entering Linux kernel namespaces (mount, IPC,
net, PID, user and UTS) by doing the “setns” syscall.
- butter
Python library to interface to low level linux features (inotify,
fanotify, timerfd, signalfd, eventfd, containers) with asyncio
support.
- pyspaces
Works with Linux namespaces through glibc with pure python.
- CRIU
Checkpoint/Restore In Userspace is a software tool for Linux operating
system. Using this tool, you can freeze a running application (or part
of it) and checkpoint it to a hard drive as a collection of files. CRIU
integrated with Docker and LXC to implement Live migration of
containers.
- Moby
A “Lego set” of toolkit components for containers software created by
Docker.
Filesystem
- container-diff
A tool for analyzing and comparing container images.
- buildah
A tool which facilitates building OCI container images.
- skopeo
Work with remote images registries - retrieving information, images,
signing content.
- img
Standalone, daemon-less, unprivileged Dockerfile and OCI compatible
container image builder.
- dgr
Command line utility designed to build and to configure at runtime App
Containers Images (ACI) and App Container Pods (POD) based on convention
over configuration.
- Whaler
Whaler is designed to reverse engineer a Docker Image into the
Dockerfile that created it.
- dive
A tool for exploring each layer in a docker image.
- go-containerregistry
Go library and CLIs for working with container registries.
- kaniko
Kaniko is a tool to build container images from a Dockerfile, inside a
container or Kubernetes cluster.
- umoci
Umoci is a tool to manipulate OCI container images, and can be used as a
rudimentary build tool.
- docker
pushrm
A Docker CLI plugin that that lets you push the README.md file from the
current directory to a container registry. Supports Docker Hub, Quay and
Harbor.
Dashboard
- LXC-Web-Panel
Web panel for LXC on Ubuntu.
- Liman
Basic docker monitoring web application.
- portainer
Lightweight Docker management UI.
- swarmpit
Lightweight mobile-friendly Docker Swarm management UI.
Best practices
- The Twelve-Factor App
The twelve-factor app is a methodology for building
software-as-a-service apps.
- Container
Best Practices
A collaborative project to document container-based application
architecture, creation and management from Project Atomic.
Security
- Docker
bench security
The Docker Bench for Security is a script that checks for dozens of
common best-practices around deploying Docker containers in
production.
- CoreOS
Clair
Open Source Vulnerability Analysis for your Containers.
- bane
Custom AppArmor profile generator for docker containers.
- OpenSCAP
The OpenSCAP ecosystem provides multiple tools to assist administrators
and auditors with assessment, measurement and enforcement of security
baselines.
- drydock
Drydock provides a flexible way of assessing the security of your Docker
daemon configuration and containers using editable audit templates.
- trireme
Security by segmentation for Docker and Kubernetes.
- goss
Quick and Easy server testing/validation.
- sockguard
A proxy for docker.sock that enforces access control and isolated
privileges.
- gvisor
gVisor is a user-space kernel, written in Go, that implements a
substantial portion of the Linux system surface. It includes an Open
Container Initiative (OCI) runtime called runsc that provides an
isolation boundary between the application and the host kernel. The
runsc runtime integrates with Docker and Kubernetes, making it simple to
run sandboxed containers.
- docker-explorer
A tool to help forensicate offline docker acquisitions.
- oci-seccomp-bpf-hook
OCI hook to trace syscalls and generate a seccomp profile.
Links
Levels of security problems
- regular application
- always untrusted -> know it
- suid bit -> mount with nosuid
- limit available syscall -> seccomp-bpf, grsec
- leak to another container (bug in namespaces, filesystem) -> user
namespaces with different uid inside for each container: 1000 in
container - 14293 and 15398 outside; security modules like selinux or
apparmor
- system services like cron, ssh
- run as root -> isolate via bastion host or vm
- using /dev -> “devices” control group
The following device nodes are created in the container by
default.
The Docker images are also mounted with nodev, which means that even if
a device node was pre-created in the image, it could not be used by
processes within the container to talk to the kernel.
/dev/console,/dev/null,/dev/zero,/dev/full,/dev/tty*,/dev/urandom,/dev/random,/dev/fuse
- root calls -> capabilities (cap_sys_admin warning!)
Here is the current list of capabilities that Docker uses: chown,
dac_override, fowner, kill, setgid, setuid, setpcap, net_bind_service,
net_raw, sys_chroot, mknod, setfcap, and audit_write.
Docker removes several of these capabilities including the
following:
CAP_SETPCAP Modify process capabilities
CAP_SYS_MODULE Insert/Remove kernel modules
CAP_SYS_RAWIO Modify Kernel Memory
CAP_SYS_PACCT Configure process accounting
CAP_SYS_NICE Modify Priority of processes
CAP_SYS_RESOURCE Override Resource Limits
CAP_SYS_TIME Modify the system clock
CAP_SYS_TTY_CONFIG Configure tty devices
CAP_AUDIT_WRITE Write the audit log
CAP_AUDIT_CONTROL Configure Audit Subsystem
CAP_MAC_OVERRIDE Ignore Kernel MAC Policy
CAP_MAC_ADMIN Configure MAC Configuration
CAP_SYSLOG Modify Kernel printk behavior
CAP_NET_ADMIN Configure the network
CAP_SYS_ADMIN Catch all
uses /proc, /sys -> remount ro, drop cap_sys_admin; security modules
like selinux or apparmor; some part of this fs are
“namespace-aware”
Docker mounts these file systems into the container as “read-only” mount
points.
. /sys
. /proc/sys
. /proc/sysrq-trigger
. /proc/irq
. /proc/bus
Copy-on-write file systems
Docker uses copy-on-write file systems. This means containers can use
the same file system image as the base for the container. When a
container writes content to the image, it gets written to a container
specific file system. This prevents one container from seeing the
changes of another container even if they wrote to the same file system
image. Just as important, one container can not change the image content
to effect the processes in another container.
- uid 0 -> user namespaces, uid 0 mappet to random uid outside
- system services like devices, network, filesystems
- root -> more of services should work on host outside; isolate
sensitive functions, run as non-privileged context
- full privileges -> isolate on kernel level
- kernel drivers, network stack, security policies
- absolute privileges -> run it in separate vm
- general like immutable infrastructure
- container is ro
- write to small separate rw nosuid part
src
src
Technologies for security
Things are better. For example, most modern container technologies
can make use of Linux’s built-in security tools such as:
AppArmor, SELinux and Seccomp
policies;
Grsecurity;
Control
groups (cgroups);
Kernel
namespaces
src
Sure, you’re deploying seccomp, but you can’t use selinux inside your
container, because the policy isn’t per-namespace (?? lxc uses apparmore
for each container…)
sVirt - selinux for
kvm
src
Major kernel subsystems are not namespaced like:
- SELinux
- Cgroups
- file systems under /sys
- /proc/sys, /proc/sysrq-trigger, /proc/irq, /proc/bus
Devices are not namespaced:
- /dev/mem
- /dev/sd* file system devices
- kernel modules
If you can communicate or attack one of these as a privileged
process, you can own the system.
src
- sysdig-container-ecosystem
The ecosystem of awesome new technologies emerging around containers and
microservices can be a little overwhelming, to say the least. We thought
we might be able to help: welcome to the Container Ecosystem
Project.
- doger.io
This page is an attempt to document the ins and outs of containers on
Linux. This is not just restricted to programmers looking to implement
containers or use container like features in their own code but also
Sysadmins and Users who want to get more of a handle on how containers
work ‘under the hood’.