• Overview and a smidge of History (go back to the very beginning of isolation concepts)
  • Kernel namespaces and some mention of Unix/OSX/NT simulacrum
  • runtimes and overlays over runtimes
  • orchestraaaaaaation

{Building, Moving, *} Containers #


  • containerfile/dockerfile
  • podman, buildah, and the non-docker ecosystem

Deploying Container Example #

This example approach utilizes a set of principles:

  • logical segmentation of the deployment/debugging into one place
  • utilizing the podman ecosystem
  • running applications as rootless-as-non-root
  • leveraging systemd as an init/orchestration system, rather than the container runtime.

We will choose to deploy HedgeDoc as it requires a relational database, which leads us to utilize a pod. We’re covering the notional pattern for a deployment with a pod and two containers, where a full deployment would have other infrastructure in place (e.g. TLS termination via a reverse proxy).

We first create a user for running our application. Conventionally you would create a “system” user (e.g. UID/GID between 500 and 1000) for identities of daemonized applications. However, systemd-journald will combine all log output for UID(s) under 1000 to the system log. A good follow on read would be Users, Groups, UIDs and GIDs on systemd systems as with rootless podman we’re going to get into UID/GID mapping quickly.

We will create a group, and user associated with that group:

[root@lab ~]# groupadd -g 2000 hedgedoc
[root@lab ~]# useradd -g 2000 -u 2000 -d /zfs/app/hedgedoc -s /sbin/nologin hedgedoc
  • UID and GID both 2000
  • home directory in /zfs/app/hedgedoc where we have prepped a zfs dataset to have acltype set to posix
  • setting the shell for the user to /sbin/nologin
[root@lab ~]# usermod --add-subuids 200000000-200065535 --add-subgids 200000000-200065535 hedgedoc
[root@lab ~]# loginctl enable-linger hedgedoc

We use loginctl to set up lingering for a user:

If enabled for a specific user, a user manager is spawned for the user at boot and kept around after logouts. This allows users who are not logged in to run long-running services.

[root@lab ~]# loginctl user-status hedgedoc

We check the status of the user.

[root@lab ~]# runuser -l -s /bin/bash hedgedoc

Become the hedgedoc user, specify shell.

[hedgedoc@lab ~]$ vim ~/.bashrc
export XDG_RUNTIME_DIR=/run/user/$(id -u)

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
export HISTTIMEFORMAT="[%F %T] "
  • We set up XDG_RUNTIME_DIR so that we can use systemctl --user
  • We set up unlimited length history with some formatting
[hedgedoc@lab ~]$ source ~/.bashrc
[hedgedoc@lab ~]$ systemctl --user status

We expect to see output of the systemd user slice.

[hedgedoc@lab ~]$ podman pod create --name hedgedoc --publish
[hedgedoc@lab ~]$ mkdir database
[hedgedoc@lab ~]$ vim ~/.database.env
[hedgedoc@lab ~]$ podman run --detach --pod=hedgedoc --name=database --label "io.containers.autoupdate=image" --env-file ~/database.env --volume ~/database/:/var/lib/postgresql/data:Z
  • we run in --detach mode because we’re confident about this invocation, if we wanted to see the output interactively we would not pass --detach.
  • we associate this container instance with a pod
  • we add a label to the container of io.containers.autoupdate=image which is used by podman-auto-update
  • we volume bind with a :Z, which tells podman to relabel the objects. We use :Z (capitalized) because this volume can be private to this container. If we wanted the volume to be shared with another container we would use :z (lower-case).
  • We fix ourselves to a specific tag, 13 as migrating between major releases of postgres is something we’ll want to handle, but we want podman to update every time a new release of 13 is pushed to the registry.
[hedgedoc@lab ~]$ mkdir uploads
[hedgedoc@lab ~]$ vim application.env
CMD_DB_URL=postgres://hedgedoc:[email protected]:5432/hedgedoc?sslmode=disable
[hedgedoc@lab ~]$ podman run --pod=hedgedoc --name=application --detach --label "io.containers.autoupdate=image" --env-file ~/application.env --volume ~/uploads/:/hedgedoc/public/uploads:Z
  • we use the :latest tag here to consume the updates of this application aggressively.

At this stage, validating that the application is running can be done. Once complete we should daemonize it’s existence:

[hedgedoc@lab ~]$ mkdir -p ~/.config/systemd/user/
[hedgedoc@lab ~]$ podman generate systemd --files --name --new --no-header hedgedoc 
  • we use --name because we want our container instances to have logical names for at-a-glance recognition
  • we use --new to ensure that we’re instancing new containers each time the service starts
  • pod-hedgedoc.service is created and uses the Requires= and Before= directives to link with container-application.service and container-database.service
  • the container-* service definitions are using Type=notify which enables more rich dependency interaction managed by systemd.

We then want to enable this set as a user service:

[hedgedoc@lab ~]$ systemctl --user enable --now pod-hedgedoc.service

This will re-instance the containers where now systemd is orchestrating podman rather than our invocations as a user. We can examine the state of the units with:

[hedgedoc@lab ~]$ systemctl --user status

We can examine the logging state of those units in aggregate with:

[hedgedoc@lab ~]$ journalctl --user -e

We can also ask podman to attempt to update these containers at a specific interval with the podman-auto-update systemd integration:

[hedgedoc@lab ~]$ systemctl --user enable --now podman-auto-update.timer

This will run podman-auto-update.service around midnight, which will pull new container images and re-instance as necessary, while cleaning up stale images.

This deployment method is comfortable/useful for a couple reasons:

  • It appears to follow a “golden path” that the podman developers are utilizing (e.g. Dan Walsh is cognizant of the home directory style of deployment).
  • It utilizes the systemd user slice, which can be generalized to many workloads. For example you could have user units defined for different development environments on your workstation.
  • It contains all concepts of an application deployment into a single directory with a shell history. If you need to debug/interact with this application deployment you can examine what you’ve done historically.

Enabling CPU, CPUSET, and I/O delegation #

By default, a non-root user can only get memory controller and pids controller to be delegated.

To allow delegation of other controllers such as cpu, cpuset, and io, run the following commands:

[root@lab ~]# mkdir -p /etc/systemd/system/[email protected]
[root@lab ~]# cat <<EOF | tee /etc/systemd/system/[email protected]/delegate.conf
Delegate=cpu cpuset io memory pids
[root@lab ~]# systemctl daemon-reload

After changing the systemd configuration, you need to re-login or reboot the host. Rebooting the host is recommended. You will then be able to set limits as well as observe utilization with podman stats.