Containers

TODO

Overview and a smidge of History (go back to the very beginning of isolation concepts)

Kernel namespaces and some mention of Unix/OSX/NT simulacrum

runtimes and overlays over runtimes

orchestraaaaaaation

{Building, Moving, *} Containers #

TODO

containerfile/dockerfile

podman, buildah, and the non-docker ecosystem

Deploying Container Example #

This example approach utilizes a set of principles:

logical segmentation of the deployment/debugging into one place
utilizing the podman ecosystem
running applications as rootless-as-non-root
leveraging systemd as an init/orchestration system, rather than the container runtime.

We will choose to deploy HedgeDoc as it requires a relational database, which leads us to utilize a pod. We’re covering the notional pattern for a deployment with a pod and two containers, where a full deployment would have other infrastructure in place (e.g. TLS termination via a reverse proxy).

We first create a user for running our application. Conventionally you would create a “system” user (e.g. UID/GID between 500 and 1000) for identities of daemonized applications. However, systemd-journald will combine all log output for UID(s) under 1000 to the system log. A good follow on read would be Users, Groups, UIDs and GIDs on systemd systems as with rootless podman we’re going to get into UID/GID mapping quickly.

We will create a group, and user associated with that group:

[root@lab ~]# groupadd -g 2000 hedgedoc

[root@lab ~]# useradd -g 2000 -u 2000 -d /zfs/app/hedgedoc -s /sbin/nologin hedgedoc

UID and GID both 2000
home directory in /zfs/app/hedgedoc where we have prepped a zfs dataset to have acltype set to posix
setting the shell for the user to /sbin/nologin

[root@lab ~]# usermod --add-subuids 200000000-200065535 --add-subgids 200000000-200065535 hedgedoc

adding subuid and subgid which will be used for user namespace mappings

[root@lab ~]# loginctl enable-linger hedgedoc

We use loginctl to set up lingering for a user:

If enabled for a specific user, a user manager is spawned for the user at boot and kept around after logouts. This allows users who are not logged in to run long-running services.

[root@lab ~]# loginctl user-status hedgedoc

We check the status of the user.

[root@lab ~]# runuser -l -s /bin/bash hedgedoc

Become the hedgedoc user, specify shell.

[hedgedoc@lab ~]$ vim ~/.bashrc

export XDG_RUNTIME_DIR=/run/user/$(id -u)

# append to the history file, don't overwrite it
shopt -s histappend

# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=
HISTFILESIZE=
HISTIGNORE='ls:ll:cd:pwd:bg:fg:history:exit'
export HISTTIMEFORMAT="[%F %T] "

We set up XDG_RUNTIME_DIR so that we can use systemctl --user
We set up unlimited length history with some formatting

[hedgedoc@lab ~]$ source ~/.bashrc

[hedgedoc@lab ~]$ systemctl --user status

We expect to see output of the systemd user slice.

[hedgedoc@lab ~]$ podman pod create --name hedgedoc --publish 127.0.0.1:8388:3000

[hedgedoc@lab ~]$ mkdir database

[hedgedoc@lab ~]$ vim ~/.database.env

PGDATA=/var/lib/postgresql/data/pgdata
POSTGRES_PASSWORD=some-long-password-to-be-secure
POSTGRES_USER=hedgedoc
POSTGRES_DB=hedgedoc

[hedgedoc@lab ~]$ podman run --detach --pod=hedgedoc --name=database --label "io.containers.autoupdate=image" --env-file ~/database.env --volume ~/database/:/var/lib/postgresql/data:Z docker.io/postgres:13

we run in --detach mode because we’re confident about this invocation, if we wanted to see the output interactively we would not pass --detach.
we associate this container instance with a pod
we add a label to the container of io.containers.autoupdate=image which is used by podman-auto-update
we volume bind with a :Z, which tells podman to relabel the objects. We use :Z (capitalized) because this volume can be private to this container. If we wanted the volume to be shared with another container we would use :z (lower-case).
We fix ourselves to a specific tag, 13 as migrating between major releases of postgres is something we’ll want to handle, but we want podman to update every time a new release of 13 is pushed to the registry.

[hedgedoc@lab ~]$ mkdir uploads

[hedgedoc@lab ~]$ vim application.env

CMD_LOGLEVEL=verbose
CMD_DB_URL=postgres://hedgedoc:[email protected]:5432/hedgedoc?sslmode=disable
CMD_HOST=0.0.0.0
CMD_PORT=3000
CMD_DOMAIN=hedgedoc.name.tld
CMD_PROTOCOL_USESSL=true
CMD_URL_ADDPORT=false
CMD_SESSION_SECRET=another-long-password-to-be-secure
CMD_AUTO_VERSION_CHECK=false
CMD_DEFAULT_PERMISSION=limited
CMD_IMAGE_UPLOAD_TYPE=filesystem

[hedgedoc@lab ~]$ podman run --pod=hedgedoc --name=application --detach --label "io.containers.autoupdate=image" --env-file ~/application.env --volume ~/uploads/:/hedgedoc/public/uploads:Z quay.io/hedgedoc/hedgedoc:latest

we use the :latest tag here to consume the updates of this application aggressively.

At this stage, validating that the application is running can be done. Once complete we should daemonize it’s existence:

[hedgedoc@lab ~]$ mkdir -p ~/.config/systemd/user/

[hedgedoc@lab ~]$ podman generate systemd --files --name --new --no-header hedgedoc

we use --name because we want our container instances to have logical names for at-a-glance recognition
we use --new to ensure that we’re instancing new containers each time the service starts
pod-hedgedoc.service is created and uses the Requires= and Before= directives to link with container-application.service and container-database.service
the container-* service definitions are using Type=notify which enables more rich dependency interaction managed by systemd.

We then want to enable this set as a user service:

[hedgedoc@lab ~]$ systemctl --user enable --now pod-hedgedoc.service

This will re-instance the containers where now systemd is orchestrating podman rather than our invocations as a user. We can examine the state of the units with:

[hedgedoc@lab ~]$ systemctl --user status

We can examine the logging state of those units in aggregate with:

[hedgedoc@lab ~]$ journalctl --user -e

We can also ask podman to attempt to update these containers at a specific interval with the podman-auto-update systemd integration:

[hedgedoc@lab ~]$ systemctl --user enable --now podman-auto-update.timer

This will run podman-auto-update.service around midnight, which will pull new container images and re-instance as necessary, while cleaning up stale images.

This deployment method is comfortable/useful for a couple reasons:

It appears to follow a “golden path” that the podman developers are utilizing (e.g. Dan Walsh is cognizant of the home directory style of deployment).
It utilizes the systemd user slice, which can be generalized to many workloads. For example you could have user units defined for different development environments on your workstation.
It contains all concepts of an application deployment into a single directory with a shell history. If you need to debug/interact with this application deployment you can examine what you’ve done historically.

Enabling CPU, CPUSET, and I/O delegation #

By default, a non-root user can only get memory controller and pids controller to be delegated.

To allow delegation of other controllers such as cpu, cpuset, and io, run the following commands:

[root@lab ~]# mkdir -p /etc/systemd/system/[email protected]

[root@lab ~]# cat <<EOF | tee /etc/systemd/system/[email protected]/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
EOF

[root@lab ~]# systemctl daemon-reload

After changing the systemd configuration, you need to re-login or reboot the host. Rebooting the host is recommended. You will then be able to set limits as well as observe utilization with podman stats.