TODO
- Overview and a smidge of History (go back to the very beginning of isolation concepts)
- Kernel namespaces and some mention of Unix/OSX/NT simulacrum
- runtimes and overlays over runtimes
- orchestraaaaaaation
{Building, Moving, *} Containers #
TODO
- containerfile/dockerfile
- podman, buildah, and the non-docker ecosystem
Deploying Container Example #
This example approach utilizes a set of principles:
- logical segmentation of the deployment/debugging into one place
- utilizing the podman ecosystem
- running applications as rootless-as-non-root
- leveraging systemd as an init/orchestration system, rather than the container runtime.
We will choose to deploy HedgeDoc as it requires a relational database, which leads us to utilize a pod. We’re covering the notional pattern for a deployment with a pod and two containers, where a full deployment would have other infrastructure in place (e.g. TLS termination via a reverse proxy).
We first create a user for running our application. Conventionally you would create a “system” user (e.g. UID/GID between 500
and 1000
) for identities of daemonized applications. However, systemd-journald will combine all log output for UID(s) under 1000
to the system log. A good follow on read would be Users, Groups, UIDs and GIDs on systemd systems as with rootless podman we’re going to get into UID/GID mapping quickly.
We will create a group, and user associated with that group:
[root@lab ~]# groupadd -g 2000 hedgedoc
[root@lab ~]# useradd -g 2000 -u 2000 -d /zfs/app/hedgedoc -s /sbin/nologin hedgedoc
- UID and GID both
2000
- home directory in
/zfs/app/hedgedoc
where we have prepped a zfs dataset to haveacltype
set toposix
- setting the shell for the user to
/sbin/nologin
[root@lab ~]# usermod --add-subuids 200000000-200065535 --add-subgids 200000000-200065535 hedgedoc
- adding subuid and subgid which will be used for user namespace mappings
[root@lab ~]# loginctl enable-linger hedgedoc
We use loginctl to set up lingering for a user:
If enabled for a specific user, a user manager is spawned for the user at boot and kept around after logouts. This allows users who are not logged in to run long-running services.
[root@lab ~]# loginctl user-status hedgedoc
We check the status of the user.
[root@lab ~]# runuser -l -s /bin/bash hedgedoc
Become the hedgedoc user, specify shell.
[hedgedoc@lab ~]$ vim ~/.bashrc
export XDG_RUNTIME_DIR=/run/user/$(id -u)
# append to the history file, don't overwrite it
shopt -s histappend
# for setting history length see HISTSIZE and HISTFILESIZE in bash(1)
HISTSIZE=
HISTFILESIZE=
HISTIGNORE='ls:ll:cd:pwd:bg:fg:history:exit'
export HISTTIMEFORMAT="[%F %T] "
- We set up
XDG_RUNTIME_DIR
so that we can usesystemctl --user
- We set up unlimited length history with some formatting
[hedgedoc@lab ~]$ source ~/.bashrc
[hedgedoc@lab ~]$ systemctl --user status
We expect to see output of the systemd user slice.
[hedgedoc@lab ~]$ podman pod create --name hedgedoc --publish 127.0.0.1:8388:3000
[hedgedoc@lab ~]$ mkdir database
[hedgedoc@lab ~]$ vim ~/.database.env
PGDATA=/var/lib/postgresql/data/pgdata
POSTGRES_PASSWORD=some-long-password-to-be-secure
POSTGRES_USER=hedgedoc
POSTGRES_DB=hedgedoc
[hedgedoc@lab ~]$ podman run --detach --pod=hedgedoc --name=database --label "io.containers.autoupdate=image" --env-file ~/database.env --volume ~/database/:/var/lib/postgresql/data:Z docker.io/postgres:13
- we run in
--detach
mode because we’re confident about this invocation, if we wanted to see the output interactively we would not pass--detach
. - we associate this container instance with a pod
- we add a label to the container of
io.containers.autoupdate=image
which is used by podman-auto-update - we volume bind with a
:Z
, which tells podman to relabel the objects. We use:Z
(capitalized) because this volume can be private to this container. If we wanted the volume to be shared with another container we would use:z
(lower-case). - We fix ourselves to a specific tag,
13
as migrating between major releases of postgres is something we’ll want to handle, but we want podman to update every time a new release of13
is pushed to the registry.
[hedgedoc@lab ~]$ mkdir uploads
[hedgedoc@lab ~]$ vim application.env
CMD_LOGLEVEL=verbose
CMD_DB_URL=postgres://hedgedoc:[email protected]:5432/hedgedoc?sslmode=disable
CMD_HOST=0.0.0.0
CMD_PORT=3000
CMD_DOMAIN=hedgedoc.name.tld
CMD_PROTOCOL_USESSL=true
CMD_URL_ADDPORT=false
CMD_SESSION_SECRET=another-long-password-to-be-secure
CMD_AUTO_VERSION_CHECK=false
CMD_DEFAULT_PERMISSION=limited
CMD_IMAGE_UPLOAD_TYPE=filesystem
[hedgedoc@lab ~]$ podman run --pod=hedgedoc --name=application --detach --label "io.containers.autoupdate=image" --env-file ~/application.env --volume ~/uploads/:/hedgedoc/public/uploads:Z quay.io/hedgedoc/hedgedoc:latest
- we use the
:latest
tag here to consume the updates of this application aggressively.
At this stage, validating that the application is running can be done. Once complete we should daemonize it’s existence:
[hedgedoc@lab ~]$ mkdir -p ~/.config/systemd/user/
[hedgedoc@lab ~]$ podman generate systemd --files --name --new --no-header hedgedoc
- we use
--name
because we want our container instances to have logical names for at-a-glance recognition - we use
--new
to ensure that we’re instancing new containers each time the service starts pod-hedgedoc.service
is created and uses theRequires=
andBefore=
directives to link withcontainer-application.service
andcontainer-database.service
- the
container-*
service definitions are usingType=notify
which enables more rich dependency interaction managed by systemd.
We then want to enable this set as a user service:
[hedgedoc@lab ~]$ systemctl --user enable --now pod-hedgedoc.service
This will re-instance the containers where now systemd is orchestrating podman rather than our invocations as a user. We can examine the state of the units with:
[hedgedoc@lab ~]$ systemctl --user status
We can examine the logging state of those units in aggregate with:
[hedgedoc@lab ~]$ journalctl --user -e
We can also ask podman to attempt to update these containers at a specific interval with the podman-auto-update systemd integration:
[hedgedoc@lab ~]$ systemctl --user enable --now podman-auto-update.timer
This will run podman-auto-update.service
around midnight, which will pull new container images and re-instance as necessary, while cleaning up stale images.
This deployment method is comfortable/useful for a couple reasons:
- It appears to follow a “golden path” that the podman developers are utilizing (e.g. Dan Walsh is cognizant of the home directory style of deployment).
- It utilizes the systemd user slice, which can be generalized to many workloads. For example you could have user units defined for different development environments on your workstation.
- It contains all concepts of an application deployment into a single directory with a shell history. If you need to debug/interact with this application deployment you can examine what you’ve done historically.
Enabling CPU, CPUSET, and I/O delegation #
By default, a non-root user can only get memory controller and pids controller to be delegated.
To allow delegation of other controllers such as cpu, cpuset, and io, run the following commands:
[root@lab ~]# mkdir -p /etc/systemd/system/[email protected]
[root@lab ~]# cat <<EOF | tee /etc/systemd/system/[email protected]/delegate.conf
[Service]
Delegate=cpu cpuset io memory pids
EOF
[root@lab ~]# systemctl daemon-reload
After changing the systemd configuration, you need to re-login or reboot the host. Rebooting the host is recommended. You will then be able to set limits as well as observe utilization with podman stats
.