Archlinux Metal to Server With ZFS

February 7, 2017

Linux

In this I’ll outline a simplified install procedture that will allow you to go from metal to a machine that is managed over ssh and has ZFS. Arch has several principles, the one your should be the most aware of is versatility. Versatility is user choice to build and use systems how they want. In following this guide you’re allowing me to make a significant amount of choices for you. For many folks a first time through the ArchLinux wiki on install procedure is too much to feel like you’re comfortable to make headway. So a trade off is following guides like this to get you’re foot in the door until you feel comfortable swapping out things. You should maintain your own notes on the install procedure, as well as you should consider contributing directly to the ArchLinux wiki or upstream projects documentation. If you find anything wrong with this guide or would like to share improvements please don’t hesitate to contact me!

I’m assuming you are going to administer this server via the root account, however you’re going to have your own user account to ssh into and sudo from. I’m also assuming you want to implement full disk encryption because servers are places you typically store data and there is no reason for the physical removal of a system to result in ex-filtration of the stored data.

Our goals will be:

install arch from the actual arch media (and not some sissy downstream distribution that tries to make life easier for you)
consume the “new breed” of systemd based initramfs
set up networking and control in a headless manner using ssh
set up ZFS for storing anything important
set up systemd-nspawn containers for compartmentalizing applications

Before #

At this time you should have acquired a copy Arch as an ISO from the mirrors and gotten it onto a bootable disk. From a Linux environment, where dd is available, you should do something like this:

`dd if=<archiso.iso> of=/dev/``

Before you boot into the Arch live environment you’ll want to get into the BIOS and change many settings, you might want to ask other associates about this but you should ensure at minimum to have the following generally set:

UEFI boot enabled, delete all other boot records as we’re about to make a new one
secure boot disabled (ArchLinux doesn’t have signed bootloader/kernel, however you could roll your own if you ever got brave enough)
PXE boot disabled (we don’t want to boot from a network target)

You might also want to update your BIOS before proceeding. Many chip manufactures are not getting hit with problems that are only correctable via BIOS or microcode updates.

From there get into your one time boot menu and select the bootable disk you’ve prepared.

You will now be in the Arch live environment, and are ready to start stage zero. If at any time you are unable to use network, you’ll need to ensure you have a cable plugged in and run systemctl start dhcpcd.service to request an address from the network.

Base Bootable Install #

Find which disk you’re going to work on via fdisk:

lsblk

which will give you an output of the disks in your environment, as well as the partition tables and partitions already on those disks. Figure out which disk you want to use and remember it, it will have a naming convention like /dev/sda or /dev/nvmen1.

Partition your disk, we’ll use GPT for our partition table via a program called cgdisk:

cgdisk /dev/disk

From here you need to delete all partitions, and create new partitions to match something like this:

partition 1, sized 1GiB, partition type efi (hex ef00)
partition 2, sized 100%, partition type linux (hex 8300)

Partitions show up under /dev/ as a number appended to your disk name, e.g. /dev/sda1 and /dev/sda2 would indicate the first and second partition of the /dev/sda disk respectively. We’ll refer to partitio 1 as disk.1 and partition 2 as disk.2 from hereforth.

We will then lay a FAT32 filesystem in on our first partition, which is going to be our boot partition:

mkfs.vfat -F32 /dev/disk.1

We will then lay a LUKS encryption container in on our second partition, which is going to be a volume group for our root and swap:

cryptsetup --cipher aes-xts-plain64 --key-size 512 --hash sha512 -y --use-random luksFormat /dev/disk.2

LUKS allows you to add up to eight passwords. Just make sure you remember the password you set, if you want to change you can add a password in the future.

We will decrypt and open our LUKS container, mapping it onto a device named luks:

cryptsetup luksOpen /dev/disk.2 luks

This device shows up under /dev/mapper. Formerly we used LVM before laying in our root filesystem, but machines have so much memory now days as well as I never use hibernate. So we’re going to put btrfs right on top of the LUKS container:

mkfs.btrfs /dev/mapper/luks

Note that just by using btrfs you don’t get all of the fancy advantages of a check-summing file-system, to actually get the rebuild benefits you’d need to have at least one parity device, which means you’d have to do a mirror. We don’t typically set up btrfs in a mirror for the root OS. The idea behind our builds are that they are quickly reproducible in the event of a failure to an OS drive, rather than be tolerant to that failure through adding significant install complexity.

We will now mount the filesystems. First we mount root, then we mount our boot partition inside of root. We will pass some arguments to the mount procedure to specifically enable both compression and trim via discard:

mount -o compress=lzo,ssd,autodefrag,discard /dev/mapper/luks /mnt

mkdir /mnt/boot

mount /dev/disk.1 /mnt/boot

We’re now ready to do the most unique step, a pacstrap, where we pass in some minimal packages that we’ll need as we continue the installation procedure. A key assumption here is that you’ll need base-devel, which will increase your install from base by adding developmental packages. We plan to use this with the AUR later. If you want to use our epiphyte mirror add Server = https://mirror.epiphyte.network/archlinux/$repo/os/$arch before [core], [extra], and [community] in your /etc/pacman.conf.

pacstrap /mnt base base-devel btrfs-progs vim

We then use genfstab to set up the fstab. We will use something called redirection via the >> directive below. This redirects the output of genfstab and appends it to the file passed in the next argument:

genfstab -pU /mnt >> /mnt/etc/fstab

We now arch-chroot into our newly installed system:

arch-chroot /mnt /bin/bash

Set our timezone:

rm -f /etc/localtime

ln -s /usr/share/zoneinfo/<zone_info> /etc/localtime

hwclock --systohc --utc

Set our hostname:

echo "hal9000" > /etc/hostname

Set and then generate locales:

vim /etc/locale.gen

Find the line that is en_US.UTF-8 UTF-8 and uncomment it, save, then generate the locales

locale-gen

Set the system locale:

echo "LANG=en_US.UTF-8" > /etc/locale.conf

Set the system keymap:

echo "KEYMAP=us" > /etc/vconsole.conf

Set the root password:

passwd

Configure the mkinitcpio for systemd based initramfs:

vim /etc/mkinitcpio.conf

HOOKS=(base systemd autodetect modconf block keyboard sd-vconsole sd-encrypt filesystems fsck)

Generate the initramfs:

mkinitcpio -p linux

Set up the systemd-boot:

bootctl install

Now we’re going to set up several options for our boot, for instance we’ll turn off some things like md support, you’ll need to ensure that all options are on the same line:

vim /boot/loader/entries/arch.conf
---
title ArchLinux
linux /vmlinuz-linux
initrd /initramfs-linux.img
options rd.md=0 rd.dm=0 noresume hibernate=noresume rd.luks.uuid=<LUKS_UUID> rd.luks.options=discard,tries=0,timeout=0 root=UUID=<VG-ROOT_UUID> rootflags=x-systemd.device-timeout=0

Assuming your system has an nvme drive you’ve been working on it you’ll be looking for the two different UUID entries here:

You’re going to need to know that:

rd.luks.uuid : your luks container : /dev/disk.2
root=UUID= : your root filesystem : /dev/mapper/luks

You can also examine this via lsblk:

lsblk -f

NAME            FSTYPE          LABEL UUID      MOUNTPOINT
nvme0n1
├─nvme0n1p1     vfat            <BOOT_UUID>     /boot
└─nvme0n1p2     crypto_LUKS     <LUKS_UUID>
  └─luks        btrfs           <ROOT_UUIT      /

You can find out the partition UUID by running blkid on the device, which following our convention would be blkid /dev/disk.2. If you don’t want to type it out, you should consider writing the /boot/loader/entries/arch.conf until you get to ...UUID= and use redirection like we have before:

blkid -o value -s UUID /dev/disk.2 >> /boot/loader/entries/arch.conf

Now when you go back into /boot/loader/entries/arch.conf with vim you’ll be able to whittle down to just the UUID without having to transcribe to something as archaic as paper.

It is best to also create a loader entry for your fallback initramfs:

cp /boot/loader/entries/arch.conf /boot/loader/entries/fallback.conf

Change ArchLinux to ArchLinuxFallback, change /initramfs-linux.img to /initramfs-linux-fallback.img

Then ensure that you have the ability to select the fallback:

vim /boot/loader/loader.conf

timeout 1
default arch

Now we close up shop and reboot into the installed system:

exit

umount -R /mnt

reboot

At this time you should be able to reboot and get back to your root shell after typing in your encryption password. If you don’t get back in, you need to re-examine your steps and ensure you can pass this stage.

Networking and Core Services #

We will use systemd-networkd to set up our networking. There are interesting and complex ways that you can configure systemd-networkd, seek documentation for advanced use cases. In this case we’ll assume a single adapter and dhcp via ipv4. Figure out what the adapter is called on your system by running ip addr, we will be referencing the adapter by name. In the case of this example the adapter is named enp1s0.

vim /etc/systemd/network/enp1s0.network

[Match]
Name=enp1s0

[Network]
DHCP=ipv4

Now enable and start systemd-networkd:

systemctl enable --now systemd-networkd.service

Normally I’ve consumed almost every possible service from systemd that I can, the exception to this has been systemd-resolved. It’s caused a ton of problems that are hard to diagnose over the last couple years. Periodically I re-evaluate, but as of 2018Q3 there are still blocking issues for use (primarily with systemd-nspawn containers).

So in setting up your resolver you should manually set it to a resolver of choice, for example if your running a DNS {authoritative, recursor} at 172.16.0.1 you:

vim /etc/resolv.conf
---
nameserver 172.16.0.1

Now test to see if you can get routing and network to function:

ping 1.1.1.1

If you’re getting responses then your addressing and routing are working properly. Now test to see if rDNS is working:

ping epiphyte.network

If you’re getting responses then your DNS resolution is working properly.

Set up network time:

systemctl enable systemd-timesyncd.service

Now we’ll install the basic useful packages and daemons:

pacman -S htop sudo openssh

Create a user account for yourself so you stop using root, I’ll use my initials:

useradd -m -s /bin/bash agd

passwd agd

Edit the sudoers file via visudo so that we can give the wheel group access to sudo privileges.

visudo

Uncomment %wheel ALL=(ALL) ALL, save, then add yourself to the wheel group:

usermod -a -G wheel agd

Change to your user and test to see if sudo is working:

su agd

sudo su

Now we’ll get the ssh daemon running so you can control this system via your remote system. We must first start the daemon and copy over our keys to the server from our remote system, then we’ll harden the daemon.

systemctl enable --now sshd.service

From your remote system, assuming you have an ssh key pair generated (should be using Curve25519), use the ssh-copy-id command to copy your pubkey to the server and set up the proper file/folder permissions. You can do this by hand, however ssh-copy-id is a very nice way to ensure you don’t muck up permissions:

ssh-copy-id -i ~/.ssh/agd.pub address.of.your.server

Ensure you can log in via key. Now lets harden the ssh daemon, you’ll want to delete everything from the /etc/ssh/sshd_config and paste in the following:

vim /etc/ssh/sshd_config
---
# from https://wiki.mozilla.org/Security/Guidelines/OpenSSH
Port <port>
Protocol 2
HostKey /etc/ssh/ssh_host_ed25519_key
HostKey /etc/ssh/ssh_host_rsa_key
KexAlgorithms [email protected],ecdh-sha2-nistp521,ecdh-sha2-nistp384,ecdh-sha2-nistp256,diffie-hellman-group-exchange-sha256
Ciphers [email protected],[email protected],[email protected],aes256-ctr,aes192-ctr,aes128-ctr
MACs [email protected],[email protected],[email protected],hmac-sha2-512,hmac-sha2-256,[email protected]
AuthenticationMethods publickey
LogLevel VERBOSE
Subsystem sftp  /usr/lib/ssh/sftp-server -f AUTHPRIV -l INFO
PermitRootLogin No

Notice we set an alternative port for listening. Change this to whatever you prefer, however you’ll need to remember it because our next steps are to bring up a firewall. Restart the ssh daemon…

systemctl restart sshd.service

…and then attempt logging into your server again via the remote host.

Now we’re gonna wanna bring up a firewall:

vim /etc/iptables/iptables.rules

*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:TCP - [0:0]
:UDP - [0:0]

-A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m conntrack --ctstate INVALID -j DROP
-A INPUT -p icmp -m icmp --icmp-type 8 -m conntrack --ctstate NEW -j ACCEPT
-A INPUT -p udp -m conntrack --ctstate NEW -j UDP
-A INPUT -p tcp --tcp-flags FIN,SYN,RST,ACK SYN -m conntrack --ctstate NEW -j TCP
-A INPUT -p udp -j REJECT --reject-with icmp-port-unreachable
-A INPUT -p tcp -j REJECT --reject-with tcp-reset
-A INPUT -j REJECT --reject-with icmp-proto-unreachable

-A TCP -p tcp --dport <port> -j ACCEPT

COMMIT

Enable and start the daemon:

systemctl enable --now iptables.service

Now you have a server that is network accessible, controllable over a hardened ssh daemon, and is blocking everything but ssh. You can get a lot more complex with all of these steps, but these are the absolute basics. Reboot to ensure all your settings held and that you didn’t forget to enable a necessary daemon.

Consider limiting exposure via ssh ProxyJump.

Storage via ZFS #

ZFS is an incredibly powerful next generation filesystems which was taken over by evil Oracle during the Sun Microsystems acquisition, but is such a phenomena that an open movement started behind it. ZFS on Linux is significantly supported by LLNL, and we on Arch harness the packaging of the ArchZFS project. The ArchZFS project is really a wonderful community resource, there have been a lot of people using ArchLinux as a testbed to register issues upstream to ZoL.

First lets snag the signing keys from the ArchZFS project and then locally sign them so that we can install packages from their repository:

pacman-key -r F75D9D76
pacman-key --lsign-key F75D9D76

Now you have a choice, you can use the ArchZFS repository directly, or you can use our "ctrl" repository that keeps the Linux package in sync with the ArchZFS release. Without using epiphyte ctrl repository you may fall into a window where the current version of the zfs-linux packages are not compatable with the current kernel. You actually have one more option, to install zfs-linux-git, but that should only be for testing.

If you’re going the strait ArchZFS repo route, add this at the end of your /etc/pacman.conf:

vim /etc/pacman.conf
---
[archzfs]
Server = http://archzfs.com/$repo/x86_64

If you’re going to use epiphyte ctrl, add this before the [core] directive in /etc/pacman.conf:

vim /etc/pacman.conf
---
[ctrlzfs]
Server = https://mirror.epiphyte.network/repos/ctrl/$repo

Now lets pull the repository directives and install ZFS:

pacman -Syyu

pacman -S zfs-linux

Now enable the ZFS services, we’re not enabling all of them… just the ones we want:

systemctl enable zfs-import.target

systemctl enable zfs-import-cache.service

systemctl enable zfs-mount.service

systemctl enable zfs-zed.service

systemctl enable zfs.target

Now you have ZFS, we should prepare some disks. We’ll just do a mirror on this situation, however there are much more complex parity based systems via raidz. I’d suggest you take a peek at this other post on zfs performance parameters and topology.

First we must generate a key to encrypt our drives:

dd bs=512 count=4 if=/dev/urandom of=/etc/crypt.key iflag=fullblock

This generates a 2048 byte key of random information. We use a password to boot the root filsystem, but we can store keys on that filesystem to user stronger passwords for the critical storage devices. Make sure you back up this key someplace other than the root drive, you will never be able to physically type this key in, you’ll have to have an un-corrupted copy on disk to be able to decrypt the block devices. Also make sure it’s only readable by root:

chmod 600 /etc/crypt.key

Now we’ll encrypt two devices (sdd and sde respectively) using that key:

cryptsetup luksFormat --cipher aes-xts-plain64 --key-size 512 --hash sha512 --use-random -y --key-file=/etc/crypt.key /dev/sdd

cryptsetup luksFormat --cipher aes-xts-plain64 --key-size 512 --hash sha512 --use-random -y --key-file=/etc/crypt.key /dev/sde

The --use-random directive samples /dev/urandom. On a server without a desktop environment you’re not generating as much entropy as you would if you had a mouse to move around, so you might have to generate entropy to have enough available to complete the encryption. It’s advisable to install rng-tools, as it will give you an immediate boost to your entropy pool.

pacman -S rng-tools

If you still need more, then curl a copy of the ArchLinux iso and delete it. If you’re in a bandwidth poor environment dd one of your disks to /dev/null.

We now need to ensure that the devices we encrypted are decrypted at boot time, we do this via the crypttab. It is just like the fstab, but its responsibility is to decrypt devices at boot. We need to ensure that the devices have distinct names once they are decrypted. On large disk systems you’ll want to identify the tray identifier and use an appropriately descriptive name in crypttab. We’ll snag the UUID from each device:

blkid -o value -s UUID /dev/sdd

blkid -o value -s UUID /dev/sde

From this we would use disk0 and disk1 as the names for the devices. You can do whatever you please, but be consistent and unique with the naming. We add the devices to the crypttab as follows:

vim /etc/crypttab
---
disk0    UUID=<UUID_of_sdd>    /etc/crypt.key
disk1    UUID=<UUID_of_sde>    /etc/crypt.key

Now reboot and check your /dev/mapper to see if those two decrypted devices show up.

Now we’ll create the ZFS mirror. We’re going to use some initialization parameters from our other write up:

zpool create -o ashift=12 -O normalization=formD -O xattr=sa -O relatime=on tank mirror /dev/mapper/disk0 /dev/mapper/disk1

You will now have a ZFS mirrored device mounted to /tank on your system. Reboot to ensure that everything comes up fine.

Containers via systemd-nspawn #

First we’ll set up networking for systemd-nspawn:

systemctl enable --now machines.target

Set up shared networking with a service override:

sudo systemctl edit [email protected]

[Service]
ExecStart=
ExecStart=/usr/bin/systemd-nspawn --quiet --keep-unit --boot --link-journal=try-guest --machine=%I

Prepare the container:

pacman -S arch-install-scripts

cd /var/lib/machines

mkdir -p containername

pacstrap --ignore linux --ignore nano --ignore linux-firmware -c -d containername/ base vim bash-completion

machinectl start containername

machinectl shell containername

You can set up disk binds to move data between the container and host operating system:

mkdir -p /etc/systemd/nspawn

mkdir -p /opt/containername

vim /etc/systemd/nspawn/containername.nspawn
---
[Files]
Bind=/var/cache/pacman/
Bind=/opt/containername/

Above we bound the /var/cache/pacman, which may save you some bandwidth as you update the host and containers.

You can set the container to start as a service:

machinectl enable containername

When it works as intended systemd-nspawn can be a really valuable tool to containerize application deployment in a way that is simple to comprehend and easy to maintain.

Review #

You should be in a livable state now, when I’d first began it took me a couple times through before I was able to not make silly mistakes. The major hurdle is always getting yourself to a bootable state after the live media install.

Consider the choices made for you in this guide and re-examine them to ensure that it makes sense for your lifestyle.

Good luck, enjoy your journey with ArchLinux. Read, Contribute, Evangelize.