Girders Blog
Notes on building internet applications

PostgreSQL with ZFS mounts inside Iocage Jails

Nov 5, 2021

This article contains my notes from an exeriment setting up a PostgreSQL 14 Cluster with ZFS mounts inside FreeBSD iocage jails. I hope you find it useful, but be aware that it may contain errors and inaccuracies due to my understanding of how things work.

There are two approaches here, Host-Managed or Jail-Managed datasets.

Host-Managed ZFS

Older versions of FreeBSD required PostgreSQL to run on the host instead of in jails. It may have been in issue with shared memory? Current versions of FreeBSD can now run Postgres inside a jail and allows us to run multiple postgres jails on the same host, if needed.

I have a host “fhost” freshly installed with FreeBSD 12.2 and ZFS. I intend to install 2 jails on it to run the Postgres cluster, though in production they would be on separate hosts. “db01” is the primary server and “db02” is the standby/replica in the cluster.

My goal is to mount ZFS datasets on the host inside the jail for the Postgres data directory. Additionally, I want to have separate datasets for the base (table data) and pg_wal (write-ahead log files) subdirectories.

zroot/jail-mounts                    mounted at  /zroot/jail-mounts
zroot/jail-mounts/db01               mounted at  /zroot/jail-mounts/fdb1
zroot/jail-mounts/db01/data14        mounted at  /var/db/postgres/data14 (PG_DATA)
zroot/jail-mounts/db01/data14/base   mounted at  /var/db/postgres/data14/base
zroot/jail-mounts/db01/data14/pg_wal mounted at  /var/db/postgres/data14/pg_wal
zroot/jail-mounts/db02               mounted at  /zroot/jail-mounts/db02
zroot/jail-mounts/db02/data14        mounted at  /var/db/postgres/data14 (PG_DATA)
zroot/jail-mounts/db02/data14/base   mounted at  /var/db/postgres/data14/base
zroot/jail-mounts/db02/data14/pg_wal mounted at  /var/db/postgres/data14/pg_wal

The catch is you can’t have the base or pg_wal datasets mounted to create the PGDATA directory. Postgresl requires it to be empty. They will be set up later.

ZFS Mounts for PGDATA

These commands assume the root user, otherwise you may need to add sudo. Postgres data should be stored compressed and have atime and setuid turned off for performance. Create the PGDATA directory for db01 as follows:

zfs create zroot/jail-mounts
zfs create zroot/jail-mounts/db01
zfs create -o compression=lz4 zroot/jail-mounts/db01/data14
zfs set atime=off zroot/jail-mounts/db01
zfs set setuid=off zroot/jail-mounts/db01
zfs create -o mountpoint=/var/db/postgres/data14 zroot/jail-mounts/db01/data14
zfs set jailed=on zroot/jail-mounts/db01

# Repeat above db01 commands for db02 datasets

Turn on “jailed” last. Otherwise it won’t let you set mountpoints, etc. When you go back to make changes to the dataset tree, you need to turn jailed=off first. Otherwise you get this funny message talking about a non-global zone:

cannot create 'zroot/...': 'mountpoint' cannot be set on dataset in a non-global zone

Attributes like jailed, compression, atime and setuid are inherited by subdirectory datasets.

Iocage

Iocage is a wrapper utility around FreeBSD Jails, an alternative to ezjail. This process is done on the host only.

pkg install iocage

To enable iocage in /etc/rc.conf, add this line:

iocage_enable="yes"

Before iocage is functional, it needs to activate.

iocage activate zroot

iocage now needs to fetch a RELEASE, which is used to create jails. This prompts you to select a release, then downloads it. I’m using 12.2-RELEASE.

iocage fetch

Install Primary Postgres Jail

Create Jail. By default, it creates a “clone” jail. Though you can create a basejail (add -b option) or thick jail (add -T option). After that, set up the ZFS mount options and boot into the jail.

iocage create -n db01 -r 12.2-RELEASE
iocage set allow_mount_zfs=1 db01
iocage set allow_mount=1 db01
iocage set jail_zfs=1 db01
iocage set jail_zfs_dataset=jail-mounts/db01 db01
iocage start db01
iocage console db01

Install PostgreSQL. Creating the cluster (initdb) demands that $PGDATA is empty

pkg install postgresql14-server postgresql14-contrib postgresql14-client
chown postgres:postgres /var/db/postgres/data14
su -l postgres
$ initdb -E UTF-8 /var/db/postgres/data14
$ exit # Back to root

Configure PostgreSQL for streaming replication. Details can be found in my article: Setup PostgreSQL 14 Streaming Replication

vi postgresql.conf (set listen address, etc.)
vi pg_hba.conf     (allow connections from our network, and replication users)

Now, to get the PGDATA/base and PGDATA/pg_wal directories pulled out into their own datasets we need to move these out of the way before mounting them. Then exit the console and stop the jail

mv /var/db/postgres/data14/base /var/db/postgres/data14/base0
mv /var/db/postgres/data14/pg_wal /var/db/postgres/data14/pg_wal0
exit # Log out of jail console to host
iocage stop db01

Create the subdirectory datasets. First, we set jailed=off to be able to access into it. Create the datasets, then set jailed=on again. Jailed datasets are not visible except to the running jail.

zfs set jailed=off zroot/jail-mounts/db01/data14
zfs create -o mountpoint=/var/db/postgres/data14/base zroot/jail-mounts/db01/data14/base
zfs create -o mountpoint=/var/db/postgres/data14/pg_wal zroot/jail-mounts/db01/data14/pg_wal
zfs set jailed=on zroot/jail-mounts/db01/data14

Start the jail (we didn’t set Postgres to start on boot yet! It wouldn’t be happy.) Move items from the PGDATA/base0 and pg_wal0 dirs over to the new home. Make sure permissions are right, everything owned by postgres.

iocage start db01
iocage console db01
cd /var/db/postgres/data14
chown postgres:postgres base pg_wal
mv base0/* base/
mv pg_wal0/* pg_wal
rmdir base0 pg_wal0

Cross our fingers and start up postgres.

vi /etc/rc.conf  to add: postgresql_enable="yes"
service postgresql start
psql -U postgres -l

createuser -U postgres --replication repl
createuser -U postgres -c appuser
createdb -U appuser appdb

Install Standby/Replica Postgres Jail

Create the jail like the db01 jail.

iocage create -n db02 -r 12.2-RELEASE
iocage set allow_mount_zfs=1 db02
iocage set allow_mount=1 db02
iocage set jail_zfs=1 db02
iocage set jail_zfs_dataset=jail-mounts/db02 db02
iocage start db02
iocage console db02

Install PostgreSQL but DO NOT do initdb. Instead, do a pg_basebackup, something like this:

pkg install postgresql14-server postgresql14-contrib postgresql14-client
chown postgres:postgres /var/db/postgres/data14
su -l postgres
$ pg_basebackup --pgdata /var/db/postgres/data14 --format=p
  --write-recovery-conf --checkpoint=fast --label=mffb --progress
  --host=db01 --port=5432 --username=repl

Now we finish the jail just like we did to db01 after the “initdb” step;

How easy was that?

Jail-Managed ZFS

We can manage ZFS from with the jail!

iocage stop db01
iocage set jail_zfs=on jail_zfs_dataset=data db01

Now we set it up. Alternatively, let’s set up a /db dataset to hold the postgresql data under, and adopt a naming convention including the environment, hostname and postgresql major version.

iocage start db01
iocage console db01
vim /etc/rc.conf    ### Add "zfs_enable="YES"
service zfs enable  ### or This command
zfs list
    NAME         USED  AVAIL  REFER  MOUNTPOINT
    zroot       99.7G   154G    88K  /zroot
    zroot/data    96K   154G    96K  none

Once we have that working, now create the datasets inside the jail.

zfs create zroot/data/db
zfs create \
    -o recordsize=16k \          ## Reduces write amplication with compression
    -o compress=lz4 \            ## Compress our data
    -o redundant_metadata=most \ ## causes ZFS to reduce the number of copies of some types of metadata that it keeps
    -o primarycache=metadata \   ## if working set fits in RAM
    -o atime=off \               ## not needed, eliminates useless writes
    -o reservation=100G          ## To ensure we don't use fill up the storage pool
    zroot/data/db/prod-db01-pg14

For compression on FreeBSD 13 or later, perhaps zstd would be a better option?

Mount and verify.

zfs set mountpoint=/db/prod-db91-pg14 zroot/data/db/prod-db01-pg14

zfs list
    NAME                      USED  AVAIL  REFER  MOUNTPOINT
    zroot                    99.7G   154G    88K  /zroot
    zroot/data                288K   154G    96K  none
    zroot/data/db             192K   154G    96K  none
    zroot/data/db/prod-db01-pg14    96K   154G    96K  /db/prod-db01-pg14

ls -l /db/prod-db01-pg14/
    total 0

On the host, we find the dataset as well. Note that it’s mount point is for the jail, and doesn’t exist on the host.

zroot/data/db/prod-db01-pg14     96K   154G    96K  /db/prod-db01-pg14

Now we can create our postgresql data directory under /db/prod-db01-pg14

initdb -E UTF-8 /db/prod-db01-pg14/data14

Thanks, Chip!

PostgreSQL Configuration for ZFS

Set full_page_writes=off Because ZFS always writes full blocks, we can disable full page writes in PostgreSQL.

For a database server, adjust your ZFS ARC size (50% of RAM default) to use any free memory not used by PG shared buffers or other needs.

TOAST Compression

Disable TOAST compression, as ZFS is handling the compression, we don’t need the overhead to double-compress it. TOAST is an area where very large columns are stored outside of the record on the page.

Unfortunately, this seems a per-column setting, not database wide.

You can only set the compression scheme with create table, either pglz or lz4 (if PostgreSQL was built --with-lz4).

CREATE TABLE posts (..., body TEXT COMPRESSION lz4, ...  );

Use alter table to disable compression on each large TOAST column. It doesn’t change the compression on existing data, only for new data. Use the storage parameter EXTERNAL for external, uncompressed data.

ALTER TABLE posts ALTER COLUMN body SET STORAGE EXTERNAL;

Good luck!