PostgreSQL with ZFS mounts inside Iocage Jails
This article contains my notes from an exeriment setting up a PostgreSQL 14 Cluster with ZFS mounts inside FreeBSD iocage jails. I hope you find it useful, but be aware that it may contain errors and inaccuracies due to my understanding of how things work.
There are two approaches here, Host-Managed or Jail-Managed datasets.
Host-Managed ZFS
Older versions of FreeBSD required PostgreSQL to run on the host instead of in jails. It may have been in issue with shared memory? Current versions of FreeBSD can now run Postgres inside a jail and allows us to run multiple postgres jails on the same host, if needed.
I have a host “fhost” freshly installed with FreeBSD 12.2 and ZFS. I intend to install 2 jails on it to run the Postgres cluster, though in production they would be on separate hosts. “db01” is the primary server and “db02” is the standby/replica in the cluster.
My goal is to mount ZFS datasets on the host inside the jail for the Postgres data directory. Additionally, I want to have separate datasets for the base (table data) and pg_wal (write-ahead log files) subdirectories.
zroot/jail-mounts mounted at /zroot/jail-mounts
zroot/jail-mounts/db01 mounted at /zroot/jail-mounts/fdb1
zroot/jail-mounts/db01/data14 mounted at /var/db/postgres/data14 (PG_DATA)
zroot/jail-mounts/db01/data14/base mounted at /var/db/postgres/data14/base
zroot/jail-mounts/db01/data14/pg_wal mounted at /var/db/postgres/data14/pg_wal
zroot/jail-mounts/db02 mounted at /zroot/jail-mounts/db02
zroot/jail-mounts/db02/data14 mounted at /var/db/postgres/data14 (PG_DATA)
zroot/jail-mounts/db02/data14/base mounted at /var/db/postgres/data14/base
zroot/jail-mounts/db02/data14/pg_wal mounted at /var/db/postgres/data14/pg_wal
The catch is you can’t have the base or pg_wal datasets mounted to create the PGDATA directory. Postgresl requires it to be empty. They will be set up later.
ZFS Mounts for PGDATA
These commands assume the root user, otherwise you may need to add sudo. Postgres data should be stored compressed and have atime and setuid turned off for performance. Create the PGDATA directory for db01 as follows:
zfs create zroot/jail-mounts
zfs create zroot/jail-mounts/db01
zfs create -o compression=lz4 zroot/jail-mounts/db01/data14
zfs set atime=off zroot/jail-mounts/db01
zfs set setuid=off zroot/jail-mounts/db01
zfs create -o mountpoint=/var/db/postgres/data14 zroot/jail-mounts/db01/data14
zfs set jailed=on zroot/jail-mounts/db01
# Repeat above db01 commands for db02 datasets
Turn on “jailed” last. Otherwise it won’t let you set mountpoints, etc.
When you go back to make changes to the dataset tree, you need to turn jailed=off
first. Otherwise you get this funny message talking about a non-global zone:
cannot create 'zroot/...': 'mountpoint' cannot be set on dataset in a non-global zone
Attributes like jailed, compression, atime and setuid are inherited by subdirectory datasets.
Iocage
Iocage is a wrapper utility around FreeBSD Jails, an alternative to ezjail. This process is done on the host only.
pkg install iocage
To enable iocage in /etc/rc.conf, add this line:
iocage_enable="yes"
Before iocage is functional, it needs to activate.
iocage activate zroot
iocage now needs to fetch a RELEASE, which is used to create jails. This prompts you to select a release, then downloads it. I’m using 12.2-RELEASE.
iocage fetch
Install Primary Postgres Jail
Create Jail. By default, it creates a “clone” jail. Though you can create a basejail (add -b option) or thick jail (add -T option). After that, set up the ZFS mount options and boot into the jail.
iocage create -n db01 -r 12.2-RELEASE
iocage set allow_mount_zfs=1 db01
iocage set allow_mount=1 db01
iocage set jail_zfs=1 db01
iocage set jail_zfs_dataset=jail-mounts/db01 db01
iocage start db01
iocage console db01
Install PostgreSQL. Creating the cluster (initdb) demands that $PGDATA is empty
pkg install postgresql14-server postgresql14-contrib postgresql14-client
chown postgres:postgres /var/db/postgres/data14
su -l postgres
$ initdb -E UTF-8 /var/db/postgres/data14
$ exit # Back to root
Configure PostgreSQL for streaming replication. Details can be found in my article: Setup PostgreSQL 14 Streaming Replication
vi postgresql.conf (set listen address, etc.)
vi pg_hba.conf (allow connections from our network, and replication users)
Now, to get the PGDATA/base and PGDATA/pg_wal directories pulled out into their own datasets we need to move these out of the way before mounting them. Then exit the console and stop the jail
mv /var/db/postgres/data14/base /var/db/postgres/data14/base0
mv /var/db/postgres/data14/pg_wal /var/db/postgres/data14/pg_wal0
exit # Log out of jail console to host
iocage stop db01
Create the subdirectory datasets. First, we set jailed=off to be able to access into it. Create the datasets, then set jailed=on again. Jailed datasets are not visible except to the running jail.
zfs set jailed=off zroot/jail-mounts/db01/data14
zfs create -o mountpoint=/var/db/postgres/data14/base zroot/jail-mounts/db01/data14/base
zfs create -o mountpoint=/var/db/postgres/data14/pg_wal zroot/jail-mounts/db01/data14/pg_wal
zfs set jailed=on zroot/jail-mounts/db01/data14
Start the jail (we didn’t set Postgres to start on boot yet! It wouldn’t be happy.) Move items from the PGDATA/base0 and pg_wal0 dirs over to the new home. Make sure permissions are right, everything owned by postgres.
iocage start db01
iocage console db01
cd /var/db/postgres/data14
chown postgres:postgres base pg_wal
mv base0/* base/
mv pg_wal0/* pg_wal
rmdir base0 pg_wal0
Cross our fingers and start up postgres.
vi /etc/rc.conf to add: postgresql_enable="yes"
service postgresql start
psql -U postgres -l
createuser -U postgres --replication repl
createuser -U postgres -c appuser
createdb -U appuser appdb
Install Standby/Replica Postgres Jail
Create the jail like the db01 jail.
iocage create -n db02 -r 12.2-RELEASE
iocage set allow_mount_zfs=1 db02
iocage set allow_mount=1 db02
iocage set jail_zfs=1 db02
iocage set jail_zfs_dataset=jail-mounts/db02 db02
iocage start db02
iocage console db02
Install PostgreSQL but DO NOT do initdb. Instead, do a pg_basebackup, something like this:
pkg install postgresql14-server postgresql14-contrib postgresql14-client
chown postgres:postgres /var/db/postgres/data14
su -l postgres
$ pg_basebackup --pgdata /var/db/postgres/data14 --format=p
--write-recovery-conf --checkpoint=fast --label=mffb --progress
--host=db01 --port=5432 --username=repl
Now we finish the jail just like we did to db01 after the “initdb” step;
- Move base and pg_wal away
- Stop jail
- Create base and pg_wal ZFS datasets under db02/data14
- Restart db02 jail
- Move base0 and pg_wal0 files back where they belong
- Configure Postgresql for standby mode
- Start postgres, configure in /etc/rc.conf for start at boot
How easy was that?
Jail-Managed ZFS
We can manage ZFS from with the jail!
iocage stop db01
iocage set jail_zfs=on jail_zfs_dataset=data db01
- jail_zfs - sets a number of sysctls to allow a zfs dataset to be managed from inside the jail.
- jail_zfs_dataset - name of the extra zfs dataset in the jail.
- Do not create the dataset, iocage will do that for us.
Now we set it up. Alternatively, let’s set up a /db dataset to hold the postgresql data under, and adopt a naming convention including the environment, hostname and postgresql major version.
iocage start db01
iocage console db01
vim /etc/rc.conf ### Add "zfs_enable="YES"
service zfs enable ### or This command
zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 99.7G 154G 88K /zroot
zroot/data 96K 154G 96K none
Once we have that working, now create the datasets inside the jail.
zfs create zroot/data/db
zfs create \
-o recordsize=16k \ ## Reduces write amplication with compression
-o compress=lz4 \ ## Compress our data
-o redundant_metadata=most \ ## causes ZFS to reduce the number of copies of some types of metadata that it keeps
-o primarycache=metadata \ ## if working set fits in RAM
-o atime=off \ ## not needed, eliminates useless writes
-o reservation=100G ## To ensure we don't use fill up the storage pool
zroot/data/db/prod-db01-pg14
For compression on FreeBSD 13 or later, perhaps zstd would be a better option?
Mount and verify.
zfs set mountpoint=/db/prod-db91-pg14 zroot/data/db/prod-db01-pg14
zfs list
NAME USED AVAIL REFER MOUNTPOINT
zroot 99.7G 154G 88K /zroot
zroot/data 288K 154G 96K none
zroot/data/db 192K 154G 96K none
zroot/data/db/prod-db01-pg14 96K 154G 96K /db/prod-db01-pg14
ls -l /db/prod-db01-pg14/
total 0
On the host, we find the dataset as well. Note that it’s mount point is for the jail, and doesn’t exist on the host.
zroot/data/db/prod-db01-pg14 96K 154G 96K /db/prod-db01-pg14
Now we can create our postgresql data directory under /db/prod-db01-pg14
initdb -E UTF-8 /db/prod-db01-pg14/data14
Thanks, Chip!
PostgreSQL Configuration for ZFS
Set full_page_writes=off
Because ZFS always writes full blocks, we can disable full page writes in PostgreSQL.
For a database server, adjust your ZFS ARC size (50% of RAM default) to use any free memory not used by PG shared buffers or other needs.
TOAST Compression
Disable TOAST compression, as ZFS is handling the compression, we don’t need the overhead to double-compress it. TOAST is an area where very large columns are stored outside of the record on the page.
Unfortunately, this seems a per-column setting, not database wide.
You can only set the compression scheme with create table
,
either pglz or lz4 (if PostgreSQL was built --with-lz4
).
CREATE TABLE posts (..., body TEXT COMPRESSION lz4, ... );
Use alter table
to disable compression on each large TOAST column.
It doesn’t change the compression on existing data, only for new data.
Use the storage parameter EXTERNAL
for external, uncompressed data.
ALTER TABLE posts ALTER COLUMN body SET STORAGE EXTERNAL;
Good luck!