Blog version of provoke_ZFS_corruption
- Github: https://github.com/HankB/provoke_ZFS_corruption/tree/main
- Blog: https://HankB.github.io/provoke_ZFS_corruption/
- Update on methodology
Directory of tests starting with preliminary exploration.
started | completed | results | ZFS ver | OS | kernel ver | notes |
---|---|---|---|---|---|---|
2025-02-03 | 2025-02-04 | corruption | zfs-2.1.11-1+deb12u1 | Debian 12 | 6.1.0-30-amd64 | methodology exploration |
2025-02-05 | 2025-02-05 | corruption in 15 minutes | zfs-2.1.11-1+deb12u1 | Debian 12 | 6.1.0-30-amd64 | methodology exploration |
2025-02-07 | 2025-02-07 | corruption in 10 hours | zfs-2.3.99-170-FreeBSD_g34205715e | 15.0-CURRENT FreeBSD | main-n275087-cdacb12065e4 | FreeBSD on Pi 4B |
2025-02-11 | 2025-02-11 | corruption in 2 hours | zfs-2.1.11-1+deb12u1 | Debian 12 | 6.1.0-30-amd64 | repeat methodology exploration, test FreeBSD tweaks |
2025-02-12 | 2025-02-12 | corruption instantly [1] | zfs-2.0.3-9~bpo10+1 | Debian 10 | 5.10.0-0.deb10.24-amd64 | repeat previous tests using new methodology |
[1]
Test ran for hours w/ wrong ownership and the stir process changed nothing. When file ownership was fixed, corruption was nearly instant.
2025-02-04 process improvements
These have worked, producing corruption in less than a day. This should shorten testing time and also proves that the H/W in use will produce corruption using current Debian kernel and ZFS. More details In addition, the scripts have been tweaked to work on FreeBSD (with bash
) and corruption was produced on a FreeBSD host.
2025-02-11 Test instructions
Configure a host and two pools send
and recv
. Review the scripts that will be used as my user name hbarta
has been hard coded in too many places. (PRs to corract that gratefully accepted.)
- Run the
populate_pool.sh
script asroot
to populate the send pool. It does not terminate when there is sufficient data and will need to be killed when thesend
pool reaches about 50% capacity. The easiest way to kill it is to find the PID of the script andkill <PID>
as root. - Run
syncoid
asroot
manually once to mirrorsend
torecv
. Set permissions or ownership on the resulting files insend
to allow user modification. - Execute
zfs allow
asroot
to provide appropriate permissions so thatsyncoid
can run as your user. - Create a directory
~/logs
where the scripts will write logs. -
Start the following three scripts. (I hard link them to the user
~/bin
directory for convenience but it is left to the user to make them available in the PATH.) -
thrash_stir.sh
- This script will modify random files in the pool continuously and also capture recursive snapshots. thrash_syncoid.sh
- This script will invokesyncoid
continuously.manage_snaps.sh
- This script will manage snapshots every minute to limit retained snapshots to 100 for each dataset.
All three scripts will check for and exit when corruption is detected. In addition I add a cron job as root
to zpool scrub
both pools several times/day.
For details on the exact commands used to prepare the test, please see the methodology exploration.