I have a Dell r720 server with enterprise grade SSDs for my homelab. This powerhouse feeds all the home services, ad-blockers, and this blog!
First time in a recent history that one of my drives failed ?.
The first challenge was to identify which drive has failed! Unfortunately, I couldn't find it an easier way. I had to pull drives one by one to see which one has failed ? . Once identified, I ordered a replacement SSD.
This is what you will see if you'd want to see the
status of your
replicator# zpool status pool: backups state: DEGRADED status: One or more devices has been taken offline by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: scrub repaired 540K in 0 days 05:28:45 with 0 errors on Wed Mar 6 18:51:22 2019 config: NAME STATE READ WRITE CKSUM backups DEGRADED 0 0 0 raidz3-0 DEGRADED 0 0 0 gptid/a38d9d54-2470-11e7-be70-ac220b8c944c ONLINE 0 0 0 gptid/18b749b3-c0b6-11e7-81f8-ac220b8c944c ONLINE 0 0 0 15678995806359064346 OFFLINE 0 0 0 was /dev/gptid/8016f0cf-5557-11e4-a84e-ac220b8c944c gptid/a0fb3bb7-c685-11e4-acbc-ac220b8c944c ONLINE 0 0 0 gptid/80f31598-5557-11e4-a84e-ac220b8c944c ONLINE 0 0 0 errors: No known data errors
I have a pool name
backups with 5 drives in total and one of them has failed and showing an
Simply identify and remove the failed drive, and replace with the new drive. Make sure you don't partition the new drive. The resilvering process will do it for you automatically!
The next step is to identify the new drive's
device id. The easiest way is to go to Proxmox GUI, Click on the name of your instance > Disks
I was running the same make and model for the pool but the new drive was a different model so it was easier for me to identify that it was mounted on
/dev/sdm. You can also try to copy the previous device id and run the command:
ls -la /dev/disk/by-id | grep -i 'previous-device-id-here'
This will tell you where the device was mounted.
Replace the drive by running this command:
zpool replace faster 9181524188806271229 /dev/sdm
the syntax of the above command is as follows:
# zpool replace <pool> <old device> <new device>
This will start the
resilvering process and replace the dead drive. You can check the progress by running the
zpool status command...
Notice the speed of the resilvering drive ?
ZFS makes it super easy to replace a dead drive!
Once the process completes, your pool will no longer be in the
DEGRADED state and will become
I would also recommend the following steps after the resilvering is completed:
# scrub your pool zpool scrub [your-pool-name] #run smart tests on the new drive: smartctl -t long /dev/sdX X = the new replacement disk
We're done now!