Jul 082013

While it’s very easy to use ZFS we sometime forget that we need to replace faulted disks. Here is a quicky to remember the commands works.
Here is some very basic information on the required steps. I also added infos about zfs boot rpool.

% zpool status mypool
  pool: mypool
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
 scrub: scrub in progress for 0h3m, 0.01% done, 447h11m to go
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c0t7d0  FAULTED      0     0     0  too many errors

errors: No known data errors

Before we start with ZFS, there is a command I love using in order to know if and how many error we have on my disks; it’s smartctl and I look for the “Reallocated_Sector_Ct” line it correspond to the number of disk sector error. Since I have several disk I also use parallel to get all dis status at the same time. The -k option allow me to keep it in order and nl will show me a simple line number. Here disk 2 as 2 errors.

% parallel -k smartctl -a ::: /dev/rdsk/c0t*d0s0 | grep _Ct | nl
1   5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always – 0
2   5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always – 2
3   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0
4   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0
5   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0
6   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0
7   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0
8   5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always – 0

Before you Physically remove and replace hard disk. You may need to unconfigure the disk in Solaris.

zpool offline mypool c0t0d0
cfgadm -c unconfigure c0::dsk/c0t0d0

If you got an hot swap controller (I think most Sata3 and SAS are) just you can now just remove the defective disk. cfgadm was added after I discover that some OS do not like to loose disks without notice. So you can reboot or try this command to re-configure a disk in Solaris …

cfgadm -c configure c0::dsk/c0t0d0

I did see this error : cfgadm: Attachment point not found

The solution was : I waited an hour an tried again. Was very happy that it work because I was not sure at all the reboot was going to work since grub was complaining that the other used disk (mirror disk) was not having a backup slice #2.

Using x86 system, while Solaris has normally 7 or 9 partition (also called slice), DOS and the x86 BIOS uses 4 partitions per disks. On solaris the command format as a sub-command fdisk (very similar to the one old DOS uses). So you need to create at least 1 partition on the disk before you can Slice a Solaris partition on that DOS partition.

[root@srvgrm01:~ ]% format

Searching for disks...done

AVAILABLE DISK SELECTIONS:
      0. c0t0d0 DEFAULT cyl 60797 alt 2 hd 255 sec 189
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@0,0
       1. c0t1d0 DEFAULT cyl 60797 alt 2 hd 255 sec 189
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@1,0
       2. c0t2d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@2,0
       3. c0t3d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@3,0
       4. c0t4d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@4,0
       5. c0t5d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@5,0
       6. c0t6d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@6,0
       7. c0t7d0 ATA-WDC WD2003FYYS-0-1D02-1.82TB
          /pci@0,0/pci8086,340a@3/pci1028,1f10@0/sd@7,0
Specify disk (enter its number): 0

selecting c0t0d0
[disk formatted]

format> fdisk
No fdisk table exists. The default partition for the disk is:

a 100% “SOLARIS System” partition

Type “y” to accept the default partition, otherwise type “n” to edit the
partition table.
y
format>

 

The purpose of my example is to get a disk with a single dos partition, a 100% Solaris partition. That partition will be split using Solaris partitioning scheme to get a smaller root #0 part then the rest of the disk space into part #7.

You can now go to format partition menu and setup than print the layout … as example here is how to setup one standalone Solaris partition (you will need to setup #7…

partition> 7

Part      Tag    Flag     Cylinders         Size            Blocks

7 unassigned    wm       0                0         (0/0/0)              0

Enter partition id tag[unassigned]: stand

Enter partition permission flags[wm]:

Enter new starting cyl[1]: 10622

Enter partition size[0b, 0c, 10622e, 0.00mb, 0.00gb]: ?

Expecting up to 2418184125 blocks, 50175 cylinders,  50175 end cylinder,  1180754.00 megabytes, or 1153.08 gigabytes

Enter partition size[0b, 0c, 10622e, 0.00mb, 0.00gb]: 50175c

partition> p

Current partition table (original):
Total disk cylinders available: 60797 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       1 - 10621      244.08GB    (10621/0/0)  511879095
  1 unassigned    wm       0                0         (0/0/0)              0
  2     backup    wu       0 - 60796        1.36TB    (60797/0/0) 2930111415
  3 unassigned    wm       0                0         (0/0/0)              0
  4 unassigned    wm       0                0         (0/0/0)              0
  5 unassigned    wm       0                0         (0/0/0)              0
  6 unassigned    wm       0                0         (0/0/0)              0
  7      stand    wm   10622 - 60796        1.13TB    (50175/0/0) 2418184125
  8       boot    wu       0 -     0       23.53MB    (1/0/0)          48195
  9 unassigned    wm       0                0         (0/0/0)              0

partition> label
Ready to label disk, continue? y

After this a zpool replace command should be issued. However the system may not see your device right away. In some cases it work in other a reboot is required. I think it may depend on on the controller and the Bios.

% zpool replace mypool c0t7d0 c0t7d0
% zpool replace -f mypool c0t7d0 c0t7d0
% zpool online mypool c0t7d0

In any case you do this on an root bootable rpool:

% zpool status rpool
…. Let disk resilver before installing the boot blocks… Then
On SPARC systems do:
% installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t0d0s0

On x86 systems do:
% installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0t0d0s0

Notes: Yes it’s the same device name twice (old and new will have the same address location). I also use the -7 because the Solaris partitions has the old backup partition that was always used since Solaris was called SunOS for backup purpose and now it’s still in use by installgrub to work properly.

% zpool status rpool

  pool: rpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scan: resilver in progress since Mon Jul  8 17:03:58 2013
    1.23G scanned out of 26.1G at 12.2M/s, 0h34m to go
    1.23G scanned out of 26.1G at 12.2M/s, 0h34m to go
    1.23G resilvered, 4.70% done

config:
        NAME                STATE     READ WRITE CKSUM
        rpool               DEGRADED     0     0     0
          mirror-0          DEGRADED     0     0     0
            replacing-0     DEGRADED     0     0     0
              c0t0d0s0/old  FAULTED      0     0     0  corrupted data
              c0t0d0s0      ONLINE       0     0     0  (resilvering)
            c0t1d0s0        ONLINE       0     0     0

errors: No known data errors

Or for a normal pool…

%  zpool status mypool
  pool: mypool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h0m, 0.00% done, 545h13m to go
config:

        NAME              STATE     READ WRITE CKSUM
        mypool            DEGRADED     0     0     0
          raidz1-0        DEGRADED     0     0     0
            c0t2d0        ONLINE       0     0     0
            c0t3d0        ONLINE       0     0     0
            c0t4d0        ONLINE       0     0     0
            c0t5d0        ONLINE       0     0     0
            c0t6d0        ONLINE       0     0     0
            replacing-5   DEGRADED     0     0     0
              c0t7d0s0/o  FAULTED      0     0     0  too many errors
              c0t7d0      ONLINE       0     0     0  5.24M resilvered

After completion old disk will be removed. Errors will be cleared. Pool will be reset to optimal performances. Here is a status.

% zpool status
  pool: mypool
 state: ONLINE
 scrub: resilver completed after 50h43m with 0 errors on Sat Oct 15 18:49:54 2011
config:

        NAME        STATE     READ WRITE CKSUM
        mypool      ONLINE       0     0     0
          raidz1-0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c0t5d0  ONLINE       0     0     0
            c0t6d0  ONLINE       0     0     0
            c0t7d0  ONLINE       0     0     0  645G resilvered

errors: No known data errors

A very good simple guide is with  How to Replace a Disk in the ZFS Root Pool and another one is SysInternals.

 

Hope this help.
Rejean.

One Response to “zpool (ZFS) howto replace faulted disk – (revised sept 2013)”

  1. Rejean says:

    I find some issue with ZFS when replacing a damaged 1 or 2 TB disk in a 5-6 disk array. The re-silvering time is very slow!!!!!!!! It took 3 weeks! While I was stress not to loose another disk…

    I am a bit worry now.

Leave a Reply

(required)

(required)

Return to Panoramic Solution