Managing Software RAID Arrays
Alteeve Wiki :: How To :: Managing Software RAID Arrays |
This quickie covers:
- Tearing down an existing software RAID 5 array.
- Deleting it's underlying primary partition on each array member.
- Creating a new set of four extended partitions.
- Creating four new RAID 5 arrays out of the new extended partitions.
- Ensuring that /etc/mdadm.conf and /etc/fstab are updated.
Warnings And Assumptions
This tutorial covers fundamentally changing the storage on a server. There is a very high, very real chance that all data can be lost.
WARNING: DO NOT FOLLOW THIS TUTORIAL ON A LIVE MACHINE! Use this to practice on a test machine only.
It is assumed that you have a fundamental understanding of and comfort with the Linux command line, specifically, the bash terminal.
Note: This tutorial was written on CentOS 5.5 (EL5). It should be fairly easy to adapt to most recent Linux distributions. This example also shows the RAID array being used to back DRBD in a RHCS cluster. This was done to show how to ensure that dependent resources are cleared and can be safely skipped if it's not appropriate to your setup.
Let's Begin
So then, let's get to work.
Viewing The Current Configuration
We want to look at three things:
- The current RAID devices.
- What is using the RAID device we plan to delete.
- The current partition layout.
Current RAID Configuration
Checking the current RAID devices involves see what devices are online and what are configured. The first is checked by looking at the /proc/mdstat file, and the later by looking in the /etc/mdstat file.
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
264960 blocks [4/4] [UUUU]
md1 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
6289152 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md3 : active raid5 sdd4[3] sdc4[2] sdb4[1] sda4[0]
1395148608 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0]
62917632 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
cat /etc/mdadm.conf
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=7af5fde9:646394dd:d46d09a3:eb495b50
ARRAY /dev/md0 level=raid1 num-devices=4 uuid=2280ed9e:24f99bf5:4cb4f32c:f3b58eb4
ARRAY /dev/md1 level=raid5 num-devices=4 uuid=5ae2c898:5837f4a0:a3f0a617:955802c1
ARRAY /dev/md3 level=raid5 num-devices=4 metadata=0.90 spares=1 UUID=a2636590:fcb1e82a:3f1d7145:41a20e6d
So we see that four devices are configured and operating. We will tear down /dev/md3.
Ensuring that /dev/md3 is no longer in use
We need to ensure that /dev/md3 is no longer in use is by checking what is mounted and confirming that no other program uses it.
There are *many* applications that might use the raw storage space;
We'll look for for local file systems by using df and checking /etc/fstab.
Note: If you have any reason to suspect trouble with your system, you may want to not use df as it can hang when a mount breaks. Instead, run cat /proc/mounts. The output is a little more cryptic, but it should always return you to the terminal.
df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md2 57G 2.5G 51G 5% /
/dev/md0 251M 37M 201M 16% /boot
tmpfs 7.7G 0 7.7G 0% /dev/shm
none 7.7G 40K 7.7G 1% /var/lib/xenstored
It's not mounted.
cat /etc/fstab
/dev/md2 / ext4 defaults 1 1
/dev/md0 /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/md1 swap swap defaults 0 0
There is no corresponding entry.
Now let's see if it's part of a DRBD resource. This involves checking /proc/drbd, if it exists, and then checking /etc/drbd.conf. If you're not using DRBD, neither of these files will exist and you can move on.
Note: If you have a DRBD resource using /dev/md3, make sure that the resource is not in use by LVM before destroying the DRBD resource. If you don't remove LVM first, then you might have trouble later as the LVM signature may persist.
First, look in /etc/drbd.conf and see if any of the configured resources use /dev/md3. If they do, make sure you tear down the matching resource on the other node.
cat /etc/drbd.conf
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#
global {
usage-count yes;
}
common {
protocol C;
syncer {
rate 15M;
}
disk {
fencing resource-and-stonith;
}
handlers {
outdate-peer "/sbin/obliterate";
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
startup {
become-primary-on both;
}
}
resource r0 {
device /dev/drbd0;
meta-disk internal;
on an-node01.alteeve.com {
address 192.168.2.71:7789;
disk /dev/md3;
}
on an-node02.alteeve.com {
address 192.168.2.72:7789;
disk /dev/md3;
}
}
Here we see that /dev/md3 is in fact in use by DRBD and it's block device name is /dev/drbd0. Given this, we'll want to go back and look at df and /etc/fstab again to make sure that /dev/drbd0 wasn't listed. It wasn't, so we're ok to proceed.
We won't actually destroy this resource yet, we'll come back to it once we know that LVM is not using /dev/md3 or /dev/drbd0.
Ensuring that LVM is not using /dev/md3 or the DRBD resource
Now let's look to make sure it's not an LVM physical volume. The pvdisplay tool will show us what physical volumes, if any, are in use.
pvdisplay
--- Physical volume ---
PV Name /dev/drbd0
VG Name drbd_x3_vg0
PV Size 1.30 TB / not usable 1.81 MB
Allocatable yes
PE Size (KByte) 4096
Total PE 340612
Free PE 327812
Allocated PE 12800
PV UUID 0ncgtw-GQ4u-srvn-mDEb-T8xq-O51D-93Xs2I
Here we see that the DRBD resource, which we know is using the RAID array we want to tear down, is in use as a PV. Before we can delete it though, we'll need to see if there are and VGs using this PV. If so, then we'll check for LVs on that VG.
So, we can use vgdisplay to see what, if any, volume groups there are and what physical volumes they use. We'll use the -v argument, which will tell us the PVs and, at the same time, what LVs use this VG.
vgdisplay -v
Finding all volume groups
Finding volume group "drbd_x3_vg0"
--- Volume group ---
VG Name drbd_x3_vg0
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 10
VG Access read/write
VG Status resizable
Clustered yes
Shared no
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 1.30 TB
PE Size 4.00 MB
Total PE 340612
Alloc PE / Size 12800 / 50.00 GB
Free PE / Size 327812 / 1.25 TB
VG UUID lqRQR9-vg5V-HexC-oeJ8-30wg-l9EL-2CJHMT
--- Logical volume ---
LV Name /dev/drbd_x3_vg0/xen_shared
VG Name drbd_x3_vg0
LV UUID RfeQ2Q-W2fF-edK2-Ove3-PmJc-lF1B-QAk063
LV Write Access read/write
LV Status available
# open 0
LV Size 50.00 GB
Current LE 12800
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 768
Block device 253:0
--- Physical volumes ---
PV Name /dev/drbd0
PV UUID 0ncgtw-GQ4u-srvn-mDEb-T8xq-O51D-93Xs2I
PV Status allocatable
Total PE / Free PE 340612 / 327812
Here we see just one VG, whcih has one LV, on the /dev/drbd0 PV.
Warning: We're stopping here, but it is entirely possible that there is a file system like GFS2 on the LV. It's an exercise for the reader to ensure that whatever uses the LV is removed from the system.
Removing everything on top of /dev/md3
So at this point, we know that there is a whole stack of stuff on top of our RAID array. We need to remove these things in the proper order; Top down. So here is the plan:
- Remove anything using the LV (exercise for the reader).
- Remove the LV.
- Remove the VG.
- Remove the PV.
- Remove the DRBD resource.
- Remove the RAID device.
Removing LVM stuff
To remove the logical volume, use lvremove.
lvremove /dev/drbd_x3_vg0/xen_shared
Do you really want to remove active clustered logical volume xen_shared? [y/n]: y
Logical volume "xen_shared" successfully removed
Now remove the volume group with the vgremove command.
vgremove drbd_x3_vg0
Volume group "drbd_x3_vg0" successfully removed
Next, remove /dev/drbd0 as an LVM physical volume.
pvremove /dev/drbd0
Labels on physical volume "/dev/drbd0" successfully wiped
Removing DRBD stuff
You'll need to do the next step on both nodes for DRBD. These steps will stop the DRBD resource and then wipe the DRBD meta-data from the backing devices. Once done, we'll edit /etc/drbd.conf and delete the r0 { } directive.
This is a simple two-step process; Take down the resource and then wipe the meta-data.
drbdadm down r0
drbdadm wipe-md r0
Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes
Wiping meta data...
DRBD meta data block successfully wiped out.
Now edit /etc/drbd.conf and delete the r0 { } directive. Using the example from earlier, the configuration will will be as shown below.
vim /etc/drbd.conf
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#
global {
usage-count yes;
}
common {
protocol C;
syncer {
rate 15M;
}
disk {
fencing resource-and-stonith;
}
handlers {
outdate-peer "/sbin/obliterate";
}
net {
allow-two-primaries;
after-sb-0pri discard-zero-changes;
after-sb-1pri discard-secondary;
after-sb-2pri disconnect;
}
startup {
become-primary-on both;
}
}
Destroying /dev/md3
First, we need to stop the array using the mdadm tool. Before we start though, let's take a look at the current RAID arrays.
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
264960 blocks [4/4] [UUUU]
md1 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
6289152 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md3 : active raid5 sdd4[3] sdc4[2] sdb4[1] sda4[0]
1395148608 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0]
62917632 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Let's stop /dev/md3 now.
mdadm --stop /dev/md3
mdadm: stopped /dev/md3
Now let's remove /dev/md3 from /etc/mdadm.conf. In my case, this meant modifying my mdadm.conf this way:
vim /etc/mdadm.conf
From:
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=7af5fde9:646394dd:d46d09a3:eb495b50
ARRAY /dev/md0 level=raid1 num-devices=4 uuid=2280ed9e:24f99bf5:4cb4f32c:f3b58eb4
ARRAY /dev/md1 level=raid5 num-devices=4 uuid=5ae2c898:5837f4a0:a3f0a617:955802c1
ARRAY /dev/md3 level=raid5 num-devices=4 metadata=0.90 spares=1 UUID=a2636590:fcb1e82a:3f1d7145:41a20e6d
To:
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=7af5fde9:646394dd:d46d09a3:eb495b50
ARRAY /dev/md0 level=raid1 num-devices=4 uuid=2280ed9e:24f99bf5:4cb4f32c:f3b58eb4
ARRAY /dev/md1 level=raid5 num-devices=4 uuid=5ae2c898:5837f4a0:a3f0a617:955802c1
Managing Partitions
Now that the RAID array is gone, we can finally change the partition table on the member drives. As we say in /proc/mdstat, the partitions in use by /dev/md3 were /dev/sd[abcd]4, which is what we've wanted to delete and split up into new extended partitions from the beginning.
The following steps will need to be done on all drives and, if you are using a cluster or DRBD, on all nodes. Be careful when modifying the various drives and noes that the new geometry matches on all drives.
The steps we will take are:
- Delete the existing fourth primary partition.
- Create an extended partition using the space formerly held by the deleted partition.
- Create four new extended partitions; 20 GiB, 2x 100 GiB and the last being the remainder of the free space.
- Note: We'll be creating a RAID 5 array, so the effective space of each array will be 4 * (n-1).
- Set the new partition ID on the four extended partitions to fd (Linux raid autodetect)
- Write the changes out to disk.
- Once all disks are changed, reboot.
The commands below will show the fdisk shell arguments entered. Please adjust as needed for your environment. Note that when no typed command is shown, that means the default was selected by having just printed <enter>.
fdisk /dev/sda
The number of cylinders for this disk is set to 60801.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Check the current partition scheme.
Command (m for help): p
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 33 265041 fd Linux raid autodetect
/dev/sda2 34 2644 20972857+ fd Linux raid autodetect
/dev/sda3 2645 2905 2096482+ fd Linux raid autodetect
/dev/sda4 2906 60801 465049620 fd Linux raid autodetect
Delete the fourth partition.
Command (m for help): d
Partition number (1-4): 4
Command (m for help): n
Command action
e extended
p primary partition (1-4)
e
Selected partition 4
First cylinder (2906-60801, default 2906):
Using default value 2906
Last cylinder or +size or +sizeM or +sizeK (2906-60801, default 60801):
Create the extended partition within which we'll create the four actual partitions.
Command (m for help): n
First cylinder (2906-60801, default 2906):
Using default value 2906
Last cylinder or +size or +sizeM or +sizeK (2906-60801, default 60801): +20G
Create the first 20 GiB partition.
Command (m for help): n
First cylinder (2906-60801, default 2906):
Using default value 2906
Last cylinder or +size or +sizeM or +sizeK (2906-60801, default 60801): +20G
Set it's partition type to fd (Linux raid autodetect).
Command (m for help): t
Partition number (1-5): 5
Hex code (type L to list codes): fd
Changed system type of partition 5 to fd (Linux raid autodetect)
Repeat the two steps above to create the two 100GiB partition and the last partition which will consume the remainder of the free space.
Command (m for help): n
First cylinder (5339-60801, default 5339):
Using default value 5339
Last cylinder or +size or +sizeM or +sizeK (5339-60801, default 60801): +100G
Command (m for help): t
Partition number (1-6): 6
Hex code (type L to list codes): fd
Changed system type of partition 6 to fd (Linux raid autodetect)
Command (m for help): n
First cylinder (17498-60801, default 17498):
Using default value 17498
Last cylinder or +size or +sizeM or +sizeK (17498-60801, default 60801): +100G
Command (m for help): t
Partition number (1-7): 7
Hex code (type L to list codes): fd
Changed system type of partition 7 to fd (Linux raid autodetect)
Command (m for help): n
First cylinder (29657-60801, default 29657):
Using default value 29657
Last cylinder or +size or +sizeM or +sizeK (29657-60801, default 60801):
Using default value 60801
Command (m for help): t
Partition number (1-8): 8
Hex code (type L to list codes): fd
Changed system type of partition 8 to fd (Linux raid autodetect)
Check the geometry to make sure it is what we expect.
Command (m for help): p
Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 33 265041 fd Linux raid autodetect
/dev/sda2 34 2644 20972857+ fd Linux raid autodetect
/dev/sda3 2645 2905 2096482+ fd Linux raid autodetect
/dev/sda4 2906 60801 465049620 5 Extended
/dev/sda5 2906 5338 19543041 fd Linux raid autodetect
/dev/sda6 5339 17497 97667136 fd Linux raid autodetect
/dev/sda7 17498 29656 97667136 fd Linux raid autodetect
/dev/sda8 29657 60801 250172181 fd Linux raid autodetect
Perfect, now we can write the changes to disk.
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.
The error above is because the disk we altered is used by the root file system. This is why we will need to reboot before we can proceed, but not until all of the disks on the server have been altered.
So go back and repeat these steps for /dev/sdb, /dev/sdc and /dev/sdd.
Once done editing all of the disks, reboot before proceeding.
Creating the new RAID Arrays
The final steps are:
- Create the four new arrays using mdadm.
- Add the new arrays to /etc/mdadm.conf.
The command to actually create the new arrays is pretty straight forward. Let's look at the first one to create the new /dev/md3.
mdadm --create /dev/md3 --homehost=localhost.localdomain --raid-devices=4 --level=5 --spare-devices=0 /dev/sd[abcd]5
mdadm: array /dev/md3 started.
The --homehost= switch is used to tell mdadm to automatically assemble this array should it move to another host. This is because, by default, the actual host name is used to tag the array. Without this, should you move this array to another system, it would not auto assemble as the current host name would not match the new server's host name. However, you may wish to omit this if you prefer the default behaviour.
One you run the above command, you should be able to cat the /proc/mdstat file and see the new array assembling.
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md3 : active raid5 sdd5[4] sdc5[2] sdb5[1] sda5[0]
58628928 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[==============>......] recovery = 73.6% (14388164/19542976) finish=1.8min speed=45201K/sec
md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
264960 blocks [4/4] [UUUU]
md1 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
6289152 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0]
62917632 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
You don't need to worry about thrashing the disks by creating all of the arrays now. The arrays will wait for the previous one to finish syncing before beginning to sync the next array.
mdadm --create /dev/md4 --homehost=localhost.localdomain --raid-devices=4 --level=5 --spare-devices=0 /dev/sd[abcd]6
mdadm: array /dev/md4 started.
mdadm --create /dev/md5 --homehost=localhost.localdomain --raid-devices=4 --level=5 --spare-devices=0 /dev/sd[abcd]7
mdadm: array /dev/md5 started.
mdadm --create /dev/md6 --homehost=localhost.localdomain --raid-devices=4 --level=5 --spare-devices=0 /dev/sd[abcd]8
mdadm: array /dev/md6 started.
Now if we look again at /proc/mdstat, we'll see the resync=DELAYED on the new arrays pending synchronization.
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid1]
md6 : active raid5 sdd8[4] sdc8[2] sdb8[1] sda8[0]
750516288 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
resync=DELAYED
md5 : active raid5 sdd7[4] sdc7[2] sdb7[1] sda7[0]
293001216 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
resync=DELAYED
md4 : active raid5 sdd6[4] sdc6[2] sdb6[1] sda6[0]
293001216 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
[>....................] recovery = 4.0% (3970688/97667072) finish=34.5min speed=45208K/sec
md3 : active raid5 sdd5[3] sdc5[2] sdb5[1] sda5[0]
58628928 blocks level 5, 64k chunk, algorithm 2 [4/4] [UUUU]
md0 : active raid1 sdd1[3] sdc1[2] sdb1[1] sda1[0]
264960 blocks [4/4] [UUUU]
md1 : active raid5 sdd3[3] sdc3[2] sdb3[1] sda3[0]
6289152 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
md2 : active raid5 sdd2[3] sdc2[2] sdb2[1] sda2[0]
62917632 blocks level 5, 256k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
Almost done. We just need to add the new arrays to /etc/mdadm.conf so that they're assembled on boot. Note that you can proceed now. There is no need to wait for the arrays to finish synchronizing.
Note: Until the array has finished synchronizing, the unedited output from the next command will include spares=1, even though there are no spares in the array. You can manually delete these arguments or wait for all arrays to finish synchronizing, if you prefer.
To append the four new arrays to the /etc/mdadm.conf file, you can use the following command.
Before:
cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=7af5fde9:646394dd:d46d09a3:eb495b50
ARRAY /dev/md0 level=raid1 num-devices=4 uuid=2280ed9e:24f99bf5:4cb4f32c:f3b58eb4
ARRAY /dev/md1 level=raid5 num-devices=4 uuid=5ae2c898:5837f4a0:a3f0a617:955802c1
After:
mdadm --detail --scan | grep -e md3 -e md4 -e md5 -e md6 | sed -e "s/spares=1 //" >> /etc/mdadm.conf
cat /etc/mdadm.conf
# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid5 num-devices=4 uuid=7af5fde9:646394dd:d46d09a3:eb495b50
ARRAY /dev/md0 level=raid1 num-devices=4 uuid=2280ed9e:24f99bf5:4cb4f32c:f3b58eb4
ARRAY /dev/md1 level=raid5 num-devices=4 uuid=5ae2c898:5837f4a0:a3f0a617:955802c1
ARRAY /dev/md3 level=raid5 num-devices=4 metadata=0.90 UUID=a9010440:446a912e:bfe78010:bc810f04
ARRAY /dev/md4 level=raid5 num-devices=4 metadata=0.90 UUID=7c8aff84:31b8e4c2:bfe78010:bc810f04
ARRAY /dev/md5 level=raid5 num-devices=4 metadata=0.90 UUID=de81e923:12d84547:bfe78010:bc810f04
ARRAY /dev/md6 level=raid5 num-devices=4 metadata=0.90 UUID=63c9bfd4:70cb8535:bfe78010:bc810f04
And, we're done!
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Enterprise Support: Alteeve Support |
Community Support | |
© Alteeve's Niche! Inc. 1997-2024 | Anvil! "Intelligent Availability®" Platform | ||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |