Anvil! Tutorial 2 - Growing Storage

From Alteeve Wiki
Jump to navigation Jump to search

 AN!Wiki :: Anvil! Tutorial 2 - Growing Storage

One common task an administrator will face as their Anvil! moves down it's usable life is running out of resources. Adding RAM to an Anvil! is pretty trivial, but what about adding storage space?

One option is to simply add new drives, create a new array, build a new DRBD resource and set it up as a new clustered LVM PV. However, this approach is not very efficient and, for many people, not possible due to lack of sufficient empty drive bays.

Task Ahead

Warning: THIS IS AN ADVANCED PROCEDURE!
This process will require manipulating storage at a very low level. It will also degrade your Anvil! for long periods of time. Please practice on a non-production test Anvil! before attempting in production.
When you do move to production, be sure you have good backups and please schedule a maintenance window, ideally, or at least a low-period as a minimum.

The ultimate goal of this tutorial is to expand the storage without causing any downtime for servers hosted on your Anvil!.

This tutorial will cover:

  1. How to grow add a new hard drive to an existing RAID array (using MegaCli64).
  2. Rebuilding existing partitions on each node to use the new space.
  3. Resizing DRBD resources.
  4. Growing the clustered LVM physical volumes.

Assumptions

This tutorial is designed as an extension to the main "AN!Cluster Tutorial 2 tutorial. As such, this tutorial assumes a matching config. If your Anvil! differs, please adjust this tutorial to match your install.

Growing The Physical RAID Array

Note: This assumes you are using an LSI based RAID controller and the MegaCli64 command line tool. If you are using software RAID or another hardware vendor's RAID controller, please consult the appropriate documentation for growing your RAID array.

The first step is to physically insert you new hard drive. Note that if you use a new disk that is larger than the existing disks, only a portion of the space will be used. For example, we will be installing a 450 GB hard drive into an array consisting of three 300 GB disks in RAID level 5. So our existing array is 600 GB large, and will grow to 900 GB, not 1,050 GB. This is because all disks in a given array must be the same size.

You can not use a new disk that is smaller than the drives already in the array.

Starting Point

Lets start by looking at what our current array looks like.

The logical disks:

an-c05n01
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 557.75 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 3
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No
an-c05n02
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 557.75 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 3
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

The physical disks in the logical disk:

an-c05n01 an-c05n02
MegaCli64 PDList aAll
Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 13
WWN: 5000C50043EE29E0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043ee29e1
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3T7X6    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :43C (109.40 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 12
WWN: 5000C5004310F4B4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5004310f4b5
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3CMMC    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :46C (114.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: N/A
Device Id: 11
WWN: 5000C500430189E4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500430189e5
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3CD2Z    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :42C (107.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No
MegaCli64 PDList aAll
Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 10
WWN: 5000C50043112280
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043112281
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3DE9Z    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :40C (104.00 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 9
WWN: 5000C5004312760C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5004312760d
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3DNG7    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :41C (105.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: N/A
Device Id: 8
WWN: 5000C50043126B4C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043126b4d
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3E01G    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :39C (102.20 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

Finally, the current partition table:

an-c05n01
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 599GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
an-c05n02
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 599GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical

So we see, in summary:

  • We have one logical disk that is 557.75 GiB (reported as GB by the controller, despite using Base2 sizes)
  • The logical disk comprises three physical disks, both nodes conveniently reporting matching disk addresses:
    • [252:0]; 279.396 GiB
    • [252:1]; 279.396 GiB
    • [252:2]; 279.396 GiB
  • Finally, parted reports both nodes partitioned with:
    • Partition 1, 537 MB (/boot)
    • Partition 2, 42.9 GB (/)
    • Partition 3, 2147 MB (<swap>)
    • Partition 4, 553 GB (extended partition containing the two logical partitions)
    • Partition 5, 287 GB (backing /dev/drbd0)
    • Partition 6, 266 GB (backing /dev/drbd1)

Now we're ready to start!

Inserting The New Disk

Note: As mentioned earlier, we will be using new 450 GB hard drives (one per node). However, because the existing drives are 300 GB, only the first 300 GB of the new drives will actually be used.

After ensuring that your nodes support hot-swap hard drives, insert the physical disks into an empty bay.

If your nodes do not support hot-swap, then withdraw the first node, install the new drive, reboot and rejoin it to the Anvil!. Migrate the servers and withdraw the second node. Power it off, install the new drive, and boot it back up, and rejoin the Anvil!.

In our case, the drives do support hot-swap, so we can install the drives while the node is running.

Note: If you are using drives that were previously in an array, they may be listed as Foreign. This will need to be cleared before you can use the drives to grow your array.

After physically inserting the new disk, we'll re-run MegaCli64 PDList aAll to see what address the new disk has.

an-c05n01 an-c05n02
MegaCli64 PDList aAll
Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 13
WWN: 5000C50043EE29E0
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043ee29e1
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3T7X6    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :43C (109.40 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 12
WWN: 5000C5004310F4B4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5004310f4b5
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3CMMC    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :46C (114.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: N/A
Device Id: 11
WWN: 5000C500430189E4
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c500430189e5
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3CD2Z    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :42C (107.60 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 6
Enclosure position: N/A
Device Id: 5
WWN: 5000CCA00F5CA29F
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 419.186 GB [0x3465f870 Sectors]
Non Coerced Size: 418.686 GB [0x3455f870 Sectors]
Coerced Size: 418.656 GB [0x34550000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: A42B
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000cca00f5ca29d
SAS Address(1): 0x0
Connected Port Number: 3(path0) 
Inquiry Data: HITACHI HUS156045VLS600 A42BJVWMYA6L            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :34C (93.20 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No
MegaCli64 PDList aAll
Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 10
WWN: 5000C50043112280
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043112281
SAS Address(1): 0x0
Connected Port Number: 2(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3DE9Z    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :41C (105.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Enclosure position: N/A
Device Id: 9
WWN: 5000C5004312760C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c5004312760d
SAS Address(1): 0x0
Connected Port Number: 1(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3DNG7    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :41C (105.80 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 2
Enclosure position: N/A
Device Id: 8
WWN: 5000C50043126B4C
Sequence Number: 2
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 279.396 GB [0x22ecb25c Sectors]
Non Coerced Size: 278.896 GB [0x22dcb25c Sectors]
Coerced Size: 278.875 GB [0x22dc0000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Online, Spun Up
Device Firmware Level: 1703
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000c50043126b4d
SAS Address(1): 0x0
Connected Port Number: 0(path0) 
Inquiry Data: SEAGATE ST3300657SS     17036SJ3E01G    @#87980 
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :39C (102.20 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No



Enclosure Device ID: 252
Slot Number: 6
Enclosure position: N/A
Device Id: 12
WWN: 5000CCA00FAEBD33
Sequence Number: 1
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS

Raw Size: 419.186 GB [0x3465f870 Sectors]
Non Coerced Size: 418.686 GB [0x3455f870 Sectors]
Coerced Size: 418.656 GB [0x34550000 Sectors]
Sector Size:  0
Logical Sector Size:  0
Physical Sector Size:  0
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: A42B
Shield Counter: 0
Successful diagnostics completion on :  N/A
SAS Address(0): 0x5000cca00faebd31
SAS Address(1): 0x0
Connected Port Number: 3(path0) 
Inquiry Data: HITACHI HUS156045VLS600 A42BJVY333DM            
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None 
Device Speed: 6.0Gb/s 
Link Speed: 6.0Gb/s 
Media Type: Hard Disk Device
Drive:  Not Certified
Drive Temperature :34C (93.20 F)
PI Eligibility:  No 
Drive is formatted for PI information:  No
PI: No PI
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Port-1 :
Port status: Active
Port's Linkspeed: Unknown 
Drive has flagged a S.M.A.R.T alert : No

There it is. On both nodes, it came up with the address [252:6].

Adding the New Disk to the Existing Array

Warning: In this example, there is only one physical controller and only one logical disk on that controller. So for us, we know that we're going to specify "a0" (adapter #0) and "L0" (logical disk #0). If you have multiple controllers and/or logical disks, be sure you specify the correct adapter and logical disk numbers!
Warning: This will likely put your array into WriteThrough caching mode, which could significantly reduce disk performance.

We will now add this new disk to controller 0 (a0), logical disk 0 (L0).

an-c05n01 an-c05n02
MegaCli64 LDRecon Start r5 add PhysDrv[252:6] L0 a0
Start Reconstruction of Virtual Drive Success.

Exit Code: 0x00
MegaCli64 LDRecon Start r5 add PhysDrv[252:6] L0 a0
Start Reconstruction of Virtual Drive Success.

Exit Code: 0x00

If you look at the front of your nodes, their drive LEDs should be lighting up like crazy!

Depending on your system, the logical disk may immediately show as the new size, or the old size will be shown until the rebuild is completed. You can check with MegaCli64 LDInfo Lall aAll:

an-c05n01
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 557.75 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 3
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Ongoing Progresses:
  Reconstruction           : Completed 7%, Taken 20 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No
an-c05n02
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 557.75 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 3
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAheadNone, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Ongoing Progresses:
  Reconstruction           : Completed 7%, Taken 20 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

In this case, the arrays are still showing as 557.75 GiB. Note also that the caching policy has changed to WriteThrough, so we can expect disk access speeds to be reduced.

So now we wait.

Note: On my Fujitsu RX300 S6 nodes, this rebuild too approximately 4 hours to complete. How long it takes on your systems will depend on the speed of the controllers and disks, the size of the disks and the number of disks already in the array.

Once the rebuild is done, the arrays should show their new size:

an-c05n01
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 836.625 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 4
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Ongoing Progresses:
  Background Initialization: Completed 99%, Taken 34 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No
an-c05n02
MegaCli64 LDInfo Lall aAll
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-5, Secondary-0, RAID Level Qualifier-3
Size                : 836.625 GB
Sector Size         : 512
Is VD emulated      : No
Parity Size         : 278.875 GB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 4
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disabled
Ongoing Progresses:
  Background Initialization: Completed 89%, Taken 24 min.
Encryption Type     : None
Bad Blocks Exist: No
Is VD Cached: No

We can see that the new size is 836.625 GiB and the caching policy has returned to WriteBack.

Updating the Kernel's View

If we look at the partition table, we'll see that the kernel still things that /dev/sda is the old size.

an-c05n01 an-c05n02
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 599GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 599GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical

We could reboot now to update it, but there is a way to tell the kernel to rescan without rebooting. The following command doesn't print anything to the command line, but it does cause an entry in the system logs.

an-c05n01
echo 1 > /sys/block/sda/device/rescan
tail -n 3 /var/log/messages
Aug  4 01:45:42 an-c05n01 kernel: sd 0:2:0:0: [sda] 1754529792 512-byte logical blocks: (898 GB/836 GiB)
Aug  4 01:45:42 an-c05n01 kernel: sd 0:2:0:0: [sda] 4096-byte physical blocks
Aug  4 01:45:42 an-c05n01 kernel: sda: detected capacity change from 598879502336 to 898319253504
an-c05n02
echo 1 > /sys/block/sda/device/rescan
tail -n 3 /var/log/messages
Aug  4 01:46:16 an-c05n02 kernel: sd 0:2:0:0: [sda] 1754529792 512-byte logical blocks: (898 GB/836 GiB)
Aug  4 01:46:16 an-c05n02 kernel: sd 0:2:0:0: [sda] 4096-byte physical blocks
Aug  4 01:46:16 an-c05n02 kernel: sda: detected capacity change from 598879502336 to 898319253504

Now the partition table should be updated.

an-c05n01 an-c05n02
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
        599GB   898GB   299GB             Free Space
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
        599GB   898GB   299GB             Free Space

There is the new space!

Updating Partition Geometry

Planning the new Partitions

Warning: The next steps are quite risky! The data on each node will be deleted, causing the Anvil! to be degraded while the DRBD resources run a full re-sync! Plan accordingly and double-check your backups!

We've now added 300 GB to each node, it's time to decide where we will assign it. For the sake of this tutorial, we will divvy it up evenly between the two existing DRBD resource. You can, of course, divvy up the space however you want.

As we see above, the current size of the partitions backing our DRBD resources are:

  • /dev/sda5: 287 GB
  • /dev/sda6: 266 GB

We have 299 GB unallocated, which we will essentially divide in two. So the new desired sizes are:

  • /dev/sda5: 436 GB (287 + 149)
  • /dev/sda6: 416 GB (266 + 150)

Knowing this, we will delete partitions 4 (extended partition), 5 (backing /dev/drbd0) and 6 (backing /dev/drbd1). We will recreate them with the new geometry (allowing for rounding):

Partition Number Start End Size
4 45.6G 898G 852G
5 45.6G 482G 437G
6 482G 898G 416G

Now we're ready!

Rebuilding Partition on an-c05n01

We're going to rebuild an-c05n01 first. To do this, we need to pull it out of the Anvil!. We can't leave it in the cluster and put the node into Diskless state because we're going to be rebuilding DRBD's backing devices, including creating new meta-data.

So start by migrating servers off of an-c05n01 and then withdraw it from the cluster.

Once an-c05n01 is out of the Anvil! entirely, we're ready to start.

We're going to tell DRBD to wipe the existing meta-data, use parted to delete the existing partitions (4, 5 and 6), then we will recreate them with the new geometry. Once done, we'll reboot to ensure the kernel sees the new partitions properly. Finally, we'll set them up as new DRBD backing devices and reconnect them to an-c05n02 and let the sync.

First, verify that an-c05n01 is no longer in the Anvil!.

an-c05n01
clustat
Could not connect to CMAN: No such file or directory
an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 12:01:55 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Offline
 an-c05n02.alteeve.ca                                       2 Online, Local, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          (an-c05n01.alteeve.ca)                        stopped       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           (an-c05n01.alteeve.ca)                        stopped       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n02.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n02.alteeve.ca                          started

Good.

Now, because this is such an invasive procedure, lets be extra careful and ensure that drbd is also showing the node as stopped.

an-c05n01
/etc/init.d/drbd status
drbd not loaded
an-c05n02
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
m:res  cs            ro               ds                 p  mounted  fstype
0:r0   WFConnection  Primary/Unknown  UpToDate/Outdated  C
1:r1   WFConnection  Primary/Unknown  UpToDate/Outdated  C

Excellent, we're ready to start!

Note: For the next little bit, we will be working on an-c05n01. It's a good idea to log out of an-c05n02 entirely, just to reduce the risk of an errant command.

We don't want the current meta-data on DRBD to be reused, as it might think the data on the partitions is partially reusable after changing the partition geometry.

an-c05n01
drbdadm wipe-md r{0,1}
Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes
Wiping meta data...
DRBD meta data block successfully wiped out.

Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes
Wiping meta data...
DRBD meta data block successfully wiped out.

Now let's take one more look at parted to re-check that it really is partition's 4, 5 and 6 that we want to delete.

(Yes, paranoid. That's the only safe operating mode!)

an-c05n01
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
        599GB   898GB   299GB             Free Space

Excellent, let's delete the partitions.

an-c05n01
parted -a opt /dev/sda "rm 6"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.
an-c05n01
parted -a opt /dev/sda "rm 5"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.
an-c05n01
parted -a opt /dev/sda "rm 4"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.

Verify that the partitions are gone.

an-c05n01
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
        32.3kB  1049kB  1016kB           Free Space
 1      1049kB  538MB   537MB   primary  ext4            boot
 2      538MB   43.5GB  42.9GB  primary  ext4
 3      43.5GB  45.6GB  2147MB  primary  linux-swap(v1)
        45.6GB  898GB   853GB            Free Space

Excellent. Now we can create the new extended partition.

an-c05n01
parted -a opt /dev/sda "mkpart extended 45.6G 898G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.

Now the new fifth partition:

an-c05n01
parted -a opt /dev/sda "mkpart logical 45.6G 482G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.

Finally, the last partition.

an-c05n01
parted -a opt /dev/sda "mkpart logical 482G 898G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.

Lets make sure the new geometry is what we want:

an-c05n01
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  898GB   853GB   extended                  lba
 5      45.6GB  482GB   436GB   logical
 6      482GB   898GB   416GB   logical
Warning: If you are using safe_anvil_start, please be sure to disable it!

Excellent!

As you saw with the warnings that were printed, we need to reboot to ensure that the kernel sees the new partition geometry. This is because / is on /dev/sda. If you're resizing a device that doesn't host /, you can simply run partprobe /dev/sdX to ensure the kernel has the updated geometry and avoid a reboot. You will know when you can do this because you will not have seen any "WARNING: the kernel failed to re-read the partition table..." messages.

an-c05n01
reboot

Rejoining an-c05n01 to the Anvil!

Warning: This next step is tricky, and must be followed exactly to ensure a full resync of data is done. Please do not rush or skip any steps!
Note: The next steps will have an-c05n01 running cman only, not rgmanager. This upsets Striker, so please be sure no one uses the Anvil! dashboard until you are done with the next sections.

We're going to zero out the start of the resized partitions, create new DRBD meta-data, invalidate the data on the partitions, manually start cman and then connect the resized partitions to our peer. Once this is done and the resync has started, we will start rgmanager and restore the node to the Anvil! proper.

Make sure that the node is still out of the Anvil! and didn't accidentally try to join the cluster:

an-c05n01
clustat
Could not connect to CMAN: No such file or directory
an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 12:39:48 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Offline
 an-c05n02.alteeve.ca                                       2 Online, Local, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          (an-c05n01.alteeve.ca)                        stopped       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           (an-c05n01.alteeve.ca)                        stopped       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n02.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n02.alteeve.ca                          started

Lets check DRBD:

an-c05n01
/etc/init.d/drbd status
drbd not loaded
an-c05n02
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
m:res  cs            ro               ds                 p  mounted  fstype
0:r0   WFConnection  Primary/Unknown  UpToDate/Outdated  C
1:r1   WFConnection  Primary/Unknown  UpToDate/Outdated  C

We're good to go.

Before we start, we're going to zero-out the new partitions. This will help ensure that DRBD doesn't think it sees the old LVM data and refuses to create the new meta-data.

an-c05n01
dd if=/dev/zero of=/dev/sda5 bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 8.41807 s, 498 MB/s
dd if=/dev/zero of=/dev/sda6 bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 10.8088 s, 388 MB/s
Note: In the following example, despite wiping the old meta-data, it still found the old data. If you see the same questions, please answer "yes" to overwrite the old meta-data.

Now we will load the drbd kernel module and then create our new meta-data.

an-c05n01
modprobe drbd
drbdadm create-md r{0,1}
You want me to create a v08 style flexible-size internal meta data block.
There appears to be a v08 flexible-size internal meta data block
already in place on /dev/sda5 at byte offset 436363849728
Do you really want to overwrite the existing v08 meta-data?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
You want me to create a v08 style flexible-size internal meta data block.
There appears to be a v08 flexible-size internal meta data block
already in place on /dev/sda6 at byte offset 416318222336
Do you really want to overwrite the existing v08 meta-data?
[need to type 'yes' to confirm] yes
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.

Now we're going to attach the backing devices.

an-c05n01
drbdadm attach r{0,1}
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:426123532
 1: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:406548324

To be extra careful, we're going to tell DRBD to invalidate the backing devices. This is to ensure that DRBD doesn't see any old data and think it might be useful. This way, when the resync is complete, we can be certain our copy of the data is complete.

an-c05n01
drbdadm invalidate r{0,1}
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:426123532
 1: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:406548324

The oos:X values didn't change, so that was probably unnecessary, but it's always better to be extra safe.

The next step will be to connect to an-c05n02, but before we do, we need to make sure our fencing is available. This is provided by cman, so before we proceed, we will start cman and make sure it properly joined an-c05n02.

Check the current state:

an-c05n01
clustat
Could not connect to CMAN: No such file or directory
an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 15:07:42 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Offline
 an-c05n02.alteeve.ca                                       2 Online, Local, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          (an-c05n01.alteeve.ca)                        stopped       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           (an-c05n01.alteeve.ca)                        stopped       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n02.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n02.alteeve.ca                          started

Now connect.

an-c05n01
/etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Tuning DLM kernel config...                             [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 15:08:41 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local
 an-c05n02.alteeve.ca                                       2 Online
an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 15:09:30 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online
 an-c05n02.alteeve.ca                                       2 Online, Local, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          (an-c05n01.alteeve.ca)                        stopped       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           (an-c05n01.alteeve.ca)                        stopped       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n02.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n02.alteeve.ca                          started

Now that fencing is available, we can safely connect our DRBD resources.

an-c05n01
drbdadm connect r{0,1}
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:73728 dw:73728 dr:0 al:0 bm:4 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:280546916
	[>....................] sync'ed:  0.1% (273968/274040)M
	finish: 6:19:07 speed: 12,288 (12,288) want: 30,720 K/sec
 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:1920 dw:1920 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:259636584
	[>....................] sync'ed:  0.1% (253548/253552)M
	finish: 216:21:49 speed: 320 (320) want: 250 K/sec

If you note, it looks like r1 didn't honour the defined requested resync rate of 30M, so lets give it a kick.

an-c05n01
drbdadm adjust all
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:1164363 dw:1164363 dr:0 al:0 bm:71 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:279456308
	[>....................] sync'ed:  0.5% (272904/274040)M
	finish: 8:15:29 speed: 9,388 (10,036) want: 30,720 K/sec
 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:117249 dw:117249 dr:0 al:0 bm:7 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:259521256
	[>....................] sync'ed:  0.1% (253436/253552)M
	finish: 15:05:05 speed: 4,776 (1,008) want: 30,720 K/sec

That's better.

Now we can safely start rgmanager.

an-c05n01
/etc/init.d/rgmanager start
Starting Cluster Service Manager:                          [  OK  ]
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 15:14:10 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local, rgmanager
 an-c05n02.alteeve.ca                                       2 Online, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n02.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n02.alteeve.ca                          started
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:645617 dw:645129 dr:8900 al:0 bm:40 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:277834280
	[>....................] sync'ed:  0.3% (271320/271952)M
	finish: 4:58:26 speed: 15,492 (15,000) want: 30,720 K/sec
 1: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:645276 dw:645120 dr:780 al:0 bm:40 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:258043624
	[>....................] sync'ed:  0.3% (251992/252624)M
	finish: 4:37:10 speed: 15,512 (15,000) want: 30,720 K/sec

Excellent!

This is where we stop for a while. We can not proceed with repartitioning an-c05n02 until an-c05n01 is UpToDate on both resources.

Rebuilding Partition on an-c05n02

Warning: DO NOT PROCEED UNTIL BOTH RESOURCES ARE UpToDate ON an-c05n01!

Once an-c05n01 is fully UpToDate, we'll be ready to migrate the servers over and withdraw an-c05n02 from the Anvil!.

So first step, verify both nodes are UpToDate:

an-c05n01
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:1208 nr:278491534 dw:278484770 dr:94392 al:4 bm:16998 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:0 nr:258710794 dw:258696946 dr:84936 al:0 bm:15791 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
an-c05n02
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:364891927 nr:1716 dw:21702 dr:365385907 al:78 bm:22288 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
 1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
    ns:260120301 nr:0 dw:593485 dr:261102972 al:2010 bm:15889 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

Took a while, didn't it?

Now to move the servers from an-c05n02 to an-c05n01 and then withdraw an-c05n02 from the Anvil!.

an-c05n01
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:01:47 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local, rgmanager
 an-c05n02.alteeve.ca                                       2 Offline

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          (an-c05n02.alteeve.ca)                        stopped       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           (an-c05n02.alteeve.ca)                        stopped       
 vm:vm01-c7                                    an-c05n01.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n01.alteeve.ca                          started
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
m:res  cs            ro               ds                 p  mounted  fstype
0:r0   WFConnection  Primary/Unknown  UpToDate/Outdated  C
1:r1   WFConnection  Primary/Unknown  UpToDate/Outdated  C
an-c05n02
clustat
Could not connect to CMAN: No such file or directory
/etc/init.d/drbd status
drbd not loaded

Now we're ready to delete the old partitions on an-c05n02 and recreate them with the new sizes. We're going to match the geometry we used earlier for an-c05n01.

The first step, if you recall, was to wipe the old DRBD meta-data:

an-c05n02
drbdadm wipe-md r{0,1}
Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes
Wiping meta data...
DRBD meta data block successfully wiped out.

Do you really want to wipe out the DRBD meta data?
[need to type 'yes' to confirm] yes
Wiping meta data...
DRBD meta data block successfully wiped out.

Now lets take a look at the current partition table, out of excess paranoia again.

an-c05n02
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  599GB   553GB   extended                  lba
 5      45.6GB  333GB   287GB   logical
 6      333GB   599GB   266GB   logical
        599GB   898GB   299GB             Free Space

Ok, so we're ready to delete partitions 4, 5 and 6.

an-c05n02
parted -a opt /dev/sda "rm 6"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.
parted -a opt /dev/sda "rm 5"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.
parted -a opt /dev/sda "rm 4"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
Information: You may need to update /etc/fstab.
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type     File system     Flags
        32.3kB  1049kB  1016kB           Free Space
 1      1049kB  538MB   537MB   primary  ext4            boot
 2      538MB   43.5GB  42.9GB  primary  ext4
 3      43.5GB  45.6GB  2147MB  primary  linux-swap(v1)
        45.6GB  898GB   853GB            Free Space

There, they're gone. Now we'll recreate the partitions to match the geometry of the an-c05n01.

an-c05n02
parted -a opt /dev/sda "mkpart extended 45.6G 898G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
parted -a opt /dev/sda "mkpart logical 45.6G 482G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
parted -a opt /dev/sda "mkpart logical 482G 898G"
Warning: WARNING: the kernel failed to re-read the partition table on /dev/sda (Device or resource busy).  As
a result, it may not reflect all of your changes until after reboot.
parted -a opt /dev/sda "print free"
Model: LSI RAID 5/6 SAS 6G (scsi)
Disk /dev/sda: 898GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos

Number  Start   End     Size    Type      File system     Flags
        32.3kB  1049kB  1016kB            Free Space
 1      1049kB  538MB   537MB   primary   ext4            boot
 2      538MB   43.5GB  42.9GB  primary   ext4
 3      43.5GB  45.6GB  2147MB  primary   linux-swap(v1)
 4      45.6GB  898GB   853GB   extended                  lba
 5      45.6GB  482GB   436GB   logical
 6      482GB   898GB   416GB   logical

Excellent!

Now, as before, we will have to reboot because / is on /dev/sda, so a reboot is needed to ensure the kernel can see the new geometry.

an-c05n02
reboot

Rejoining an-c05n02 to the Anvil!

Warning: This next step is tricky, and must be followed exactly to ensure a full resync of data is done. Please do not rush or skip any steps!
Note: The next steps will have an-c05n01 running cman only, not rgmanager. This upsets Striker, so please be sure no one uses the Anvil! dashboard until you are done with the next sections.

As before, we're going to zero out the start of the resized partitions, create new DRBD meta-data, invalidate the data on the partitions, manually start cman and then connect the resized partitions to our peer. Once this is done and the resync has started, we will start rgmanager and restore the node to the Anvil! proper.

Make sure that the node is still out of the Anvil! and didn't accidentally try to join the cluster:

an-c05n01
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:26:34 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local, rgmanager
 an-c05n02.alteeve.ca                                       2 Offline

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          (an-c05n02.alteeve.ca)                        stopped       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           (an-c05n02.alteeve.ca)                        stopped       
 vm:vm01-c7                                    an-c05n01.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n01.alteeve.ca                          started
an-c05n02
clustat
Could not connect to CMAN: No such file or directory

Double-check DRBD as well:

an-c05n01
/etc/init.d/drbd status
drbd driver loaded OK; device status:
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
m:res  cs            ro               ds                 p  mounted  fstype
0:r0   WFConnection  Primary/Unknown  UpToDate/Outdated  C
1:r1   WFConnection  Primary/Unknown  UpToDate/Outdated  C
an-c05n02
/etc/init.d/drbd status
drbd not loaded

Perfect.

Now we'll zero-out the start of both partitions, just to ensure that DRBD doesn't see any remnants of old data.

an-c05n01
dd if=/dev/zero of=/dev/sda5 bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 8.38742 s, 500 MB/s
dd if=/dev/zero of=/dev/sda6 bs=4M count=1000
1000+0 records in
1000+0 records out
4194304000 bytes (4.2 GB) copied, 9.52675 s, 440 MB/s

We're ready to load the drbd kernel module and create the new meta-data.

an-c05n02
modprobe drbd
drbdadm create-md r{0,1}
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success
strange bm_offset -24824 (expected: -24880)
strange bm_offset -24824 (expected: -24880)
Writing meta data...
initializing activity log
NOT initialized bitmap
New drbd meta data block successfully created.
success

Unlike last time, it didn't see the old meta-data, so we didn't have to type yes. Do note that it found an odd bm_offset, this is harmless.

We're ready to attach the backing devices now.

an-c05n02
drbdadm attach r{0,1}
cat /proc/drbd
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:426123532
 1: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:406548324

As we did with an-c05n01, we're going to play it safe and invalidate both backing devices, just to be sure DRBD does a full resync.

an-c05n02
drbdadm invalidate r{0,1}
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:426123532
 1: cs:StandAlone ro:Secondary/Unknown ds:Inconsistent/DUnknown   r----s
    ns:0 nr:0 dw:0 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:406548324

Again, the oos:X counts didn't change, but it's still better to be safe.

Now we'll start cman alone to get fencing and then connect our resources.

an-c05n02
/etc/init.d/cman start
Starting cluster: 
   Checking if cluster has been disabled at boot...        [  OK  ]
   Checking Network Manager...                             [  OK  ]
   Global setup...                                         [  OK  ]
   Loading kernel modules...                               [  OK  ]
   Mounting configfs...                                    [  OK  ]
   Starting cman...                                        [  OK  ]
   Waiting for quorum...                                   [  OK  ]
   Starting fenced...                                      [  OK  ]
   Starting dlm_controld...                                [  OK  ]
   Tuning DLM kernel config...                             [  OK  ]
   Starting gfs_controld...                                [  OK  ]
   Unfencing self...                                       [  OK  ]
   Joining fence domain...                                 [  OK  ]
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:46:52 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online
 an-c05n02.alteeve.ca                                       2 Online, Local

Verify by checking the other node.

an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:47:48 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local, rgmanager
 an-c05n02.alteeve.ca                                       2 Online

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          (an-c05n02.alteeve.ca)                        stopped       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           (an-c05n02.alteeve.ca)                        stopped       
 vm:vm01-c7                                    an-c05n01.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n01.alteeve.ca                          started

Good.

Now we'll connect the resources and make sure they start sync'ing.

an-c05n02
drbdadm connect r{0,1}
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:12288 dw:12288 dr:0 al:0 bm:0 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:426111244
	[>....................] sync'ed:  0.1% (416124/416136)M
	finish: 9:32:43 speed: 12,288 (12,288) want: 30,720 K/sec
 1: cs:SyncTarget ro:Secondary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:12288 dw:12288 dr:0 al:0 bm:0 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:406536036
	[>....................] sync'ed:  0.1% (397004/397016)M
	finish: 9:06:25 speed: 12,288 (12,288) want: 30,720 K/sec

Excellent! They're both sync'ing at the desired 30M, so no need to drbdadm adjust all this time.

Let's start up rgmanager now!

an-c05n02
/etc/init.d/rgmanager start
Starting Cluster Service Manager:                          [  OK  ]

If everything went properly, both nodes should be back in the Anvil! and resync'ing.

an-c05n01
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:53:21 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, Local, rgmanager
 an-c05n02.alteeve.ca                                       2 Online, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n01.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n01.alteeve.ca                          started
an-c05n02
clustat
Cluster Status for an-cluster-05 @ Mon Aug  4 20:53:32 2014
Member Status: Quorate

 Member Name                                            ID   Status
 ------ ----                                            ---- ------
 an-c05n01.alteeve.ca                                       1 Online, rgmanager
 an-c05n02.alteeve.ca                                       2 Online, Local, rgmanager

 Service Name                                  Owner (Last)                                  State         
 ------- ----                                  ----- ------                                  -----         
 service:libvirtd_n01                          an-c05n01.alteeve.ca                          started       
 service:libvirtd_n02                          an-c05n02.alteeve.ca                          started       
 service:storage_n01                           an-c05n01.alteeve.ca                          started       
 service:storage_n02                           an-c05n02.alteeve.ca                          started       
 vm:vm01-c7                                    an-c05n01.alteeve.ca                          started       
 vm:vm02-win2008r2                             an-c05n01.alteeve.ca                          started

Let's check DRBD, just to be careful:

an-c05n01
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncSource ro:Primary/Primary ds:UpToDate/Inconsistent C r-----
    ns:992848 nr:278492177 dw:278487965 dr:1132716 al:60 bm:17058 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:425133452
	[>....................] sync'ed:  0.1% (415168/415212)M
	finish: 421:45:33 speed: 276 (272) K/sec
 1: cs:SyncSource ro:Primary/Primary ds:UpToDate/Inconsistent C r-----
    ns:1002809 nr:258710944 dw:258698014 dr:1137688 al:30 bm:15852 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:405546000
	[>....................] sync'ed:  0.1% (396040/396080)M
	finish: 402:19:38 speed: 272 (272) K/sec
an-c05n02
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:45811 dw:45195 dr:7168 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:425132164
	[>....................] sync'ed:  0.1% (415168/415212)M
	finish: 408:46:51 speed: 288 (272) want: 250 K/sec
 1: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:45433 dw:45149 dr:1564 al:0 bm:3 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:405544712
	[>....................] sync'ed:  0.1% (396036/396080)M
	finish: 389:56:48 speed: 288 (272) want: 250 K/sec

Doh! The sync rate is low, so we'll kick it.

an-c05n02
drbdadm adjust all
cat /proc/drbd
version: 8.3.16 (api:88/proto:86-97)
GIT-hash: a798fa7e274428a357657fb52f0ecf40192c1985 build by root@rhel6-builder.alteeve.ca, 2014-04-20 12:16:31
 0: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:167979 dw:167239 dr:7732 al:0 bm:10 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:425010144
	[>....................] sync'ed:  0.1% (415048/415212)M
	finish: 23:03:29 speed: 5,112 (652) want: 30,720 K/sec
 1: cs:SyncTarget ro:Primary/Primary ds:Inconsistent/UpToDate C r-----
    ns:0 nr:167578 dw:167170 dr:2160 al:0 bm:10 lo:0 pe:24 ua:0 ap:0 ep:1 wo:f oos:405422692
	[>....................] sync'ed:  0.1% (395920/396080)M
	finish: 21:59:44 speed: 5,112 (652) want: 30,720 K/sec

That's better.

At this point, you can proceed on to the next step without waiting for an-c05n02's resources to finish sync'ing.

Updating Clustered LVM

If we had simply extended a DRBD resource, we would first have to grow it to move the meta-data to the new end of the backing device. In our case though, we deleted the old meta-data entirely and then recreated it, so we don't need to worry about that.

So the next step, then, is to grow the clustered LVM physical volume.

Let's see what the PV looks like, to start.

an-c05n01
pvscan
  PV /dev/drbd1   VG an-c05n02_vg0   lvm2 [247.61 GiB / 47.61 GiB free]
  PV /dev/drbd0   VG an-c05n01_vg0   lvm2 [267.62 GiB / 27.62 GiB free]
  Total: 2 [515.23 GiB] / in use: 2 [515.23 GiB] / in no VG: 0 [0   ]
an-c05n02
pvscan
  PV /dev/drbd1   VG an-c05n02_vg0   lvm2 [247.61 GiB / 47.61 GiB free]
  PV /dev/drbd0   VG an-c05n01_vg0   lvm2 [267.62 GiB / 27.62 GiB free]
  Total: 2 [515.23 GiB] / in use: 2 [515.23 GiB] / in no VG: 0 [0   ]

Still the old size, as expected. So now we tell LVM to resize the two DRBD-backed PVs. This can be done from either node. We'll use an-c05n01 because reasons.

an-c05n01
pvresize -v /dev/drbd0
    Using physical volume(s) on command line
    Archiving volume group "an-c05n01_vg0" metadata (seqno 5).
    Resizing volume "/dev/drbd0" to 561241288 sectors.
    Resizing physical volume /dev/drbd0 from 0 to 104033 extents.
    Updating physical volume "/dev/drbd0"
    Creating volume group backup "/etc/lvm/backup/an-c05n01_vg0" (seqno 6).
  Physical volume "/dev/drbd0" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized
pvresize -v /dev/drbd1
    Using physical volume(s) on command line
    Archiving volume group "an-c05n02_vg0" metadata (seqno 4).
    Resizing volume "/dev/drbd1" to 519277008 sectors.
    Resizing physical volume /dev/drbd1 from 0 to 99254 extents.
    Updating physical volume "/dev/drbd1"
    Creating volume group backup "/etc/lvm/backup/an-c05n02_vg0" (seqno 5).
  Physical volume "/dev/drbd1" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Now we'll check pvscan again from both nodes. If all went well, they'll be the new size.

an-c05n01
pvscan
  PV /dev/drbd1   VG an-c05n02_vg0   lvm2 [387.71 GiB / 187.71 GiB free]
  PV /dev/drbd0   VG an-c05n01_vg0   lvm2 [406.38 GiB / 166.38 GiB free]
  Total: 2 [794.09 GiB] / in use: 2 [794.09 GiB] / in no VG: 0 [0   ]
an-c05n02
pvscan
  PV /dev/drbd1   VG an-c05n02_vg0   lvm2 [387.71 GiB / 187.71 GiB free]
  PV /dev/drbd0   VG an-c05n01_vg0   lvm2 [406.38 GiB / 166.38 GiB free]
  Total: 2 [794.09 GiB] / in use: 2 [794.09 GiB] / in no VG: 0 [0   ]

That's it, we're done!

If you want to check, you will see that the volume groups already see the new space, so there is nothing more for us to do.

an-c05n01
vgdisplay
  --- Volume group ---
  VG Name               an-c05n02_vg0
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  5
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                1
  Open LV               1
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               387.71 GiB
  PE Size               4.00 MiB
  Total PE              99254
  Alloc PE / Size       51200 / 200.00 GiB
  Free  PE / Size       48054 / 187.71 GiB
  VG UUID               vF9KbQ-X02p-uUSN-gGxc-fqVV-Ufgt-DxvMme
   
  --- Volume group ---
  VG Name               an-c05n01_vg0
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  6
  VG Access             read/write
  VG Status             resizable
  Clustered             yes
  Shared                no
  MAX LV                0
  Cur LV                2
  Open LV               2
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               406.38 GiB
  PE Size               4.00 MiB
  Total PE              104033
  Alloc PE / Size       61440 / 240.00 GiB
  Free  PE / Size       42593 / 166.38 GiB
  VG UUID               i61Whg-en6H-dd2S-DLrK-TqhA-7lyg-H9xmOL

That wasn't the fastest procedure, but it wasn't that bad, now was it? :)

 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.