Managing Drive Failures with AN!CDB: Difference between revisions
Line 111: | Line 111: | ||
[[Image:an-cdb_storage-control_12.png|thumb|800px|center|Physical drive <span class="code">252:0</span> is back, but it's had better days.]] | [[Image:an-cdb_storage-control_12.png|thumb|800px|center|Physical drive <span class="code">252:0</span> is back, but it's had better days.]] | ||
{{note|1=If the drive was marked bad automatically by the controller, do not try to repair it. Replace it!}} | |||
In this state, the physical drive is useless. Before we can use it, we must click on the <span class="code">Make Good</span> link by the drive's <span class="code">State</span>. | |||
[[Image:an-cdb_storage-control_13.png|thumb|800px|center|Physical drive <span class="code">252:0</span> degraded state and it's "<span class="code">Make Good</span>" button.]] | |||
Once you click on the <span class="code">Make Good</span>, the drive will be flagged as healthy and brought online. | |||
[[Image:an-cdb_storage-control_14.png|thumb|800px|center|Physical drive <span class="code">252:0</span> made good and brought online.]] | |||
Now the physical disk is usable again. | |||
== Adding the Recovered/Replacement Drive to the Degraded Array == | |||
[[Image:an-cdb_storage-control_15.png|thumb|800px|center|Physical drive <span class="code">252:0</span> made good and brought online.]] | |||
Revision as of 07:28, 11 February 2014
Note: At this time, only LSI-based controllers are supported. Please see this section of the AN!Cluster Tutorial 2 for required node configuration. |
The most common repaid needed on Anvil! nodes is the replacement of failing or failed physical disks.
AN!CDB provides a very easy to use interface for managing this. In this tutorial, we will physically eject a drive from a small running logical volume, simulating a failure.
Introducing AN!CDB Drive Management
On the main AN!CDB page, you can click on either node's name in the "Cluster Nodes - Control" section.
Click on the name of the node you want to work on. In our case, we will work on an-c05n01.alteeve.ca.
Storage Display Window
The storage display window shows your storage controller(s), their auxiliary power supply for write-back caching if installed, the logical disk(s) and each logical disk's constituent drives.
The auxiliary power and logical disks will be slightly indented under their parent controller.
The physical disks associates with a given logical disk are further indented, to show their association.
In this example, we have only one RAID controller, it has an auxiliary power pack and a single logical volume has been created.
The Logical volume is a RAID level 5 array with four physical disks.
Managing the Physical Disk Identification ID Light
The first task we will explore is using identification lights to find a physical drive in a node.
If a drive fails completely, it's fault light will light up, making the failed drive easy to find. However, the AN!CDB alert system can notify us of pending failures.
In these cases, the drive's fault light may not illuminate. So it becomes critical to identify the failing drive. Removing the wrong drive, when another drive is unhealthy, may well leave your node non-operational.
That's no fun.
Each physical drive, whether in an array or unconfigured, will have a pair of buttons labelled Turn On and Turn Off. Which you click will determine if the drive's ID light illuminates or turns off.
Illumination a Drive's ID Light
Let's illuminate!
We will identify the drive with the somewhat-cryptic name '252:0'.
The storage page will reload, indicating whether the command succeeded or not.
If you now look at the front of your node, you should see one of the drives lit up.
Most excellent.
Shutting off a Drive's ID Light
To turn the ID light off, simply click on the drive's Turn Off button.
As before, the success or failure will be reported.
Refreshing The Storage Page
Warning: AN!CDB doesn't (yet) use a command key to prevent a request being sent again if a page is manually reloaded (ctrl + r, <f5>, etc). In most all cases, this is harmless as AN!CDB won't do something dangerous without verifying it is still safe to do so. Just the same, please always use the "reload" icon shown below. |
After issuing a command to the storage manager, please do not use your browser's "refresh" function. It is always better to click on the reload icon.
This will reload the page with the most up to date state of the storage in your node.
Failure Recovery
Now the fun part; Breaking things!
Failing a Drive
Warning: Physically ejecting a drive that is powered up and running is very dangerous! The electrical connections will be live and there is a possibility of a short destroying components. Further, platter-based drives will be under centripetal force, and moving the physical drive out of it's current rotational plane risks a very destructive head crash. DO NOT EJECT A POWERED ON, SPINNING DRIVE! Like, EVAR. We understood the risks when writing this tutorial and, even then, used development nodes. |
For this tutorial, we will physically eject the drive we identified earlier, called '252:0', which is a member of the logical drive #0. This will cause the logical drive to enter a degraded state and the ejected drive will completely vanish from the storage page.
Now we will eject the drive.
Notice how 252:0 is gone and how the logical drive's state is now Degraded?
Underneath the "Degraded" state is more detail on which drive is missing and how big the replacement drive has to be.
Recovering the Ejected Disk
In our case, we know that the drive we ejected is healthy, so we will re-insert it into the bay.
If your drive really is failed, the remove the drive and insert the new replacement drive.
Once inserted, the drive will be marked by the controller as Unconfigured(bad). It will also be listed as an "Unconfigured Physical Disk".
Note: If the drive was marked bad automatically by the controller, do not try to repair it. Replace it! |
In this state, the physical drive is useless. Before we can use it, we must click on the Make Good link by the drive's State.
Once you click on the Make Good, the drive will be flagged as healthy and brought online.
Now the physical disk is usable again.
Adding the Recovered/Replacement Drive to the Degraded Array
Any questions, feedback, advice, complaints or meanderings are welcome. | |||
Alteeve's Niche! | Enterprise Support: Alteeve Support |
Community Support | |
© Alteeve's Niche! Inc. 1997-2024 | Anvil! "Intelligent Availability®" Platform | ||
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions. |