Managing Drive Failures with AN!CDB: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
(Created page with "{{howto_header}} {{note|1=At this time, only LSI-based controllers are supported. Please see [[AN!Cluster_Tutorial_2#Monitoring_LSI-Based_RAID_Controllers_with_MegaCli|this s...")
 
Line 70: Line 70:


= Refreshing The Storage Page =
= Refreshing The Storage Page =
{{warning|1=AN!CDB doesn't (yet) use a command key to prevent a request being sent again if a page is manually reloaded (<span class="code">ctrl + r</span>, <span class="code"><f5></span>, etc). In most all cases, this is harmless as AN!CDB won't do something dangerous without verifying it is still safe to do so. Just the same, please always use the "reload" icon shown below.}}
After issuing a command to the storage manager, please '''do not''' use your browser's "refresh" function. It is always better to click on the reload icon.


[[Image:an-cdb_storage-control_08.png|thumb|800px|center|Storage page "refresh" icon.]]
[[Image:an-cdb_storage-control_08.png|thumb|800px|center|Storage page "refresh" icon.]]
This will reload the page with the most up to date state of the storage in your node.
[[Image:an-cdb_storage-control_09.png|thumb|800px|center|Storage page reloaded properly.]]
= Failure Recovery =
Now the fun part; Breaking things!
[[Image:an-cdb_storage-control_10.png|thumb|800px|center|Storage page pre-failure state.]]





Revision as of 07:04, 11 February 2014

 AN!Wiki :: How To :: Managing Drive Failures with AN!CDB

Note: At this time, only LSI-based controllers are supported. Please see this section of the AN!Cluster Tutorial 2 for required node configuration.

The most common repaid needed on Anvil! nodes is the replacement of failing or failed physical disks.

AN!CDB provides a very easy to use interface for managing this. In this tutorial, we will physically eject a drive from a small running logical volume, simulating a failure.

Introducing AN!CDB Drive Management

On the main AN!CDB page, you can click on either node's name in the "Cluster Nodes - Control" section.

AN!CDB main page with node names as click-able items.

Click on the name of the node you want to work on. In our case, we will work on an-c05n01.alteeve.ca.

Storage Display Window

The storage display window shows your storage controller(s), their auxiliary power supply for write-back caching if installed, the logical disk(s) and each logical disk's constituent drives.

The auxiliary power and logical disks will be slightly indented under their parent controller.

The physical disks associates with a given logical disk are further indented, to show their association.

AN!CDB storage management page.

In this example, we have only one RAID controller, it has an auxiliary power pack and a single logical volume has been created.

The Logical volume is a RAID level 5 array with four physical disks.

Managing the Physical Disk Identification ID Light

The first task we will explore is using identification lights to find a physical drive in a node.

If a drive fails completely, it's fault light will light up, making the failed drive easy to find. However, the AN!CDB alert system can notify us of pending failures.

In these cases, the drive's fault light may not illuminate. So it becomes critical to identify the failing drive. Removing the wrong drive, when another drive is unhealthy, may well leave your node non-operational.

That's no fun.

Each physical drive, whether in an array or unconfigured, will have a pair of buttons labelled Turn On and Turn Off. Which you click will determine if the drive's ID light illuminates or turns off.

Illumination a Drive's ID Light

Let's illuminate!

We will identify the drive with the somewhat-cryptic name '252:0'.

Turning on the ID light for physical disk '252:0'.

The storage page will reload, indicating whether the command succeeded or not.

Physical disk '252:0' illuminated successfully.

If you now look at the front of your node, you should see one of the drives lit up.

Locating physical disk '252:0' on the node's front panel.

Most excellent.

Shutting off a Drive's ID Light

To turn the ID light off, simply click on the drive's Turn Off button.

Turning off the ID light for physical disk '252:0'.

As before, the success or failure will be reported.

Physical disk '252:0' ID light turned off successfully.

Refreshing The Storage Page

Warning: AN!CDB doesn't (yet) use a command key to prevent a request being sent again if a page is manually reloaded (ctrl + r, <f5>, etc). In most all cases, this is harmless as AN!CDB won't do something dangerous without verifying it is still safe to do so. Just the same, please always use the "reload" icon shown below.

After issuing a command to the storage manager, please do not use your browser's "refresh" function. It is always better to click on the reload icon.

Storage page "refresh" icon.

This will reload the page with the most up to date state of the storage in your node.

Storage page reloaded properly.

Failure Recovery

Now the fun part; Breaking things!


Storage page pre-failure state.


 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.