ScanCore: Difference between revisions

From Alteeve Wiki
Jump to navigation Jump to search
(Replaced content with "{{howto_header}} {{warning|1=This is little more that raw notes, do not consider anything here to be valid or accurate at this time.}} = Installing = It is installed au...")
Line 5: Line 5:
= Installing =
= Installing =


== PostgreSQL Setup ==
It is installed automatically now. No need to install it manually.
 
<syntaxhighlight lang="bash">
yum install -y postgresql postgresql-server postgresql-plperl postgresql-contrib postgresql-libs Scanner
</syntaxhighlight>
<syntaxhighlight lang="text">
...
Complete!
</syntaxhighlight>
 
DB config:
 
<syntaxhighlight lang="bash">
/etc/init.d/postgresql initdb
</syntaxhighlight>
<syntaxhighlight lang="text">
Initializing database:                                    [  OK  ]
</syntaxhighlight>
 
Start
 
<syntaxhighlight lang="bash">
chkconfig postgresql on
/etc/init.d/postgresql start
</syntaxhighlight>
<syntaxhighlight lang="text">
Starting postgresql service:                              [  OK  ]
</syntaxhighlight>
 
Create the striker user.
 
<syntaxhighlight lang="bash">
su - postgres -c "createuser --no-superuser --createdb --no-createrole striker"
</syntaxhighlight>
<syntaxhighlight lang="bash">
# no output expected
</syntaxhighlight>
 
Set 'postgres' and 'striker' user passwords:
 
<syntaxhighlight lang="bash">
su - postgres -c "psql -U postgres"
</syntaxhighlight>
<syntaxhighlight lang="text">
psql (8.4.20)
Type "help" for help.
</syntaxhighlight>
 
<syntaxhighlight lang="bash">
postgres=# \password
</syntaxhighlight>
<syntaxhighlight lang="text">
Enter new password:
Enter it again:
</syntaxhighlight>
 
<syntaxhighlight lang="bash">
postgres=# \password striker
</syntaxhighlight>
<syntaxhighlight lang="text">
Enter new password:
Enter it again:
</syntaxhighlight>
 
Exit.
 
<syntaxhighlight lang="bash">
postgres=# \q
</syntaxhighlight>
 
{{warning|1=In the below example, the [[BCN]] is <span class="code">10.20.0.0/16</span> and the IFN is <span class="code">192.168.199.0/24</span>. If you have different networks, be sure to adjust your values accordingly!}}
 
Configure access:
 
<syntaxhighlight lang="bash">
cp /var/lib/pgsql/data/pg_hba.conf /var/lib/pgsql/data/pg_hba.conf.striker
vim /var/lib/pgsql/data/pg_hba.conf
diff -u /var/lib/pgsql/data/pg_hba.conf.striker /var/lib/pgsql/data/pg_hba.conf
</syntaxhighlight>
<syntaxhighlight lang="diff">
--- /var/lib/pgsql/data/pg_hba.conf.striker 2015-03-05 14:33:40.902733374 +0000
+++ /var/lib/pgsql/data/pg_hba.conf 2015-03-05 14:34:44.861733318 +0000
@@ -65,9 +65,13 @@
# TYPE  DATABASE    USER        CIDR-ADDRESS          METHOD
+# dashboards
+host    all        all        192.168.199.0/24      md5
+# node servers
+host    all        all        10.20.0.0/16          md5
# "local" is for Unix domain socket connections only
-local  all        all                              ident
+local  all        all                              md5
# IPv4 local connections:
host    all        all        127.0.0.1/32          ident
# IPv6 local connections:
</syntaxhighlight>
 
<syntaxhighlight lang="bash">
cp /var/lib/pgsql/data/postgresql.conf /var/lib/pgsql/data/postgresql.conf.striker
vim /var/lib/pgsql/data/postgresql.conf
diff -u /var/lib/pgsql/data/postgresql.conf.striker /var/lib/pgsql/data/postgresql.conf
</syntaxhighlight>
<syntaxhighlight lang="diff">
--- /var/lib/pgsql/data/postgresql.conf.striker 2015-03-05 14:35:35.388733307 +0000
+++ /var/lib/pgsql/data/postgresql.conf 2015-03-05 14:36:07.111733159 +0000
@@ -56,7 +56,7 @@
# - Connection Settings -
-#listen_addresses = 'localhost' # what IP address(es) to listen on;
+listen_addresses = '*' # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to 'localhost', '*' = all
# (change requires restart)
</syntaxhighlight>
 
<syntaxhighlight lang="bash">
/etc/init.d/postgresql restart
</syntaxhighlight>
<syntaxhighlight lang="text">
Stopping postgresql service:                              [  OK  ]
Starting postgresql service:                              [  OK  ]
</syntaxhighlight>
 
== Striker Database Setup ==
 
Create DB:
 
<syntaxhighlight lang="bash">
su - postgres -c "createdb --owner striker scanner"
</syntaxhighlight>
<syntaxhighlight lang="text">
Password:
</syntaxhighlight>
 
The SQL files we need to load are found in the <span class="code">/etc/striker/SQL</span> directory.
 
The core SQL file is <span class="code"></span>
 
<syntaxhighlight lang="bash">
ls -lah /etc/striker/SQL/
</syntaxhighlight>
<syntaxhighlight lang="text">
total 64K
drwxr-xr-x. 2 root root 4.0K Mar  4 23:50 .
drwxr-xr-x. 5 root root 4.0K Mar  4 23:50 ..
-rw-r--r--. 1 root root  397 Mar  4 23:41 00_drop_db.sql
-rw-r--r--. 1 root root 2.5K Mar  4 23:41 01_create_node.sql
-rw-r--r--. 1 root root 3.2K Mar  4 23:41 02_create_alerts.sql
-rw-r--r--. 1 root root 1.9K Mar  4 23:41 03_create_alert_listeners.sql
-rw-r--r--. 1 root root 1.3K Mar  4 23:41 04_load_alert_listeners.sql
-rw-r--r--. 1 root root 3.2K Mar  4 23:41 05_create_random_agent.sql
-rw-r--r--. 1 root root 3.4K Mar  4 23:41 06a_create_snm_apc_pdu.sql
-rw-r--r--. 1 root root 3.6K Mar  4 23:41 06b_create_snmp_brocade_switch.sql
-rw-r--r--. 1 root root 3.4K Mar  4 23:41 06_create_snm_apc_ups.sql
-rw-r--r--. 1 root root 3.5K Mar  4 23:41 07_create_ipmi.sql
-rw-r--r--. 1 root root 5.9K Mar  4 23:41 08_create_raid.sql
-rw-r--r--. 1 root root 3.8K Mar  4 23:41 09_create_bonding.sql
-rw-r--r--. 1 root root 1.2K Mar  4 23:41 Makefile
</syntaxhighlight>
 
{{note|1=The default is that the database owner name is <span class="code">striker</span>. If you used a different database name owner, please update the <span class="code">.sql</span> files with the command <span class="code">sed -i 's/striker/yourname/' *.sql</span>.}}
 
Load the SQL tables into the database.
 
<syntaxhighlight lang="bash">
cat /etc/striker/SQL/*.sql > /tmp/all.sql
psql scanner -U striker -f /tmp/all.sql
</syntaxhighlight>
<syntaxhighlight lang="text">
Password for user striker:
</syntaxhighlight>
<syntaxhighlight lang="text">
<sql load messages>
</syntaxhighlight>
 
Test:
 
<syntaxhighlight lang="bash">
psql -U striker -d scanner -c "SELECT * FROM alert_listeners"
</syntaxhighlight>
<syntaxhighlight lang="text">
Password for user striker:
</syntaxhighlight>
<syntaxhighlight lang="text">
id |      name      |    mode      |  level  |  contact_info  | language | added_by |            updated           
----+----------------+---------------+---------+----------------+----------+----------+-------------------------------
  1 | screen        | Screen        | DEBUG  | screen        | en_CA    |        0 | 2014-12-11 14:42:13.273057-05
  2 | Tom Legrady    | Email        | DEBUG  | tom@striker.ca | en_CA    |        0 | 2014-12-11 16:54:25.477321-05
  3 | Health Monitor | HealthMonitor | WARNING |                | en_CA    |        0 | 2015-01-14 14:08:15-05
(3 rows)
</syntaxhighlight>
 
Done!
 
== Configure Scan Core on a Node ==
 
Install dependencies:
 
<syntaxhighlight lang="bash">
yum install Scanner postgresql perl-DBD-Pg mailx
</syntaxhighlight>
 
On the clients, you need to be sure your configuration files are set the way you want.
 
Most importantly is that the connection details to the databases on the dashboards are configured properly. Most installs have two dashboards, and Scanner will record it's data to both for resiliency.
 
The configuration files are found in <span class="code">/etc/striker/Config/</span>.
 
<syntaxhighlight lang="bash">
ls -lah /etc/striker/Config/
</syntaxhighlight>
<syntaxhighlight lang="text">
total 68K
drwxr-xr-x. 2 root root 4.0K Mar  5 15:06 .
drwxr-xr-x. 5 root root 4.0K Mar  5 15:06 ..
-rw-r--r--. 1 root root  741 Mar  4 23:41 bonding.conf
-rw-r--r--. 1 root root 1.1K Mar  4 23:41 dashboard.conf
-rw-r--r--. 1 root root  379 Mar  4 23:41 db.conf
-rw-r--r--. 1 root root 5.1K Mar  4 23:41 ipmi.conf
-rw-r--r--. 1 root root  939 Mar  4 23:41 nodemonitor.conf
-rw-r--r--. 1 root root 1.2K Mar  4 23:41 raid.conf
-rw-r--r--. 1 root root  961 Mar  4 23:41 scanner.conf
-rw-r--r--. 1 root root 1.7K Mar  4 23:41 snmp_apc_pdu.conf
-rw-r--r--. 1 root root 8.9K Mar  4 23:41 snmp_apc_ups.conf
-rw-r--r--. 1 root root 4.7K Mar  4 23:41 snmp_brocade_switch.conf
-rw-r--r--. 1 root root 1.4K Mar  4 23:41 system_check.conf
</syntaxhighlight>
 
{{note|1=We're showing two databases, but in theory, there is no set limit on the number of database servers that the nodes can use. Simply copy the configuration section for each additional server you wish to use. Just be sure to increment the id number for each section (ie: <span class="code">db::X::name</span> where <span class="code">X</span> is a unique integer for the additional server).}}
 
In this example, the two [[Striker]] dashboards with our databases have the [[BCN]] IPs <span class="code">10.20.4.1</span> and <span class="code">10.20.4.2</span>. Both use the database name <span class="code">scanner</span> owned by the database user <span class="code">striker</span> with the password <span class="code">secret</span>. So their configurations will be nearly identical.
 
<syntaxhighlight lang="bash">
cp /etc/striker/Config/db.conf /etc/striker/Config/db.conf.original
vim /etc/striker/Config/db.conf
</syntaxhighlight>
<syntaxhighlight lang="text">
db::1::name      = scanner
db::1::db_type  = Pg
db::1::host      = 10.20.4.1
db::1::port      = 5432
db::1::user      = striker
db::1::password  = secret
 
db::2::name      = scanner
db::2::db_type  = Pg
db::2::host      = 10.20.4.2
db::2::port      = 5432
db::2::user      = striker
db::2::password  = secret
</syntaxhighlight>
 
Now the node should be able to reach the databases. Lets test though, to be sure. The nodes have IPMI, so we will test by manually calling the <span class="code">ipmi</span> agent.
 
<syntaxhighlight lang="bash">
/usr/share/striker/agents/ipmi --verbose --verbose
</syntaxhighlight>
<syntaxhighlight lang="text">
Program ipmi writing to DB '10.20.4.1'.
Program ipmi writing to DB '10.20.4.2'.
ipmi loop 1 at 01:16:08 ->  960.295 ms elapsed;  29039.705 ms pending.
 
----------------------------------------------------------------------
 
ipmi loop 2 at 01:16:38 -> 1005.016 ms elapsed;  28994.984 ms pending.
 
----------------------------------------------------------------------
</syntaxhighlight>
 
It all is well, it should record it's values once every 30 seconds or so. Let it run a couple of loops, and then press <span class="code"><ctrl></span> + <span class="code">c</span> to stop the scan.
 
Now we can verify the data was written to both dashboard's databases:
 
<syntaxhighlight lang="bash">
psql -h 10.20.4.1 -U striker scanner -c "SELECT * FROM ipmi_temperatures;"
</syntaxhighlight>
<syntaxhighlight lang="text">
id | node_id |      target      |      field      | value |  units  | status  |  message_tag  | message_arguments |          timestamp         
----+---------+------------------+-----------------+-------+-----------+---------+---------------+-------------------+-------------------------------
  1 |      1 | node1.alteeve.ca | Ambient        | 25    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.390891+00
  2 |      1 | node1.alteeve.ca | Systemboard 1  | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.415248+00
  3 |      1 | node1.alteeve.ca | Systemboard 2  | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.429477+00
  4 |      1 | node1.alteeve.ca | CPU1            | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.4434+00
  5 |      1 | node1.alteeve.ca | CPU2            | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.455114+00
  6 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.466447+00
  7 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.47765+00
  8 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.489131+00
  9 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.500622+00
10 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.51189+00
11 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.523267+00
12 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.534761+00
13 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.54614+00
14 |      1 | node1.alteeve.ca | PSU1 Inlet      | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.557422+00
15 |      1 | node1.alteeve.ca | PSU2 Inlet      | 28    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.569362+00
16 |      1 | node1.alteeve.ca | PSU1            | 53    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.580696+00
17 |      1 | node1.alteeve.ca | PSU2            | 56    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.591993+00
18 |      1 | node1.alteeve.ca | BBU            | 30    | degrees C | OK      |              |                  | 2015-03-06 01:16:02.603261+00
19 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS  | Value crisis  | value=76          | 2015-03-06 01:16:02.614824+00
20 |      1 | node1.alteeve.ca | summary        | 1    |          | WARNING | Value warning | value=1          | 2015-03-06 01:16:02.64331+00
21 |      1 | node1.alteeve.ca | Ambient        | 25    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.400365+00
22 |      1 | node1.alteeve.ca | Systemboard 1  | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.425598+00
23 |      1 | node1.alteeve.ca | Systemboard 2  | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.439627+00
24 |      1 | node1.alteeve.ca | CPU1            | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.453921+00
25 |      1 | node1.alteeve.ca | CPU2            | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.468253+00
26 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.482567+00
27 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.496698+00
28 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.508425+00
29 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.522475+00
30 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.536592+00
31 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.548096+00
32 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.559742+00
33 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.573795+00
34 |      1 | node1.alteeve.ca | PSU1 Inlet      | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.585372+00
35 |      1 | node1.alteeve.ca | PSU2 Inlet      | 28    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.599816+00
36 |      1 | node1.alteeve.ca | PSU1            | 53    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.613983+00
37 |      1 | node1.alteeve.ca | PSU2            | 56    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.628238+00
38 |      1 | node1.alteeve.ca | BBU            | 30    | degrees C | OK      |              |                  | 2015-03-06 01:16:32.642372+00
39 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS  | Value crisis  | value=76          | 2015-03-06 01:16:32.653909+00
40 |      1 | node1.alteeve.ca | summary        | 1    |          | WARNING | Value warning | value=1          | 2015-03-06 01:16:32.682502+00
(40 rows)
</syntaxhighlight>
 
We'll address the warnings in a moment. For now, this tells up that we are recording to dashboard 1 properly. Lets check dashboard 2:
 
<syntaxhighlight lang="bash">
psql -h 10.20.4.2 -U striker scanner -c "SELECT * FROM ipmi_temperatures;"
</syntaxhighlight>
<syntaxhighlight lang="text">
id | node_id |      target      |      field      | value |  units  | status  |  message_tag  | message_arguments |          timestamp         
----+---------+------------------+-----------------+-------+-----------+---------+---------------+-------------------+-------------------------------
  1 |      1 | node1.alteeve.ca | Ambient        | 25    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.689144+00
  2 |      1 | node1.alteeve.ca | Systemboard 1  | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.708423+00
  3 |      1 | node1.alteeve.ca | Systemboard 2  | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.722751+00
  4 |      1 | node1.alteeve.ca | CPU1            | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.733944+00
  5 |      1 | node1.alteeve.ca | CPU2            | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.74567+00
  6 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.756925+00
  7 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.768102+00
  8 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.779549+00
  9 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.791011+00
10 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.802332+00
11 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.813697+00
12 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.825063+00
13 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.836604+00
14 |      1 | node1.alteeve.ca | PSU1 Inlet      | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.848219+00
15 |      1 | node1.alteeve.ca | PSU2 Inlet      | 28    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.859965+00
16 |      1 | node1.alteeve.ca | PSU1            | 53    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.870959+00
17 |      1 | node1.alteeve.ca | PSU2            | 56    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.88233+00
18 |      1 | node1.alteeve.ca | BBU            | 30    | degrees C | OK      |              |                  | 2015-03-06 01:16:08.893657+00
19 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS  | Value crisis  | value=76          | 2015-03-06 01:16:08.905299+00
20 |      1 | node1.alteeve.ca | summary        | 1    |          | WARNING | Value warning | value=1          | 2015-03-06 01:16:08.93407+00
21 |      1 | node1.alteeve.ca | Ambient        | 25    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.699395+00
22 |      1 | node1.alteeve.ca | Systemboard 1  | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.718864+00
23 |      1 | node1.alteeve.ca | Systemboard 2  | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.73341+00
24 |      1 | node1.alteeve.ca | CPU1            | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.747455+00
25 |      1 | node1.alteeve.ca | CPU2            | 39    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.762113+00
26 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.776163+00
27 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.787508+00
28 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.802058+00
29 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.816296+00
30 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.827444+00
31 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.83877+00
32 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.853383+00
33 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.864927+00
34 |      1 | node1.alteeve.ca | PSU1 Inlet      | 29    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.879143+00
35 |      1 | node1.alteeve.ca | PSU2 Inlet      | 28    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.893541+00
36 |      1 | node1.alteeve.ca | PSU1            | 53    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.907655+00
37 |      1 | node1.alteeve.ca | PSU2            | 56    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.922028+00
38 |      1 | node1.alteeve.ca | BBU            | 30    | degrees C | OK      |              |                  | 2015-03-06 01:16:38.933201+00
39 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS  | Value crisis  | value=76          | 2015-03-06 01:16:38.947347+00
40 |      1 | node1.alteeve.ca | summary        | 1    |          | WARNING | Value warning | value=1          | 2015-03-06 01:16:38.976188+00
(40 rows)
</syntaxhighlight>
 
Excellent!
 
Now, note the lines:
 
<syntaxhighlight lang="bash">
psql -h 10.20.4.1 -U striker scanner -c "SELECT * FROM ipmi_temperatures WHERE field='RAID Controller' ORDER BY timestamp ASC;"
</syntaxhighlight>
<syntaxhighlight lang="text">
id | node_id |      target      |      field      | value |  units  | status | message_tag  | message_arguments |          timestamp         
----+---------+------------------+-----------------+-------+-----------+--------+--------------+-------------------+-------------------------------
19 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:02.614824+00
39 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:32.653909+00
(2 rows)
</syntaxhighlight>
 
This tells us that the <span class="code">RAID Controller</span> is running at 76°C, which scanner thinks is dangerously hot. We know that, according to the manufacturer, the controller is rated for up to 95°C, so this is fine. To account for this, we'll update the <span class="code">/etc/striker/Config/ipmi.conf</span> file from:
 
<syntaxhighlight lang="text">
ipmi::RAID Controller::ok        = 60
ipmi::RAID Controller::warn      = 70
ipmi::RAID Controller::hysteresis =  1
ipmi::RAID Controller::units      = degrees C
</syntaxhighlight>
 
To:
 
<syntaxhighlight lang="text">
ipmi::RAID Controller::ok        = 80
ipmi::RAID Controller::warn      = 90
ipmi::RAID Controller::hysteresis =  1
ipmi::RAID Controller::units      = degrees C
</syntaxhighlight>
 
Now, over 80°C will cause a warning and over 90°C will cause a critical alert. Lets test by running the <span class="code">ipmi</span> scan agent for one pass.
 
<syntaxhighlight lang="bash">
/usr/share/striker/agents/ipmi --verbose --verbose
</syntaxhighlight>
<syntaxhighlight lang="text">
Program ipmi writing to DB '10.20.4.1'.
Program ipmi writing to DB '10.20.4.2'.
ipmi loop 1 at 01:32:39 ->  937.201 ms elapsed;  29062.799 ms pending.
 
----------------------------------------------------------------------
 
^C
</syntaxhighlight>
 
Now lets look at the database again:
 
<syntaxhighlight lang="bash">
psql -h 10.20.4.1 -U striker scanner -c "SELECT * FROM ipmi_temperatures WHERE field='RAID Controller' ORDER BY timestamp ASC;"
</syntaxhighlight>
<syntaxhighlight lang="text">
id | node_id |      target      |      field      | value |  units  | status | message_tag  | message_arguments |          timestamp         
----+---------+------------------+-----------------+-------+-----------+--------+--------------+-------------------+-------------------------------
19 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:02.614824+00
39 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:32.653909+00
59 |      2 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:31:17.370253+00
79 |      3 | node1.alteeve.ca | RAID Controller | 76    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.447756+00
(4 rows)
</syntaxhighlight>
 
Notice the last entry is '<span class="code">OK</span>' now? That tells us we're doing fine.
 
{{note|1=Be sure to update the configuration values on both nodes!}}
 
Start scanner every five minutes. If another copy is running, it simply exits. If no other copy was running (do to OS boot, scanner crash, etc) it will start up.
 
== Testing Automatic Shutdown ==
 
One of the features of Scanner is that it can safely shut down a node if it starts to get too hot, or if the UPS has lost power and the batteries in the strongest UPS drops below a minimum hold-up time. To test this, you have two choices;
 
# Pull the power on the UPSes and watch their hold-up time. If all goes well, both nodes will power off when the minimum thresh-hold is passed.
# Artificially set five or more thermal sensors to be too low so that normal thermal levels trigger a shut down.
 
{{warning|1=If you're testing option 2, '''do not''' configure scanner to run on boot or via cron! Your node will shut down within five minutes otherwise, requiring a boot to single-user mode to correct.}}
 
For time sake, we'll drop the sensors.
 
First, we need to know what values would be "too high", so lets see what our RAM and RAID controller is sitting at:
 
<syntaxhighlight lang="bash">
psql -h 10.20.4.1 -U striker scanner -c "SELECT * FROM ipmi_temperatures WHERE field='RAID Controller' or field LIKE 'MEM %' ORDER BY field ASC, timestamp ASC;"
</syntaxhighlight>
<syntaxhighlight lang="text">
id | node_id |      target      |      field      | value |  units  | status | message_tag  | message_arguments |          timestamp         
----+---------+------------------+-----------------+-------+-----------+--------+--------------+-------------------+-------------------------------
  6 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.466447+00
26 |      1 | node1.alteeve.ca | MEM A          | 32    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.482567+00
46 |      2 | node1.alteeve.ca | MEM A          | 33    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.222054+00
66 |      3 | node1.alteeve.ca | MEM A          | 33    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.299146+00
  7 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.47765+00
27 |      1 | node1.alteeve.ca | MEM B          | 32    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.496698+00
47 |      2 | node1.alteeve.ca | MEM B          | 33    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.233122+00
67 |      3 | node1.alteeve.ca | MEM B          | 33    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.310512+00
  8 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.489131+00
28 |      1 | node1.alteeve.ca | MEM C          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.508425+00
48 |      2 | node1.alteeve.ca | MEM C          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.244798+00
68 |      3 | node1.alteeve.ca | MEM C          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.321981+00
  9 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.500622+00
29 |      1 | node1.alteeve.ca | MEM D          | 34    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.522475+00
49 |      2 | node1.alteeve.ca | MEM D          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.256127+00
69 |      3 | node1.alteeve.ca | MEM D          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.333338+00
10 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.51189+00
30 |      1 | node1.alteeve.ca | MEM E          | 37    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.536592+00
50 |      2 | node1.alteeve.ca | MEM E          | 38    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.26758+00
70 |      3 | node1.alteeve.ca | MEM E          | 38    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.34476+00
11 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.523267+00
31 |      1 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.548096+00
51 |      2 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.278884+00
71 |      3 | node1.alteeve.ca | MEM F          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.356109+00
12 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.534761+00
32 |      1 | node1.alteeve.ca | MEM G          | 34    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.559742+00
52 |      2 | node1.alteeve.ca | MEM G          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.290446+00
72 |      3 | node1.alteeve.ca | MEM G          | 35    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.367751+00
13 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:16:02.54614+00
33 |      1 | node1.alteeve.ca | MEM H          | 36    | degrees C | OK    |              |                  | 2015-03-06 01:16:32.573795+00
53 |      2 | node1.alteeve.ca | MEM H          | 37    | degrees C | OK    |              |                  | 2015-03-06 01:31:17.301801+00
73 |      3 | node1.alteeve.ca | MEM H          | 37    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.378846+00
19 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:02.614824+00
39 |      1 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:16:32.653909+00
59 |      2 | node1.alteeve.ca | RAID Controller | 76    | degrees C | CRISIS | Value crisis | value=76          | 2015-03-06 01:31:17.370253+00
79 |      3 | node1.alteeve.ca | RAID Controller | 76    | degrees C | OK    |              |                  | 2015-03-06 01:32:34.447756+00
(36 rows)
</syntaxhighlight>
 
So the RAM is sitting around 35°C and the RAID controller is sitting around 75°C. So to trigger a CRITICAL shutdown, we'll need five values or more. In my case, I had eight RAM modules, so that is enough to trigger a shut down. We'll modify those.
 
To save time restoring after the test is done, lets copy our properly configured <span class="code">ipmi.conf</span> out of the way. We'll copy it back when the testing is done.
 
<syntaxhighlight lang="bash">
cp /etc/striker/Config/ipmi.conf /etc/striker/Config/ipmi.conf.good
</syntaxhighlight>
 
Now we'll edit <span class="code">/etc/striker/Config/ipmi.conf</span>:
 
<syntaxhighlight lang="bash">
vim /etc/striker/Config/ipmi.conf
</syntaxhighlight>
 
The memory entries should look like this, normally:
 
<syntaxhighlight lang="text">
ipmi::MEM A::ok        = 45
ipmi::MEM A::warn      = 55
ipmi::MEM A::hysteresis =  1
ipmi::MEM A::units      = degrees C
</syntaxhighlight>
 
We'll change them all to:
 
<syntaxhighlight lang="text">
ipmi::MEM A::ok        = 20
ipmi::MEM A::warn      = 30
ipmi::MEM A::hysteresis =  1
ipmi::MEM A::units      = degrees C
</syntaxhighlight>
 
Once you've edited five or more values down, save the file.
 
Before we run the test, we need to tell Scanner how to shut down the [[Anvil!]]. In [[Striker]], there is a script called <span class="code">[https://github.com/digimer/striker/blob/master/tools/safe_anvil_shutdown safe_anvil_shutdown]</span> which can be found on the dashboads at <span class="code">/var/www/tools/safe_anvil_shutdown</span>. We need to copy this onto the nodes:
 
<syntaxhighlight lang="bash">
rsync -av /var/www/tools/safe_anvil_shutdown root@an-a05n01:/root/
rsync -av /var/www/tools/safe_anvil_shutdown root@an-a05n02:/root/
</syntaxhighlight>
 
Now we need to configure Scanner to call it when a <span class="code">CRITICAL</span> state is reached. We do this by editing the <span class="code">scanner.conf</span> file.
 
<syntaxhighlight lang="bash">
vim /etc/striker/Config/scanner.conf
</syntaxhighlight>
 
There are two key entries to set:
 
<syntaxhighlight lang="text">
scanner::healthfile = /shared/status/.an-a05n01
scanner::shutdown  = /root/safe_anvil_shutdown
</syntaxhighlight>
 
The <span class="code">scanner::healthfile</span> '''MUST''' match the short host name of the node with a preceding '<span class="code">.</span>'. To determine the name to use, you can run:
 
<syntaxhighlight lang="bash">
clustat |grep Local |awk '{print $1}' | awk -F '.' '{print $1}'
</syntaxhighlight>
<syntaxhighlight lang="text">
an-a05n01
</syntaxhighlight>
 
If the cluster isn't running on the node, and provided you built the cluster using [[AN!Cluster_Tutorial_2#Node_Host_Names|proper host names]], you can get the name to use with this:
 
<syntaxhighlight lang="bash">
uname -n | awk -F '.' '{print $1}'
</syntaxhighlight>
<syntaxhighlight lang="text">
an-a05n01
</syntaxhighlight>
 
This is important because <span class="code">safe_anvil_shutdown</span> will look for the file <span class="code">/shared/status/.<peer's name></span>. If it finds that file, it will be able to determine the health of the peer. Assuming the peer is healthy, <span class="code">safe_anvil_shutdown</span> will assume the CRITICAL state is localized and so it will migrate servers to the peer before shutting down. However, if the peer is sick, it will gracefully shut down the servers before powering off.
 
So setting <span class="code">scanner::healthfile = /shared/status/.an-a05n01</span> allows our peer to check our state if it goes critical, enabling this intelligence to work reliably.
 
The second value is the program that Scanner will execute when it goes critical. This should always be <span class="code">/root/safe_anvil_start</span> (or the path to the program, if you saved it elsewhere).
 
Save the changes and exit.
 
=== Testing one node going critical ===
 
For the first test, we're going to run a server on <span class="code">an-a05n01</span> and change it's sensor values limits low enough to trigger an immediate crisis. We'll leave the configuration on the second node as normal. This way, if all goes well, starting Scanner on the first node should cause the hosted server to be migrated and then the node will withdraw from the cluster and shut down.
 
Edit <span class="code">an-a05n01</span>'s <span class="code">ipmi.conf</span> as discussed, start the cluster and run a test server on the node.
 
{{note|1=TODO: Show example output.}}
 
Start Scanner on <span class="code">an-a05n02</span> and verify it wrote it's status file and that we can read it from <span class="code">an-a05n01</span>.
 
On '''<span class="code">an-a05n02</span>''':
 
<syntaxhighlight lang="bash">
/usr/share/striker/bin/scanner
</syntaxhighlight>
<syntaxhighlight lang="text">
Replacing defective previous scanner: OLD_PROCESS_RECENT_CRASH
Starting /usr/share/striker/bin/scanner at Fri Mar  6 03:26:30 2015.
Program scanner reading from DB '10.20.4.1'.
scan 1425612390.18275 [bonding,ipmi,raid], [].
id na | 2015-03-06 03:26:30+0000: an-a05n02.alteeve.ca->scanner (22338); DEBUG: Old process crashed recently.; (0 : pidfile check)
</syntaxhighlight>
 
Wait a minute, and then check the status file:
 
From '''<span class="code">an-a05n01</span>''':
 
<syntaxhighlight lang="bash">
cat /shared/status/.an-a05n02
</syntaxhighlight>
<syntaxhighlight lang="text">
health = ok
</syntaxhighlight>
 
Make sure <span class="code">an-a05n01</span> is in the cluster and is hosting a server.
 
OK, now we'll start scanner on <span class="code">an-a05n01</span>
 
<syntaxhighlight lang="bash">
clustat
</syntaxhighlight>
<syntaxhighlight lang="text">
Cluster Status for an-anvil-05 @ Fri Mar  6 02:30:31 2015
Member Status: Quorate
 
Member Name                            ID  Status
------ ----                            ---- ------
an-a05n01.alteeve.ca                      1 Online, Local, rgmanager
an-a05n02.alteeve.ca                      2 Online, rgmanager
 
Service Name                  Owner (Last)                  State       
------- ----                  ----- ------                  -----       
service:libvirtd_n01          an-a05n01.alteeve.ca          started     
service:libvirtd_n02          an-a05n02.alteeve.ca          started     
service:storage_n01            an-a05n01.alteeve.ca          started     
service:storage_n02            an-a05n02.alteeve.ca          started     
vm:vm01-rhel6                  an-a05n01.alteeve.ca          started     
</syntaxhighlight>
 
OK, start scanner on node 1!
 
<syntaxhighlight lang="bash">
/usr/share/striker/bin/scanner
</syntaxhighlight>
<syntaxhighlight lang="text">
Previous scanner exited cleanly; taking over.
Starting /usr/share/striker/bin/scanner at Fri Mar  6 02:48:25 2015.
Program scanner reading from DB '10.20.4.1'.
scan 1425610105.36039 [bonding,ipmi,raid], [].
</syntaxhighlight>
 
There will be a delay while the first scan runs. This is normal, please be patient.
 
<syntaxhighlight lang="text">
Total crisis weight is 7.
id 60 | 2015-03-06 02:48:21.026042+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '37' in Crisis range; (0 : MEM C : 37 : degrees C)
id 58 | 2015-03-06 02:48:20.975518+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '34' in Crisis range; (0 : MEM A : 34 : degrees C)
id 64 | 2015-03-06 02:48:21.123492+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '35' in Crisis range; (0 : MEM G : 35 : degrees C)
id 62 | 2015-03-06 02:48:21.077744+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '39' in Crisis range; (0 : MEM E : 39 : degrees C)
id 63 | 2015-03-06 02:48:21.100352+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '37' in Crisis range; (0 : MEM F : 37 : degrees C)
id 59 | 2015-03-06 02:48:21.00336+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '34' in Crisis range; (0 : MEM B : 34 : degrees C)
id 61 | 2015-03-06 02:48:21.053743+00: an-a05n01->ipmi (8381); ( server node an-a05n01.alteeve.ca 10.20.10.1 ) CRISIS: value '36' in Crisis range; (0 : MEM D : 36 : degrees C)
</syntaxhighlight>
 
This shows the sensors in crisis and their cumulative weight. By default, the shutdown threshold is <span class="code">5</span>, and we got a score of <span class="code">7</span>. So directly below this, we'll see the output from <span class="code">safe_anvil_shutdown</span> run:
 
<syntaxhighlight lang="text">
Safe Anvil! Shutdown initiating!
- I am:    [an-a05n01.alteeve.ca]
- Peer is: [an-a05n02.alteeve.ca]
- Checking peer's health status:
- Peer was last listed as OK, I will migrate servers if possible.
- Storage replication state:
- Resource: [r0], Role (me/peer): [Primary/Primary], Disk state (me/peer): [UpToDate/UpToDate]
- Replicated storage is fully UpToDate, deep storage inspection not requires.
- Hosted server: [vm01-rhel6] is: [started]. The peer's storage is good, migration possible.
- Migrating: [vm01-rhel6] to: [an-a05n02.alteeve.ca]
- Output: [Trying to migrate vm:vm01-rhel6 to an-a05n02.alteeve.ca...Success]
- Migration successful!
- Done.
- Withdrawing from the cluster and shutting down:
] Output: [Stopping Cluster Service Manager:              [  OK  ]
- Output: [Stopping cluster: ]
] Output: [  Leaving fence domain...                      [  OK  ]
] Output: [  Stopping gfs_controld...                    [  OK  ]
] Output: [  Stopping dlm_controld...                    [  OK  ]
] Output: [  Stopping fenced...                          [  OK  ]
] Output: [  Stopping cman...                            [  OK  ]
] Output: [  Waiting for corosync to shutdown:            [  OK  ]
] Output: [  Unloading kernel modules...                  [  OK  ]
] Output: [  Unmounting configfs...                      [  OK  ]
 
Broadcast message from root@an-a05n01.alteeve.ca
(/dev/pts/0) at 2:50 ...
 
The system is going down for power off NOW!
I am dead.
Processing took a long time: 124.338072061539 seconds is more than expected loop rate of 30 seconds.
$self->shutdown is 1
At 2015-03-06_02:51:00 exiting run_timed_loop_forever() unknown reason
Connection to 10.20.10.1 closed by remote host.
Connection to 10.20.10.1 closed.
</syntaxhighlight>
 
BAM!
 
Now, lets cause the second node to go critical. This time, with node 1 gone, it should gracefully shut down the server.
 
We need to stop the scanner (<span class="code">ctrl</span> + <span class="code">c</span> the process), update the <span class="code">ipmi.conf</span> file as before, and then restart the scanner.
 
<syntaxhighlight lang="bash">
cp /etc/striker/Config/ipmi.conf.bad /etc/striker/Config/ipmi.conf
</syntaxhighlight>
<syntaxhighlight lang="text">
cp: overwrite `/etc/striker/Config/ipmi.conf'? y
</syntaxhighlight>
 
Now start it back up, and within a minute, it should stop the server, withdraw and power down.
 
<syntaxhighlight lang="bash">
/usr/share/striker/bin/scanner
</syntaxhighlight>
<syntaxhighlight lang="text">
Replacing defective previous scanner: OLD_PROCESS_RECENT_CRASH
Starting /usr/share/striker/bin/scanner at Fri Mar  6 04:00:49 2015.
Program scanner reading from DB '10.20.4.1'.
scan 1425614449.42107 [bonding,ipmi,raid], [].
id na | 2015-03-06 04:00:49+0000: an-a05n02.alteeve.ca->scanner (6296); DEBUG: Old process crashed recently.; (0 : pidfile check)
</syntaxhighlight>
 
{{note|1=The crash messages are because we closed the scanner with <span class="code">ctrl</span> + <span class="code">c</span> and are safe to ignore.}}
 
After a moment:
 
<syntaxhighlight lang="text">
Total crisis weight is 7.
id 109 | 2015-03-06 03:00:41.207857+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '34' in Crisis range; (0 : MEM C : 34 : degrees C)
id 107 | 2015-03-06 03:00:41.162384+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '33' in Crisis range; (0 : MEM A : 33 : degrees C)
id 113 | 2015-03-06 03:00:41.299164+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '34' in Crisis range; (0 : MEM G : 34 : degrees C)
id 111 | 2015-03-06 03:00:41.253448+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '36' in Crisis range; (0 : MEM E : 36 : degrees C)
id 112 | 2015-03-06 03:00:41.276414+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '35' in Crisis range; (0 : MEM F : 35 : degrees C)
id 108 | 2015-03-06 03:00:41.184866+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '33' in Crisis range; (0 : MEM B : 33 : degrees C)
id 110 | 2015-03-06 03:00:41.230574+00: an-a05n02->ipmi (6299); ( server node an-a05n02.alteeve.ca 10.20.10.2 ) CRISIS: value '34' in Crisis range; (0 : MEM D : 34 : degrees C)
Safe Anvil! Shutdown initiating!
- I am:    [an-a05n02.alteeve.ca]
- Peer is: [an-a05n01.alteeve.ca]
- Checking peer's health status:
- Peer is in 'Critical' state! It will very likely shut down shortly, too.
  I will shut down my servers.
- Shutting down: [vm01-rhel6]
- Output: [Local machine disabling vm:vm01-rhel6...Success]
- Shutdown successful!
- Done.
- Withdrawing from the cluster and shutting down:
</syntaxhighlight>
 
Fantastic!
 
=== Testing node recovery ===
 
After the nodes go critical and shut down, the dashboards will start watching the environmental and UPS sensors. Once they determine that things are safe, they will power the nodes back on. With both nodes off from our artificial over-temp tests, we'll now test their recovery by running scan core on one of the dashboards.
 
 
 
 
<span class="code"></span>
<syntaxhighlight lang="bash">
</syntaxhighlight>
<syntaxhighlight lang="text">
</syntaxhighlight>
 
== Enabling Scanner ==
 
 
 
<syntaxhighlight lang="text">
# Crontab
*/5 * * * * /usr/share/striker/bin/scanner
</syntaxhighlight>
 
 
 
<span class="code"></span>
<syntaxhighlight lang="bash">
</syntaxhighlight>
<syntaxhighlight lang="text">
</syntaxhighlight>
 
 
 
Test:
 
<syntaxhighlight lang="bash">
Agents/ipmi --verbose --verbose
</syntaxhighlight>
<syntaxhighlight lang="text">
ipmi loop 1 at 1421444884.53996 2378.437:27621.563 mSec.
^C
</syntaxhighlight>
 
Yay!
 
 
 
 
 
 
 
----


<span class="code"></span>
<span class="code"></span>

Revision as of 22:59, 23 May 2016

 AN!Wiki :: How To :: ScanCore

Warning: This is little more that raw notes, do not consider anything here to be valid or accurate at this time.

Installing

It is installed automatically now. No need to install it manually.


 

Any questions, feedback, advice, complaints or meanderings are welcome.
Alteeve's Niche! Enterprise Support:
Alteeve Support
Community Support
© Alteeve's Niche! Inc. 1997-2024   Anvil! "Intelligent Availability®" Platform
legal stuff: All info is provided "As-Is". Do not use anything here unless you are willing and able to take responsibility for your own actions.