Discussion:
[opennms-discuss] Minion turned off. No Alarm and 100% Availability.
Marc Doesburg
2017-06-26 12:27:05 UTC
Permalink
Hello All,

I'm still playing around with the Minions. But I noticed that when I turned
off a Minion "server", the Minion node stays online without any alarms. The
Minion-Heartbeat and all detected services stay at 100% availability. But
when I disable an individual service (I.e. SNMP) it does get detected and
set to down.

I need to double check the following, but I also think that nodes behind
the Minion stay online too, even though they can't be reached. I need to
double check that, because I've been playing with outage-paths as well.

Is there something that needs to be configured? Or is there a log that I
could check to see why the Minion-Heartbeat stays at 100% even though the
Minion is turned off?

Best regards,
--
Marc D.

Real is just a matter of perception
Jesse White
2017-06-27 18:41:28 UTC
Permalink
Hi Mark,

The Minion-Heartbeat service should go DOWN on the (automatically provisioned) Minion node after the Minion has been
offline for a few minutes. If not, this warrants further investigation.

The Minions send a heartbeat message every 30 seconds, and the time-stamp of the last heartbeat gets recorded in the
database. You can view this time-stamp in the "Last Updated" column on the "Admin -> Manage Minions" page. This
time-stamp should not continue increase when a Minion is offline.

As for the services at a location being reported as UP when the Minion is offline, the current behavior is to keep the
existing state (UP or DOWN) unless we can actively confirm otherwise. Since we are unable to execute the monitor (before
the TTL expires), due to the Minion being offline, the service maintains the state of the last check.

Hope this helps.

Best,
Jesse
Post by Marc Doesburg
Hello All,
I'm still playing around with the Minions. But I noticed that when I turned off a Minion "server", the Minion node
stays online without any alarms. The Minion-Heartbeat and all detected services stay at 100% availability. But when I
disable an individual service (I.e. SNMP) it does get detected and set to down.
I need to double check the following, but I also think that nodes behind the Minion stay online too, even though they
can't be reached. I need to double check that, because I've been playing with outage-paths as well.
Is there something that needs to be configured? Or is there a log that I could check to see why the Minion-Heartbeat
stays at 100% even though the Minion is turned off?
Best regards,
--
Marc D.
Real is just a matter of perception
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Marc Doesburg
2017-06-28 08:06:08 UTC
Permalink
Hi Jesse,

Thank you for your reply and help.

I've checked the "Last Updated" and it's currently set to Jun 26, 11:16:21
AM. So that is not being updated. This is to be expected since the Minion
is still turned off. But the heart-beat is still at 100% available. Other
services as well, but you explained why that is, and that explanation is
very good.

Node: *Testlab Netfirst Minion* 6 Minions
ac715340-54d4-11e7-9544-000c29ae9b15 TestLabTest

- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list?filter=node%3D6>

- View Alarms
<http://opennms01.lab.netfirst.nl:8980/opennms/alarm/list.htm?filter=node%3D6>

- View Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=node%3D6>

- Asset Info
<http://opennms01.lab.netfirst.nl:8980/opennms/asset/modify.jsp?node=6>
- Hardware Info
<http://opennms01.lab.netfirst.nl:8980/opennms/hardware/list.jsp?node=6>
- Availability
<http://opennms01.lab.netfirst.nl:8980/opennms/element/availability.jsp?node=6>

- SSH
- Resource Graphs
<http://opennms01.lab.netfirst.nl:8980/opennms/graph/chooseresource.jsp?node=6&reports=all>

- Rescan
<http://opennms01.lab.netfirst.nl:8980/opennms/element/rescan.jsp?node=6>

- Admin
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/nodemanagement/index.jsp?node=6>

- Update SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/updateSnmp.jsp?node=6&ipaddr=127.0.0.1>

- Schedule Outage
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/sched-outages/editoutage.jsp?newName=Testlab+Netfirst+Minion&addNew=true&nodeID=6>

- View in Topology
<http://opennms01.lab.netfirst.nl:8980/opennms/topology?provider=Enhanced+Linkd&szl=1&focus-vertices=6>

SNMP Attributes
Name mdominiontest.lab.netfirst.nl
sysObjectID .1.3.6.1.4.1.8072.3.2.10
Location %%%
Contact %%%
Description Linux mdominiontest.lab.netfirst.nl 3.10.0-514.21.1.el7.x86_64
#1 SMP Thu May 25 17:04:51 UTC 2017 x86_64
Availability
Availability (last 24 hours) 100.000%
127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=127.0.0.1>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=4>
100.000%
Minion-Heartbeat
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=15>
100.000%
SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=1>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=2>
100.000%
192.168.20.20
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=192.168.20.20>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=4>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=2>
100.000%




































Minion-Heartbeat service on 127.0.0.1

- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>

- Delete
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/deleteService>

General
Node Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Interface 127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?ipinterfaceid=219>
Polling Status Managed
Polling Package example1
Monitor Class org.opennms.netmgt.poller.monitors.MinionHeartbeatMonitor
Service Parameters
period 30000
Overall Availability
127.0.0.1 100.000%
Minion-Heartbeat 100.000%
Application Memberships (Edit
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/applications.htm?edit&ifserviceid=220>
)
This service is not a member of any applications
Recent Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
687 <http://opennms01.lab.netfirst.nl:8980/opennms/event/detail.jsp?id=687>
6/19/17 13:39:08 Warning The Minion-Heartbeat service has been discovered
on interface 127.0.0.1.
More...
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
Recent Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=service%3D15>
There have been no outages on this service in the last 24 hours.










































The OpenNMS server did generate a Warning for the Minion on the 27th:

Event 2295
Severity WarningNode Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Event Source Location Default (00000000-0000-0000-0000-000000000000)Node
Location TestLabTest
Time Jun 27, 2017 10:29:34 AMInterface
Service
UEI uei.opennms.org/internal/provisiond/nodeScanAborted
Log Message

The Node with Id: 6; ForeignSource: Minions;
ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15 has aborted for the
following reason: Aborting node scan : Agent failed while scanning the
system table: org.opennms.core.rpc.api.RequestTimedOutException:
org.apache.camel.ExchangeTimedOutException: The OUT message was not
received within: 20000 millis due reply message with correlationID:
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination:
temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
Exchange[ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183144][Message:
[Body is not logged]]
Description
A message from the Provisiond NodeScan lifecycle that a NodeScan has
Aborted:

The Node with Id: 6; ForeignSource: Minions;
ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15 has aborted for the
following reason: Aborting node scan : Agent failed while scanning the
system table: org.opennms.core.rpc.api.RequestTimedOutException:
org.apache.camel.ExchangeTimedOutException: The OUT message was not
received within: 20000 millis due reply message with correlationID:
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination:
temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
Exchange[ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183144][Message:
[Body is not logged]]
Operator Instructions
No instructions available.
I hope you can help me further with this information.
--
Marc D.

Real is just a matter of perception
Post by Jesse White
Hi Mark,
The Minion-Heartbeat service should go DOWN on the (automatically
provisioned) Minion node after the Minion has been offline for a few
minutes. If not, this warrants further investigation.
The Minions send a heartbeat message every 30 seconds, and the time-stamp
of the last heartbeat gets recorded in the database. You can view this
time-stamp in the "Last Updated" column on the "Admin -> Manage Minions"
page. This time-stamp should not continue increase when a Minion is offline.
As for the services at a location being reported as UP when the Minion is
offline, the current behavior is to keep the existing state (UP or DOWN)
unless we can actively confirm otherwise. Since we are unable to execute
the monitor (before the TTL expires), due to the Minion being offline, the
service maintains the state of the last check.
Hope this helps.
Best,
Jesse
Hello All,
I'm still playing around with the Minions. But I noticed that when I
turned off a Minion "server", the Minion node stays online without any
alarms. The Minion-Heartbeat and all detected services stay at 100%
availability. But when I disable an individual service (I.e. SNMP) it does
get detected and set to down.
I need to double check the following, but I also think that nodes behind
the Minion stay online too, even though they can't be reached. I need to
double check that, because I've been playing with outage-paths as well.
Is there something that needs to be configured? Or is there a log that I
could check to see why the Minion-Heartbeat stays at 100% even though the
Minion is turned off?
Best regards,
--
Marc D.
Real is just a matter of perception
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
To *unsubscribe* or change your subscription options, see the bottom of this page:https://lists.sourceforge.net/lists/listinfo/opennms-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
Marc Doesburg
2017-06-28 15:35:18 UTC
Permalink
Ok this is getting strange... I've done the following:

100% clean OpenNMS (Used CentOS 1.1 install script) and Minion. No nodes or
anything. Only Minion user added
- Minion-Heartbeat loss IS detected when Minion is stopped
- Minion-Heartbeat is detected when Minion is started
Edited Provisioning Requisitions
- Added ICMP, SSH
- Minion-Heartbeat loss IS detected when Minion is stopped
- Minion-Heartbeat stays DOWN even though Minion is started and sending
heartbeat (log:tail). Last Updated increases (Is current time)
- Restarted OpenNMS. Minion-Heartbeat is detected and UP again
- Stopped Minion again. Heartbeat DOWN detected
- Started Minion. Again Minion-Heartbeat stays DOWN.
- Restarted OpenNMS. Minion-Heartbeat DOWN, but ICMP poll is working
2017-06-28 16:20:45,397 | INFO | pool-35-thread-4 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | waiting for ping to /127.0.0.1 to finish
2017-06-28 16:20:45,401 | INFO | llback-Processor |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | got response for address /127.0.0.1, thread 24506, seq 1 with a
responseTime 1.408ms
2017-06-28 16:20:45,401 | INFO | pool-35-thread-4 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | finished waiting for ping to /127.0.0.1 to finish

- Restarted OpenNMS server. Minion-Heartbeat DOWN, but ICMP, SSH poll is
working
- Added SNMP to the Minion. And it's detected even though Minion-Heartbeat
is DOWN...

2017-06-28 16:40:25,583 | INFO | pool-40-thread-5 | BasicDetector
| 271 - org.opennms.opennms-provision-api - 20.0.0 |
isServiceDetected: Checking address: 192.168.20.31 for SSH capability on
port 22
2017-06-28 16:40:25,635 | INFO | pool-40-thread-5 | BasicDetector
| 271 - org.opennms.opennms-provision-api - 20.0.0 |
isServiceDetected: Attempting to connect to address: 192.168.20.31, port:
22, attempt: #0
2017-06-28 16:40:29,224 | WARN | Timer-17 | Snmp4JStrategy
| 221 - org.opennms.core.snmp.implementations.snmp4j - 20.0.0 |
processResponse: Timeout. Agent: SnmpAgentConfig[Address: 192.168.20.31,
ProxyForAddress: 127.0.0.1, Port: 161, Timeout: 1800, Retries: 1,
MaxVarsPerPdu: 10, MaxRepetitions: 2, MaxRequestSize: 65535, Version: v2c,
ReadCommunity: XXXXXXXX, WriteCommunity: XXXXXXXX], requestID=532867420
2017-06-28 16:40:35,646 | INFO | Timer-2 | HeartbeatProducer
| 240 - org.opennms.features.minion.heartbeat.producer - 20.0.0
| Sending heartbeat to Minion with id: 674f089e-5bfe-11e7-ac25-000c29b8695a
at location: pfffff-who-cares
2017-06-28 16:40:46,953 | INFO | pool-35-thread-8 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | waiting for ping to /127.0.0.1 to finish
2017-06-28 16:40:46,957 | INFO | llback-Processor |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | got response for address /127.0.0.1, thread 24506, seq 1 with a
responseTime 2.007ms
2017-06-28 16:40:46,958 | INFO | pool-35-thread-8 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | finished waiting for ping to /127.0.0.1 to finish

So OpenNMS thinks the Minion-Heartbeat is DOWN, but it's communicating with
the Minion and detecting service outage...

2017-06-28 17:25:58,389 | WARN | Timer-51 |
CamelRpcServerProcessor | 209 -
org.opennms.core.ipc.rpc.camel-impl - 20.0.0 | An error occured while
executing a call in SNMP.
java.util.concurrent.CompletionException:
org.opennms.netmgt.snmp.SnmpAgentTimeoutException: Timeout retrieving
'SnmpCollectors for 127.0.0.1' for 127.0.0.1.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1095)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture$BiApply.tryFire(CompletableFuture.java:1070)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)[:1.8.0_112]
at
org.opennms.netmgt.snmp.proxy.common.SnmpProxyRpcModule$4.complete(SnmpProxyRpcModule.java:143)[223:org.opennms.core.snmp.proxy.rpc-impl:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.finish(SnmpWalker.java:177)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.processError(SnmpWalker.java:162)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.handleTimeout(SnmpWalker.java:152)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.snmp4j.Snmp4JWalker.access$1300(Snmp4JWalker.java:48)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at
org.opennms.netmgt.snmp.snmp4j.Snmp4JWalker$Snmp4JResponseListener.onResponse(Snmp4JWalker.java:174)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at
org.snmp4j.Snmp$PendingRequest.run(Snmp.java:1878)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at java.util.TimerThread.mainLoop(Timer.java:555)[:1.8.0_112]
at java.util.TimerThread.run(Timer.java:505)[:1.8.0_112]
Caused by: org.opennms.netmgt.snmp.SnmpAgentTimeoutException: Timeout
retrieving 'SnmpCollectors for 127.0.0.1' for 127.0.0.1.
... 6 more
2017-06-28 17:25:58,536 | WARN | Timer-52 | Snmp4JStrategy
| 221 - org.opennms.core.snmp.implementations.snmp4j - 20.0.0 |
processResponse: Timeout. Agent: SnmpAgentConfig[Address: 127.0.0.1,
ProxyForAddress: null, Port: 161, Timeout: 1800, Retries: 1, MaxVarsPerPdu:
10, MaxRepetitions: 2, MaxRequestSize: 65535, Version: v2c, ReadCommunity:
XXXXXXXX, WriteCommunity: XXXXXXXX], requestID=959575986

[image: Inline image 1]

[image: Inline image 2]

[image: Inline image 3]
Post by Marc Doesburg
Hi Jesse,
Thank you for your reply and help.
I've checked the "Last Updated" and it's currently set to Jun 26, 11:16:21
AM. So that is not being updated. This is to be expected since the Minion
is still turned off. But the heart-beat is still at 100% available. Other
services as well, but you explained why that is, and that explanation is
very good.
Node: *Testlab Netfirst Minion* 6 Minions ac715340-54d4-11e7-9544-
000c29ae9b15 TestLabTest
- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list?filter=node%3D6>
- View Alarms
<http://opennms01.lab.netfirst.nl:8980/opennms/alarm/list.htm?filter=node%3D6>
- View Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=node%3D6>
- Asset Info
<http://opennms01.lab.netfirst.nl:8980/opennms/asset/modify.jsp?node=6>
- Hardware Info
<http://opennms01.lab.netfirst.nl:8980/opennms/hardware/list.jsp?node=6>
- Availability
<http://opennms01.lab.netfirst.nl:8980/opennms/element/availability.jsp?node=6>
- SSH
- Resource Graphs
<http://opennms01.lab.netfirst.nl:8980/opennms/graph/chooseresource.jsp?node=6&reports=all>
- Rescan
<http://opennms01.lab.netfirst.nl:8980/opennms/element/rescan.jsp?node=6>
- Admin
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/nodemanagement/index.jsp?node=6>
- Update SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/updateSnmp.jsp?node=6&ipaddr=127.0.0.1>
- Schedule Outage
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/sched-outages/editoutage.jsp?newName=Testlab+Netfirst+Minion&addNew=true&nodeID=6>
- View in Topology
<http://opennms01.lab.netfirst.nl:8980/opennms/topology?provider=Enhanced+Linkd&szl=1&focus-vertices=6>
SNMP Attributes
Name mdominiontest.lab.netfirst.nl
sysObjectID .1.3.6.1.4.1.8072.3.2.10
Location %%%
Contact %%%
Description Linux mdominiontest.lab.netfirst.nl
3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64
Availability
Availability (last 24 hours) 100.000%
127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=127.0.0.1>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=4>
100.000%
Minion-Heartbeat
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=15>
100.000%
SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=1>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=2>
100.000%
192.168.20.20
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=192.168.20.20>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=4>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=2>
100.000%
Minion-Heartbeat service on 127.0.0.1
- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
- Delete
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/deleteService>
General
Node Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Interface 127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?ipinterfaceid=219>
Polling Status Managed
Polling Package example1
Monitor Class org.opennms.netmgt.poller.monitors.MinionHeartbeatMonitor
Service Parameters
period 30000
Overall Availability
127.0.0.1 100.000%
Minion-Heartbeat 100.000%
Application Memberships (Edit
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/applications.htm?edit&ifserviceid=220>
)
This service is not a member of any applications
Recent Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
687
<http://opennms01.lab.netfirst.nl:8980/opennms/event/detail.jsp?id=687>
6/19/17 13:39:08 Warning The Minion-Heartbeat service has been discovered
on interface 127.0.0.1.
More...
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
Recent Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=service%3D15>
There have been no outages on this service in the last 24 hours.
Event 2295
Severity WarningNode Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Event Source Location Default (00000000-0000-0000-0000-000000000000)Node
Location TestLabTest
Time Jun 27, 2017 10:29:34 AMInterface
Service
UEI uei.opennms.org/internal/provisiond/nodeScanAborted
Log Message
The Node with Id: 6; ForeignSource: Minions; ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15
has aborted for the following reason: Aborting node scan : Agent failed
org.apache.camel.ExchangeTimedOutException: The OUT message was not
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination: temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
[Body is not logged]]
Description
A message from the Provisiond NodeScan lifecycle that a NodeScan has
The Node with Id: 6; ForeignSource: Minions; ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15
has aborted for the following reason: Aborting node scan : Agent failed
org.apache.camel.ExchangeTimedOutException: The OUT message was not
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination: temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
[Body is not logged]]
Operator Instructions
No instructions available.
I hope you can help me further with this information.
--
Marc D.
Real is just a matter of perception
Post by Jesse White
Hi Mark,
The Minion-Heartbeat service should go DOWN on the (automatically
provisioned) Minion node after the Minion has been offline for a few
minutes. If not, this warrants further investigation.
The Minions send a heartbeat message every 30 seconds, and the time-stamp
of the last heartbeat gets recorded in the database. You can view this
time-stamp in the "Last Updated" column on the "Admin -> Manage Minions"
page. This time-stamp should not continue increase when a Minion is offline.
As for the services at a location being reported as UP when the Minion is
offline, the current behavior is to keep the existing state (UP or DOWN)
unless we can actively confirm otherwise. Since we are unable to execute
the monitor (before the TTL expires), due to the Minion being offline, the
service maintains the state of the last check.
Hope this helps.
Best,
Jesse
Hello All,
I'm still playing around with the Minions. But I noticed that when I
turned off a Minion "server", the Minion node stays online without any
alarms. The Minion-Heartbeat and all detected services stay at 100%
availability. But when I disable an individual service (I.e. SNMP) it does
get detected and set to down.
I need to double check the following, but I also think that nodes behind
the Minion stay online too, even though they can't be reached. I need to
double check that, because I've been playing with outage-paths as well.
Is there something that needs to be configured? Or is there a log that I
could check to see why the Minion-Heartbeat stays at 100% even though the
Minion is turned off?
Best regards,
--
Marc D.
Real is just a matter of perception
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
To *unsubscribe* or change your subscription options, see the bottom of this page:https://lists.sourceforge.net/lists/listinfo/opennms-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
--
Marc D.

Real is just a matter of perception
Loading...