Ok this is getting strange... I've done the following:
100% clean OpenNMS (Used CentOS 1.1 install script) and Minion. No nodes or
anything. Only Minion user added
- Minion-Heartbeat loss IS detected when Minion is stopped
- Minion-Heartbeat is detected when Minion is started
Edited Provisioning Requisitions
- Added ICMP, SSH
- Minion-Heartbeat loss IS detected when Minion is stopped
- Minion-Heartbeat stays DOWN even though Minion is started and sending
heartbeat (log:tail). Last Updated increases (Is current time)
- Restarted OpenNMS. Minion-Heartbeat is detected and UP again
- Stopped Minion again. Heartbeat DOWN detected
- Started Minion. Again Minion-Heartbeat stays DOWN.
- Restarted OpenNMS. Minion-Heartbeat DOWN, but ICMP poll is working
2017-06-28 16:20:45,397 | INFO | pool-35-thread-4 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | waiting for ping to /127.0.0.1 to finish
2017-06-28 16:20:45,401 | INFO | llback-Processor |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | got response for address /127.0.0.1, thread 24506, seq 1 with a
responseTime 1.408ms
2017-06-28 16:20:45,401 | INFO | pool-35-thread-4 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | finished waiting for ping to /127.0.0.1 to finish
- Restarted OpenNMS server. Minion-Heartbeat DOWN, but ICMP, SSH poll is
working
- Added SNMP to the Minion. And it's detected even though Minion-Heartbeat
is DOWN...
2017-06-28 16:40:25,583 | INFO | pool-40-thread-5 | BasicDetector
| 271 - org.opennms.opennms-provision-api - 20.0.0 |
isServiceDetected: Checking address: 192.168.20.31 for SSH capability on
port 22
2017-06-28 16:40:25,635 | INFO | pool-40-thread-5 | BasicDetector
| 271 - org.opennms.opennms-provision-api - 20.0.0 |
isServiceDetected: Attempting to connect to address: 192.168.20.31, port:
22, attempt: #0
2017-06-28 16:40:29,224 | WARN | Timer-17 | Snmp4JStrategy
| 221 - org.opennms.core.snmp.implementations.snmp4j - 20.0.0 |
processResponse: Timeout. Agent: SnmpAgentConfig[Address: 192.168.20.31,
ProxyForAddress: 127.0.0.1, Port: 161, Timeout: 1800, Retries: 1,
MaxVarsPerPdu: 10, MaxRepetitions: 2, MaxRequestSize: 65535, Version: v2c,
ReadCommunity: XXXXXXXX, WriteCommunity: XXXXXXXX], requestID=532867420
2017-06-28 16:40:35,646 | INFO | Timer-2 | HeartbeatProducer
| 240 - org.opennms.features.minion.heartbeat.producer - 20.0.0
| Sending heartbeat to Minion with id: 674f089e-5bfe-11e7-ac25-000c29b8695a
at location: pfffff-who-cares
2017-06-28 16:40:46,953 | INFO | pool-35-thread-8 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | waiting for ping to /127.0.0.1 to finish
2017-06-28 16:40:46,957 | INFO | llback-Processor |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | got response for address /127.0.0.1, thread 24506, seq 1 with a
responseTime 2.007ms
2017-06-28 16:40:46,958 | INFO | pool-35-thread-8 |
SinglePingResponseCallback | 264 - org.opennms.opennms-icmp-api -
20.0.0 | finished waiting for ping to /127.0.0.1 to finish
So OpenNMS thinks the Minion-Heartbeat is DOWN, but it's communicating with
the Minion and detecting service outage...
2017-06-28 17:25:58,389 | WARN | Timer-51 |
CamelRpcServerProcessor | 209 -
org.opennms.core.ipc.rpc.camel-impl - 20.0.0 | An error occured while
executing a call in SNMP.
java.util.concurrent.CompletionException:
org.opennms.netmgt.snmp.SnmpAgentTimeoutException: Timeout retrieving
'SnmpCollectors for 127.0.0.1' for 127.0.0.1.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.biApply(CompletableFuture.java:1095)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture$BiApply.tryFire(CompletableFuture.java:1070)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)[:1.8.0_112]
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)[:1.8.0_112]
at
org.opennms.netmgt.snmp.proxy.common.SnmpProxyRpcModule$4.complete(SnmpProxyRpcModule.java:143)[223:org.opennms.core.snmp.proxy.rpc-impl:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.finish(SnmpWalker.java:177)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.processError(SnmpWalker.java:162)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.SnmpWalker.handleTimeout(SnmpWalker.java:152)[220:org.opennms.core.snmp.api:20.0.0]
at
org.opennms.netmgt.snmp.snmp4j.Snmp4JWalker.access$1300(Snmp4JWalker.java:48)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at
org.opennms.netmgt.snmp.snmp4j.Snmp4JWalker$Snmp4JResponseListener.onResponse(Snmp4JWalker.java:174)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at
org.snmp4j.Snmp$PendingRequest.run(Snmp.java:1878)[221:org.opennms.core.snmp.implementations.snmp4j:20.0.0]
at java.util.TimerThread.mainLoop(Timer.java:555)[:1.8.0_112]
at java.util.TimerThread.run(Timer.java:505)[:1.8.0_112]
Caused by: org.opennms.netmgt.snmp.SnmpAgentTimeoutException: Timeout
retrieving 'SnmpCollectors for 127.0.0.1' for 127.0.0.1.
... 6 more
2017-06-28 17:25:58,536 | WARN | Timer-52 | Snmp4JStrategy
| 221 - org.opennms.core.snmp.implementations.snmp4j - 20.0.0 |
processResponse: Timeout. Agent: SnmpAgentConfig[Address: 127.0.0.1,
ProxyForAddress: null, Port: 161, Timeout: 1800, Retries: 1, MaxVarsPerPdu:
10, MaxRepetitions: 2, MaxRequestSize: 65535, Version: v2c, ReadCommunity:
XXXXXXXX, WriteCommunity: XXXXXXXX], requestID=959575986
[image: Inline image 1]
[image: Inline image 2]
[image: Inline image 3]
Post by Marc DoesburgHi Jesse,
Thank you for your reply and help.
I've checked the "Last Updated" and it's currently set to Jun 26, 11:16:21
AM. So that is not being updated. This is to be expected since the Minion
is still turned off. But the heart-beat is still at 100% available. Other
services as well, but you explained why that is, and that explanation is
very good.
Node: *Testlab Netfirst Minion* 6 Minions ac715340-54d4-11e7-9544-
000c29ae9b15 TestLabTest
- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list?filter=node%3D6>
- View Alarms
<http://opennms01.lab.netfirst.nl:8980/opennms/alarm/list.htm?filter=node%3D6>
- View Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=node%3D6>
- Asset Info
<http://opennms01.lab.netfirst.nl:8980/opennms/asset/modify.jsp?node=6>
- Hardware Info
<http://opennms01.lab.netfirst.nl:8980/opennms/hardware/list.jsp?node=6>
- Availability
<http://opennms01.lab.netfirst.nl:8980/opennms/element/availability.jsp?node=6>
- SSH
- Resource Graphs
<http://opennms01.lab.netfirst.nl:8980/opennms/graph/chooseresource.jsp?node=6&reports=all>
- Rescan
<http://opennms01.lab.netfirst.nl:8980/opennms/element/rescan.jsp?node=6>
- Admin
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/nodemanagement/index.jsp?node=6>
- Update SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/updateSnmp.jsp?node=6&ipaddr=127.0.0.1>
- Schedule Outage
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/sched-outages/editoutage.jsp?newName=Testlab+Netfirst+Minion&addNew=true&nodeID=6>
- View in Topology
<http://opennms01.lab.netfirst.nl:8980/opennms/topology?provider=Enhanced+Linkd&szl=1&focus-vertices=6>
SNMP Attributes
Name mdominiontest.lab.netfirst.nl
sysObjectID .1.3.6.1.4.1.8072.3.2.10
Location %%%
Contact %%%
Description Linux mdominiontest.lab.netfirst.nl
3.10.0-514.21.1.el7.x86_64 #1 SMP Thu May 25 17:04:51 UTC 2017 x86_64
Availability
Availability (last 24 hours) 100.000%
127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=127.0.0.1>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=4>
100.000%
Minion-Heartbeat
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=15>
100.000%
SNMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=1>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=127.0.0.1&service=2>
100.000%
192.168.20.20
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?node=6&intf=192.168.20.20>
100.000%
ICMP
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=4>
100.000%
SSH
<http://opennms01.lab.netfirst.nl:8980/opennms/element/service.jsp?node=6&intf=192.168.20.20&service=2>
100.000%
Minion-Heartbeat service on 127.0.0.1
- View Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
- Delete
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/deleteService>
General
Node Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Interface 127.0.0.1
<http://opennms01.lab.netfirst.nl:8980/opennms/element/interface.jsp?ipinterfaceid=219>
Polling Status Managed
Polling Package example1
Monitor Class org.opennms.netmgt.poller.monitors.MinionHeartbeatMonitor
Service Parameters
period 30000
Overall Availability
127.0.0.1 100.000%
Minion-Heartbeat 100.000%
Application Memberships (Edit
<http://opennms01.lab.netfirst.nl:8980/opennms/admin/applications.htm?edit&ifserviceid=220>
)
This service is not a member of any applications
Recent Events
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
687
<http://opennms01.lab.netfirst.nl:8980/opennms/event/detail.jsp?id=687>
6/19/17 13:39:08 Warning The Minion-Heartbeat service has been discovered
on interface 127.0.0.1.
More...
<http://opennms01.lab.netfirst.nl:8980/opennms/event/list.htm?filter=node%3D6&filter=interface%3D127.0.0.1&filter=service%3D15>
Recent Outages
<http://opennms01.lab.netfirst.nl:8980/opennms/outage/list.htm?filter=service%3D15>
There have been no outages on this service in the last 24 hours.
Event 2295
Severity WarningNode Testlab Netfirst Minion
<http://opennms01.lab.netfirst.nl:8980/opennms/element/node.jsp?node=6>
Event Source Location Default (00000000-0000-0000-0000-000000000000)Node
Location TestLabTest
Time Jun 27, 2017 10:29:34 AMInterface
Service
UEI uei.opennms.org/internal/provisiond/nodeScanAborted
Log Message
The Node with Id: 6; ForeignSource: Minions; ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15
has aborted for the following reason: Aborting node scan : Agent failed
org.apache.camel.ExchangeTimedOutException: The OUT message was not
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination: temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
[Body is not logged]]
Description
A message from the Provisiond NodeScan lifecycle that a NodeScan has
The Node with Id: 6; ForeignSource: Minions; ForeignId:ac715340-54d4-11e7-9544-000c29ae9b15
has aborted for the following reason: Aborting node scan : Agent failed
org.apache.camel.ExchangeTimedOutException: The OUT message was not
Camel-ID-opennms01-lab-netfirst-nl-45036-1497871823767-0-183145 not
received on destination: temp-queue://ID:opennms01.lab.netfirst.nl-46021-1497871808006-4:5:1.
[Body is not logged]]
Operator Instructions
No instructions available.
I hope you can help me further with this information.
--
Marc D.
Real is just a matter of perception
Post by Jesse WhiteHi Mark,
The Minion-Heartbeat service should go DOWN on the (automatically
provisioned) Minion node after the Minion has been offline for a few
minutes. If not, this warrants further investigation.
The Minions send a heartbeat message every 30 seconds, and the time-stamp
of the last heartbeat gets recorded in the database. You can view this
time-stamp in the "Last Updated" column on the "Admin -> Manage Minions"
page. This time-stamp should not continue increase when a Minion is offline.
As for the services at a location being reported as UP when the Minion is
offline, the current behavior is to keep the existing state (UP or DOWN)
unless we can actively confirm otherwise. Since we are unable to execute
the monitor (before the TTL expires), due to the Minion being offline, the
service maintains the state of the last check.
Hope this helps.
Best,
Jesse
Hello All,
I'm still playing around with the Minions. But I noticed that when I
turned off a Minion "server", the Minion node stays online without any
alarms. The Minion-Heartbeat and all detected services stay at 100%
availability. But when I disable an individual service (I.e. SNMP) it does
get detected and set to down.
I need to double check the following, but I also think that nodes behind
the Minion stay online too, even though they can't be reached. I need to
double check that, because I've been playing with outage-paths as well.
Is there something that needs to be configured? Or is there a log that I
could check to see why the Minion-Heartbeat stays at 100% even though the
Minion is turned off?
Best regards,
--
Marc D.
Real is just a matter of perception
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Please read the OpenNMS Mailing List FAQ:http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
To *unsubscribe* or change your subscription options, see the bottom of this page:https://lists.sourceforge.net/lists/listinfo/opennms-discuss
------------------------------------------------------------
------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
http://www.opennms.org/index.php/Mailing_List_FAQ
opennms-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/opennms-discuss
--
Marc D.
Real is just a matter of perception