Monitor Alarm Events
Two event workflows, the Alarms card workflow and the Info card workflow, provide a view into the events occurring in the network. The Alarms card workflow tracks critical severity events, whereas the Info card workflow tracks all warning, info, and debug severity events.
To focus on events from a single device perspective, refer to Monitor Switches . To monitor informational alarms, refer to Monitor Informational Events.
Contents
Monitor All Alarms
The Alarms card workflow enables users to easily view and track critical severity alarms occurring anywhere in your network.
Alarms Card Workflow Summary
The small Alarms card displays:
-
total number of alarms
-
distribution of alarms
-
performance indicator
<insert image>
The medium Alarms card displays:
-
total number of alarms
-
total number, distribution, and trend of alarms triggered by network protocols and services
-
total number, distribution, and trend of alarms triggered by interfaces
-
total number, distribution, and trend of alarms triggered by trace
-
total number, distribution, and trend of alarms triggered by system
<insert image>
The large Alarms card contains two tabs.
-
Network Services tab which displays:
-
total number, distribution, and trend of alarms triggered by the BGP service
-
total number, distribution, and trend of alarms triggered by the CLAG service
-
total number, distribution, and trend of alarms triggered by the EVPN service
-
total number, distribution, and trend of alarms triggered by the LLDP service
-
total number, distribution, and trend of alarms triggered by OSPF? LNV? VLAN, VXLAN, NTP, Sensors, MTU, PTM, licenses?
-
alarms by most recent
-
devices by most alarms
-
-
System, Trace, and Interfaces tab which displays: ( no wires )
-
total number, distribution, and trend of alarms triggered by links
-
total number, distribution, and trend of alarms triggered by ports
-
total number, distribution, and trend of alarms triggered by ???
-
interface alarms by most recent
-
devices by most interface alarms
-
total number, distribution, and trend of traces with warnings
-
total number, distribution, and trend of failed traces
-
trace alarms by most recent
-
devices by most trace alarms
-
total number, distribution, and trend of alarms triggered by the NTP service
-
total number, distribution, and trend of alarms triggered by NetQ Agents
-
total number, distribution, and trend of alarms triggered by invalid licenses
-
total number, distribution, and trend of alarms triggered by device sensors
-
system alarms by most recent
-
<insert images>
The full screen Alarms card provides tabs for all events and all devices.
<insert image>
View Alarm Status Summary
A summary of the critical alarms in the network includes the number of alarms, a trend indicator, a performance indicator, and a distribution of those alarms. The trend indicator is based on the count of alarms that have occurred compared to the count in the last two time periods:
-
Upward facing arrow: alarm count is higher that the last two time periods, an increasing trend
-
Downward facing arrow: alarm count is lower than the last two time periods, a decreasing trend
-
No arrow: count is unchanged, trend is steady.
The performance indicator is based on a set of pre-defined thresholds, where:
-
Low: alarm count is less than x
-
Med: alarm count is between x and y
-
High: alarm count is more than y
To view the summary, open the small Alarms card.
View the Distribution of Alarms
It is helpful to know where and when alarms are occurring in your network. The Alarms card workflow enables you to see the distribution of alarms based on its source—network services, interfaces, traces or other system service. You can also view the trend of alarms in each source category.
To view the alarm distribution, open the medium Alarms card. Scroll down to view all of the charts.
Monitor Network Services Alarms
The Alarms card workflow enables users to easily view and track critical severity alarms triggered by network services.
View All Network Services Alarms
You can view only the alarms associated with network services using the Alarms card workflow. Network services alarms are broken into the following categories: BGP, OSPF , EVPN, LNV , CLAG, and LLDP. You can sort alarms based on their occurrence or view devices with the most network services alarms.
To view network services alarms, open the large Alarms card. Network Services alarms are shown by default.
From this card, you can view the distribution of alarms for each of the categories over time. Scroll down to view any hidden charts. A list of the associated individual network services alarms is also displayed.
View Devices with the Most Network Services Alarms
By default, the list of alarms for all network services is displayed when viewing the large cards. You can filter instead for the devices that have the most network services alarms.
To view devices with the most alarms, open the large Alarms card, and then select Devices by Most Issues from the dropdown.
From this card, you can:
-
Hover over an individual charts to filter the list on the right to focus on only those devices associated with that category. Click on a chart to persist the table changes; the category is highlighted and a checkmark is shown next to its title, while other category charts are faded.
-
Click and drag the vertical lines left and right on the charts to narrow the time period even further.
-
Change the time period for the data to compare with a prior time. If the same devices are consistently indicating the most alarms, you might want to look more carefully at those devices using the Switches card workflow.
-
Click Show All Events to investigate all events with Network Services alarms in the full screen card.
View All Alarms for a Specific Network Service
You can view the alarms for a given network service instead of alarms for all services.
To view all alarms for a given service:
-
Open the large Alarms card
-
Hover over an individual service graph. This causes the alarm list to be filtered by the service type.
-
Optionally, click the checkbox next to a give service to retain the filtered list.
You can select more than one service by clicking the checkbox next to multiple services.
View Devices with the Most Alarms for a Specific Network Service
You can view devices that have the most alarms associated with a given network service instead of alarms for all services.
To view devices with the most alarms for a given service:
-
Open the large Alarms card.
-
Select Devices by Most Issues from the dropdown.
-
Hover over an individual service graph.
This causes the device list to be filtered by the service type. -
Optionally, click the checkbox next to a given service to retain the filtered list.
You can select more than one service by clicking the checkbox next to multiple services.
Monitor Interface Alarms
The Alarms card workflow enables users to easily view and track critical severity alarms triggered by interfaces.
No wires to work from.
View All Interface Alarms
View Interfaces with the Most Link Alarms
View Interfaces with the Most Port Alarms
Monitor Trace Alarms
The Alarms card workflow enables users to easily view and track critical severity alarms triggered by trace.
No wires to work from.
View All Trace Alarms
View Traces with the Most Warnings
View Traces with the Most Failures
Monitor System Alarms
The Alarms card workflow enables users to easily view and track critical severity alarms triggered by system services.
No wires to work from.
View All System Alarms
View NTP Service Alarms
View Devices with the Most NTP Service Alarms
View NetQ Agent Alarms
View Devices with the Most NetQ Agent Alarms
View License Alarms
View Device Sensor Alarms
View Devices with the Most Sensor Alarms
Alarms Reference
The following table lists alarm messages organized by message type, by default. Click the column header to sort the list by that characteristic. Click
in any column header to toggle the sort order between A-Z and Z-A. Recommended actions suggest NetQ CLI commands and Cumulus Linux NCLU commands for further investigation.
The messages can be viewed in syslog or through third-party notification applications. For details about configuring notifications using the GUI, refer to Notification Management. For details about configuring notifications using the NetQ CLI, refer to the Deployment Guide, Configure Optional NetQ Capabilities.
Type |
Trigger |
Severity |
Message Format |
Example |
|
|
|||||
agent |
NetQ Agent on device has not been heard from in over 15 seconds |
Critical |
Rotten Agent |
Rotten Agent |
|
bgp |
BGP session with remote peer failed to establish due to reasons such as link down or peer not enabled |
Critical |
BGP session with peer @peerhost (@peer vrf @vrf) failed, reason: @reason |
BGP session with peer spine02 (swp3 vrf default) failed, reason: link down |
|
bgp |
BGP session state changed from established to failed |
Critical |
BGP session with peer @peer @peerhost @neighbor vrf @vrf session state changed from established to failed |
BGP session with peer swp3 leaf12 leaf13 vrf mgmt session state changed from established to failed |
|
bgp |
Address Family Identifiers/Subsequent AFIs enabled on remote peer but not local peer |
Critical |
BGP session with peer @peerhost @peer: AFI/SAFI @families not activated on node |
BGP session with peer server3 swp6: AFI/SAFI EVPN not activated on node |
|
bgp |
Address Family Identifiers/Subsequent AFIs enabled on local peer but not on remote peer |
Critical |
BGP session with peer @peerhost @peer: AFI/SAFI @families not activated on peer |
BGP session with peer leaf27 swp2: AFI/SAFI ipv6 not activated on peer |
|
bgp |
Address Family Identifiers/Subsequent AFIs not enabled on either the local or remote peers |
Critical |
BGP session with peer @peerhost @peer: AFI/SAFI @families not activated on session |
BGP session with peer spine02 swp9: AFI/SAFI ipv4 not activated on session |
|
bgp |
Router id conflict detected between two hosts |
Critical |
Router id @router_id conflict detected between @sess_peer and @router |
Router id 13467 conflict detected between router3 and router5 |
|
cable |
Local port is missing physical connector |
Critical |
Port cage empty on @ifname, peer @peer @peer_if |
Port cage empty on swp16, peer leaf17 swp15 |
|
cable |
Peer port is missing physical connector |
Critical |
Peer port cage empty on @ifname, peer @peer @peer_if |
Peer port cage empty on @ifname, peer @peer @peer_if |
|
cable |
Administrative state of remote peer does not match the state of local peer |
Critical |
@ifname admin state @state, mismatched with peer @peer @peer_if state @peer_state |
swp3 admin state up, mismatched with peer spine04 swp 2 state down |
|
cable |
Interface operational state on the two ends of the link is not the same |
Critical |
@ifname oper state @state, mismatched with peer @peer @peer_if state @peer_state |
swp5 oper state up, mismatched with peer leaf11 swp29 state down |
|
cable |
Link speed is not the same on both ends of the link |
Critical |
@ifname speed @speed, mismatched with peer @peer @peer_if speed @peer_speed |
swp2 speed 10, mismatched with peer server02 speed 40 |
|
cable |
Auto-negotiation setting on remote peer does not match setting on local peer |
Critical |
@ifname autoneg @autoneg, mismatched with peer @peer @peer_if autoneg @peer_autoneg |
swp12 autoneg on, mismatched with peer spine01 swp04 autoneg off |
|
cable |
Link is flapping |
Critical |
@ifname @msg |
swp8 Link flapped 6 times in last 5 mins |
|
clag |
CLAG backup IP address on local peer is not also an address on the remote peer of this CLAG session |
Critical |
Backup IP @ip does not belong to peer @peer |
Backup IP 192.168.33 does not belong to peer leaf13 |
|
clag |
CLAG sysmac of current session is a duplicate across multiple nodes |
Critical |
Duplicate sysmac with @node_name |
Duplicate sysmac with leaf01 |
|
clag |
MSTP (multiple spanning tree protocol) is not running |
Critical |
MSTP not running |
MSTP not running |
|
clag |
Spanning Tree bridge ID is not the same on the local CLAG node and its remote peer |
Critical |
Bridge ID mismatch with peer |
Bridge ID mismatch with peer |
|
clag |
Connectivity with CLAG peer failed |
Critical |
Session connectivity with peer failed |
Session connectivity with peer failed |
|
clag |
CLAG peerlink is not part of Spanning Tree |
Critical |
Peerlink @peerlink not in MSTP |
Peerlink swp4 not in MSTP |
|
clag |
CLAG peerlink is not a bridge member port |
Critical |
Peerlink @peerlink not in bridge |
Peerlink swp2 not in bridge |
|
clag |
CLAG bond is in Conflicted state |
Critical |
Bond @bond in Conflicted state due to @reason |
Bond 4 in Conflicted state due to peerlink down Bond 3 in Conflicted state due to lacp partner mac mismatch |
|
clag |
CLAG bond is in protodown state |
Critical |
Bond @bond in protodown state due to @reason |
Bond 2 in protodown state due to startup-delay Bond 6 in protodown state due to isl-down |
|
clag |
MSTP daemon, mstpd, and the CLAG daemon, clagd, have different views of a bond's dual-connected state |
Critical |
Bond @bond dual-connected state mismatched with MSTP |
Bond 7 dual-connected state mismatched with MSTP |
|
clag |
A CLAG bond on each switch of the CLAG pair has inconsistent maximum transmission unit (MTU) |
Critical |
Dually connected bond @bond MTU mismatch with peer @peer:@peer_if |
Dually connected bond 3 MTU mismatch with peer leaf12:swp45 |
|
clag |
A CLAG bond on each switch of the CLAG pair has mismatched private VLAN ID (PVID) |
Critical |
Dually connected bond @bond PVID mismatch with peer @peer:@peer_if |
Dually connected bond 5 PVID mismatch with peer spine04:swp2 |
|
clag |
A CLAG bond on each switch of the CLAG pair has mismatched VLAN membership |
Critical |
Dually connected bond @bond VLANs mismatch with peer @peer:@peer_if |
Dually connected bond 23 VLANs mismatch with peer leaf02:swp4 |
|
clag |
A VXLAN interface has mismatched VXLAN ID (VNID) on each switch of the CLAG pair |
Critical |
VXLAN @vxlan_if VNI @vni mismatched with peer @peer:@peer_if |
VXLAN xx VNI 12 mismatched with peer TOR-13:swp6 |
|
clag |
VXLAN anycast gateway IP address is mismatched between the two switches of a CLAG pair |
Critical |
VXLAN anycast address mismatched on peer @peer |
VXLAN anycast address mismatched on peer leaf31 |
|
clag |
Local CLAG node role changed from primary to secondary or vice versa |
Critical |
Role changed from @old_role to @new_role |
Role changed from primary to secondary |
|
clag |
CLAG remote peer role changed from secondary to primary or vice versa |
Critical |
Peer role changed from @old_role to @new_role |
Peer role changed from secondary to primary |
|
clag |
CLAG remote peer state changed from up to down |
Critical |
Peer state changed to down |
Peer state changed from up to down |
|
configdiff |
Configuration file deleted on a device |
Critical |
@hostname config file @type was deleted |
spine03 config file /etc/frr/frr.conf was deleted |
|
evpn |
Advertise-All-VNI flag disabled |
Critical |
VNI @vni advertise-all-vni flag not enabled |
VNI 3 advertise-all-vni flag not enabled |
|
evpn |
VTEP missing from replication list |
Critical |
VNI @vni VTEP @ip not in replication list |
VNI 13 VTEP 192.168.22 not in replication list |
|
evpn |
Same MAC address appears on multiple hosts |
Critical |
Duplicate Mac @mac VLAN @vlan at @h1: @lk1 and @h2: @lk2 |
Duplicate Mac A0:00:00:00:00:32 VLAN 13 at leaf02:swp3 and leaf04: swp24 |
|
evpn |
A VLAN MAC address is not the same for two remote destinations |
Critical |
Mac @mac VLAN @vlan remote dest @vtep1 inconsistent with @vtep2 |
Mac A0:00:00:00:00:11 VLAN 4 remote dest 10.0.0.11 inconsistent with 10.0.0.8 |
|
evpn |
Requested VNI is not detected in the Cumulus Linux kernel |
Critical |
VNI @vni not in kernel |
VNI 11 not in kernel |
|
evpn |
VTEP's IP address is either not reachable or a duplicate across multiple VTEPs |
Critical |
VTEP @vtep: @alert |
VTEP 10.0.0.4: No route to VTEP VTEP 10.0.0.4: IP claimed by more than 2 nodes {leaf04, leaf11, spine02, spine04} VTEP 10.0.0.4: IP claimed by 2 unconnected VTEPs {10.0.0.4, 10.0.0.7} |
|
evpn |
A remote destination is unknown |
Critical |
Mac @mac VLAN @vlan unknown remote dest @vtep |
Mac A0:00:00:00:00:33 VLAN 4 unknown remote dest 10.0.0.12 |
|
license |
License state is missing or invalid |
Critical |
License check failed, name @lic_name state @state |
License check failed, name agent.lic state invalid |
|
lnv |
VXLAN service node daemon, vxsnd, is not running |
Critical |
vxsnd service not running |
vxsnd service not running |
|
lnv |
vxsnd peer membership is inconsistent among two or more peers in a cluster |
Critical |
vxsnd peer membership inconsistent |
vxsnd peer membership inconsistent |
|
lnv |
VNI database is inconsistent among peers in VXLAN service node cluster |
Critical |
vxsnd vni database inconsistent |
vxsnd vni database inconsistent |
|
lnv |
VXLAN replication mode is inconsistent among peers in VXLAN service node cluster |
Critical |
vxsnd replication mode @mode inconsistent |
vxsnd replication mode HER inconsistent vxsnd replication mode SVC inconsistent |
|
lnv |
VXLAN registration daemon, vxrd, is not running |
Critical |
vxrd service not running |
vxrd service not running |
|
lnv |
VXLAN registration daemon is configured to point to a VXLAN service node daemon IP address that is unknown |
Critical |
vxrd points to unknown vxsnd @snd_ip |
vxrd points to unknown vxsnd 192.168.54 |
|
lnv |
VXLAN registration daemon's VNI database is inconsistent with database of the VXLAN service node daemon |
Critical |
VNI @vni database inconsistent with vxsnd |
VNI 24 database inconsistent with vxsnd |
|
lnv |
A VNI in the VXLAN registration daemon's database is not found in the service node daemon's database |
Critical |
VNI @vni not in vxsnd database |
VNI 5 not in vxsnd database |
|
lnv |
VXLAN interface is not in Up state |
Critical |
vxlan @vxlan vni @vni in @state state |
vxlan 1003 vni 6 in down state |
|
link |
Link operational state changed from up to down |
Critical |
HostName @hostname changed state from @old_state to @new_state Interface:@ifname |
HostName leaf01 changed state from up to down Interface:swp34 |
|
mtu |
MTU mismatch detected between device pair |
Critical |
Interface @link mtu @mtu mismatch with @peer interface @peer_if mtu @peer_mtu |
Interface swp4 mtu 9600 mismatch with server02 interface swp3 mtu 1500 |
|
mtu |
Missing bond information on peer node |
Critical |
Bond @bond mtu @mtu, No peer bond info |
Bond 3 mtu 1500, No peer bond info |
|
mtu |
Missing CLAG peerlink information on peer node |
Critical |
Clag bond @bond mtu @mtu, peer @peer, no peerlink info |
Clag bond 4 mtu 9600, peer leaf13, no peerlink info |
|
mtu |
Missing link information on peer node |
Critical |
Link @link mtu @mtu peer @peer, no peer link info |
Link swp35 mtu 1500 peer spine01, no peer link info |
|
ntp |
NTP is not synchronized on the device; protocol is not in Sync state |
Critical |
Sync state changed from @old_state to @new_state for @hostname |
Sync state changed from not sync to in sync for leaf11 |
|
ospf |
OSPF router id of this host conflicts with another host |
Critical |
@ifname Router ID conflict with @id |
swp5 Router ID conflict with leaf4 |
|
ospf |
OSPF HELLO time is not the same on the local host and its remote peer |
Critical |
@ifname hello time mismatch with peer @peer |
swp16 hello time mismatch with peer leaf21 |
|
ospf |
OSPF DEAD time is not the same on the local host and its remote peer |
Critical |
@ifname dead time mismatch with peer @peer |
swp13 dead time mismatch with peer spine02 |
|
ospf |
Link MTU is not the same for the local host and its remote OSPF peer |
Critical |
@ifname mtu mismatch with peer @peer |
swp4 mtu mismatch with peer server04 |
|
ospf |
OSPF Area ID is not the same on the local host and its remote peer |
Critical |
@ifname area ID mismatch with peer @peer |
swp14 area ID mismatch with peer leaf34 |
|
ospf |
OSPF Network type is not the same on the local host and its remote peer |
Critical |
@ifname network type mismatch with peer @peer |
swp12 network type mismatch with peer leaf6 |
|
ospf |
OSPF service is not configured on the peer node |
Critical |
@ifname no OSPF config on peer @peer |
swp2 no OSPF config on peer leaf7 |
|
ospf |
A particular peer interface does not have the OSPF service enabled |
Critical |
@ifname OSPF service not enabled on peer @peer |
swp9 OSPF service not enabled on peer spine04 |
|
ospf |
OSPF service is in error state on the peer node |
Critical |
@ifname OSPF service error on peer @peer |
swp4 OSPF service error on peer leaf11 |
|
ospf |
OSPF service is in shutdown state on the peer node |
Critical |
@ifname OSPF service shutdown on peer @peer |
swp17 OSPF service shutdown on peer spine01 |
|
sensor |
A temperature, fan, or power supply unit sensor has passed a critical threshold |
Critical |
Sensor @sensor state @state value @value msg @msg |
Sensor temp state critical value x °F msg @msg Sensor fan state bad value x msg msg Sensor psu state bad value x msg msg |
|
sensor |
A temperature, fan, or power supply unit sensor has changed from low or warning to critical |
Critical |
Sensor @sensor state changed from @old_s_state to @new_s_state |
Sensor temperature state changed from low to critical |
|
sensor |
A temperature, fan, or power supply unit sensor has crossed the maximum threshold for that sensor |
Critical |
Sensor @sensor max value @new_s_max exceeds threshold @ new _s_crit |
Sensor fan max value some value exceeds the threshold some value |
|
sensor |
A temperature, fan, or power supply unit sensor has crossed the minimum threshold for that sensor |
Critical |
Sensor @sensor min value @new_s_lcrit fall behind threshold @ new _s_min |
Sensor psu min value some value fell below threshold some value |
|
services |
The process ID for a service changed and the service status changed from down to up (why is this critical and not info?) |
Critical |
Service @name with old pid @old_pid changed to @new_pid, status changed from @old_status to @new_status |
Service bgp with old pid 12323 changed to 27651, status changed from down to up |
|
services |
The process ID for a service changed and the service status changed from up to down |
Criticial |
Service @name with old pid @old_pid changed to @new_pid, status changed from @old_status to @new_status |
Service lldp with old pid 32846 changed to 17493, status changed from up to down |
|
trace |
Unable to make connection along the trace path |
Critical |
Path incomplete, ends at node @hostname |
Path incomplete, ends at node spine04 |
|
trace |
Unable to connect to destination device |
Critical |
No valid path to destination |
No valid path to destination |
|
trace |
Interface on path is down |
Critical |
Link @hostname:@link is down |
Link leaf03:swp6 is down |
|
trace |
Routing loop detected |
Critical |
Routing loop: node @hostname vrf @vrf visited twice |
Routing loop: node leaf23 vrf default visited twice |
|
trace |
Bridging loop detected |
Critical |
Bridging loop: node @hostname vlan @vlan visited twice |
Bridging loop: node spine02 vlan 11 visited twice |
|
trace |
Node along path was unreachable |
Critical |
Tracing stopped at rotten node @hostname |
Tracing stopped at rotten node leaf17 |
|
trace |
Source and destination addresses are of different scope and there is no path between them |
Critical |
No valid paths between link local and non link-local IP addresses |
No valid paths between link local and non link-local IP addresses |
|
trace |
There is no valid path between a pair of VTEPs |
Critical |
No underlay path from @src_ip to @dst_ip for vxlan @vxlan |
No underlay path from 192.168.35 to 192.168.12 for vxlan 1005 |
|
vlan |
VLAN membership is not the same for both ends of the interface |
Critical |
@link VLAN set (@vlans) mismatch with peer @peerhost:@peer_ifname (@peer_vlans) |
swp3 VLAN set (1002 1005 1230) mismatch with peer leaf06:swp4 (1002 1016 1230) |
|
vlan |
PVID is not the same for both ends of the interface |
Critical |
@link PVID (@pvid) mismatch with peer @peerhost:@peer_ifname (@peer_pvid) |
swp7 PVID (10) mismatch with peer spine01:swp3 (9) |
|
vxlan |
Broadcast, unknown unicast, and multicast ( BUM) replication list of a VNI is inconsistent among all the VTEPs in the network |
Critical |
VNI @vni replication list inconsistent |
VNI 14 replication list inconsistent |
|
vxlan |
A VNI is associated with different VLANs on different VTEPs in the network |
Critical |
VNI @vni mapped to inconsistent VLAN @vlan1 |
VNI 6 mapped to inconsistent VLAN 7 |
|
vxlan |
A VXLAN interface on a node has changed state from up to down |
Critical |
vxlan device @vxlan in @state state |
vxlan device 1002 in down state |