Monitor Informational Events

Two event workflows, the Alarms card workflow and the Info card workflow, provide a view into the events occurring in the network. The Alarms card workflow tracks critical severity events, whereas the Info card workflow tracks all warning, info, and debug severity events.

To focus on events from a single device perspective, refer to Monitor Switches. To monitor alarms, refer to Monitor Alarm Events.

Info Card Workflow Summary

The Info card workflow enables users to easily view and track informational alarms occurring anywhere in your network.

The small Info card displays:

  • total number and distribution of info events

  • total number and distribution of alarms

  • indication of when and relative number of configuration changes

images/download/attachments/9012102/image2019-2-5_16_54_15.png

The medium Info card displays:

  • types of info events that occurred

  • total number and distribution of info events

  • total number and distribution of alarms

  • total number and distribution of configuration changes

<insert image>

The large Alarms card displays:

  • types of info events that occurred

  • total number and distribution of info events

  • total number and distribution of alarms

  • total number and distribution of configuration changes

  • info events by most recent

  • devices by most issues

<insert images>

The full screen Info card provides tabs for all events and all devices.

<insert image>

View Info Events Summary

A summary of the informational events occurring in the network can be found on the small, medium, and large Info cards. Additional details are available as you increase the size of the card.

To view the summary with the small Info card, simply open the card. This card gives you a high-level view in a condensed visual, including the number and distribution of the info events along with the alarm and configuration changes that have occurred during the same time period.

To view the summary with the medium Info card, simply open the card. This card gives you the same count and distribution of info and alarm events and configuration changes, but it also provides information about the sources of the info events and enables you to view a small slice of time using the distribution charts.

  • Use the chart at the top of the card to view the various sources, types, of info events. Hover over the chart to view the number of each type. The four or so types with the most info events are called out separately, with all others collected together into an Other category. Hovering over the Other segment of the chart provides a listing of the additional types and their counts.

  • Hover on the distribution charts to view the count of info and alarm events and configuration changes during a smaller time slice. For example, in a card showing results for the last 24 hours, you can view the counts for a single hour. For a card showing results for the last week, you can view the counts for a single day . Hovering updates the data in the table. To persist the changes, click the charts.

To view the summary with the large Info card, open the card. The left side of the card provides the same capabilities as the medium Info card.

Compare Timing of Info and Alarm Events

While you can see the relative relationship between info and alarm events on the small Info card, the medium and large cards provide considerably more information. Open either of these to view individual line charts for the events. Generally, alarms have some corollary info events. For example, when a network service becomes unavailable, a critical alarm is often issued, and when the service becomes available again, an info event of severity warning is generated. For this reason, you might see some level of tracking between the info and alarm counts and distributions. Some other possible scenarios:

  • When a critical alarm is resolved, you may see a temporary increase in info events as a result.

  • When you get a burst of info events, you may see a follow-on increase in critical alarms, as the info events may have been warning you of something beginning to go wrong.

  • You set logging to debug, and a large number of info events of severity debug are seen. You would not expect to see an increase in critical alarms.

Compare Timing of Info and Alarm Events with Configuration Changes

While you can see the relative relationship between info and alarm events with configuration changes on the small Info card, the medium and large cards provide a better view. Open either the medium or large card. Focusing on the distribution charts, you can see the relationship between the info and alarm events and any configuration changes that have been made during this time period. Significant configuration changes, such as x, y, or z, are likely to create a number of info events. Smaller configuration changes, such as a, b, or c are not as likely to create a large number of info events. ( what are the config changes that we track, which category do they fit in? )

View the Source of Info Events

Using the medium or large Info card, you can view the source and count of info events by those sources. The Types of Info chart is divided into segments that represent each source type that generated info events. Hover over the segments to view the number of info events for each source type. The four or so types with the most info events are called out separately, with all others collected together into an Other category. Hovering over the Other segment of the chart provides a listing of the additional types and their counts.

View All Info Events Sorted by Time of Occurrence

You can view all info events using the large Info card. Open the large card and confirm the Events By Most Recent option is selected in the dropdown above the table on the right. When this option is selected, all of the info events are listed with the most recently occurring event at the top. Scrolling down shows you the info events that have occurred at an earlier time within the selected time period for the card. You can also resort the table to show the oldest info events at the top or based on the message if you are looking for how often a particular info event is occurring. Hover in the relevant column header and click images/download/thumbnails/9012102/image2019-1-15_11_56_17.png . (only avail on full screen?)

View Devices with the Most Info Events

By default, the list of all info events is displayed when viewing the large Info card. You can filter instead for the devices that have the most info events by selecting the Devices by Most Issues option from the dropdown above the table. To view all devices with info events, click Show All Devices to open the Device Inventory tab on the full screen Info card.

View All Events

You can view all alarms and info events using the full screen Info card. Simply open the full screen Info card, and click the All Events tab.

If you are already have the large info card open, you can open the All Events tab by clicking the See Alarms link in the bottom right corner of the card.

To return to your workbench, click images/download/thumbnails/9012102/close.png next to the title of the full screen card.

View All Devices

You can view all devices using the full screen Info card. Simply open the full screen Info card, and click the All Devices tab.

Informational Events Reference

The following table lists alarm messages organized by message type and then severity, by default. Click the column header to sort the list by that characteristic. Click images/download/attachments/9012102/image2018-12-5_17_18_48.png in any column header to toggle the sort order between A-Z and Z-A. Recommended actions suggest NetQ GUI cards, CLI commands and Cumulus Linux NCLU commands for further investigation.

Type

Trigger

Severity

Message Format

Example

 

bgp

BGP Session state changed from Failed to Established

Info

BGP session with peer @peer @peerhost @neighbor vrf @vrf session state changed from Failed to Established

BGP session with peer swp5 spine02 spine03 vrf default session state changed from Failed to Established

bgp

BGP Session state changed from Established to Failed

Info

BGP session with peer @peer @neighbor vrf @vrf state changed from Established to Failed

BGP session with peer leaf03 leaf04 vrf mgmt state changed from Established to Failed

bgp

The reset time for a BGP session changed

Info

BGP session with peer @peer @neighbor vrf @vrf reset time changed from @old_last_reset_time to @new_last_reset_time

BGP session with peer spine03 swp9: reset time changed from time to time (format of times?)

cable

The speed setting for a given port changed

Info

@ifname speed changed from @old_speed to @new_speed

swp9 speed changed from 10 to 40 ( units ?)

cable

The transceiver status for a given port changed

Info

@ifname transreceiver changed from @old_transareceiver to @new_transareceiver

swp4 transceiver changed from down to up/disabled to enabled/disconnected to connected?

cable

The vendor of a given port (transceiver?) changed

Info

@ifname vendor name changed from @old_vendor_name to @new_vendor_name

swp23 vendor name changed from Broadcom to Mellanox

cable

The part number of a given port (transceiver?) changed

Info

@ifname part number changed from @old_part_number to @new_part_number

swp7 part number changed from FP1ZZ5654002A to MSN2700-CS2F0

cable

The serial number of a given port changed

Info

@ifname serial number changed from @old_serial_number to @new_serial_number

swp4 serial number changed from 571254X1507020 to MT1552X12041

cable

The status of forward error correction (FEC) support for a given port changed

Info

@ifname supported fec changed from @old_supported_fec to @new_supported_fec

swp12 supported fec changed from supported to unsupported

swp12 supported fec changed from unsupported to supported

cable

The advertised support for FEC for a given port changed

Info

@ifname supported fec changed from @old_advertised_fec to @new_advertised_fec

swp24 supported FEC changed from advertized to not advertized (or enabled/disabled?)

cable

The FEC status for a given port changed

Info

@ifname fec changed from @old_fec to @new_fec

swp15 fec changed from disabled to enabled

clag

Backup IP address has not been specified in case the peerlink goes down

Warning

Backup IP not configured

Backup IP not configured

clag

Backup IP address is not reachable

Warning

Backup IP connectivity failed

Backup IP connectivity failed

clag

Bond has only one connection

Warning

Bond @bond is singly connected

Bond 2 is singly connected

clag

Bond state changed from up to down

Warning

Bond @bond is down

Bond 1 is down

clag

CLAG remote peer state changed from down to up

Info

Peer state changed to up

Peer state changed to up

clag

Local CLAG host state changed from down to up

Info

Clag state changed from down to up

CLAG state changed from down to up

clag

CLAG bond in Conflicted state was updated with new bonds

Info

Clag conflicted bond changed from @old_conflicted_bonds to @new_conflicted_bonds

Clag conflicted bond changed from swp7 swp8 to @swp9 swp10

clag

CLAG bond changed state from protodown to up state

Info

Clag conflicted bond changed from @old_state_protodownbond to @new_state_protodownbond

Clag conflicted bond changed from protodown to up

configdiff

Configuration file has been modified

Info

@hostname config file @type was modified

spine03 config file /etc/frr/frr.conf was modified

configdiff

Configuration file has been created

Info

@hostname config file @type was created

leaf12 config file /etc/lldp.d/README.conf was created

evpn

The type of VNI, layer 2 versus layer 3, is inconsistent across the network fabric

Warning

VNI @vni type inconsistent

VNI 12 type inconsistent

evpn

The forwarding database learning feature was enabled on the specified VNI

Warning

VNI @vni fdb learning enabled on network port

VNI 4 fdb learning enabled on network port

evpn

An incorrect VNI to VRF mapping has occurred

Warning

VNI @vni mapped to VRF @vrf1 instead of @vrf2

VNI 21 mapped to VRF mgmt instead of default

evpn

An incorrect VNI to VLAN mapping has occurred

Warning

VNI @vni mapped to VLAN @vlan1 instead of @vlan2

VNI 7 mapped to VLAN 1005 instead of 1003

evpn

A VNI was configured and moved from the down state to the up state

Info

VNI @vni state changed from down to up

VNI 36 state changed from down to up

evpn

The kernel state changed from x to y on a VNI

Info

VNI @vni kernel state changed from @old_in_kernel_state to @new_in_kernel_state"

VNI 3 kernel state changed from something to something else

evpn

A VNI state changed from not advertising all VNIs to advertising all VNIs

Info

VNI @vni vni state changed from @old_adv_all_vni_state to @new_adv_all_vni_state"

VNI 11 vni state changed from something to something else

link

Link operational state changed from down to up

Info

HostName @hostname changed state from @old_state to @new_state Interface:@ifname

HostName leaf04 changed state from down to up Interface:swp11

lldp

Local LLDP host has new neighbor information

Info

LLDP Session with host @hostname and @ifname modified fields @changed_fields

LLDP Session with host leaf02 swp6 modified fields leaf06 swp21

lldp

Local LLDP host has new peer interface name

Info

LLDP Session with host @hostname and @ifname @old_peer_ifname changed to @new_peer_ifname

LLDP Session with host spine01 and swp5 swp12 changed to port12

lldp

Local LLDP host has new peer hostname

Info

LLDP Session with host @hostname and @ifname @old_peer_hostname changed to @new_peer_hostname

LLDP Session with host leaf03 and swp2 leaf07 changed to exit01

ntp

NTP sync state has been lost or never acquired

Warning

Sync state unknown for @hostname

Sync state unknown for spine01

ntp

NTP sync state changed from not in sync to in sync

Info

Sync state changed from @old_state to @new_state for @hostname

Sync state changed from not sync to in sync for leaf06

sensor

A temperature, fan, or power supply sensor state changed from a higher severity to a lower severity

Info

Sensor @sensor state changed from @old_state to @new_state

Sensor temperature state changed from critical to low

sensor

A temperature, fan, or power supply sensor state changed from a low or medium severity to a medium or low severity

Info

Sensor @sensor state changed from @old_state to @new_state

Sensor temperature state changed from medium to low

services

A service changed state from down to up

Info

Service @name changed state from down to up

Service bgp changed state from down to up

Service lldp changed state from down to up

trace

The underlay network MTU does not provide enough headroom to encapsulate (some message?) for a given interface

Warning

Underlay mtu @mtu at @hostname:@link not enough encap headroom

Underlay mtu 9600 at leaf23:swp6 not enough encap headroom

trace

Path MTU is inconsistent along trace path

Warning

Inconsistent PMTU among paths

Inconsistent PMTU among paths

trace

Tunnel MTU is different on the incoming and outgoing interfaces

Warning

@link tunnel mtu mismatch between ingress @hostname1 and egress @hostname2

swp9 tunnel mtu mismatch between ingress spine6 and egress @leaf31

trace

Interface MTU is different on each end of the link

Warning

Link mtu mismatch between @hostname:@link1 (@mtu1) and @hostname2:@link2 (@mtu2)

Link mtu mismatch between leaf09:swp4 (9600) and Leaf10:swp5 (1500)

vlan

Interface has no peer bond on remote switch

Info

@link No peer bond

swp49 No peer bond

vlan

Interface has no CLAG peer information

Info

@link no clag peer info

swp25 no clag peer info

vlan

Interface has no CLAG peerlink information

Info

@link no clag peerlink info

@link no clag peerlink info

vlan

Peer link on a given interface is down

Info

Peer link @ifname is down

Peer link swp17 is down

vlan

CLAG peerlink for a given interface is unknown

Info

@link clag peer link unknown

swp4 clag peer link unknown