Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

System Health features for the NetMRI Operations Center environment will list all issues associated with the Operations Center appliance and for all of its associated Collectors.
All reported issues are the same for all alerts described in the previous topics; the main difference is that the System Health feature applies globally to all appliances and virtual appliances within the distributed Operations Center environment.

Note
titleNote

...

Every message listed in the System Health page provides an Alert Code, similar to the following:

SOFT001

If you need to communicate with Customer Support for an issue, ensure that you provide this code to the support representative.

In this section, you will find descriptions for all alerts in the System Health category, descriptions of possible causes for the issue, and potential fixes for each alert.

...

Banner System Health messages appear only in yellow (Warning) and red (Critical). Click directly on the banner text to display the System Health page with its alert listings.
You may disallow the System Health banners from appearing to non-Admin NetMRI users, by opening the Settings  icon > General Settings –> > Advanced Settings page and choosing the Hide the system banners from non-admin users setting. (It is on the last page of Advanced Settings, under User Administration.) Click the Action icon and choose Edit, choose Yes and click OK.

...

Health Alert Category

Alert Messages

Description

Hardware (see Details on Hardware Alerts for more information)

RAID Drive <X> Failed.

RAID Array Failed.

Fan <X> Failed.

Power Supply <X> Failed.

High Ambient Temperature.

High Internal Temperature.

RAID Battery Failed.

RAID Array Failed.

This category applies only to hardware-based NetMRI systems and will not appear for virtual machine-based NetMRI instances.
RAID messages apply only to appliances that directly support RAID, including the NT-2200 and NT-4000 models.
NetMRI 1102-A models do not support hardware monitoring alerts.
NT-1400 and NT-2200 systems do not report Ambient Temperatures.
Double-clicking any hardware Issue that appears in this category opens the Settings –>  icon > Notifications –> > Hardware Status page.

Network (see Details on Network Alerts for more information)

High rate of network errors on MGMT port.

Network link down on MGMT port.

High rate of network errors on SCAN port.

Network link down on SCAN port.

General network connectivity issues on the NetMRI appliance.

Errors related to sending jumbo frames are excluded from the triggers of the alert messages "NETW000: High number of network errors on management port" and "NETW001: High number of network errors on SCAN port".

Platform Capacity (see Details on Network Alerts for more information)

Number of interfaces <count> exceeds Platform Interface Limit of <limit>.
Number of end hosts <count> exceeds Platform SPM End Host Limit of <limit>.
Number of devices <count> exceeds Platform Total Device Limit of <limit>.

Reflects issues where the current level of discovered network devices, interfaces or end hosts is exceeding the platform limits for the appliance. Does not apply to licensed limits. Platform limit values can be located in on the Settings icon –> > Setup –> > Settings Summary page.

Processing (see Details on Processing Alerts for more information)

Processing Capacity is being exceeded.

Processing Alerts reflect Issues where the system processing capacity is being exceeded in the current system configuration.

Software (see Details on Software Alerts for more information)

A software problem was detected.
A software problem was detected during Weekly Maintenance.

In all cases, contact Customer Support for assistance.

Storage (see Details on Storage Alerts for more information)

Low on disk space

Critically low on disk space

Cannot Connect to remote archive storage

Could not save archive to remote storage  <hostname>

Disk <X> Failed.

Low on Disk Space indicates that System Health recommends preventive action to increase available disk space in the appliance.
Critically Low on Disk Space indicates an impending failure due to insufficient disk space.

Collector Connectivity (see Details on Operation Center Collector Alerts for more information)

Connection to Collector <X> lost. Collector <X> Reset.
Collector <X> is Rebooting.

Issues associated with collector reachability and connectivity in an Operation Center deployment.

Configuration (see Details on Configuration Alerts for more information)

New unassigned VRF discovered.

Warning notification that a VRF network has been discovered and should be placed into a network view by the administrator.

...

Platform Capacity alerts do not necessarily reflect a problem in the NetMRI system. Each NetMRI appliance has an advisory limit in the number of discovered interfaces, discovered devices and discovered end host devices that it is expected to support, based on disk space and system processing capabilities inherent in the appliance model. These values are called the Platform Capacity and are also reflected in the NetMRI Configuration values shown under on the Settings icon –> > Setup –> > Settings Summary page.
Unlike other System Alert categories, Platform Capacity warnings will always appear when all three of the advisory system limits (Number of managed interfaces, Number of end hosts devices, number of discovered devices) are exceeded by the appliance. Note that the Processing category (also see Details on Processing Alerts) provides the same three warnings (along with others) in its alerts category. When any of these three limits is violated as the result of a processing issue, one of the Platform Capacity warnings also will appear in the notification. These limits are not enforced and the NetMRI appliance operates normally; excess devices continues to appear in the Discovered Devices table. (For related information, see Understanding Platform Limits, Licensing Limits and Effective Limits.)

...

Double-clicking on any hardware alert opens the alert in on the Settings –>  icon > Notifications –>  > Hardware Status page

Alert Message

User Action

RAID Drive <X> Failed

Replace the hard disk with a replacement drive authorized by Infoblox.

RAID Array Failed

Contact Customer Support.

Fax <X> Failed

Replace the system fan. Appears only in systems where system fans are user-replaceable, as with the NetMRI NT-2200 and NT-4000 devices. Fan assemblies must be replaced with authorized Infoblox parts. Contact Customer Support if this message appears in systems where fans are not user-replaceable.

Power Supply <X> Failed

Check Power Supply operation. Message appears only for systems in which a redundant 1+1 power supply configuration is available and running in the device in question. (For a single-power-supply system, the appliance simply shuts down.) The alerts also allow for the possibility that a power supply is unplugged.

Ambient temperature is high. Internal temperature is high.

Both messages may appear for the same system, with the internal temperature being affected by the ambient temperature. Reduce the ambient temperature where possible; if the Internal temperature remains high, look for a Fan Failed error message along with the Internal Temperature message. Contact Customer Support if an Internal Temperature is High issue persists when conditions are otherwise optimal.

Critical — RAID Battery failed.

Contact Customer Support.

RAID Array Degraded.

The RAID array is not fully operational due to a disk in the process of rebuilding or a disk being removed. If a disk has been removed in preparation for replacement, this issue will also appear, and will clear when the replacement is finished rebuilding. If you know that no disk replacement operation has been started with the appliance and this issue appears, contact Customer Support.

...

Alert Message

User Action

System Processing capacity is being exceeded.

A number of causes may contribute to processing slowdowns on the appliance.
Some processing warnings reflect higher quantities of various network entities than can be supported by the hardware platform:

  • Number of interfaces <count> exceeds the recommended capacity of <limit>.
    Solution: Consider reducing discovery ranges.
  • Number of end hosts <count> exceeds the recommended capacity of <limit>.
    Solution: Consider reducing discovery ranges.
  • Number of devices <count> exceeds the recommended capacity of <limit>.
    Solution: Consider reducing discovery ranges.
    If any one of these three processing warnings appearappears, a Platform Capacity message of the same type also appears.
    Other processing warnings include the following:
  • Policy Rule deployment exceeds the recommended limit of <X>. Solution: Reduce the number of deployed Policy rules.
  • Executed jobs exceed the recommended limit of <X> per 24 hours. Solution: Reduce the number of scripted Jobs that execute over a 24-hour period.
    The following messages are enforced on current platforms and will appear on appliances only when a) a Processing Capacity alert is present; b) that are over-provisioned with discovered devices beyond the licensed limit:
  • Number of Licensed Devices exceeds licensed platform limit of <X> devices.
    Solution: un-license some network devices.
    Appliances cannot have more licenses in-use than the number of installed licenses; appliances can have more installed licenses than the maximum allowed if the appliances are grandfathered in from older deployments with higher licensed levels. These messages only appear if the number of licenses exceeds the maximum number of licenses allowed for the hardware platform. For more information, see Understanding Platform Limits, Licensing Limits and Effective Limits .

...