Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Anchor
About Automatic Failover
About Automatic Failover
Anchor
bookmark82
bookmark82
You can create a NetMRI failover pair using two NetMRI appliances, in which one acts as the primary appliance and the other as the secondary appliance. A failover pair provides a backup or redundant operational mode between the primary and secondary appliances so you can greatly reduce service downtime when one of them is out of service. You can configure two Operation Center (OC) appliances, collector appliances, or standalone appliances to form a failover pair.

...

5. For an Operation Center and collector failover, complete the following:

    • Log in to the Admin Shell administrative shell on the Operation Center and run the configure tunserver command. Enter the VIP address of the Operation Center when prompted for the IP address of the Operation Center server.
    • To register collector on the Operation Center set up, log in to the Admin Shell administrative shell on each Collector and run the register tunclient command. Enter the VIP address of the Operation Center when prompted for the IP address of the Operation Center.

...

The steps required to migrate existing systems as failover pairs depend on whether your appliances use the old or new partition scheme. If your appliances use the old partition scheme, you need to additionally prepare them. To determine what partition scheme an appliance has, run the show diskusage command from the Admin Shell administrative shell and search for the “/drbd0” substring. If the substring is present, the appliance runs with the new scheme.

...

Note
titleNote
For the new partition scheme, configure three nodes—one with an OC license and two others with Stand Alone licenses—with the same version and licenses, and reset the admin password in the GUI to match the other system. Then proceed with the last three steps from the list above.

Preparing Secondary Appliances (Old Partition Scheme)

To prepare a secondary (B) device, complete the following:

...

  1. Follow the steps to prepare node B for HA OC as described in the section above, Preparing Secondary Appliances (Old Partition Scheme).
  2. On To disable SNMP collection on node A, disable SNMP collection. Go to Settings -> Setup -> go to the Settings icon > Setup > Collection and Groups -> Global ->  > Global > Network Polling and deselect then clear the SNMP Collection checkbox.
  3. Generate a database archive of node A and restore it on node B. For more information, see NetMRI Database Management.
    • If data is restored successfully, proceed to the next step.
    • If the restore failed due to disk space exhaustion, try reducing data retention settings on your existing NetMRI system to reduce the archive size. It might take up to 24 hours for reduced data retention settings to take effect. For more information, see Data Retention or contact Infoblox Support for further assistance.
  4. Run the configure server command on node B.
  5. Run the config tunserver command with the new server IP (IP of node B).
  6. ReTo re-enable SNMP collection after restoring the archive on node B. Go to Settings -> Setup -> , go to the Settings icon > Setup > Collection and Groups -> Global ->  > Global > Network Polling and then select the SNMP Collection checkbox.
  7. Log in to the Admin Shell administrative shell on node A, enter the reset system command, and then enter the repartition command.
  8. After repartitioning is complete, run the configure server command on node A.
  9. Install the license.
  10. Reset the admin password in the GUI NetMRI UI to match the other system.

The two nodes are now ready for failover configuration where the primary will be node is node B with all OC data , but without connected collectors at this point.

...

  1. Follow the steps to prepare node B for HA Collector as described in the section above, Preparing Secondary Appliances (Old Partition Scheme).
  2. Log in to the Admin Shell administrative shell on the existing node, enter the reset system command, and then enter the repartition command.
  3. After repartitioning is complete, run the configure server command.
  4. Install the license that should be identical.
  5. Reset the admin password in the GUI to match the other system.
  6. Log in to the Admin Shell administrative shell on node A, enter the config tunclient command and connect it to the node B OC.
  7. You now have two nodes ready for failover configuration where any of them can be primary.
  8. If you want to make node B primary, complete the following:
    • Log in to the NetMRI UI of node B (primary) OC and then go to to the Settings - icon Setup ->  > Tunnels and Collectors.
    • Choose the existing collector (A), select Collector Replacement, and then insert the Serial Number of node B.
    • Log in to the Admin Shell administrative shell on node B, enter run the config tunclient command, and then connect it to node B OC (primary).

...

  1. Log in to the Operation Center UI as admin.
  2. Go to to the Settings - icon Setup ->  > Failover Configuration. Here you can see the HA status of your devices scheme (OC and 2 collectors).
  3. Choose OC ->  > Edit and configure the Operation Center HA pair. Wait until it is finished and the status is OK.
  4. Choose the first collector -Edit and configure the first collector HA pair. Wait until it is finished and the status is OK.
  5. Choose the second collector -Edit and configure the second collector HA pair. Wait until it is finished and the status is OK.

...

  1. Log in to the Operation Center CLI as admin.
  2. Run the reset tunserver command:

    oc (primary)> reset tunserver

    Notice: This operation will clear all Tunnel CA, server, and client

    configuration and shut down the Tunnel service.

    Continue? (y/n) [n]: y

    +++ Stopping OpenVPN Server ... OK

    +++ Configuring OpenVPN Service ... OK

    +++ Clearing Server Config ...OK

    +++ Clearing CA Config ... OK

    Launching "failover tunserver reset" on "172.19.2.59"...

    The server needs to be restarted for these changes to take effect.

    Do you wish to restart the server now? (y/n) [y]: y

    +++ Restarting Server ... OK

  3. Run the configure tunserver command and configure it with OC VIP address (Server Public Virtual Name or VIP address):

    oc (primary)> config tunserver

    +++ Configuring CA Settings

    CA key expiry in days [5475]:

    CA key size in bits [2048]:

    +++ Configuring Server Settings

    Server key expiry in days [5475]:

    Server key size in bits [2048]:

    Server Public Virtual Name or VIP address [172.19.2.66]: <- By default it will be already oc VIP

    Select tunnnel IP protocol. 'udp' recommended for better performance.

    Protocol (udp, udp6, tcp) [udp]:

    Tunnel network /24 base [169.254.50.0]:

    Block cipher:

    0. None (RSA auth)

    1. Blowfish-CBC

    2. AES-128-CBC

    3. Triple DES

    4. AES-256-CBC

    Enter Choice [2]:

    Use compression [y]:

    Use these settings? (y/n) [n]: y

    +++ Initializing CA (may take a minute) ...

    +++ Creating Server Params and Keypair ...

    Generating DH parameters, 2048 bit long safe prime, generator 2

    This is going to take a long time

...

Note
titleNote

If devices were discovered from both collectors, e.g. when a device has a few interfaces, these devices are displayed in grey without a sim link to the device viewer on the collector which did not discover first. After the system is migrated, the following issue is observed for one of the collectors (second): In On the Network Explorer -> Discovery > Discovery page, the devices listed on the left from the initial collection and discovered by both collectors cannot be discovered or deleted using the Discover Now or Delete button. However, you can discover them from the  Settings - icon Discovery Settings - >  > Seed Routers/CIDR.

Anchor
Manually Initiating Failover
Manually Initiating Failover
 
Anchor
bookmark86
bookmark86
Manually Initiating Failover

...

  1. Log in to the primary system using your username and password.
  2. Go to the Settings - icon Setup ->  > Failover Configuration tab.
  3. In On the Failover Configuration page, click Become Secondary.

To initiate a manual failover using the NetMRI Admin Shelladministrative shell, perform one of the following:

  • Log in to the Admin Shell administrative shell on the primary system and enter the failover role secondary command, and then click Enter.
  • Log in to the Admin Shell administrative shell on the secondary system and enter the failover role primary command, and then click Enter.

...

To monitor the current status of the failover pair, complete the following:

  1. Go to the Settings - icon Setup ->  > Failover Configuration tab.
    The Failover Configuration page appears, listing all device interfaces that are used by the system.
  2. In On the Failover Configuration page, the Status field displays the current status of the failover pair. The current status can be one of the following:

...

To view configuration details of the Operation Center (OC) and Collector pair:

  • Go to the Settings-> icon > Setup -> Failover Configuration tab.

...

Note
titleNote

For an OC collector set upsetup, the first row of the Failover Configuration page displays the OC pair information and the other rows display the collector pair information.

...

In a failover pair, although the scan interfaces are enabled only on the primary system, the scan interface configurations are replicated on both the systems. When the primary fails, the secondary activates its scan interfaces (physical and virtual) using the same IP configurations. Both the primary and the secondary can access the network using the same scan interface configurations. After a failover, the NetMRI appliance continues to interact with the devices using the same scan interfaces.

...

Anchor
Resolving Split Brain Issues
Resolving Split Brain Issues
Anchor
bookmark90
bookmark90
Resolving Split Brain Issues

Generally speaking, "Split Brain" split-brain is a term used to describe the undesirable state in which both members of a failover pair act as primary at the same time. This is a rare situation which that can occur when both the systems are up and running, but the systems completely disconnect from one another on both the MGMT and HA ports at the same time due to a network outage or a cable mishap. Split-brain can also occur due to an error in the failover software. In this case, the secondary system assumes that the primary system has failed and takes on the primary role. The primary system, which does not have any contact with the secondary system continues to perform as the primary system. Having two primary systems introduces issues such as VIP contention and duplication of data.

To detect a split-brain issue, complete the following:

  1. When the "Lost connectivity via peer replication link" alert occurs, run failover status on both members.
  2. Check the failover status: If the failover is enabled and both nodes are in the primary mode, you have the split-brain situation.

For example:

...

To resolve a Split Brain issue using the NetMRI Admin Shelladministrative shell, complete the following:

  1. Use a terminal program to connect to the management IP address of the victim system.
  2. Log in to the Admin Shell administrative shell using your username and password.
  3. At the Admin Shell administrative shell prompt, enter the failover role secondary command, and then click Enter.