Hyper Open Edge Cloud

NMS and BBU Redundancy

Explains how NMS (SlapOS Master), BBU Main, and BBU Backup redundancy works in a network deployment, including eCPRI 7.2 /CRPI dual SFP failover, proxy instance tree config sync, and RU carrier aggregation fallback.
  • Last Update:2026-03-24
  • Version:001
  • Language:en

Overview

This document explains the redundancy architecture for the NMS (SlapOS Master), BBU (Baseband Unit), and RU (Remote Unit) layers in a network deployment based on the rapid.space vRAN stack (Amarisoft eNB/gNB + SlapOS).

The architecture uses three redundancy mechanisms that work independently:

  • NMS primary/backup with daily config sync
  • BBU Main/Backup with proxy instance tree and daily config sync
  • Dual SFP eCPRI 7.2 / CPRI links per RU — one to BBU Main, one to BBU Backup

Related documentation:

Architecture — three layers

The full stack has three layers. Each layer has its own redundancy mechanism.

Layer Primary component Backup component Sync frequency
NMS SlapOS Master (primary) SlapOS Master (backup server) Daily
BBU BBU Main — SlapOS Node (active) BBU Backup — SlapOS Node (standby, proxy instance tree) Daily
RU SFP-A → BBU Main (active eCPRI 7.2/CPRI) SFP-B → BBU Backup (standby eCPRI 7.2/CPRI) Instant takeover

Key property: SlapOS Node on each BBU operates independently. If the SlapOS Master (NMS) loses connectivity, the Amarisoft eNB/gNB keeps running without interruption. Radio service is never dependent on NMS availability.

NMS redundancy — SlapOS Master primary and backup

The NMS is implemented as SlapOS Master running on a dedicated server (on-premises at OCC/BOCC or hosted at panel.rapid.space). A second physical server runs a backup SlapOS Master.

Normal operation

  • SlapOS Master primary handles all Panel UI, instance tree management, alarm aggregation, and config deployment to BBU SlapOS Nodes.
  • The backup server receives a daily configuration sync from the primary.
  • All BBU SlapOS Nodes are registered and connected to the primary.

On primary NMS failure

  1. The backup SlapOS Master is promoted to primary.
  2. BBU SlapOS Nodes reconnect to the backup master.
  3. Radio services on all BBUs continue uninterrupted throughout — BBU Nodes never stop running the Amarisoft stack during NMS failover.
  4. Panel UI, alarms, and config management resume from the backup master.

Important: The daily sync covers configuration and node registration. Events and alarms generated between the last sync and the failure are lost. This is acceptable for NMS failover — radio continuity is not affected.

BBU redundancy — SlapOS Node with proxy instance tree

Each cell site has two BBU servers: BBU Main (active) and BBU Backup (standby). Both run SlapOS Node managed by SlapOS Master.

Instance trees on BBU Main (active)

  • BBUXX-enb or BBUXX-gnb: Amarisoft LTE eNB or NR gNB (only one active at a time)
  • BBUXX-core-network: local EPC/5GC or connection to external 5GC
  • BBUXX-conference: Galene PTT video conference
  • BBUXX-mail-server: DeltaChat mail server
  • BBUXX-health: SlapOS health monitoring promises (CPU temperature, process status, disk, RAM, network)
  • BBUXX-enb.cell, BBUXX-enb.ru, BBUXX-enb.x2peer: cell, RU, and peer configurations

Proxy instance tree on BBU Backup (standby)

  • BBU Backup runs a proxy instance tree that mirrors all configuration from BBU Main.
  • Config is synced daily from BBU Main to BBU Backup via SlapOS Master.
  • Under normal operation, no Amarisoft services are started on BBU Backup — it consumes no radio resources.

On BBU Main failure or eCPRI/CPRI cable failure

  1. SlapOS Master detects BBU Main health promises turning red (check_cpu_temperature, check_cpu_load, service promises failing).
  2. Operator initiates takeover: on BBU Backup, click Request Start on the enb/gnb instance tree.
  3. BBU Backup starts Amarisoft eNB/gNB using the last synced configuration.
  4. BBU Backup activates its eCPRI 7.2/CPRI links to RU1 and RU2 via SFP-B ports.
  5. RUs re-attach to BBU Backup and resume radio service.

Cable failure is treated identically to BBU Main failure: if the eCPRI/CPRI fiber between BBU Main and the RUs is cut, SFP-A loses lock on both RUs. BBU Backup's SFP-B connections are on a separate fiber path (separate OFC cable routing), so the takeover path is unaffected by the cable failure.

Configuration gap: BBU Backup uses the last daily sync. Any parameter changes made to BBU Main after the last sync (eNB ID, AMF list, PLMN, frequency, TX gain, etc.) must be re-applied on BBU Backup after takeover. Keep the daily sync schedule and update BBU Backup parameters immediately after any change on BBU Main.

RU redundancy — dual SFP and carrier aggregation

Each RU has two SFP optical ports on its eCPRI 7.2/CPRI fronthaul interface:

  • SFP-A: connects to BBU Main via dedicated optical fiber
  • SFP-B: connects to BBU Backup via a separate optical fiber path (different OFC cable, opposite track side where possible)

Each BBU (Main and Backup) connects to at least two RUs — RU1 and RU2. RU2 is the redundancy of RU1 and vice versa.

Normal operation — Carrier Aggregation over full spectrum

  • BBU Main controls both RU1 (via SFP-A) and RU2 (via SFP-A).
  • RU1 and RU2 operate as a Carrier Aggregation (CA) pair, delivering the full configured spectrum to trains in the cell.
  • SFP-B links on both RUs are connected to BBU Backup but are standby — BBU Backup does not run Amarisoft in this state.

On RU failure (one RU offline)

  • SlapOS promise alarm triggers: RU*_cpri/ecpri_lock turns red on the failed RU.
  • The surviving RU continues operating with BBU Main via SFP-A.
  • Coverage continues at half spectrum (CA is disabled, single RU only).
  • No BBU takeover is needed — BBU Main is still healthy.

On BBU Main failure or eCPRI/CPRI cable cut

  • RU*_cpri/ecpri_lock alarms turn red on both RU1 and RU2 (SFP-A links lost).
  • Takeover to BBU Backup as described in the BBU redundancy section above.
  • BBU Backup activates SFP-B links to both RU1 and RU2.
  • CA resumes over full spectrum once both RUs re-attach to BBU Backup.

RU monitoring promises

SlapOS monitors each RU automatically through the BBUXX-enb instance tree. Promises are visible in the Panel via the monitor-setup-url connection parameter. Status is green (OK), orange (warning / no data), or red (error).

Promise What it monitors
RU*_cpri/ecpri_lock cpri/eCPRI hardware lock (HW) and software frame sync (SW)
RU*_netconf_connection NETCONF session between BBU and RU (over IPv6)
RU*_config_log NETCONF configuration push success
RU*_sync Frame synchronization status
RU*_lof Loss of frame
RU*_firmware Firmware version verification (SFTP from BBU)
RU*_pa_output_power PA output power alarm
RU*_pa_current PA overcurrent alarm
RU*_rssi RSSI imbalance / RX diversity loss
RU*_rx_saturated RX antenna saturation
RU*_vswr VSWR antenna alarm (check antenna connection)
RU*_stats_log NETCONF statistics subscription
amarisoft_stats_log Amarisoft eNB/gNB stats collection

If RU*_cpri/ecpri_lock shows "HW Lock is missing", check the physical eCPRI/CPRI fiber connection. If "SW Lock is missing", check frame synchronization. If RU*_netconf_connection fails, verify that the RU has received an IPv6 address from the BBU.

See How To Monitor and Access Logs on ORS for full promise troubleshooting.

Failure scenario summary

Failure Detection Response Result
One RU offline RU*_cpri_lock red on failed RU No action needed — surviving RU stays active Half spectrum · service continues
BBU Main hardware failure BBUXX-health promises red · compute node unreachable Start enb/gnb on BBU Backup · SFP-B links activate Full spectrum restored via BBU Backup
eCPRI/CPRI fiber cut (BBU Main side) RU*_cpri_lock red on both RU1 and RU2 Same as BBU Main failure — takeover to BBU Backup · SFP-B unaffected Full spectrum restored via BBU Backup
SlapOS Master primary failure Panel unreachable · BBU Nodes continue independently Promote NMS backup server · BBU Nodes reconnect Radio uninterrupted · NMS management restored

Daily sync checklist

The daily sync is the foundation of the BBU and NMS redundancy. Verify the following after any configuration change:

  • After changing any parameter on BBU Main (eNB ID, AMF/MME list, PLMN, frequency, bandwidth, TX gain, antenna count, carrier activation), apply the same change on BBU Backup immediately — do not wait for the daily sync.
  • After adding a new RU instance tree (ENB.RU) on BBU Main, create the corresponding ENB.RU on BBU Backup with SFP-B port assignment.
  • After adding a new Cell (ENB.CELL) on BBU Main, create the same cell on BBU Backup.
  • Verify the NMS backup server sync completed successfully each day — check the backup server's Panel for any sync errors.
  • Perform a quarterly takeover drill: start enb on BBU Backup with SFP-B links active, verify RUs attach and CA resumes, then restore normal operation.

References