Hyper Open Edge Cloud

NMS: How To Handle Faults and Alarms

Handling Faults and Alarms
  • Last Update:2024-10-16
  • Version:001
  • Language:en

Agenda

  • Acess Monitor and Logs
  • Alarms Explanation
  • Examples

This lecture is designed to help you understand how to use SlapOS Master for managing Baseband Units (BBUs) and Remote Units (RUs) via the Network Management System (NMS) integrated with the ors-amarisoft software release.

 

Acess Monitor and Logs

Click monitor-setup-url from bbu0-health and bbu0-ENB

 

Acess Monitor and Logs

Click monitor-setup-url from bbu0-health and bbu0-ENB

 

Alarms Explanation

Here is a thorough explanation about some promises in Error, especially for the case of RUs: 

Promise Promise Source Code Common Causes Solution
RU*_config_log check_lopcomm_config_log.py Netconf connection lost; cu_config.xml improperly configured (e.g., out-of-range frequencies). Refer to RU*-config.log. Check CPRI locks and ensure the RU gets an IPv6 address. Adjust user input if needed. Contact Rapid.Space for help if needed.
RU*_cpri_lock check_cpri_lock.py Hardware issues (disconnected hardware, unplugged cables); software issues (failed frame synchronization). For "HW Lock is missing," check physical connection. For "SW Lock is missing," contact Rapid.Space.
RU*_firmware Unavailable The RU is running unverified firmware. Provide the correct SSH key to enable firmware download and upgrade from the BBU.
RU*_lof check_lopcomm_lof.py Loss of frame. Same steps as for RU*_cpri_lock when "SW Lock is missing."
RU*_netconf_connection Unavailable Netconf connection lost. Check CPRI locks and ensure the RU gets an IPv6 address from the BBU.
RU*_netconf_socket Unavailable Netconf connection lost; RU not listening for Netconf. Check CPRI locks and ensure the RU gets an IPv6 address from the BBU.
RU*_pa_current check_lopcomm_pa_current.py RU's PA over current. Contact Rapid.Space support.
RU*_pa_output_power check_lopcomm_pa_output_power.py RU's PA Over Output Power. Contact Rapid.Space support.
RU*_rssi test_check_lopcomm_rssi.py RU's RSSI imbalance; RX diversity lost. - Check connection in between RU and antenna - Check if TX/RX are corrected connected without mistake. Contact Rapid.Space support.
RU*_rx_saturated check_rx_saturated.py RU's RX antennas saturated. Check if there are other RU emitting in the same frequency causing interference. Lower RX gain. Contact Rapid.Space support.
RU*_sdr_busy check_sdr_busy.py ENB doesn't properly use the CPRI card. Refer to enb-output.log. Check trx_sdr kernel initialization, CPRI card use, SPF port, or LTEENB license. Make sure there is only one LTEENB process running (for instance there must be only one enb or gnb service started). Check GPS is properly working. Contact Rapid.Space for assistance.
RU*_stats_log check_lopcomm_stats_log.py Netconf connection lost; subscription for notification from RU failed. Refer to RU*-stats.log. Check CPRI locks and ensure the RU gets an IPv6 address from the BBU. Contact Rapid.Space for assistance if needed.
RU*_sync check_lopcomm_sync.py Similar to RU*_cpri_lock. Same as RU*_cpri_lock solution.
RU*_vswr check_lopcomm_vswr.py RU's VSWR alarm. Ensure antennas are connected. Reboot the RU. Contact Lopcomm for further help if needed.
amarisoft_stats_log check_amarisoft_stats_log.py ENB doesn't properly use the CPRI card. Refer to enb-output.log. Check trx_sdr kernel initialization, CPRI card use, SPF port, or LTEENB license. Contact Rapid.Space for assistance.
buildout_slappart*_status Unavailable Fault in the software's buildout. Contact Rapid.Space for a patch.
check_baseband_latency check_baseband_latency.py Insufficient processing time for LTEENB due to other processes consuming too much CPU on the server (ORS). Identify and resolve disturbing process. If you can't access the server, contact Rapid.Space for assistance.
check_monitor_frontend_password Unavailable monitor-setup-url with username and password cannot be accessed. Ensure the server is online. Contact Rapid.Space for assistance.
monitor_bootstrap_status monitor_bootstrap_status.py Fault in the software or request parameters. Contact Rapid.Space for help.
sshd Unavailable sshd on BBU for RU to download firmware is unavailable. Contact Rapid.Space for help.
monitor_httpd_listening_on_tcp Unavailable Server's IPv6 is not accessible. Check the server's connection.
monitor_http_frontend Unavailable Monitor frontend URL is not ready. Check the frontend server.
check_cpu_temperature check_cpu_temperature.py CPU temperature too high. Check the device's environment.
check_cpu_load check_server_cpu_load.py CPU overload. Check running processes on the server.
check_free_disk_space check_free_disk_space.py Insufficient disk space on the server. Free up space if you have access. Otherwise, contact Rapid.Space for help.
check_network_errors check_network_errors_packets.py Network packet loss. Check the server's connection.
check_partition_space Unavailable Server's IPv6 is not accessible. Check the server's partition usage. Contact Rapid.Space for help.
check_ram_usage check_ram_usage.py High RAM usage. Check the server's RAM usage. Contact Rapid.Space for help if needed.
check_re6stnet_certificate Unavailable Re6stnet certificate expired. Contact Rapid.Space for help.
check_network_transit check_network_transit.py Network congestion. Check the server's connection. Contact Rapid.Space for help if needed.
check_disk_space Unavailable Same as check_free_disk_space. Same as check_free_disk_space.

Example: enb doesn't start

RU*_sdr_busy
Promise source codecheck_sdr_busy.py
Common causes: ENB doesn't properly use the CPRI card.
Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Possible issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

As indicated in the ticket, the error is "sdr_busy". We need to identify the cause.

Check the enb-output.log:

[2024/09/25 11:21:21.718190745] Starting eNB software...
/srv/slapgrid/slappart19/etc/enb.cfg:648: cell_id 11 is already used by another cell
Base Station version 2024-03-15, Copyright (C) 2012-2024 Amarisoft
This software is licensed to rapid.space.
Support and software update available until 2025-03-25.

The log shows an issue with the cell_id. Let's verify the cell_id in BBU.ENB.CELL. The input error stems from the fact that the panel expects the cell_id to begin with "0x". After correcting the cell_id, check the enb-output.log again, and you should see the eNB starting properly.

 

 

 

Example: enb doesn't start

RU*_sdr_busy
Promise source codecheck_sdr_busy.py
Common causes: ENB doesn't properly use the CPRI card.
Solution: Refer to the related enb-output.log (see "Access ORS log" section) in the private log to debug the causes. Possible issues include uninitialized trx_sdr kernel, CPRI card in use by another process, incorrect SPF port, or missing license to launch LTEENB. Contact Rapid.Space for assistance if needed.

As indicated in the ticket, the error is "sdr_busy". We need to identify the cause.

Check the enb-output.log:

[2024/09/25 15:57:01.694437726 ] Starting eNB software...

[2024-09-25 15:57:07.496] gtp: bind: Cannot assign requested address
Could not open GTP-U socket
Base Station version 2024-03-15, Copyright (C) 2012-2024 Amarisoft
This software is licensed to rapid.space.
Support and software update available until 2025-03-25.

RF0: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4
RF1: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4
RF2: sample_rate=30.720 MHz dl_freq=1865.200 MHz ul_freq=1770.200 MHz (band 3) dl_ant=4 ul_ant=4

The log shows an issue with the gtp_address. Let's verify the gtp_address in BBU.ENB instance tree. gtp_address is supposed to be a connected address of BBU. After correcting the gtp_address, check the enb-output.log again, and you should see the eNB starting properly.

 

 

 

Example: RU not connected

RU*_cpri_lock

Promise source codecheck_cpri_lock.py

Common causes: Hardware issues (e.g., disconnected hardware, unplugged cables, RU powered off); software issues (e.g., failed frame synchronization).

Solution: If "HW Lock is missing", check the physical connection between the RU and BBU. If "SW Lock is missing", which rarely happens, it indicates frame loss. Contact Rapid.Space for assistance.

This is a common failure when the RU is not properly connected. Verify the physical connection of the RU to ensure it turns green.