NetNumen U31 R22 (12.13.10P02) Routine Maintenance Guide
NetNumen U31 R22 (12.13.10P02) Routine Maintenance Guide
NetNumen U31 R22 (12.13.10P02) Routine Maintenance Guide
ZTE CORPORATION
No. 55, Hi-tech Road South, ShenZhen, P.R.China
Postcode: 518057
Tel: +86-755-26771900
Fax: +86-755-26770801
URL: http://ensupport.zte.com.cn
E-mail: [email protected]
LEGAL INFORMATION
Copyright 2013 ZTE CORPORATION.
The contents of this document are protected by copyright laws and international treaties. Any reproduction or
distribution of this document or any portion of this document, in any form by any means, without the prior written
consent of ZTE CORPORATION is prohibited.
Revision History
Revision No.
Revision Date
Revision Reason
R1.0
20130820
First edition
SJ-20130731095208-010|20130820(R1.0)
Contents
About This Manual ......................................................................................... I
Chapter 1 Maintenance Overview ............................................................. 1-1
1.1 Equipment Maintenance ..................................................................................... 1-1
1.2 Maintenance Items ............................................................................................. 1-1
1.3 Routine Maintenance Flow.................................................................................. 1-2
1.4 Routine Maintenance Precautions ....................................................................... 1-4
Figures............................................................................................................. I
Glossary ........................................................................................................ III
I
SJ-20130731095208-010|20130820(R1.0)
II
SJ-20130731095208-010|20130820(R1.0)
Intended Audience
This manual is intended for:
l
l
Maintenance engineers
Network monitoring engineers
Summary
1, Maintenance Overview
2, Daily Maintenance
3, Weekly Maintenance
4, Monthly Maintenance
5, Annual Maintenance
Conventions
This manual uses the following typographical conventions:
Typeface
Meaning
Italics
Variables in commands. It may also refer to other related manuals and documents.
Bold
Menus, menu options, function names, input fields, option button names, check boxes,
drop-down lists, dialog box names, window names, parameters, and commands.
Constant
Text that you type, program codes, filenames, directory names, and function names.
width
|
I
SJ-20130731095208-010|20130820(R1.0)
II
SJ-20130731095208-010|20130820(R1.0)
Chapter 1
Maintenance Overview
Table of Contents
Equipment Maintenance.............................................................................................1-1
Maintenance Items .....................................................................................................1-1
Routine Maintenance Flow .........................................................................................1-2
Routine Maintenance Precautions ..............................................................................1-4
Routine Maintenance
By maintenance period, routine maintenance is further divided into daily, weekly,
monthly and annual maintenance. Through routine maintenance, maintenance
personnel can learn about the system operation status, detect potential risks, prevent
accidents and troubleshoot faults quickly.
Troubleshooting
Troubleshooting means to analyze and correct a fault after the fault is reported.
Maintenance Items
Daily maintenance
SJ-20130731095208-010|20130820(R1.0)
Maintenance Period
Maintenance Items
Checking the time zone and time of the operating system.
Weekly maintenance
Monthly maintenance
Annual maintenance
1-2
SJ-20130731095208-010|20130820(R1.0)
Flow Description
1. Making routine maintenance schedules
Complying with the cycle of each maintenance item, maintenance personnel make
routine maintenance schedules of the next year at the end of this year.
2. Checking routine maintenance items
Maintenance personnel check each routine maintenance item described in this manual
at a proper time.
Note:
It is recommended to check items during non-peak hours, for example, 02:00 to 06:00.
3. Troubleshooting
If a fault occurs, maintenance personnel take different measures based on symptom
severity and urgency by the following troubleshooting steps:
a. If the fault affects services. For example, some services cannot be implemented,
handle the fault immediately because it is urgent and critical.
1-3
SJ-20130731095208-010|20130820(R1.0)
b. If the fault affects operation and maintenance rather than services. For example,
new service commissioning failures or configuration failures, handle the fault in
accordance with the general troubleshooting flow because it is critical but not
urgent.
c.
If some alarms are raised, but services, operation, and maintenance are not
affected, pay attention to the fault, and handle the fault in accordance with the
general troubleshooting flow.
If a fault occurs, maintenance personnel handle the fault in accordance with the general
troubleshooting flow. If the fault persists, contact ZTE technical support. Some typical
troubleshooting methods are described below:
l
l
l
l
l
SJ-20130731095208-010|20130820(R1.0)
l
l
l
l
l
l
1-5
SJ-20130731095208-010|20130820(R1.0)
1-6
SJ-20130731095208-010|20130820(R1.0)
Chapter 2
Daily Maintenance
Table of Contents
Checking NE Statuses and Link Statuses...................................................................2-1
Checking Existing Alarms...........................................................................................2-2
Checking CPU Usage and Memory Usage .................................................................2-3
Checking the Operating Status of the NMS ................................................................2-4
Checking the Disk Array Operating State ...................................................................2-5
Checking the Time Zone and Time of the Operating System ......................................2-5
Steps
1. In the NMS main window, select Topology > View Topology.
Management window is displayed.
The Topology
2-1
SJ-20130731095208-010|20130820(R1.0)
The link between the U31 R22 NMS and the NE is disconnected.
The NE IP address is incorrectly set.
The SNMP parameters are incorrectly set.
2-2
SJ-20130731095208-010|20130820(R1.0)
Steps
1. In the NMS main window, select Fault > Alarm Monitoring. The Alarm Monitoring
tab is displayed, see Figure 2-2.
Figure 2-2 Alarm Monitoring Tab
2. Query the current alarms and historical alarms in a day. Focus on critical alarms, link
disconnection alarms, and high CPU/memory/hard disk usage alarms.
3. Double-click an alarm to query the detailed information.
4. Click the Handling Suggestions tab to query the troubleshooting suggestions.
5. On the Alarm Monitoring tab, click the drop-down list next to the
select the
to export all rows of the alarm information, or click the
visible cows of the alarm information.
icon to export
You can export files in the formats of text, Excel, PDF, HTML, and CSV.
6. In the displayed Save dialog box, enter the user name and select the file type, and
then click Save.
End of Steps
Steps
1. In the NMS main window, select Maintenance > System Monitoring. The System
Monitoring window is displayed, see Figure 2-3.
2-3
SJ-20130731095208-010|20130820(R1.0)
2. Select the corresponding NMS under Server. Click View in the View Server
Performance area.
The View Application Server Performance window is
displayed.
3. Query the CPU and memory usage.
The duration when the instant peak value of the CPU usage exceeds 90% must not
exceed 10 seconds. After an operation, the CPU usage must be decreased to the
normal range. The memory usage must be lower than 80%.
If the subscriber capacity exceeds the designed value, and the CPU usage and
memory usage are high for a long time, this indicates that the system load is high
and capacity expansion is required.
If the subscriber capacity does not reach the designed value, but the CPU usage and
memory usage are frequently high, contact ZTE technical support for assistance.
End of Steps
Steps
1. In the U31 R22 client main window, run ems\ums-server\console.exe. The
NetNumen U31 Unified Network Management System - Console window is
displayed.
2. Select View > Record. The Detailed information pane is displayed.
3. Check whether NMS processes are normal, see Figure 2-4.
2-4
SJ-20130731095208-010|20130820(R1.0)
Steps
1. Check the disk space to ensure that sufficient space is available.
2. Check whether any fault occurs on the disks, for example, read or write error.
End of Steps
Result
l
l
If the disk array space is not enough, add more disks, or back up alarm and
performance data and then delete the data to release some space.
If a hard disk fault is found, contact ZTE technical support for assistance.
Note:
The time zone and time of the operating system can only be modified before the NMS is
started.
2-5
SJ-20130731095208-010|20130820(R1.0)
Steps
1. In the Windows operating system, select Start > Control Panel > Date and Time.
The Date and Time window is displayed.
2. Check the system time zone. If the system time zone and the local time zone are
different, click Change time zone to change the system time zone to the local time
zone.
The system time zone must be modified to the local time zone. Otherwise, alarm
generation time is different from that displayed on the NMS.
3. Synchronized the operating system time to the standard clock source.
End of Steps
2-6
SJ-20130731095208-010|20130820(R1.0)
Chapter 3
Weekly Maintenance
Table of Contents
Checking the Performance Data.................................................................................3-1
Checking Hard Disk Space.........................................................................................3-5
Checking Database Space .........................................................................................3-7
Checking Automatic Backup Status..........................................................................3-12
Checking the Operating Status of the Anti-Virus Software ........................................3-14
Performing Alarm Statistics and Analysis .................................................................3-14
Analyzing NE Performance Data ..............................................................................3-16
Checking the Server (Solaris) Logs ..........................................................................3-17
Checking License Management Scale......................................................................3-17
Prerequisite
An NE performance task is created in the NMS.
Steps
1. In the NMS main window, select Performance > Measurement Task Management.
The Measurement Task Management window is displayed.
2. Select an NE, the NE measurement task is displayed on the right pane, see Figure
3-1.
3-1
SJ-20130731095208-010|20130820(R1.0)
3. Right-click the measurement task, and select Query PM Data by Task from the
shortcut menu. The History Performance Data Query window is displayed.
4. Click the Object Selection tab, and select Group by NE from the Location group
list, see Figure 3-2.
Figure 3-2 Object Selection Tab
5. Click the Time Selection tab, and set Query granularity and Time settings, see
Figure 3-3.
3-2
SJ-20130731095208-010|20130820(R1.0)
Note:
NE type and MO type must be the same as those set in the measurement task.
c. Click the Location Selection tab, and select an NE, see Figure 3-5.
3-3
SJ-20130731095208-010|20130820(R1.0)
Note:
The NE must be the same as that selected in the measurement task.
d. Click the Time Selection tab, and set the time range to check the performance
data integrity.
e. Click OK.
The Integrity Status column shows the data integrity of each NE.
l
l
Have Data means that the performance data at each collection point (every
15 minutes by default) during the query period can be queried.
No Data means that no performance data at each collection point is queried.
Suppose the query period is a day. If the system can query the performance
data at collection points during one period (00:00:00 to 12:00:00), but fails to
query performance data at collection points during the other period (12:00:00 to
00:00:00), the integrity query result will be displayed on two rows, with the Integrity
Status as Have Data and No Data respectively.
If No Data is displayed in the Integrity Status column, check whether:
i.
ii.
iii. The start time and end time of the task are set correctly.
iv. The task granularity is set correctly.
v.
Steps
1. In the NMS main window, select Maintenance > System Monitoring. The System
Monitoring window is displayed, see Figure 3-6.
Figure 3-6 System Monitoring Window
2. Select Application Server. Click View in the View Server Performance area. The
View Application Server Performance window is displayed.
3. Query the hard disk usage, see Figure 3-7.
Figure 3-7 HD Information Pane
You should pay much attention to the usage of hard disks where the operating system,
the U31 R22, and the database are installed.
4. Close the View Application Server Performance window. The System Monitoring
window is displayed.
5. In the Monitor Server Performance area, click Configure to set whether to monitor
hard disk space, see Figure 3-8.
3-5
SJ-20130731095208-010|20130820(R1.0)
If HD Monitoring is selected, the hard disk monitoring threshold must be set as follows:
l
l
l
l
l
The remaining space of the disk C must be larger than 5 GB or cannot be less
than 10% of the total hard disk space. The bigger one between 5 GB and 10%
should be taken.
For the SUN server, the file system does exceed 85%, and the root file system
does not exceed 70%.
The remaining space of dual RAID is not less than 10% of the entire hard drive
capacity.
The remaining space of the disk where the server data is stored should be 5 GB
above, or no less than 10% of the entire hard drive capacity.
ii.
If the hard disk where the data files and log files are saved has no enough space,
delete some unnecessary files in the hard disk to release some space. At the
same time, some operations on the database are required, such as backup of the
alarm database and performance database, and deletion of historical data.
3-6
SJ-20130731095208-010|20130820(R1.0)
iii. If the alarm persists, contact ZTE technical support for assistance.
End of Steps
Steps
1. In the NMS main window, select Maintenance > System Monitoring. The System
Monitoring window is displayed, see Figure 3-9.
Figure 3-9 System Monitoring Window
2. Select the corresponding database under Database, and click View in the View
Database Resouce area. The View Database Resources window is displayed.
Query the database information.
Data Space Free Percent of each database must be larger than 5%, and the database
size is normal.
3. Close the View Database Resources window. The System Monitoring window is
displayed.
4. In the Monitor Server Performance area, click Configure to set whether to monitor
the database space, see Figure 3-10.
3-7
SJ-20130731095208-010|20130820(R1.0)
5. (Optional) If the database space is insufficient, this procedure uses the MSSQL Server
2008 as an example to describe related operations.
a. Clear unused alarms and performance data.
i.
In the NMS main window, select Maintenance > System Backup and
Restore. The System Backup and Restore window is displayed.
ii.
Select Backup and Deletion Log Data, Backup and Deletion Alarm Data,
and Backup and Deletion PM Data in Backup for data backup and deletion.
The MSSQL database space is not freed after mass data is cleared. The
database must be shrunk to free the occupied space.
Caution!
Before shrinking the database, you should terminate the network management
services.
In the Windows operating system, select Microsoft SQL Server 2008 R2 >
SQL Server Management Studio. The Microsoft SQL Server Management
Studio window is displayed.
ii.
Click Connect.
iii. Select UEP4X_CAF_FM from the resource manager, see Figure 3-11.
3-8
SJ-20130731095208-010|20130820(R1.0)
iv. Right-click UEP4X_CAF_FM, and select Properties from the shortcut menu.
The Database Properties window is displayed.
v.
Select Options in the left navigation tree, and modify Recovery model in the
right pane to Simple, see Figure 3-12. Remember the original mode for mode
retrieval.
Figure 3-12 Database Properties Window
viii. The Shrink Database window is displayed. Click OK to shrink the database.
ix. After shrinking the database, right-click UEP4X_CAF_FM, and select
Properties from the shortcut menu. The Database Properties window is
displayed. Check the database remaining space.
x.
ii.
Right-click UEP4X, and select Properties from the shortcut menu. The
Database Properties window is displayed. Select Files, see Figure 3-15.
3-10
SJ-20130731095208-010|20130820(R1.0)
Set the Autogrowth column where File Type is Log by the same method.
v.
Store the data files to other disks. Click Add in the Database Properties
window to add a row in the table, see Figure 3-17.
Figure 3-17 Database Properties Window
viii. Set Logical Name of the data file to uep4x_data2, see Figure 3-19.
Figure 3-19 Setting Logical Name
Set Logical Name of the data file to uep4x_data2.dbf, see Figure 3-20.
Click OK.
Figure 3-20 Setting Logical Name of a Database File
Steps
1. In the NMS main window, select Maintenance > Task Management. The Task
Management window is displayed.
2. Select a task from the left navigation tree, for example, PM Data Backup and Deletion
Task, see Figure 3-21.
Figure 3-21 PM Data Backup and Deletion Task
3. Click the
properly.
button to query the task logs, and check whether the task is operating
4. Click the
If a backup task is operating improperly, verify that the disk space is sufficient and the
network connection is normal. If the fault persists, contact ZTE technical support for
assistance.
5. Check the backup file.
The backup path is displayed on the NMS window, see Figure 3-22.
Figure 3-22 Backup Path Area
3-13
SJ-20130731095208-010|20130820(R1.0)
The above figure shows that the performance data is backed up in the \ums-serve
r\rundata\backup\pmbak directory.
End of Steps
Note:
Both the virus database and the anti-virus engine must be updated.
Steps
1. Check whether the anti-virus software is installed and updated automatically.
2. Verify that the virus database version is the latest.
End of Steps
Steps
1. Perform alarm statistics and analysis on existing alarms.
a. In the NMS main window, select Fault > Alarm Monitoring by NE. The Alarm
Monitoring by NE window is displayed.
The NMS collects statistics on number of alarms of each NE based on alarm
levels. You should pay much attention to critical alarms and alarms that are raised
frequently.
b. Double-click the number of alarms to query the actual number.
c. In the left navigation tree, double-click Alarm Monitoring by NE Type to query
the number of alarms of each NE type.
3-14
SJ-20130731095208-010|20130820(R1.0)
c. Click the Condition tab, and set the parameters on the Location, Alarm Code,
and Others tabs.
d. Click OK. Number of historical alarms of each type is displayed. Pay much
attention to the alarms that are raised frequently.
e. In the left navigation tree, select Fault > History Alarm Busy-Time Statistics.
The History Alarm Busy-Time Statistics window is displayed.
"Busy-Time" means the time when the NMS is busy processing a service. You
may pay much attention to the busy-time alarms.
History Alarm Busy-Time Statistics enable you to only collect statistics on the
historical alarms in busy hours. Similar to basic statistics, you can set mean
duration of alarm statistics or alarm occurrence frequency.
f. On the Basic tab, set Statistic Type, Effective Time, and View Setting, see
Figure 3-24.
3-15
SJ-20130731095208-010|20130820(R1.0)
g. Click the Condition tab, and set the parameters on the Location, Alarm Code,
and Others tabs.
h. Click OK. Alarm frequency of each period is displayed in the window. Pay much
attention to the alarms that are raised frequently.
End of Steps
Steps
1.
3-16
SJ-20130731095208-010|20130820(R1.0)
Steps
1. Use the following commands to check whether error, warning or failure records exist
in messages.
l # more /var/adm/messages|grep error
l # more /var/adm/messages|grep warning
l # more /var/adm/messages|grep fail
2. Use the following commands to check whether error, warning or failure records exist
in Syslog.
l # more /var/log/syslog|grep error
l # more /var/log/syslog|grep warning
l # more /var/log/syslog|grep fail
End of Steps
Result
If an error or failure is found in messages or Syslog, check the operation state of
corresponding hardware or software according to the position where the error or failure
occurs.
U31 R22 License files control U31 R22 functions and manage the U31 R22 network scale.
If the network scale is larger than the License management scale, some NEs cannot be
managed.
Steps
1. In the NMS main window, select Help > License Information > Show License. The
License Information window is displayed. Query the License information.
The License information includes the U31 R22 functions and management scale
information. Different configurations are displayed based on different License files.
2. Query current network status and network scale. If the current network scale is larger
or closed to the License management scale. Apply for a new License file to replace
the old one.
End of Steps
3-18
SJ-20130731095208-010|20130820(R1.0)
Chapter 4
Monthly Maintenance
Table of Contents
Checking the Equipment Room Environment .............................................................4-1
Checking Log Information...........................................................................................4-2
Checking the Remote Maintenance Tool ....................................................................4-3
Steps
1. Check the values of the thermometer and hygrometer.
For the temperature and humidity requirements, refer to the following table.
Temperature
Relative humidity
5 to 40
-5 to 50
5% to 85%
5% to 90%
Measure the temperature at a position 1.5 m above the floor and 0.4 m in front of the rack without
front and rear panels.
The short-term operating condition means that the continuous operating period does not exceed 96
hours and the accumulative total period within a year does not exceed 15 days.
If the equipment room temperature does not meet requirements, repair or replace the
air-conditioning system in the equipment room.
If the relative humidity in the equipment room is high, install dehumidification facilities.
if the relative humidity is low, install humidifying facilities.
Verify that there is no sewer pipeline (especially no pipeline connector) passing through
the equipment room.
2. Check the power distribution cabinet, cabinet, shelves, cables, wiring troughs and
other key components.
Fire prevention: All components should be prevented from fire and all fire-fighting
facilities in the equipment room are in good condition.
4-1
SJ-20130731095208-010|20130820(R1.0)
Dust prevention: All components should be clean and tidy without apparent dust
attached.
End of Steps
Steps
1. In the NMS main window, select Security > Log Management.
Management window is displayed.
The Log
2. Double-click a log type, for example, All System Log, check whether there is
information that occurs frequently, mass error information, and operation failure
information, see Figure 4-1.
Figure 4-1 Log Management
4-2
SJ-20130731095208-010|20130820(R1.0)
End of Steps
Steps
1. Check whether the remote maintenance software has been installed on the
maintenance terminal of the U31 R22, such as teamviewer or VNC. Check whether
the maintenance personnel can log in to the remote client or server
2. Check whether the special line is available in the equipment room for remote
maintenance.
End of Steps
4-3
SJ-20130731095208-010|20130820(R1.0)
4-4
SJ-20130731095208-010|20130820(R1.0)
Chapter 5
Annual Maintenance
Table of Contents
Checking the Cabinet and Cables ..............................................................................5-1
Removing Dust...........................................................................................................5-1
Steps
1. Verify that all power cables, grounding cables and signal cables inside the cabinet are
in good condition without any defect such as damage, aging, corrosion or flash burn.
2. Verify that the cable labels are complete and correct.
3. Verify that there is no foreign matter inside or on top of the cabinet.
4. Verify that the rodent-resistant nets at the cable outlets are secured without damage.
End of Steps
There is no dust around the cabinet shell and air inlets of the cabinet. Ensure that the
air inlets of the cabinet are clean and the wind path is unblocked.
The dust screen and the frame are clean without dust.
The fan shelf is clean without dust.
5-1
SJ-20130731095208-010|20130820(R1.0)
Note:
Antistatic measures such as use of an antistatic platform with antistatic dress and antistatic
wrist straps must be taken.
To reduce risks, it is recommended that maintenance personnel remove dust under the
guidance of ZTE technical support engineers. The dust removal operation should be
performed when the traffic is low (for example, between 2:00 and 4:00).
Steps
1. Perform the following steps to clean the dust screen:
i.
Disassemble the dust screen from the cabinet, and clean it with water. Dry the
dust screen, and then reinstall it onto the cabinet.
ii.
Wipe the cabinet shell with a clean and dry cotton cloth.
iii. Use a vacuum cleaner to remove dust around the air inlets of the cabinet.
2. Perform the following steps to clean the fan shelf:
i.
Use a clean cotton cloth, an antistatic soft brush and a vacuum cleaner to remove
dust from fan blades and circuit boards of the backup fan shelf.
ii.
Use the backup fan shelf to replace a fan shelf inside the cabinet.
iii. Remove dust from the replaced fan shelf in the same way. The replaced fan shelf
can be used as the backup fan shelf.
iv. Repeat step i. through step iii. to replace other fan shelves inside the cabinet, and
remove dust from these fan shelves.
3. Remove dust from the air conditioner by complying with air conditioner manuals.
End of Steps
5-2
SJ-20130731095208-010|20130820(R1.0)
Figures
Figure 1-1 Routine Maintenance Flow....................................................................... 1-3
Figure 2-1 Legend .................................................................................................... 2-2
Figure 2-2 Alarm Monitoring Tab ............................................................................... 2-3
Figure 2-3 System Monitoring Window...................................................................... 2-4
Figure 2-4 Detailed Information Pane........................................................................ 2-5
Figure 3-1 Measurement Task Management Tab ...................................................... 3-2
Figure 3-2 Object Selection Tab ................................................................................ 3-2
Figure 3-3 Time Selection Tab .................................................................................. 3-3
Figure 3-4 Object Selection Tab ................................................................................ 3-3
Figure 3-5 Location Selection Tab............................................................................. 3-4
Figure 3-6 System Monitoring Window...................................................................... 3-5
Figure 3-7 HD Information Pane ............................................................................... 3-5
Figure 3-8 Monitor Server Performance Area............................................................ 3-6
Figure 3-9 System Monitoring Window...................................................................... 3-7
Figure 3-10 Monitoring Item Selection Window ......................................................... 3-8
Figure 3-11 Object Explorer Source Manager ........................................................... 3-9
Figure 3-12 Database Properties Window................................................................. 3-9
Figure 3-13 Shrinking the Database........................................................................ 3-10
Figure 3-14 UEP4X................................................................................................. 3-10
Figure 3-15 Files..................................................................................................... 3-11
Figure 3-16 Change Autogrowth for UEP4X............................................................ 3-11
Figure 3-17 Database Properties Window............................................................... 3-11
Figure 3-18 Change Autogrowth Dialog Box ........................................................... 3-12
Figure 3-19 Setting Logical Name........................................................................... 3-12
Figure 3-20 Setting Logical Name of a Database File ............................................. 3-12
Figure 3-21 PM Data Backup and Deletion Task ..................................................... 3-13
Figure 3-22 Backup Path Area ................................................................................ 3-13
Figure 3-23 Basic Tab ............................................................................................. 3-15
Figure 3-24 Historical Alarm Busy-Time Statistics ................................................... 3-16
Figure 3-25 History Performance Data Query ......................................................... 3-16
Figure 4-1 Log Management..................................................................................... 4-2
Figure 4-2 Log Detail ................................................................................................ 4-3
I
SJ-20130731095208-010|20130820(R1.0)
Figures
II
SJ-20130731095208-010|20130820(R1.0)
Glossary
NMS
- Network Management Server
RAID
- Redundant Array of Independent Disks
SNMP
- Simple Network Management Protocol
III
SJ-20130731095208-010|20130820(R1.0)