WSMS LLD Final Confirmed
WSMS LLD Final Confirmed
WSMS LLD Final Confirmed
Telecom Egypt
Rev PA1
Contents
1 Introduction ................................................................................ 3
2 Network Architecture .................................................................. 4
3 Dimensioning .............................................................................. 5
3.1 Hardware Specifications .................................................................. 5
3.1.1 SMS Welcome Cluster ........................................................................ 5
3.1.2 Storage Capacity Requirement .......................................................... 5
3.2 High availability ................................................................................ 6
3.2.1 SMS Welcome High Availability ........................................................ 6
3.2.2 Cluster status .................................................................................... 7
3.2.3 SMS Welcome Cluster switch over ................................................... 7
3.2.4 Manual switch ................................................................................... 7
3.2.5 Logical connectivity ........................................................................... 7
3.3 Network Planning .............................................................................. 8
4 Integrations ............................................................................... 11
4.1 UPG Integration .............................................................................. 11
4.2 BSS Integration ................................................................................ 11
5 Call Flow .................................................................................... 12
6 Backup Procedure...................................................................... 13
6.1 Backup routine ................................................................................ 13
6.2 Performance .................................................................................... 13
6.3 Backup Types .................................................................................. 13
6.4 Backup Retention Period ................................................................ 13
6.5 Backup Tool ..................................................................................... 13
6.6 Recoverability .................................................................................. 14
6.7 Restoration Time ............................................................................. 14
7 Recovery Procedure .................................................................. 14
8 Solution Software ...................................................................... 15
8.1 SMS Welcome applications ............................................................. 15
9 Fault Management .................................................................... 17
1 Introduction
This document describes the SMS Welcome low‐level design including the solution architecture,
hardware (HW) specifications and dimensioning, network connectivity, Invigo requirements,
modules and software list installed on the SMS Welcome servers.
2 Network Architecture
Invigo’s SMS Welcome solution is composed of two clusters: Capturing servers responsible for
detecting roamers traffic and Application servers responsible for sending Welcome SMS and tariff
info. High availability mode is achieved in a fully redundant configuration to minimize service
downtime. The solution architecture is illustrated in the below diagram:
3 Dimensioning
3.1 Hardware Specifications
The SMS Welcome solution will be hosted on 4 physical servers at Telecom Egypt premises.
Automatic switchover will be ensured using Red Hat cluster.
3.1.1 SMS Welcome Cluster
The SMS Welcome cluster is connected to Telecom Egypt/Ericsson networks. Below is the
corresponding information based on HW standard specification:
Capturing Servers dimensioned to process 40 TPS:
HP Proliant DL380p Gen9
Two Intel Xeon Processor E5‐2630v4 10‐Core processors
16 GB RAM
4xHP 300GB Hard Disks
Redundant Power Supply
RHEL 7.5
RHEL Cluster Suite for High Availability
Applications Servers:
HP Proliant DL380p Gen9
Two Intel Xeon Processor E5‐2630v4 10‐Core processors
32 GB RAM
6xHP 300GB Hard Disks
Redundant Power Supply
RHEL 7.5
RHEL Cluster Suite for High Availability
3.1.2 Storage Capacity Requirement
The total size of the Capturing Server storage is around 900 GB (after RAID 1 configuration) per
server, and the total size of the Applications Server storage is around 600 GB (after RAID 1
configuration) per server.
The available disk space will be used for:
PostgreSQL Database datafiles
SMS Welcome applications
SMS Welcome CDR and debug log files
SMS Welcome archiving (database dump, daily CDR archiving, etc…)
Subscribers history for the last 6 months
3.2 High availability
The SMS Welcome is implemented in a highly available fully redundant configuration to minimize
service downtime.
3.2.1 SMS Welcome High Availability
Red Hat Cluster Suite packages are used for the configuration of the Active/Standby Cluster.
The cluster will automatically move the SMS Welcome service from the active node to the
standby node in case a failure is detected. The configuration made on the cluster will be
monitoring the following cases:
1. VAS network is no longer reachable.
2. Network connectivity (heartbeats) between the two servers was lost.
VAS network
One of the cluster configuration’s resources is responsible of detecting any network connectivity
failure: ping commands is issued every 1 minute from the active server to the Traffic IP gateway to
make sure that the VAS network is still accessible from that node. If there is no reply on the ping,
then the cluster will automatically switch SMS Welcome service to the standby node.
Heartbeat
The heartbeat network is a private network that is shared only by the cluster nodes and is not
accessible from outside the cluster. It is used by the cluster exchange tokens between the cluster’s
nodes in order to monitor their status and ensure the correct communicate between them.
When the cluster detects that a token is lost on the active node, where the service is running, for
20 seconds (active node no longer responding over the heartbeat IP), the cluster will stop the SMS
Welcome service on that node and fence it though the ILO interface to protect the data, then re‐
locate the SMS Welcome service to the standby node. This can be only achieved when the standby
node is able to reach the active node’s ILO interface from its Traffic IP.
Resources relocation
VAS IPs will be added to the Red Hat cluster as a service’s resource that should be available only on
the active node.
When the cluster starts the service on server 1, it will stop the service (applications, database and
shut down the VAS interface) on the server 2 and will start the service on server 1 and will bring up
the VAS interface on that node. In cases where the cluster need to switch from one node to another
(e.g. power loss on active node), the heartbeat exchanged by the cluster between both servers will
be lost, hence, the cluster manager will detect that the node 1 (active node) become unreachable,
and will then relocate the service (application, database…) to node 2 (standby node) and bring up
the VAS interface.
3.2.2 Cluster status
The “pcs cluster status” command can be executed as root user to check the cluster status and the
active server
The cluster log messages can be viewed from the /var/log/messages system file.
3.2.3 SMS Welcome Cluster switch over
The cluster will automatically switch to the standby server in case:
The VAS interface went down. This interface handles all the traffic with Telecom Egypt
nodes.
Power Failover. If any of the servers went down, the heartbeat will be lost, and the cluster
manager will relocate the service to node 2.
3.2.4 Manual switch
To manually switch and reallocate the cluster to the standby server, the following command shall
be executed as root user.
If service is currently running on SMSWelcome1 and must be switched to SMSWelcome2:
pcs resource move smswservice hbsmsw2
If service is currently running on SMSWelcome2 and must be switched to SMSWelcome1:
pcs resource move smswservice hbsmsw1
3.2.5 Logical connectivity
4 ILO IPs are used for management access:
Reset the server (in case the server does not respond anymore via the normal network
card), power‐up the server (even if the server is shut down), remote console and
mount remote physical CD/DVD drive or image (license required)
1 VAS IP is used to connect to SMSC/NTP/SMTP/SIEM/EMM/USSD/Tariff Info DB
1 VAS IP is used to connect to UPG/NTP/SMTP/SIEM/EMM
1 VAS IP is used to connect to BSS
Heartbeat IPs are used by the cluster configuration to monitor the servers’ availability.
O&M IPs are used to access all the servers for maintenance.
Bonding mode 1 is used (active/backup) where only one slave in the bond is active. A different slave
becomes active if the active slave fails.
Note: The O&M IPs are the only IPs that will be constantly present on all servers. All other IPs will
only be up on the active server.
3.3 Network Planning
INVIGO SMS Welcome IP Connectivity Matrix
Destin Destina
Sourc Proto
Source Node Source IP ation Destination IP tion Traffic Details Direction
e Port col
node Port
Capturing 10.99.49.160/28
TBD UPG xxx.xxx.xxx.xxx TBD SOAP Triggers Both
servers 10.99.55.32/28
10.98.36.4 SNMP
10.98.36.5 traps
Capturing SNMP 10.98.36.7 (udp
10.99.49.128/27 any SNMP send alarms Both
servers server 10.98.36.8 3302),(
10.98.36.9 udp
10.98.36.16 3307)
Capturing NTP time
10.99.49.128/27 any 172.20.24.20 123 NTP Both
servers server synchronization
TE's Captur
Monitoring and
O&M/Invigo xxx.xxx.xxx.xxx any ing 10.99.49.128/27 22 SSH Both
Support access
Gateway servers
SMS‐C
Applications 10.99.49.160/28 ‐ 172.23.0.174 [9500‐ Send/receive
any SMPP Both
servers 10.99.55.32/28 testbe 172.23.0.177 9510] SMS
d
172.23.0.165
172.23.0.166
172.23.0.167
172.23.0.168
172.23.0.169
172.23.0.170
172.23.0.171
172.23.0.179
172.22.0.165
172.22.0.166
Applications 10.99.49.160/28 [9500‐ Send/receive
any SMS‐C 172.22.0.167 SMPP Both
servers 10.99.55.32/28 9510] SMS
172.22.0.168
172.22.0.169
"172.22.0.171
172.23.0.179"
172.22.0.172
172.22.0.173
172.22.0.174
172.23.0.180
172.22.0.180
172.22.0.181
TE's
Applic
Customer 10.99.49.160/28
xxx.xxx.xxx.xxx any ations 443 HTTPs Web access Both
Care/Admin/ 10.99.55.32/28
servers
Marketing
10.98.36.4
10.98.36.5
Applications SNMP 10.98.36.7 10161,
10.99.49.128/27 any SNMP send alarms Both
servers server 10.98.36.8 10162
10.98.36.9
10.98.36.16
Applications NTP time
10.99.49.128/27 any 172.20.24.20 123 NTP Both
servers server synchronization
TE's Applic
Monitoring and
O&M/Invigo xxx.xxx.xxx.xxx any ations 10.99.49.128/27 22 SSH Both
Support access
Gateway servers
BSS‐
Applications 10.99.49.160/28 (17800‐ getting
any testbe 10.20.129.109 Soap Both
servers 10.99.55.32/28 17815) customer info
d
Applications 10.99.49.160/28 (17800‐ getting
any BSS 10.20.129.60 SOAP Both
servers 10.99.55.32/28 17815) customer info
Captur
ing
Server
1
10.241.129.0/2 10.99.49.132
(Probe
7 10.99.49.133
EMM any 1) 21,22 sftp collecting CDR's both
10.241.129.64/ 10.99.49.134
Captur
27 10.99.49.135
ing
Server
2
(Probe
2)
Applic
ation
Server
1
(SMSW
1)
Applic
ation
Server
2
(SMSW
2)
Captur
ing
Server
1
(Probe
1)
Captur
ing
Server
2
10.241.73.64/2 (Probe 10.99.49.132
7 2) 10.99.49.133
EMM any 21,22 sftp collecting CDR's both
10.241.73.128/ Applic 10.99.49.134
27 ation 10.99.49.135
Server
1
(SMSW
1)
Applic
ation
Server
2
(SMSW
2)
Application 10.99.49.134 Security
any SIEM 10.99.10.10 514 sftp Both
servers 10.99.49.135 auditing
Capturing 10.99.49.132 Security
any SIEM 10.99.10.10 514 sftp Both
servers 10.99.49.133 auditing
4 Integrations
4.1 UPG Integration
The WSMS platform will be integrated with the UPG (i.e. a Customer Adaptation (CA) on top of
UPG) over SOAP protocol for cascading the trigger already received from the CUDB in case of VLR
Change of TE roaming subscriber.
The trigger received from the CUDB on which the UPG will perform the below logic will be as
follow:
Old VLR New VLR
Prefix Prefix UPG Action
+20 +20 No Notification to WSMS platform
Other than this condition Notification to WSMS platform
The SOAP notification from the UPG towards the WSMS platform will include the
below parameters:
Tag identifying this is VLR change notification
MSISDN
Old/New VLR
APN‐id
IMSI
IMEISV
Pdpcp‐id
4.2 BSS Integration
The WSMS platform will be integrated with the BSS node in order to retrieve the subscriber profile
information over SOAP protocol. The WSMS will use the function “recivecustomerinfo” and will
parse the BSS response to extract the paid‐mode, rate‐plan and add‐on values for every
subscriber. The info will be stored in a specific table and displayed on the GUI where the user can
update any field.
In case the SMSW module is not able to fetch the subscriber’s profile from the BSS, a default pre‐
configured SMS will be sent to the roamer.
5 Call Flow
1. The UPG node sends SOAP trigger towards wsms platform containing the below
(Tag identifying this is VLR change notification,MSISDN,Old/New VLR,APN‐id,IMSI,IMEISV,Pdpcp‐id
2. The SMS Welcome server parse the UPG SOAP request received from the UPG node.
3. The SMS Welcome send ReceiveCustomerInfo towards the BSS to get the below
parameters ( paid‐mode, rate‐plan , add‐on )
4. The SMS welcome perform internal logic based on configured rules to prepare the sms
5. The SMS Welcome server sends a Bon Voyage SMS via SMPP through the operator
SMSC,The Bon Voyage and information SMS are delivered to the outbound roamer.
6 Backup Procedure
6.2 Performance
The duration needed to run the full backup daily is less than 1 hour. It will run each day after
midnight, after analyzing the previous day’s logs (for statistical reasons).
6.3 Backup Types
The backup taken each day is a full backup, not only consisting of the updates. As a result, each
day, the backup’s size is incremental.
6.4 Backup Retention Period
All the backups will be kept for 15 days, except for the CDRs, which will be kept for X days (number
of days is configurable) based on the available storage
6.5 Backup Tool
The application that is responsible for the backup is a proprietary script for Invigo.
6.6 Recoverability
In case of data corruption in the DB, the exported dumps can be imported, and CDRs can be
analyzed to apply needed changes that took place on the DB after taking the export.
If connectivity is lost due to changes in routing info or Ethernet interfaces configurations (maybe
due to a server reboot), the working configurations can be extracted from the backup of IP routes
or Ethernet interfaces.
The application and architecture backups can be used in case a modification took place causing
unwanted behavior. Older running versions can be extracted to replace the new ones.
7 Recovery Procedure
An initial cold Backup will be taken directly after the SMS Welcome installation is completed. This
cold backup shall be copied to a safe location to be used in case needed (e.g. complete loss of
data).
In case of data lost on day D (any time), recovery process allows to recover all data until day D‐
1@23:59:59
Depending on the issue faced, Database, Applications, and Configuration files ... can be recovered
from the daily backed up files. Normal PostgreSQL backup/restore commands are used for the
backup and restore.
Any application/page can be restored from the backup using the untar command (tar cvf)
Database recovery can be done by:
1‐ Untar the export: tar zxf exp_yyyymmdd.dmp.gz
2‐ Acces the db: psql smsw ‐U smsw
3‐ Execute \i exp_yyyymmdd.dmp
Welcome SMS
8 Solution Software
8.1 SMS Welcome applications
The SMS Welcome applications are responsible for the handling the triggers, creating subscribers
accounts. Below is a list of applications that might be updated during installation:
Synapse receives roamers traffic through integration with UPG node and sends them to
ocean2db
ocean2db inserts roamers records into the WSMS DB
smswelcome_o reads outbound roamers traffic records inserted into the DB, update the
roaming tables including all information related to the roamer (IMSI, MSISDN, date of detection,
home country, visited country, visited VLR, etc…), and decide which Bon Voyage, Tariff messages
to send.
Db2sms reads SMS messages from the SMS Sending table and submits these messages to the
SMSC over SMPP for delivery to the roamer. It also receives delivery reports from SMSC
stats_o reads smswelcome_o cdr and update outbound roamers statistics tables
call_stats_processes generates SMSwelcome daily statistics
BroadcastSend is responsible for generating the SMS to be sent for the targeted bulk campaign.
A start date for the campaign must be configured when uploading the list of MSISDN to which
we need to send an SMS.
BroadcastUpdate is responsible for updating the bulk campaign’s statistics.
Eventmanager is responsible for sending alarms generated using SMTP protocol. It also displays
the alarms on the web GUI page “SMS Welcome Alarms”
sysmon is responsible for continuous monitoring of the SMS Welcome system and generating
performance statistics each 5 minute (CPU, memory, disk, and Ethernet interfaces usage)
Monitor is responsible for continuous monitoring of the application. if an applicataion stopped
will attempt to restart them
Dailymgmt module will run daily to take a take a backup of today’s LOG, CDR and a database
dump, clear old history (>30 days). It uses tar and gzip commands to store yesterday backups
under the backup directory.
LOG_20150618.tar.gz (LOGs generated by all modules are compressed)
CDR_20150618.tar.gz (CDRs generated by all modules are compressed)
exp_20150618.dmp.gz (database export)
remote_backup module will run daily to take a take a backup of folder architecture,
applications, configurations, network interfaces/routes, and crontab entries then clear old
history (>30 days) and send the backups to a remote server over SFTP. It uses tar and gzip
commands to store yesterday backups under the backup directory.
folder_arch_20170712.tar.gz: folders architecture on the server
eth_20170712.tar.gz: network configurations on the server
routes_20170712.tar.gz: routing tables on the server
crontab_20170712.tar.gz: list of all cronjobs configured on the server
apps_20170712.tar.gz: all Perl, Shell and C applications
conf_20170712.tar.gz: configuration parameters used by the applications
htdocs_20170712.tar.gz: all php pages
9 Fault Management
The SMS Welcome server can be connected to the Telecom Egypt SNMP server. The SMS Welcome
SNMP agent is a PERL script (EventManager.pl) that can send SNMP traps (version 3) to the SNMP
manager notifying it about errors generated on the system.
Alarms will also be sent to the Invigo Support team and TE operations team by Email over the SMTP
protocol.
The SMS Welcome errors have different error codes and severity levels.
Alarms include:
SMS Welcome Applications Error logs (e.g. connection loss)
SMS Welcome CDRs not updated
Disk Space Usage
High CPU load & Memory usage
SMS Welcome tables queuing (e.g. SMS queue waiting to be delivered to the
subscribers)
Server reboot
Apache Error Logs
Moreover, a script “sysmon.sh” generates performance stats each 5 minutes, logged in the CDR file.
Stats include information about:
Used and Free memory
CPU load
Disk space usage
Ethernet connections
SMS throughput
Used licenses