ISE503 Project Report
ISE503 Project Report
ISE503 Project Report
By:
Muhammad Dzulqarnain Al Firdausi
g202210120
Supervised by:
Dr. Syed Mujahid
ISE 503 TERM PROJECT
1. Introduction
This term project provides a review of an algorithm proposed in (Zhi-Qiang Zeng et al.,
2008) for fast training support vector machines (SVM) using parallel sequential minimal
optimization (SMO). SVM design problem lies in how to use linear constraint to solve a quadratic
programming (QP) problem. Due to the availability of massive data nowadays, if one wants to
include all available data to train SVM, he/she should provide more memory space for kernel
matrix since the increment of need for memory is to the power of 2, 𝑂(𝑁 2 ), where 𝑁 is the number
of available data to train SVM. They improved an algorithm based on Symmetric Multiprocessor
(SMP) machine with advantages of speed when applied to problems with large training sets and
high dimensional spaces without reducing generalization performance of SVM.
There are several algorithms to train SVM in fast way, and SMO (Keerthi et al., 2001; Lin,
2002) is the most popular one with its faster, simple, and scaling ability. Although SMO with its
novelty can reduce the difficulty of QP problem with decomposition strategies, the memory saving
problem to train SVM with high number of training data still exists and needs to address further.
2. SMO review
𝑢 = 𝒘𝑇 𝒙 + 𝑏
Where 𝒙 ∈ ℝ𝑚 is the input, and the objective is to find 𝒘 and 𝑏 with a maximum margin. The
support vector method uses the optimal solution of the optimization problem (1) for finding the
maximum margin:
1
min: ‖𝒘‖2
2 (1)
𝑠𝑡: 𝑦𝑖 (𝒘𝑇 𝒙𝑖 + 𝑏) ≥ 1 ∀𝑖
To solve problem (1) we can use Lagrange by introducing 𝛼 as the Lagrange multiplier:
𝑁
1
ℒ(𝒘, 𝑏, 𝛼) = ‖𝒘‖2 − ∑ 𝛼𝑖 (𝑦𝑖 (𝒘𝑇 𝒙𝑖 + 𝑏) − 1)
2
𝑖=1
Page 1 of 5
ISE 503 TERM PROJECT
Where 𝑁 is the number of training data points for the SVM. Taking partial derivatives with respect
to 𝒘 and 𝑏 and setting to 0, we found:
1 T
ℒ(𝒘, 𝑏, 𝛼) = 𝒘 𝒘 − 𝜶𝒚𝒘𝑇 𝒙𝑖 + 𝜶𝒚T 𝒆𝑏 − 𝜶
2 (2)
1
𝜕 2 𝒘T 𝒘 − 𝜶𝒚𝒘𝑇 𝒙𝑖
=0
𝜕𝒘
𝒘 − 𝜶𝒚𝒙𝑖 = 0
𝒘 = 𝜶𝒚𝒙𝑖 (3)
𝜕𝜶𝒚T 𝒆𝑏
=0
𝜕𝑏
𝜶T 𝒚 = 0 (4)
Where 𝒚 is a vector in which 𝒚 ∈ ℝ such that 𝑦𝑖 ∈ {1, −1}, 𝒆 is a vector contains all ones.
SMO solves the smallest possible optimization problem at every iteration. The smallest possible
optimization problem involves 2 𝜶’s since they must obey a linear equality constraint 𝜶T 𝒚 = 0.
There are two components in SMO:
Page 2 of 5
ISE 503 TERM PROJECT
Where 𝐶 is the upper bound of all variable (Zhi-Qiang Zeng et al., 2008).
1. Initialization: set 𝐶 for upper bound of all 𝛼 variables, 𝜀 for error threshold, index 𝑖 = 0
for the first 𝛼, 𝑏 = 0, and 𝑘 = 0. Select 𝜶𝑘 as the initial feasible solution, i.e., set 𝜶0 = 𝟎
where 𝜶 ∈ ℝ𝑚 has same numbers as 𝒙.
2. If 𝜶𝑘 is an optimal solution of (5) with constraint (6) and the KKT for 𝜶𝑘 , stop. Otherwise,
go to step 3.
3. For given input 𝒙𝑖 , calculate output error 𝐸𝑖 by using:
𝑢 = 𝒘𝑇 𝒙 + 𝑏
𝒘 = 𝜶𝒚𝒙𝑖
𝐸𝑖 = 𝑢 − 𝑦𝑖 = 𝜶𝒚𝒙𝑖 𝒙 + 𝑏 − 𝑦𝑖
4. Check if 𝛼𝑖 > 0 and |𝐸𝑖 | < 𝜀 or 𝛼𝑖 < 𝐶 and |𝐸𝑖 | > 𝜀 go to step 5, where 𝛼𝑖 is the 𝑖 − 𝑡ℎ
element of 𝜶0 . Otherwise, 𝑖 = 𝑖 + 1, go to step 3
5. Set index 𝑗 for the second 𝛼 randomly, i.e., use random uniform numbers between 0 and
𝑚 and 𝑗 ≠ 𝑖. For given input 𝒙𝑗 , calculate output error 𝐸𝑗 by using:
𝐸𝑗 = 𝜶𝒚𝒙𝑗 𝒙 + 𝑏 − 𝑦𝑗
{max(0, 𝛼𝑗 + 𝛼𝑖 − 𝐶), min(𝐶, 𝛼𝑗 + 𝛼𝑖 ), 2} 𝑓𝑜𝑟 𝑦𝑖 = 𝑦𝑗
6. Set 𝑏𝑜𝑢𝑛𝑑 for 𝜶 where:{
{max(0, 𝛼𝑗 − 𝛼𝑖 ), min(𝐶, 𝛼𝑗 − 𝛼𝑖 + 𝐶), 2} 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
7. Set 𝜂 = 2𝒙T𝑖 𝒙𝑗 − 𝒙T𝑖 𝒙𝑖 − 𝒙𝑗T 𝒙𝑗
Page 3 of 5
ISE 503 TERM PROJECT
Otherwise, go to step 3
𝑏1 𝑖𝑓 0 < 𝛼𝑖 < 𝐶
𝑏
10. Set 𝑏 = { 2 𝑖𝑓 0 < 𝛼𝑗 < 𝐶
(𝑏1 + 𝑏2 )/2 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
11. 𝑘 = 𝑘 + 1 and go to step 2.
3. Experiment
We implemented the SMO algorithm on two different data sets where the first dataset is a
simulated interview score data, the second and the third are the clustering dataset. The experiment
was carried out with HP Pavilion Laptop 14 with i5-8250U CPU @1.60GHz (8 CPUs), 8GB RAM.
The operating system is Windows 11 pro 64-bit. The parameters for training are: maximum
number of iterations: 50, 𝜀 = 0.001, 𝐶 = 1, 𝑚𝑖𝑛α = 0.0001. The training result of the algorithm
is shown in Table 1 and Table 2.
Page 4 of 5
ISE 503 TERM PROJECT
References
Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K., 2001. Improvements to Platt’s
SMO Algorithm for SVM Classifier Design. Neural Computation 13, 637–649.
https://doi.org/10.1162/089976601300014493
Lin, C.-J., 2002. Asymptotic convergence of an SMO algorithm without any assumptions. IEEE
Transactions on Neural Networks 13, 248–250. https://doi.org/10.1109/72.977319
Zhi-Qiang Zeng, Hong-Bin Yu, Hua-Rong Xu, Yan-Qi Xie, Ji Gao, 2008. Fast training Support
Vector Machines using parallel sequential minimal optimization, in: 2008 3rd
International Conference on Intelligent System and Knowledge Engineering. Presented at
the 2008 3rd International Conference on Intelligent System and Knowledge Engineering
(ISKE 2008), IEEE, Xiamen, China, pp. 997–1001.
https://doi.org/10.1109/ISKE.2008.4731075
Page 5 of 5