Mirzahosseini Ramin 201711 PhD Thesis(1)

FPGA-based Real-Time Simulation Platform For Power
Grids Including Multiple Converters
by
Ramin Mirzahosseini
A thesis submitted in conformity with the requirements

for the degree of Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
c Copyright 2017 by Ramin Mirzahosseini

Abstract
FPGA-based Real-Time Simulation Platform For Power Grids Including Multiple
Converters
Ramin Mirzahosseini
Doctor of Philosophy
Graduate Department of Electrical and Computer Engineering
University of Toronto
2017
This thesis develops an FPGA-based real-time simulation platform for power grids
including multiple Power Electronic Converters (PECs) and other classical components,
which in contrast to the classical systems do not include long transmission lines for
mathematical decoupling of systems for efficient simulation. Presence of short transmis-
sion lines creates a large set of network equations that can not be decoupled. Handling
the large admittance matrix, representing the large set of network equations, requires
floating-point numerical representation which is associated with large latencies and high
resource utilization in FPGAs. In addition, the presence of high switching frequency
PECs imposes small simulation time-step requirement and therefore high computational
burden. Accurate representation of PECs, using two-value resistor model, results in
time-variant admittance matrix which also increases computational burden.
This thesis proposes Reformulated Modified Nodal Analysis (RMNA) for solution of
network equations to enable (i) parallelism between the simulation task of solving network
equations and those of other power system component models to achieve small simula-
tion time-steps and (ii) design of a low-latency floating-point matrix-vector multiplication
hardware module to efficiently solve relatively large set of network equations associated
with systems targeted in this work. An FPGA-based partitioning method is proposed
to enable simulation of multiple PECs and representing power switches by two-value re-
sistor model without significantly increasing the computational burden. This overcomes
the issues associated with the existing real-time Associate Discrete Circuit (ADC) switch
ii
model, i.e., artificial switching loss and numerical oscillation. A new FPGA-based Elec-
trical Machine (EM) Constant-Parameter-Voltage-Behind-Reactance (CPVBR) model is
proposed to enable real-time simulation of different types of EMs, i.e., Induction Ma-
chine (IM), Synchronous Machine (SM) and Permanent Magnet Synchronous Machine
(PMSM), under all test scenarios and regardless of their dynamic saliency and system con-
figuration. Moreover, this thesis develops Inter-Hardware Universal Line Model (IHULM)
to enable parallel simulation of distribution system, which includes short transmission
lines, in the FPGA-based platform and the rest of the system in commercially available
simulation platforms, e.g., RTDS.
The platform developed in this work accurately simulates power grids with up to 160
nodes/outputs, four PECs, three EMs and one IHULM using a small real-time time-step
of 2.55µs. Moreover, it features fixed-hardware design to avoid FPGA code compilation
time. A model preparation script is developed to calculate the study system parameters
based on the modeling approaches devised in this work.
iii
To my parents, Maryam and Aliakbar
iv
Acknowledgements
I would like to express my sincere appreciation to my advisor, Professor Reza Iravani,
for his valuable and insightful supervision. Throughout my Ph.D. his knowledge and
professionalism inspired me and I am grateful to have had the opportunity to learn from
him and further my career with his guidance.
In addition, I would like to thank my Ph.D. examination committee members, Pro-
fessors Zeb Tate, Joshua A. Taylor, Olivier Trescases and Alireza Bakhshai for their
thoughtful feedback.
I acknowledge the generous financial support I received from Professor Iravani and
the University of Toronto.
v
Contents
1 Introduction 1
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Statement of the Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Power System Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.2 Analyses of EMTs in Power System . . . . . . . . . . . . . . . . . 3
1.2.3 Real-Time Simulation of EMTs . . . . . . . . . . . . . . . . . . . 4
1.2.4 Existing Digital Real-Time Simulator (DRTS) platforms . . . . . 6
1.2.5 Motivation of this Work . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 FPGA-Based High Performance Computing 13

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 FPGA Hardware Resources . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Configurable Logic Blocks (CLBs) . . . . . . . . . . . . . . . . . 14
2.2.2 DSP Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Memory resources . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Numerical Representations . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Parallelism Exploitation Techniques . . . . . . . . . . . . . . . . . . . . 18
2.4.1 Unrolling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.2 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4.3 Floating-Point Accumulation . . . . . . . . . . . . . . . . . . . . 20
2.5 FPGA Programming Method . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Summary and Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 FPGA-Based Solution of Network Equations for EMT Simulation 25

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 Network Formulations for EMT Simulation . . . . . . . . . . . . . . . . . 25
3.2.1 State-Space Formulation . . . . . . . . . . . . . . . . . . . . . . . 26
vi
3.2.2 Nodal Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3 Combined Nodal and State-Space Formulation . . . . . . . . . . . 30
3.3 Proposed DRTS Architecture . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 FPGA-based Solution of Network Equations . . . . . . . . . . . . . . . . 33
3.4.1 Anti-parallel Patterns and Proposed Reformulation . . . . . . . . 35
3.4.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.5 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 FPGA-Based Real-Time Simulation of Power Electronic Converters

(PECs) for EMT Analysis 45
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Linear Multi-Step(LMS) Integration Methods . . . . . . . . . . . . . . . 46
4.3 Power Switch Models for EMT studies . . . . . . . . . . . . . . . . . . . 49
4.3.1 Two-value Resistor Model . . . . . . . . . . . . . . . . . . . . . . 50
4.3.2 Associated Discrete Circuit (ADC) Model . . . . . . . . . . . . . 51
4.4 Partitioning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4.1 Proposed FPGA-based Partitioning Method . . . . . . . . . . . . 55
4.4.2 Parallel Computation of Network Equivalent . . . . . . . . . . . . 55
4.4.3 PEC Simulation Task . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4.4 Partitioning Method for Multiple PECs . . . . . . . . . . . . . . . 61
4.5 Hardware Design of PEC Module . . . . . . . . . . . . . . . . . . . . . . 63
4.6 Performance evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.6.1 Case I: Start-up . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.6.2 Case II: Rectifier Mode . . . . . . . . . . . . . . . . . . . . . . . . 68
5 FPGA-Based Simulation of Electrical Machines (EMs) for EMT Anal-

ysis 72
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2 EM Models for EMT Analysis . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.1 PD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.2.2 QD Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2.3 Voltage Behind Reactance (VBR) Model . . . . . . . . . . . . . . 76
5.3 Proposed EM CPVBR Model . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.1 CPVBR Continuous-time Model . . . . . . . . . . . . . . . . . . . 79
5.3.2 Proposed LMS Integration Method for EMs with Dynamic Saliency 84
5.3.3 EM Simulation task . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.4 Hardware Design of EM Module . . . . . . . . . . . . . . . . . . . . . . 90
vii
5.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
5.5.1 Case I: IM Start-up . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.5.2 Case II: IM Transient Response . . . . . . . . . . . . . . . . . . . 94
5.5.3 Case III: PMSM Transient Response . . . . . . . . . . . . . . . . 95
5.5.4 Case IV: PMSM and Rectifier . . . . . . . . . . . . . . . . . . . . 97
5.5.5 Case V: SM and Rectifier . . . . . . . . . . . . . . . . . . . . . . 99
6 Real-time Simulation Platform and Performance Verification 103

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
6.2 Inter-Hardware Universal Line Model (IHULM) . . . . . . . . . . . . . . 104
6.3 IHULM Hardware Module . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Proposed real-time simulation platform . . . . . . . . . . . . . . . . . . . 112
6.4.1 Hardware Platform . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.4.2 Model Preparation Script . . . . . . . . . . . . . . . . . . . . . . 116
6.5 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
6.5.1 Case I: Transmission line step response . . . . . . . . . . . . . . . 118
6.5.2 Case II: Grid connected DER . . . . . . . . . . . . . . . . . . . . 119
6.5.3 Case III: Microgrid . . . . . . . . . . . . . . . . . . . . . . . . . . 120
6.6 Summary and Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 125
7 Summary and Conclusion 129

7.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
7.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
7.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
A EM Matrices 133
B Transmission Line Parameters and Fitting Results 135
Bibliography 139
viii
List of Tables
2.1 Hardware Rsources available on VC7VX485T . . . . . . . . . . . . . . . 16

2.2 Characteristics of Floating-Point Accumulator Designs . . . . . . . . . . 22
3.1 Discrete admittance and history update coefficients for different branches 28
3.2 Resource utilization of differnet modules . . . . . . . . . . . . . . . . . . 44
4.1 Multi-step numerical integration methods . . . . . . . . . . . . . . . . . . 46

4.2 ADC switch model parameters . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Resource utilization of the PEC module . . . . . . . . . . . . . . . . . . 66
4.4 Parameters of the PEC station . . . . . . . . . . . . . . . . . . . . . . . 68
5.1 Resource utilization of the EM module . . . . . . . . . . . . . . . . . . . 92

5.2 Parameters of the PEC and IM . . . . . . . . . . . . . . . . . . . . . . . 94
5.3 Parameters of the PEC and PMSM . . . . . . . . . . . . . . . . . . . . . 97
5.4 Parameters of the SM and Rectifier system . . . . . . . . . . . . . . . . . 101
6.1 Resource utilization of IHULM Module . . . . . . . . . . . . . . . . . . . 112

6.2 Resource utilization of modules in the FPGA-based DRTS . . . . . . . . 114
6.3 DER unit parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6.4 Microgrid 50hp IM parameters [1] . . . . . . . . . . . . . . . . . . . . . . 124
B.1 Conductor and ground wire data . . . . . . . . . . . . . . . . . . . . . . 136
ix
List of Figures
1.1 Power System Transients Time-Scale. . . . . . . . . . . . . . . . . . . . . 3

1.2 Block diagram of Hardware-In-The-Loop (HIL) platform. . . . . . . . . . 5
1.3 CIGRE benchmark system for renewable energy integration. . . . . . . . 8
2.1 CLB resource in Virtex-7 family. . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Basic DSP48E1 Slice Diagram . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Block RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Numerical Representations (a) Fixed-Point, (b) Floating-Point . . . . . . 17
2.5 Resource utilization of floating-point adder IP core optimized for speed [2] 18
2.6 Timing of floating-point vector addition (a) without and (b) with pipelin-
ing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7 Timing diagram for calculating partial accumulations . . . . . . . . . . . 21
3.1 Continuous-time and discretized model of L, C, series RL and series RC

branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Transformer Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Nodal equivalent of the state-space subsystem . . . . . . . . . . . . . . . 32
3.4 Discretized transmission line Model . . . . . . . . . . . . . . . . . . . . . 33
3.5 Emerging power systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Independent source calculation module . . . . . . . . . . . . . . . . . . . 38
3.7 History current update module . . . . . . . . . . . . . . . . . . . . . . . 39
3.8 Matrix-vector multiplication module . . . . . . . . . . . . . . . . . . . . 40
3.9 Computation engine for solution of network equations . . . . . . . . . . . 42
3.10 Timing diagram of the proposed computation engine . . . . . . . . . . . 42
4.1 Region of absolute stability for multi-step methods . . . . . . . . . . . . 48

4.2 Inductor switch example . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Trapezoidal ringing phenomenon . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Two-value resistor Switch- and ideal switch model . . . . . . . . . . . . . 50
4.5 Positive current and voltage convention for switches . . . . . . . . . . . . 51
x
4.6 ADC switch model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.7 (a) A system embedding one PEC (b) The circuit seen from the PECSN 57
4.8 (a) The model of the PEC in the PEC simulation task, (b) The third step
of the proposed partitioning method . . . . . . . . . . . . . . . . . . . . 58
4.9 (a) Boost converter and (b) its discretized model . . . . . . . . . . . . . . 62
4.10 PEC Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.11 Computation engine for simulation of case studies embedding multiple PECs 65
4.12 Timing diagram of (a) computation engine modules (b) PUs in the PEC
module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.13 (a) Case study system embedding two PEC units (b) PEC unit model . 67
4.14 Simulation results of Case I, m1 = 0.4∠60◦ and m2 = 0.4∠0◦ , 1st (left)
column includes the FPGA-based DRTS real-time simulation results using
the proposed partitioning method, 2nd column contains the PSIM off-line
simulation results using the two-value resistor reference method and 3rd
column has the RTDS-based real-time simulation results using the ADC
method. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.15 Simulation results of Case II, m1 = 0.4∠60◦ , m2 = 0.4∠0◦ and the gate
signals of both PECs are blocked at t=0.1s, 1st (left) column includes
the FPGA-based DRTS real-time simulation results using the proposed
partitioning method, 2nd column contains the PSIM off-line simulation
results using the two-value resistor reference method and 3rd column has
the RTDS-based real-time simulation results using the ADC method. . . 70
5.1 General VBR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.2 The model of different types of EMs in qd coordinates, (a) Synchronous
Machine, (b) Permanent-Magnet Synchronous Machine including two damper
windings, (c) Single-cage Induction Machine, (d) Double-cage Induction
Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.3 The general EM model in qd coordinates . . . . . . . . . . . . . . . . . . 81
5.4 The constant parameter interface for CPVBR model [3] . . . . . . . . . . 84
5.5 Region of absolute stability requirement for a stiffly stable LMS integration
method [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Region of absolute stability of the proposed LMS for different values of
parameter a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.7 Inductor discretized model circuit equivalent . . . . . . . . . . . . . . . 87
5.8 (a) The computational engine including the EM modules, (b) The PUs in
the EM module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.9 Induction machine case study system . . . . . . . . . . . . . . . . . . . . 93
xi
5.10 Simulation results of Case I, i.e., IM start-up, m = 0.4. The left column
includes the results from FPGA-based method, based on the proposed
CPVBR model and using the proposed FPGA-based DRTS platform of
Fig. 5.8 with the real-time time-step of 4µs and the right column contains
the results from the reference method, i.e., based on VBR model and using
PLECs toolbox of MATLAB Simulink environment with ode23tb solver
and the relative error parameter set to 10−4 . . . . . . . . . . . . . . . . 95
5.11 Simulation results of Case II, i.e., IM transient response, a step change
from m = 0.4 to m = 0.5 is applied at t = 0.12s. The left column includes
the results from FPGA-based method, based on the proposed CPVBR
model and using the proposed FPGA-based DRTS platform of Fig. 5.8
with the real-time time-step of 4µs and the right column contains the
results from the reference method, i.e., based on VBR model and using
PLECs toolbox of MATLAB Simulink environment with ode23tb solver
5.12 PMSM Machine case study system . . . . . . . . . . . . . . . . . . . . . 96
5.13 Simulation results of Case III, i.e., PMSM transient response, a mechan-
ical torque of τm = 10N.m is applied the PMSM shaft at t = 3s. The
left column includes the results from FPGA-based method, based on the
proposed CPVBR model and using the proposed FPGA-based DRTS plat-
form of Fig. 5.8 with the real-time time-step of 4µs and the right column
contains the results from the reference method, i.e., based on VBR model
and using PLECs toolbox of MATLAB Simulink environment with ode23tb
solver and the relative error parameter set to 10−4 . . . . . . . . . . . . . 98
5.14 Simulation results of Case IV, i.e., PMSM and rectifier, a mechanical
torque of τm = −15N.m is applied to the PMSM shaft. The left column
includes the results from FPGA-based method, based on the proposed
CPVBR model and using the proposed FPGA-based DRTS platform of
Fig. 5.8 with the real-time time-step of 4µs and the right column contains
the results from the reference method, i.e., based on VBR model and us-
ing PLECs toolbox of MATLAB Simulink environment with ode23tb solver
5.15 Synchronous machine case study system . . . . . . . . . . . . . . . . . . 100
xii
5.16 Simulation results of Case V, i.e., SM and Rectifier. The SM speed is
locked at synchronous speed and at t = 0s the field voltage is applied to
the filed winding. The left column includes the results from FPGA-based
method, based on the proposed CPVBR model and using the proposed
FPGA-based DRTS platform of Fig. 5.8 with the real-time time-step of
4µs and the right column contains the results from the reference method,
i.e., based on VBR model and using PLECs toolbox of MATLAB Simulink
environment with ode23tb solver and the relative error parameter set to
10−4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.1 Frequency-domain transmission line model . . . . . . . . . . . . . . . . . 104

6.2 (a) IHULM concept, (b) Discretized IHULM . . . . . . . . . . . . . . . . 106
6.3 IHULM receiving-end representation in RTDS . . . . . . . . . . . . . . . 108
6.4 (a) IHULM FPGA module containing 5 PUs to carry out steps 1 to 5
of the IHULM simulation task. (b) FP Multiply and Accumulate (FP-
MACC) units similar to the first design in Table 2.2 (c) Matrix-vector
multiplication containing three FPMACC units. (d) The input select for
matrix-vector multiplication unit based on the order shown Fig. 6.5 . . . 111
6.5 The order of inputs for the vector-matrix multiplications in FPMACC1,
shown in red arrow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.6 Proposed DRTS Platform . . . . . . . . . . . . . . . . . . . . . . . . . . 113
6.7 Placement of the implemented FPGA-based DRTS on VC7VX485T chip 115
6.8 Timing diagram of parallel operation of simulation modules in FPGA-
based DRTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6.9 Model preparation algorithm diagram including three steps, 1st step (col-
umn) is to prepare the intermediate netlist file, 2nd step is to find module
parameters and RMNA matrix and 3rd step is to obtain the network equiv-
alent parameter of PECSNs. The process required for each component
model is shown in the corresponding row. . . . . . . . . . . . . . . . . . 117
6.10 Overhead three-phase transmission line . . . . . . . . . . . . . . . . . . . 118
6.11 Simulation results of overhead line energization. The left column includes
the results from FPGA-based method, based on the proposed real-time
simulation platform with the real-time time-step of 2.6µs and the right
column contains the results from the reference method, using PSCAD,
based on its phase domain model, with the time-step of 2.6µs. . . . . . . 119
6.12 (a) Grid connected DER study system (b) DER unit model . . . . . . . . 120
xiii
6.13 Start up simulation results of Case II, grid connected DER. The receiving
end and sending end voltages of the transmission line are shown in first
and second row, respectively, and the DER three-phase currents are shown
in the third row. The left column includes the results from FPGA-based
method, based on the proposed real-time simulation platform with the
real-time time-step of 2.6µs and the right column contains the results from
the reference method, using PSCAD with the time-step of 2.6µs. . . . . . 121
6.14 Fault response simulation results of Case II, grid connected DER. The
receiving end and sending end voltages of the transmission line are shown
in first and second row, respectively, and the DER three-phase currents
are shown in the third row. A three-phase to ground fault is applied to
the grid voltage at t=0.5s and VSC gate signals are blocked at t=0.51s.
The left column includes the results from FPGA-based method, based on
the proposed real-time simulation platform with the real-time time-step of
2.6µs and the right column contains the results from the reference method,
using PSCAD with the time-step of 2.6µs. . . . . . . . . . . . . . . . . . 122
6.15 (a) Microgrid case study system embedding four DER units . . . . . . . 123
6.16 Simulation results of DER currents in the microgrid start-up. DER1 to
DER4 three-phase currents are shown in the first to fourth row, respec-
tively. The left column includes the results from FPGA-based method,
based on the proposed real-time simulation platform with the real-time
time-step of 2.6µs and the right column contains the results from the ref-
erence method, using PSIM with the time-step of 2.6µs. . . . . . . . . . . 124
6.17 Simulation results of IMs in the microgrid start-up. IM1 to IM3 three-
phase currents are shown in the first to third row, respectively. Fourth and
fifth row are respectively the electromagnetic torque and the mechanical
speed of the IMs. The left column includes the results from FPGA-based
the reference method, using PSIM with the time-step of 2.6µs. . . . . . . 125
6.18 Simulation results of DER currents in the microgrid fault study. DER1 to
DER4 three-phase currents are shown in the first to fourth row, respec-
tively. The left column includes the results from FPGA-based method,
based on the proposed real-time simulation platform with the real-time
time-step of 2.6µs and the right column contains the results from the ref-
erence method, using PSIM with the time-step of 2.6µs. . . . . . . . . . . 126
xiv
6.19 Simulation results of IMs in the microgrid fault study. IM1 to IM3 three-
phase currents are shown in the first to third row, respectively. Fourth and
fifth row are respectively the electromagnetic torque and the mechanical
speed of the IMs. The left column includes the results from FPGA-based
the reference method, using PSIM with the time-step of 2.6µs. . . . . . . 127
B.1 Tower Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

B.2 Results from fitting of characteristic admittance Y c (s). Top is Y c (1, 1)(s),
middle Y c (1, 2)(s) and bottom Y c (1, 3)(s). . . . . . . . . . . . . . . . . . 136
B.3 Results from fitting of mode functions. Top is h1 (s), middle h2 (s) and
bottom h3 (s). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
B.4 Results from fitting of propagation matrix H (s). Top is H (1, 1)(s), middle
H (1, 2)(s) and bottom H (1, 3)(s). . . . . . . . . . . . . . . . . . . . . . 138
xv
List of Abbreviations
AB: Adam Bashforth
ADC: Associate Discrete Circuit
BE: Backward Euler
BRAM: Block RAM.
CHIL: Control Hardware-In-the-Loop
CLB: Configurable Logic Blocks
CPU: Central Processing Unit
CPVBR: Constant-Parameter-Voltage-Behind-Reactance
DER: Distributed Energy Resource
DRTS: Digital Real-Time Simulator
DSP: Digital Signal Processing
EM: Electrical Machine
EMF: Electromagnetic Force
EMT: Electromagnetic Transients
FE: Forward Euler
FEM: Finite Element Model
FIFO: First In First Out
FPGA: Field Programmable Gate Array
GPU: Graphical Processing Unit
HDL: Hardware Description Language
HF: High Frequency
HIL: Hardware-In-the-Loop
HLS: High-Level Synthesis
HPC: High Performance Computing
HuT: Hardware under Test
IHULM: Inter-Hardware Universal Line Model
IM: Induction Machine
LMS: Linear Multi-Step
LUT: Look-Up Tables
MB: MicroBlaze
MMC: Modular Multilevel Converter
MNA Modified Nodal Analysis
MUX: Multiplexer
NA: Nodal Analysis
PCI-E Peripheral Component Interconnect Express
xvi
PD: Phase Doamin
PEC: Power Electronic Converter
PHIL: Power-Hardware-In-the-Loop
PMSM: Permanent Magnet Synchronous Machine
PU: Process Unit
PV: Photovoltaic
QD: Quadrature-Direct
RMNA: Reformulated Modified Nodal Analysis
SIL: Software-In-the-Loop
SM: Synchronous Machine
SoC: System on Chip
TR: Trapezoidal
TS: Transient Stability
ULM: Universal Line Model
VBR: Voltage-Behind-Reactance
xvii
Chapter 1
Introduction
1.1 Background
Analysis of power system transients is a required step for design verification and perfor-
mance evaluation of power systems. Power system transients are divided into electromechanical-
and Electromagnetic Transients (EMTs). EMTs are changes in voltages and currents due
to small-signal and large-signal disturbances in the power system and are characterized
by their time-scales and/or natural frequencies of current and/or voltage waveforms.
Power system EMTs cover a wide frequency range, e.g., 0-2MHz, and different studies
target different span of this range. Control/protection related studies need to consider
frequencies from approximately 0.1Hz up to several tens of kHz.
Time-domain simulation is the most convenient way of studying EMTs for power
systems since analytical solution of the power system nonlinear equations is practically
impossible. Real-time simulation of EMTs is a class of time-domain simulation in which
the (voltage/current) waveforms during the study time-frame are reproduced at the same
speed as the actual phenomenon occurs.
Real-time simulation of the power system is a requirement for Hardware-In-the-Loop
(HIL) simulation where actual control/protection hardware platform and/or power de-
vices need to be tested before being installed in the power system. Emerging trends in
power systems, e.g., integration of renewable resources and energy storage systems into
the power system through Power Electronic Converters (PECs) have introduced many
challenges in real-time simulation, e.g., the need for small simulation time-step.
The Field Programmable Gate Array (FPGA) technology presents features to over-
come such challenges. This work aims to develop a real-time FPGA-based simulator
platform for real-time simulation with emphasis on the characteristics of emerging power
systems which embed large number of high switching frequency PECs. This Chapter
explains the basic concepts and highlights the existing shortcomings and methodology to
1
CHAPTER 1. INTRODUCTION
address them.
The remainder of this chapter is as follows. Section 1.2 elaborates on the statement of
the problem. First, power system transient analysis is briefly explained and simulation of
EMTs is described. Next, different types of real-time simulation of EMTs are explained.
Then, the characteristics/properties of the existing real-time simulators are summarized.
At the end of this section, the motivation for this thesis is presented. Section 1.3 explains
the research objectives based on the identified limitations. Section 1.4 provides the thesis
outline.
1.2 Statement of the Problem

Power systems are continuously subjected to different types of disturbances including
load changes, faults and apparatus outage. Following a disturbance inception, the power
system is expected to retain stability and settle to a viable steady state operating point.
During this transition, the power system might be subjected to excessive stress, e.g.,
overcurrents and/or overvoltages. Therefore, analyses of power system transients are
essential for design and verification of control and protection schemes, designing new
power systems, determining power system apparatus ratings and identifying cause of
failures.
1.2.1 Power System Analysis

Power system analysis is divided into static analysis and dynamic analysis. In static
power system analysis, the algebraic equations determining the steady-state operating
point of the power system are solved. Static analysis also can be used to analyse and
identify slow-to-develop dynamics, e.g., voltage (in)stability phenomenon (typically tens
of minutes) [5].
Power system dynamic analysis are used to study system transients which cover phe-
nomena in a wide frequency range. Fig 1.1 shows the time scale of different power system
transients. The time scale of these phenomena is from several nanoseconds, e.g., light-
ning transients, to hours for long-term transients. In a power system transient study, the
duration of the transient phenomenon determines the time-frame of the study and the
mathematical models needed to represent the power system components.
Power system transients are divided into electromechanical transients and electromag-
netic transients. The former is related to the energy exchanges between the mechanical
system of rotating machines and the electrical energy of the electrical network at frequen-
cies approximately within the bandwidth of 0.1 to 2Hz. The latter covers other types of
2
Long-term Dynamics
Transient Stability
Subsynchronous resonance
Switching
Lightning
10-7 10-5 10-3 10-1 101 103

Time (s)
Figure 1.1. Power System Transients Time-Scale.
transients under the umbrella of EMTs [6]. This thesis focuses on electromagnetic tran-
sients that are considered in control/protection related studies and are at frequencies
approximately within the bandwidth of 0.1Hz up to several tens of kHz.
The electromechanical transient studies are usually referred to as Transient Stability
(TS) studies, although TS is a subcategory of electromechanical transients. TS primarily
deals with angle stability of the power system which is the ability of the power system
to preserve synchronous operation of synchronous generators. For TS studies the power
system model is combination of algebraic and differential equations. Since in this class of
dynamics the frequency deviations are small, the phasor concept is used to represent the
power network by algebraic equations. Moreover, per-phase representation is adopted.
There is a large number of commercially available TS simulation software packages, e.g.,
PSS/E and DigSilent.
In contrast to TS analysis, for EMTs, each power system component is represented by
differential equations. For instance, long transmission lines are represented by frequency
dependent models and each PEC is modeled as a repetitive switching apparatus. Thus
instantaneous values of the system variables are obtained by solving the set of differen-
tial/algebraic equations representing the overall power system. The system model for
EMT studies is dominated by differential equations.
1.2.2 Analyses of EMTs in Power System

The waveforms of different EMT phenomena have different time-scales, therefore, each
EMT phenomenon can be characterized by the frequency bandwidth of its waveform.
This determines the appropriate mathematical model and time frame of the EMT study.
Each time-scale requires specific class of mathematical models representing power system
components for the corresponding frequency range. For instance, in case of lightning the
3
time-scale of the phenomenon is short and parasitic elements are of significant importance
and must be considered in the model [6].
Verifying performance of control and protection algorithms is one of the most impor-
tant applications of EMT analysis methods. In this context, instantaneous voltages and
currents are required to determine overvoltage and overcurrent stresses. Moreover, the
system behaviour in several cycles after the transient inception is required to investigate
the system recovery. The time-frame of such studies in the presence of PECs is within
tens of nanoseconds to several tens of milliseconds. This time-frame requires appropriate
component models to properly capture the associated physical phenomena.
Since analytical solution of the differential algebraic equations of a realistic-size power
system for the EMT study is practically impossible, therefore, the equations are solved
based on numerical approaches. This requires the component models to be discretized
using a numerical integration method. This approach is known as digital time-domain
simulation method and hereinafter we simply refer to it as simulation. Selection of
simulation time-step depends on the frequency of the phenomenon under investigation.
The simulation time-step to verify performance of control and protection algorithms is
approximately in the order of 0.5 to 50 microseconds.
If the actual elapsed time for performing the simulation at each time-step is longer
than the time-step, the simulation inherently only can be carried out in an off-line man-
ner. Otherwise it can be conducted in real-time. There are several commercially available
offline EMT simulation software packages, i.e., PSCAD, EMTP-RV, DigSilent and Sim-
PowerSystem. In addition to tailored software environment, real-time simulation also
requires specialized hardware platform. Real-time simulation and the existing real-time
simulation hardware and software platforms and their limitations are briefly explained
in the following sections.
1.2.3 Real-Time Simulation of EMTs

Based on the existing processor technologies, real-time simulation requires specialized
computation platform which includes both dedicated software environment and hardware
platform and we refer to it as Digital Real-Time Simulator (DRTS).
To simulate power system EMTs, in addition to the power system apparatus model,
the control/protection model must be numerically solved. If the overall system model,
i.e., power circuits and control/protection models are solved by the DRTS, then the
simulation platform is referred to as Software-In-the-Loop (SIL) platform. The main
application of real-time simulation, as of now, has been to test performances of actual
control/protection hardware-based systems, i.e., the Hardware under Test (HuT) [7]. In
such a study the rest of the power system model, except the HuT, is solved in DRTS
4
and interfaced to the HuT by a signal conditioning hardware and the overall platform
called Control Hardware-In-the-Loop (CHIL). Fig. 1.2 shows the CHIL block diagram.
For CHIL tests the signal conditioning hardware mostly consists of Analog to Digital
converter (A/D) and Digital to Analog converter (D/A). However, additional amplifiers
might be needed in some CHIL applications, e.g., for protection relays that operate at
high current/voltage levels.
Digital Real-Time Signal Conditioning Hardware under Test

Simulator Hardware
DRTS HuT
Figure 1.2. Block diagram of Hardware-In-The-Loop (HIL) platform.
Performance evaluation of power apparatus, e.g., a PEC, also can be carried out
by DRTS particularly in view of the advent of new converter topologies/configurations
/applications and interfacing DRTS and power apparatus is gaining more attention. In
this scenario, the HuT is a power apparatus and the simulation platform is referred
to as Power-Hardware-In-the-Loop (PHIL) platform. For PHIL simulations, the signal
conditioning hardware is current and/or voltage amplifiers. It should be noted that in
a PHIL simulation the amplifiers should be able to sink or source power since there is a
real power exchange between the amplifier and the HuT.
To conduct all types of real-time simulation, i.e., SIL, CHIL and PHIL, the DRTS
must be capable of solving the overall system mathematical model, including the power
system models and the control/protection models. The solution of power system models is
more challenging than the solution of the control/protection models and the limitations
of the existing DRTSs is primarily in solution of the power system models. In this
thesis, a high performance DRTS architecture is devised to address the limitations of the
existing DRTS technologies. The envisioned DRTS has the capability to cooperate with
an existing real-time simulation platform to use its capability to solve control/protection
models and its signal conditioning hardware. In the following sections the existing DRTSs
are briefly explained and their limitations, particularly for the emerging power systems,
are highlighted.
5
1.2.4 Existing Digital Real-Time Simulator (DRTS) platforms

Real-Time Digital Simulator (RTDS) [8] is the first commercially available DRTS which
uses a customised computation hardware. An arbitrary number of RTDS racks can be
connected to achieve the desired computational power for a targeted study systems. Each
rack can accommodate up to 6 CPU cards. RSCAD software package [8] is used to create
the power system model based on Dommel’s EMT algorithm [9] using nodal formulation.
RSCAD includes two real-time power system model solvers, i.e., large time-step solver
with the time-step of 50µs and small time-step solver with the time-step of around 1−4µs.
The 50µs solver is used for simulation of traditional power system case studies while the
small time-step solver is mainly used for simulation of switch-mode PECs [10]. The
small time-step solver is based on the Associate Discrete Circuit (ADC) power switch
model [11].
The power system co-simulation on FPGA and RTDS is developed for simulation of
high switching frequency PECs [12]. To enable co-simulation on two hardware platforms,
a fictitious Bergeron transmission line model is used to interface each converter to the host
power system. Since this transmission line model is accurate only for the fundamental
system frequency [13], the method is prone to numerical oscillations.
Another commercially available real-time simulator platform is developed by Opal-RT
and uses off-the-shelf multicore CPU platforms [14]. eMEGAsim is the simulator soft-
ware developed by Opal-RT. It is based on ARTEMiS-SSN formulation which combines
nodal and state-space formulations to leverage advantages of both formulations [15] [16].
Hypersim which is the real-time simulator software developed by IREQ also can run
on the same hardware platform as eMEGAsim and uses nodal formulation method [17].
eMEGAsim and Hypersim simulators are mainly used for real-time simulation of classical
power systems.
The eHS FPGA-based solver is also developed by OPAL-RT for real-time simulation
of PECs and EMs [18]. Off-the-shelf FPGA boards are used as the hardware platform
for this solver. The solver also models PECs based on the ADC approach. eHS is de-
signed as a general-purpose FPGA-based solver using state-space formulation. It converts
the discrete-time nodal formulation of the power switches and RLC components into a
discrete-time state-space formulation.
Another commercially available real-time simulator platform is Plexim [19]. Plexim
is developed based on the PLECS simulation software package which is a toolbar for
MATLAB Simulink [20]. Plexim uses Zynq System on Chip (SoC) boards from Xilinx [2].
Zynq family consists of a processor part and an FPGA part. Plexim uses the processor
part as DRTS and the FPGA part for handling IO signals for HIL applications and
conceptually is a CPU-based DRTS.
6
The other simulators described in the following parts are the research-oriented devel-
opments which are not commercially used.
• The developments in [21] and [22] have achieved tens of nanoseconds computa-
tion time for simulation of Electrical Machiens (EMs) and PECs, based on the
ADC modeling approach for power switches. These developments are based on
fixed-point data representation. Fixed-point arithmetic does not provide adequate
accuracy especially for large study systems [10] [18].
• For motor drive applications there are several FPGA-based DRTS trials. In this
category, switching loss effect of the power semiconductors with linear approxima-
tion [23] or lookup tables, obtained based on actual experimental results of the
switch devices [24] are included in the simulation. The work done in [25] imple-
ments the EM model with different numerical representation is direct-quadrature
reference frame.
• There are also dedicated FGPA-based simulator algorithms to simulate specific

components, e.g., the Modular Multilevel Converter (MMC) [26], [27]. These algo-
rithms use the partitioning method proposed by [28] which suits well for the MMC
topology.
• The FPGA-based DRTS in [29] implements partitioning algorithm for PEC simula-
tion. It utilizes customized floating-point representation to achieve small simulation
time-step. This has been applied to a case study which includes one PEC.
• References [30] and [31] provide FPGA-based platform for the transmission line
model and the universal EM model, however, no PEC representation is considered.
1.2.5 Motivation of this Work

There is an ongoing transition toward integration of renewable resources into power
systems using PECs, e.g., the system of Fig. 1.3 which illustrates the CIGRE benchmark
[32] for renewable energy integration for the medium-voltage distribution network. The
system of Fig. 1.3 includes multiple PECs operating close to each other. This trend makes
(i) the future power system larger in terms of number of nodes and components and (ii)
more complicated for EMT simulation studies due to the larger number of differential-
algebraic equations and large number of repetitive-switching devices. Such power systems
also often include multiple EMs. Moreover, it can be noticed that the transmission lines
are very short and therefore travelling wave transmission line models cannot be used in
7
Primary Transmission HV system
220/20kV 220/20kV 1
1
2
2.8 km
2
4.9km
4.4 km
3
0.6 km 1
3
4 8
3.0km
9 1
0.6 km CHP 4
Fuel 2.0km
1 Cell
1
0.3 km
7
10
5 PEC
transformer
0.2 km bus
1.5 km Fuel Fuel 6
Cell Cell load
Figure 1.3. CIGRE benchmark system for renewable energy integration.
the correspondng simulation studies to partition the system and reduce the simulation
burden [13].
Furthermore, CHIL real-time simulation study is becoming an indispensable and es-
sential step to verify performance of control/protection algorithms prior to installation
in the power system, e.g., in the system of Fig. 1.3. The reason is that emerging control
and protection algorithms are more complicated than those of the established classical
power systems. Thus, to perform real-time simulation of power systems which include
switch-mode PECs, new requirements are imposed on DRTS platforms. These include:
• The need for a small simulation time-step to simulate switching transients of PECs.
Considering the switching frequency of the PECs in power system, e.g., 10 kHz, µs
range time-step is appropriate.
• Computational capability to accurately simulate multiple PECs.
• Proper emulation of the EMs under different test scenarios and system configura-
tions with a small time-step. Classical Quadrature-Direct (QD) EM models fail to
8
efficiently emulate the EM behaviour in all test scenarios, e.g., rectifier mode of a
drive system [33].
• Capability to simulate systems with large number of nodes. The emerging power
systems, particularly distribution voltage-class systems, feature short transmission
lines which do not lend themselves to the travelling wave models. This results in
one set, rather than multiple partitioned sets, of equations describing the power
system model for EMT studies.
Commercial real-time simulation platforms, e.g., RTDS, use FPGAs for input/output
signal conditioning while Central Processing Unit (CPU) cards for computation [8] which
is a proven and mature approach for simulation of classical power systems. However, it
has major limitations to simulate emerging power systems at the presence of a large
number of switch-mode PECs due to the requirement for small simulation time-step,
e.g., 1µs. The reason is that the communication delay between the processing units is
usually larger than this time-step.
Geraphical Processing Unit (GPU) has also been used for high performance compu-
tation in power system studies, e.g, TS simulation [34]. However, the latency of the
Peripheral Component Interconnect Express (PCI-E) link of the GPU is larger than the
required time-step for simulation of emerging power systems. Therefore, GPU is not
considered as a computation device for DRTS applications.
Recently, FPGA has received significant attention for DRTS applications. FPGA pro-
vides reconfigurable hardware programming which enables development of application-
specific parallel computation engines. Moreover, all the data storage and computation is
done in one chip which enables use of small simulation time-steps.
FPGA features have resulted in multiple attempts for development of FPGA-based
DRTS to simulate power systems embedding PECs and EMs. However, such develop-
ments suffer from a set of basic shortcomings, as follows:
• It is not computationally feasible to change and solve new admittance matrix of
study system for every switching event, particularly when the simulation time-
step is small. Thus most real-time simulators use ADC approach that represent
a switch as fixed admittance configuration with a small capacitor or inductor for
on-off operation, respectively [10] [21] [35]. This leads to i) numerical oscillations in
simulation results and ii) unacceptably large switching loss, e.g., 3.7% of nominal
power for a 21-Pulse STATCOM with 1.4µs simulation time-step [10].
• All developments use classical direct-quadrature reference frame for EMs. This
model exhibits deficiency for simulating EMs for specific case studies and/or test
scenarios [33].
9
• The existing FPGA-based developments are dependent on system configuration and

require changing the bitstream of the FPGA for every change in the configuration of
the study system [36] [37]. Changing the bitstream necessitates hardware code re-
compilation which is a time consuming task. This can take several tens of minutes
to hours depending on the size of the code. It should be noted that eHS solver
in [18] solves networks with switching elements independent of the topology based
on ADC model for power switches and tableau approach [38] for forming network
equations. This results in larger matrix dimension and thus extra FPGA hardware
resource consumption.
• The existing FPGA-based DRTSs do not target power systems that include a large
number of PECs and EMs. Reference [31] reports simulation of a large classical
power system in real-time, however, it does not include PECs.
Considering these limitations of the existing DRTS development efforts, one can con-
clude that the existing models and implementations lack the ability to accurately and
computationally-efficiently simulate the emerging power systems. Therefore, there is a
need to enhance both modeling approaches and FPGA-based implementation of the mod-
els. The main focus of this work is to address these shortcomings of the FPGA-based
DRTS, i.e., (i) development of PEC model for FPGA-based implementation, (ii) devel-
opment of proper FPGA-based EM models, and (iii) development of the FPGA-based
real-time simulator platform for the emerging power systems in the context of CHIL.
1.3 Research Objectives

Considering the aforementioned limitations of the existing DRTS simulators for power
systems, objectives of this work are as follows:
• Development of an FPGA-based DRTS platform for the power systems that include
multiple PECs and classical components. This requires:
– Presenting an efficient FPGA-based general nodal network solver that can

handle different system topologies independent of the connectivity and device
parameters with efficient FPGA resources utilization.
– Developing a proper partitioning method to enable parallel processing for PEC
models.
– Devising a general model for the PECs to enable simulation of multiple con-
verters
10
• Developing an FPGA-based EM model capable of emulating the machine behaviour

under all practical/possible operational scenarios based on the use of small simu-
lation time-step.
• Implementing the final design of the FPGA-based real-time simulator in a hardware

platform.
• FPGA-based implementation of transmission line model to interface the FPGA-

based DRTS with the production-grade RTDS environment.
1.4 Thesis Outline

This thesis contains seven chapters including this introductory chapter. The next chap-
ters are organized as follows:
Chapter 2: FPGA-Based Computation
This chapter presents a targeted introduction to FPGA-based high performance com-
puting. Different options for numerical representation are discussed. The parallelism ex-
ploitation techniques and the best practice to utilize them for this research are explained.
Moreover, a proper FPGA programming method to enable short development time and
efficient design is introduced.
Chapter 3: FPGA-Based Solution of Network Equations for EMT Simulation
This chapter presents an FPGA-based computation engine for solving power system
network equations based on floating-point numerical representation. This chapter intro-
duces a reformulation of the nodal equations to enable small time-step simulation of large
systems while resource-efficient implementation is adopted.
Chapter 4: FPGA-Based Simulation of Power Electronic Converters (PECs)
for EMT Analysis
This chapter elaborates on different real-time and off-line PEC models. It presents a
novel partitioning method to separate the solution of PEC model from that of the rest of
the system. It also shows that equivalent circuit, seen by the PEC, can be calculated in
parallel with the computation engine of chapter 3. An FPGA-based PEC module based
on the proposed method is designed and added to the computation engine of Chapter 3.
Simulation case studies are carried out to evaluate performance of the proposed method
and its implementation.
Chapter 5: FPGA-Based Simulation of Electrical Machines (EMs) for EMT
Analysis
This chapter elaborates on different EM models. A general EM model that can
represent different types of EMs based on Voltage-Behind-Reactance models is proposed.
11
A discretized model is deduced and the method to enable parallel processing of the EM
simulation task with other tasks in the computation engine is presented. An FPGA-based
EM module based on the proposed method is designed. Based on enhancement of the
computation engine with this module, simulation test cases are carried out to evaluate
performance of the proposed model and its implementation.
Chapter 6: Hardware Platform for Real-time Simulation of the Emerging
Power Systems
This chapter presents the hardware platform of the real-time simulator including
the FPGA-based computation engine and RTDS. This chapter also explains the ap-
proach, based on the use of transmission line model, to enable simulation of the emerging
power systems on the FPGA-based computation engine and the classical ones on RTDS.
Simulation-based case studies are carried out to evaluate the performance of the proposed
real-time simulator.
Chapter 7: Conclusions and Future Work
This chapter summarizes the work, highlights the conclusions, identifies contributions
and provides suggestions for future research.
12
Chapter 2
FPGA-Based High Performance

Computing
2.1 Introduction
The Field Programmable Gate Array (FPGA) was originally developed for digital design
prototyping purposes in late 1980s. Designs could be prototyped and debugged before
mass production. As the FPGA technology improved it became the popular comput-
ing device for applications to various fields, e.g., communication. However, in most of
the computational-intensive applications CPU technology was the main choice until it
was revealed that the clock speed of the processor would not be significantly increased
due to power consumption considerations. Since transistor count increase is achievable,
parallel architecture became the solution to increase computational power. FPGA gener-
ally provides a large amount of reconfigurable resources that can be configured to create
application-specific parallel computational engines to achieve performance goals that can
not be achieved by other parallel processing technologies, e.g., multi-core coprocessors.
This makes FPGA the main computation engine in many High Performance Computing
(HPC) applications [39]. Most of these applications feature low latency and high com-
putation power requirement. Power system and power electronics real-time simulation
applications share the same features [18] [40].
Unlike a processor, in which architecture of the Arithmetic Logic Unit (ALU) is fixed
and designed in a general-purpose manner to execute various operations, the FPGA
hardware resources can be configured to perform just the operations needed by the specific
application. This flexibility results in increased performance, however, it comes at the
cost of FPGA resource utilization. Thus an implementation of a desired algorithm is
evaluated based on both performance and resource utilization.
13
CHAPTER 2. FPGA-BASED HIGH PERFORMANCE COMPUTING
In this chapter FPGA resources and FPGA-based techniques to exploit parallelism

in an algorithm to enable its high performance implementation are described. Section
2.2 briefly explains the FPGA hardware resources. In section 2.3 different numerical
representation are discussed and appropriate choice for DRTS applications is introduced.
Section 2.4 briefly reviews different FPGA-based design techniques to exploit parallelism.
Moreover, multiple designs of the floating-point accumulator and their performances are
presented. The design of the floating-point accumulator is a key factor in the performance
and resource utilization of the DRTS implementation, as described in chapter 3. Section
2.5 briefly describes different FPGA programming methods and the approach taken in
this work. Section 2.6 summarizes this chapter.
2.2 FPGA Hardware Resources
Different FPGA families from different vendors provide different resources available on
the FPGA chip. For example Virtex-7 is a family of Xilinx FPGAs accommodating
large amount of resources which are usually used for compute-intensive purposes. In this
work Xilinx’s VC7VX485T chip from this family is selected, therefore, the resources of
Virtex-7 family is explained in this chapter. The main FPGA resources are Configurable
Logic Blocks (CLBs), Digital Signal Processing (DSP) blocks and Block RAMs (BRAMs).
CLBs and DSPs, similar to a processor’s ALU, can be programmed to perform arithmetic
and logic operations including integer add, multiply, subtract and compare. Utilization
of BRAMs, CLBs and DSPs and performance indices, e.g., latency, are used to evaluate
the implementations of an algorithm. These three type resources are briefly described as
follows.
2.2.1 Configurable Logic Blocks (CLBs)
CLBs are the main FPGA resources for implementing sequential and combinational logic.
Fig. 2.1 illustrates the CLB arrangement in the FPGA. Each CLB element is connected
to a switch matrix for access to the general routing matrix. This provides routing to
the other FPGA resources. In addition, each CLB is connected to the other CLBs
through carry chain which is designed for implementing arithmetic logic functions with
high speed performance due to its dedicated hardware. A CLB element contains a pair
of slices. Every slice contains four 6-input look-up tables (LUT), eight storage elements
or Flip-Flops (FFs), wide-function multiplexers and carry logic.
14
Cout
CLB
Clock
Slice
Switch Matrix
Switch Matrix
Cin
Slice
Data
Data
Figure 2.1. CLB resource in Virtex-7 family.
2.2.2 DSP Resources

Digital signal processing (DSP) units are hardened units on the FPGA that can be
configured to execute variety of arithmetic and logic functions including: multiply, add
and compare. The DSP slices in Virtex 7 family are called DSP48E1. The functional
diagram of this DSP is shown in Fig. 2.2. In HPC applications the DSP units are
mainly used to carry out arithmetic functions, e.g., add and multiply. The DSP slices
are originally used for implementing fixed-point arithmetic functions, however, floating-
point arithmetic functions can be carried out using floating-point IP cores which use
CLBs and DSPs.
A
+
B P
25*18
Multiplier
+ =
C
Pre-adder
Figure 2.2. Basic DSP48E1 Slice Diagram
2.2.3 Memory resources

Almost every FPGA-based design requires the use of internal memory resources for va-
riety of purposes including storage of coefficients, buffering of data and finite state ma-
chines. BRAM resources on the FPGA are mainly used as data storage and also as First
15
In First Outs (FIFOs). Fig. 2.3 depicts the Xilinx FPGA block RAM. Each block can
have up to two ports which share the same data memory. Each BRAM contains 36Kb
of memory and can be segmented into two 18Kb blocks. This is the smallest block that
can be achieved. Larger blocks are obtained by simply cascading multiple BRAMs. Each
block can be either configured as dual port or single port RAM. Table 2.1 contains the
number of BRAM resources as well as other resources available on the VC7VX485T chip.
Data In
Address
Port A Data Out
Write Enable
Clock
Memory
Data In
Address
Port B Data Out
Write Enable
Clock
Figure 2.3. Block RAM
Table 2.1
Hardware Rsources available on VC7VX485T
Slices DSP Slices BRAM 36 Kb

75,900 2,800 1,030
2.3 Numerical Representations

In a digital system each number is stored as a string of bits. These bits represent
the number value according to an arrangement which is called numerical representation.
There are two major categories of numerical representation in digital systems, i.e., binary
fixed-point and binary floating-point representations. Hereinafter, these two are simply
referred to as fixed-point and floating-point.
The value of a fixed-point number is an integer value inferred by the two’s com-
plement representation scaled by a factor that is a power of 2. Fig. 2.4(a) depicts a
fixed-point representation. The scaling factor is 2−N in which N is the number of bits
in the fraction part. Fixed-point representation is mainly used in low-cost computers
due to simple hardware implementation of arithmetic functions based on this represen-
tation. However, it fails to accurately represent real numbers and is not widely used for
computation purposes. As an alternative to fixed-point representation with fixed range
and precision, floating-point representation offers a trade-off between range and preci-
sion. Floating-point representation is capable of representing very large and very small
numbers. 32-bit single-precision and 64-bit double-precision are the two main standard
16
floating-point representations according to IEEE-754. The floating-point representation

is depicted in Fig. 2.4 (b). The value of the number in a floating-point representation is:
p
X
(−1)S × (1 + biti × 2−i ) × 2e , (2.1)
i=1
where biti is the ith bit of the fraction, p is the number of bits in the fraction and
e = Exponent − 2N −1 , (2.2)
where N in (2.2) is the number of bits in the Exponent.
MSB N bits LSB

(a) Fraction
Single: 8bits Single: 23bits

Sign:1 bit Double:11 bits Double:52 bits
(b) S Exponent Fraction
Figure 2.4. Numerical Representations (a) Fixed-Point, (b) Floating-Point
The FPGA was originally used to implement fixed-point algorithms. With the im-
provements of the Computer Aided Design (CAD) tools for FPGA programming and
increase in the hardware resources available on FPGA chips, implementation of floating-
point arithmetic on FPGA is becoming of more interest. However, understanding of the
floating-point arithmetic IP cores available from the FPGA vendors is essential for an
efficient compute-intensive design. This is due to the fact that inefficient utilization of
these IP cores will consume large amount of the FPGA resources and therefore limits the
performance.
Fig. 2.5 depicts the resource utilization and maximum clock frequency of the single
and double precision floating-point adder for Xilinx Virtex 7 family as a function of
latency [2]. Latency is the number of clock cycles it takes for the IP core to produce the
results. This is equal to the number of register stages in the data path which is referred
to as number of pipeline stages. The lines with circle marker are for double precision
adder IP core while the ones with square marker are for single precision adder IP core.
One can observe that the more the number of pipeline stages is, the more the maximum
clock frequency will be. Also more pipeline stages result in more FF utilization, however,
LUT utilization is nearly unchanged for IPs with different latencies. One can conclude
that double precision adder consumes approximately three times more LUTs and FFs
than a single-precision adder.
17
In real-time simulation of power systems it is recommended to avoid fixed-point arith-

metic [10] [18] . This is mainly due to the fact that small integration time-step results
in very large and very small matrix entries requiring dynamic numerical range. In this
work single precision floating-point representation is used. The reason is that it provides
adequate dynamic range and precision for the envisioned application while consuming
reasonable amount of FPGA resources.
1000
Speed(MHz)
900 FFs
LUTs
800
700
600
Resource
500
400
300
200
100
0
2 4 6 8 10 12 14
Latency(clock cycles)
Figure 2.5. Resource utilization of floating-point adder IP core optimized for speed [2]
2.4 Parallelism Exploitation Techniques

The FPGA technology is chosen to achieve a desired performance that is not achievable
using other computing technologies. However, as mentioned earlier, performing floating-
point arithmetic on FPGA requires large amount of FPGA resources, therefore, it is
necessary to design efficient application-specific digital hardware to reach the performance
goals with the available resources. To enable real-time simulation, it is necessary to
maximize the performance of the execution of the simulation algorithms. The approach
to do so is to exploit the parallelism existing in these algorithms. Moreover, it might be
necessary to reformulate the existing algorithms or propose new ones to create parallelism.
In an FPGA-based computation, each task in the algorithm is executed by a specific
hardware unit. Pipelining and unrolling are the techniques that improve the performance
of the hardware by exploiting the parallelism inside the corresponding task. These tech-
niques are as follows.
18
2.4.1 Unrolling
Unrolling a parallel algorithm in the context of FPGA-based HPC is to create multiple
instances of the same hardware unit and assign the same task to each hardware unit.
For instance, consider a floating-point vector addition of two vectors of length l. In a
completely unrolled case, the same number of floating-point adders can be instantiated.
Consider all the floating-point adder cores are configured to have 6 clock cycles latency.
Therefore, the latency of the completely unrolled vector addition will be 6 cycles. How-
ever, l times the FPGA resources of a floating-point adder IP core is required.
2.4.2 Pipelining
Pipelining is a technique to divide an algorithm into sequential steps, with each step
being performed in a special dedicated segment that operates concurrently with the other
segments. Each segment executes a part of the overall algorithm and passes the data
to the next one. The algorithm is finished when data is outputted by the last segment.
This enables the first segment to input a data set while the other data sets are being
processed in the middle segments.
Floating-point arithmetic IP core provided by FPGA vendors are pipelined. Consider
the vector addition example using one adder IP core. Each addition takes 6 clock cycles
to finish. Fig. 2.6(a) shows the timing of a simple vector addition of a + b = c, using one
floating-point adder IP instance. The hardware waits until each addition is completed
to start the next one. This approach is not efficient and doesn’t take advantage of the
existing parallelism in the vector addition and pipelining capability of the hardware.
Using a pipelined design, two vectors of length l can be added more efficiently. The
timing of the pipelined design is shown in Fig. 2.6(b). At each clock cycle a new data
set is inputted by the adder and after 6 cycles the corresponding addition result is ready.
The latency in this case is 6 + l. The increase in performance obtained from pipelining is
approximately proportional to the number of pipeline stages. However, this holds only
if data moves through the pipeline stages without any stalling. This is referred to as
stall-free pipeline.
Pipelining and unrolling are methods to exploit the parallelism in an algorithm which
can be respectively limited by data dependency between the tasks and the FPGA resource
limitation. Data dependency implies that each task cannot be started until the other task
has finished. An algorithms with data dependency might require to be reformulated to
overcome its anti-parallel pattern. For instance, there is data dependency in the floating-
point accumulation of a vector since each addition is not started until the previous
addition has been completed. This can be solved by reformulating the algorithm as
19
Clock
Input 1 a0 a1 a2 al
Input 2 b0 b1 b2 bl
6 clock cycles
Results c0 c1 cl
(a)
Clock
Input 1 a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 al-6 al-5 al-4 al-3 al-2 al-1
Input 2 b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 b11 bl-6 bl-5 bl-4 bl-3 bl-2 bl-1
6 clock cycles
Results c0 c1 c2 c3 c4 c5 cl-12 cl-11 cl-10 cl-9 cl-8 cl-7 cl-6 cl-5 cl-4 cl-3 cl-2 cl-1
(b)
Figure 2.6. Timing of floating-point vector addition (a) without and (b) with pipelining.
follows.
2.4.3 Floating-Point Accumulation

Most of the developments done in the area of FPGA-based small time-step real-time sim-
ulation of power systems either use fixed point representation or floating point numbers
with fixed point accumulation [21] [25] [31] . Custom floating-point representation has
been also implemented to compromise the accuracy to reach higher speeds for more time-
consuming simulation algorithms [29]. However, proper reformulation removes the data
dependency and enables pipelining to achieve low latencies using standard floating point
adder IP. Floating-point accumulator is used in many algorithms, e.g., matrix-vector
multiplication. It will be shown in the next chapter that matrix-vector multiplication
module is the key module that determines both performance and resource utilization of
the FPGA-based DRTS.
Vector accumulation is expressed by
l−1
X
Acc = x[i], (2.3)
i=0
where x is the input vector and Acc is the accumulation result. The drawback is that
it takes a certain number of clock cycles for the floating-point adder IP core to execute
each addition while the next addition requires the results from the previous ones. This
20
dependency prevents parallel execution of additions using the same IP core.

Consider an adder IP with 6 pipeline stages. The input vector is divided into 6 sub-
vectors each with the length of l/6. The number of sub-vectors should be equal or greater
than the number of the adder pipeline stages. The accumulation can be expressed by
5
X
Acc = P acc[j], (2.4)
j=0
where P acc[j]s are called partial accumulations and obtained by
(l/6)−1
X
P acc[j] = x[6i + j]. (2.5)
i=0
This approach divides the accumulation process into two steps, first is the calculation of
six partial accumulations and second is the calculation of the final results by adding up
the partial accumulations. The first step can be executed in parallel with throughput of 1
data per clock cycle. Fig. 2.7 depicts the timing of this step. Every each six clock cycles,
the new data corresponding to each sub-vector is inputted. Based on this method in every
clock cycle that a new x[i] is inputted by the adder IP core, the corresponding P acc[j]
is ready to be added to it. Therefore, the step of calculating the partial accumulations
takes 6 × (l/6) = l cycles. The next step is to accumulate P acc[j]s. It takes 22 clock
cycles for the adder IP core to finish the second step. Therefore, the accumulation of a
vector of length l takes l + 22 clock cycles using one adder IP instance with 6 clock cycles
latency.
Clock
Input 1 x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 xl-6 xl-5 xl-4 xl-3 xl-2 xl-1
Input 2
Pacc0 Pacc5 Pacc0 Pacc5 Pacc0 Pacc5
Results l-6 clock cycles Pacc0 Pacc1 Pacc2 Pacc3 Pacc4 Pacc5
Figure 2.7. Timing diagram for calculating partial accumulations
Other designs are possible to enable trade-off between performance and resource uti-
lization. Consider the case where two vector accumulations are carried out using one
floating-point adder IP core. The accumulations can be expressed by
l−1
X l−1
X
Acc1 = x1 [i] , Acc2 = x2 [i]. (2.6)
i=0 i=0
21
The same reformulation can be applied by dividing the algorithm into two steps. The
first step, i.e., calculation of the six partial accumulation terms for the two vectors, is
equivalent to calculating six partial accumulation of a vector x of length 2l given by
x[2i] = x1 [i], x[2i + 1] = x2 [i], i = 0, .., l − 1. (2.7)
However, in the second step for each output, only three partial accumulations are added
to produce Acc1 and Acc2 . It takes 14 clock cycles for the adder IP core to finish the
second part. This results in total latency of 2l + 14 clock cycle. Table 2.2 contains
the latency and number of adder IP cores per vector accumulation for different designs.
The first design has the latency of 6l and every adder IP core can be used to execute
six vector accumulations. This design compromises speed to save hardware resources.
Conversely, the last design in which three adder IP instances are utilized to execute one
vector accumulation has lower latency but higher resource utilization. These designs offer
a trade-off between performance and resource utilization. Depending on the speed goals
and resource utilization constraints, throughout this work different accumulator designs
are used in different algorithms.
Table 2.2
Characteristics of Floating-Point Accumulator Designs
Solution number Number of adder IP cores Latency

1 1/6 6l
2 1/3 3l + 9
3 1/2 2l + 14
4 1 l + 22
5 2 (l/2) + 28
6 3 (l/3) + 34
2.5 FPGA Programming Method

There are three main categories of FPGA programming languages, i.e., Hardware Descrip-
tion Languages (HDL), high-level graphical languages and high-level C-based languages.
In this section, these programming languages are briefly compared and the appropriate
choice for this work is explained.
The HDL programming languages, e.g, Verilog and VHDL are the original digital
hardware design languages. These languages do not impose any restriction in the digital
hardware design. However, in case of complicated designs they impose very long devel-
opment time. This is usually a significant issue and limits design exploration which itself
22
might result in inefficient designs.

Graphical languages, e.g., Labview FPGA [41] enable designers to program FPGAs
without much of the knowledge of digital hardware design. This type of FPGA program-
ming is used for control applications and algorithm verifications but it is rarely used in
computationally intensive applications. Besides, graphical languages inherently limit the
flexibility and design exploration.
The are two main approaches in high-level C-based FPGA programming languages.
First category includes C-based programming packages based on OpenCL framework,
e.g., Altera’s SDK for OpenCL [42] and Xilinx SDAccel [43]. In the OpenCL framework
an accelerator (FPGA board in this case) is connected to the host which is usually a
Personal Computer (PC). Part of the code which is computationally expensive is executed
on the FPGA. OpenCL for FPGA is under development and the connection of FPGA
to other devices in OpenCL framework is not as flexible as HDL languages. This is
important for development of an FPGA-based DRTS since it usually requires that the
simulator interface with other devices, e.g., another simulator or Input/Output (IO)
cards. Therefore, OpenCL is not the preferred programming language for development
of an FPGA-based DRTS.
The second approach in high-level C-based FPGA programming languages, is the
IP-based design approach in Xilinx’s Vivado High Level Synthesis (HLS). The C-based
programming is used to design modules which are also called IPs. This higher level of
abstraction enables more aggressive design exploration. This results in achieving better
performance than HDL languages in a much shorter time. However, achieving efficient
designs using HLS requires understanding of digital hardware design. This is even more
important in case of digital hardware design based on floating-point representation. The
C-based designed IPs and IPs designed using HDL languages can be integrated into one
design using Vivado IP integrator environment. Using this approach one can exploit all
the advantages of both the HDL and the high level C-based programming. Soft-processors
and hard-processors can also be integrated into the design in Vivado IP integrator envi-
ronment.
In this project different modules are designed for different tasks using Vivado HLS
and HDL languages. However, most of the modules that carry out the simulation tasks,
based on floating-point representation, are designed using Vivado HLS. Each task may
contain several processes. Each process is carried out using a Process Unit (PU). Within
a module each PU can be designed using either a loop or a function in Vivado HLS.
When designing PUs using functions dedicated hardware containing dedicated floating-
point cores is created. However, PUs designed using loops share the same floating-point
cores. These two methods of implementation enable design flexibility to achieve better
23
performance by exploiting the floating-point cores.

Finally, the computation engine, i.e., digital hardware of the FPGA-based power sys-
tem EMT simulator is built by interconnecting these different modules in Vivado IP
Integrator environment. A MicroBlaze (MB) soft-processor is instantiated in the com-
putation engine to control different modules. A transceiver module is also instantiated
to connect the computation engine to the RTDS.
2.6 Summary and Conclusions

This chapter provides basic and necessary information of the FPGA hardware resources
for high performance design purposes. Numerical representations and their characteris-
tics are explained. Standard single-precision representation is chosen to provide adequate
precision and dynamic range while utilizing reasonable FPGA resources. Two major tech-
niques of exploiting the parallelism, i.e., unrolling and pipelining are described. Vector
accumulation is a bottleneck in low latency floating-point matrix-vector multiplication.
It is shown how parallelism can be enabled in the vector accumulation by reformula-
tion. Moreover, the FPGA programming methods are briefly explained and Vivado HLS
and Vivado IP Integrator are introduced for designing the hardware modules inside the
FPGA-based computation engine.
24
Chapter 3
FPGA-Based Solution of Network

Equations for EMT Simulation
3.1 Introduction
For simulation of power system EMTs, the power network is represented by set of differen-
tial and algebraic equations. The main course of this chapter is to design a floating-point
computation engine to solve such differential equations. The proposed computation en-
gine features (i) a fixed digital hardware independent of system topology, (ii) parallelism
between simulation tasks and (iii) efficient utilization of FPGA hardware resources.
Section 3.2 describes different formulation for power network representation for EMT
simulation. The features and drawbacks of different formulations are explained and the
one for FPGA-base implementation is introduced. Section 3.3 explains the structure of
the real-time simulation platform, the FPGA-based solution of the power system equa-
tions and its interface with the classical CPU-based RTDS real-time simulator. Section
3.4 describes further details of solving of the network equations using FPGA, explains
anti-parallel patterns and elaborates on the proposed reformulations to enable paral-
lelism. The design of the pipelined computation engine that exploits the enabled paral-
lelism is also provided.
3.2 Network Formulations for EMT Simulation

To numerically solve differential equations, either variable- or fixed time-step numerical
integration methods can be employed. For power system EMT studies, the variable time-
step method is a viable approach for fast transients that damp out. This allows the solver
to increase the time-step after the transients vanish. However, in case of a power system
25
CHAPTER 3. FPGA-BASED SOLUTION OF NETWORK EQUATIONS FOR EMT
SIMULATION
embedding PECs, the repetitive switching of PECs prevents the solver from increasing
the time-step. Moreover, in real-time simulation fixed time-step method is preferred over
variable time-step method. Therefore, fixed time-step method is considered in this work.
The differential equations representing a power system model for EMT studies consti-
tute a stiff system. Therefore, implicit stiff numerical integration methods are employed
to obtain the simulation results at each time-step [44]. This leads to the solution of a
set of (possibly time-variant) linear equations at each time-step. Different formulations ,
i.e., state-space, nodal formulation and combined nodal and state-space can be adopted
for this purpose.
3.2.1 State-Space Formulation

The standard state-space representation of a set of linear differential equation is
ẋ = Ac x + Bc u,
(3.1)
y = Cc x + Dc u.
Vector x contains state variables, i.e., capacitor voltages and inductor currents and u is
the vector of inputs. Subscript c indicates the continuous-time equations.
One feature of state-space formulations is that it can be solved by a general integration
method. A Linear Multi-Step (LMS) integration method can be used to carry out such
a task. For power system EMT analysis usually an implicit one-step LMS integration
method is used. Applying an implicit one-step LMS integration method, the discrete-time
equations are
Md xn+1 = Ad xn + Bd1 un + Bd2 un+1 ,

(3.2)
yn+1 = Cd xn+1 + Dd un+1 ,
where Md , Ad , Bd1 , Bd2 , Cd and Dd are discrete-time state-space matrices. In the case
of the commonly used Trapezoidal (TR) LMS integration method [4] these matrices are
∆tAc ∆tAc ∆tBc

Md = (I + ), Ad = (I − ), Bd1 = Bd2 = ,
2 2 2 (3.3)
Cd = Cc , Dd = Dc ,
where d denotes the discrete-time and ∆t is the simulation time-step.

The steps to solve the network equations, using state-space formulation and a one-step
implicit integration method, e.g., TR include:
1. Update sources un+1 .
26
SIMULATION
2. Calculate discrete-time system matrices from (3.3) (if there is any change).
3. Solve (3.2) to find the state variables xn+1 .
4. Calculate the outputs yn+1 .
The main feature of the state-space formulation is that it enables eigenvalue analysis
of both continuous-time and the resulting discretized equations after applying the LMS
integration method. This is advantageous in selection of proper LMS integration method
and simulation time-step for a specific problem. State-space approach allows evaluation of
simulation stability and usage of iterative methods for real-time EMT simulation of PECs
[45]. The reason is that in the state-space formulation, the vector of unknowns is the
vector of state-variables, e.g., capacitor voltages and inductor currents. This quantities
do not dramatically change from one time-step to the next. This allows employing the
state variable vector from previous time-step xn as the initial guess for the iterations to
obtain xn+1 .
However, the state-space formulation suffers from complex process of assembling the
network equations. The reason is that organizing the network equations in solely dif-
ferential form is not straightforward. If a parameter changes during the simulation, it
will be a time consuming task to obtain the new set of differential equations. This has
consequently prevented state-space formulation to be employed in simulation of power
system EMTsm, specially in presence of PECs. SimpowerSystem [46] is the main off-line
EMT simulation software package that uses state-space formulation.
3.2.2 Nodal Formulation

Unlike the state-space formulation where the integration method is applied to the sys-
tem of (3.1), in the nodal formulation the differential equation describing each branch is
discretized separately. The companion model [4] of the dynamic branches, e.g., inductors
and capacitors as well as the non-linear apparatus, e.g., EMs are deduced separately.
Then, the network discretized model is obtained by replacing each branch by its com-
panion model. The discretized network equations are deduced by KCLs and KVLs for
the system discretized model.
The differential equation describing an inductor branch with inductance L is
diL
L = vL . (3.4)
dt
Discretized form of (3.4), based on TR integration method, is
∆t n+1
in+1
L − inL = (v + vLn ), (3.5)
2L L
27
SIMULATION
Table 3.1
Discrete admittance and history update coefficients for different branches
Branch Type cv ch g
∆t ∆t
Inductor L
1 2L
−4C 2C
Capacitor ∆t
−1 ∆t
4L∆t 2L−R∆t 1
Series RL (2L+R∆t)2 2L+R∆t 2L
R+ ∆t
−4C∆t 2RC−∆t 1
Series RC (2RC+∆t)2 (2RC+∆t) ∆t
R+ 2C
which can be rewritten as

∆t n+1 ∆t n ∆t n
in+1 =( )vL + (inL + vL ) = gd vLn+1 + (inL + v )
L
2L 2L 2L L (3.6)
= gd vLn+1 + in+1
h ,
where gd is the discrete-time admittance and in+1

h is the so called history current source
which at time-step n + 1 is only a function of solution from the previous time-step, i.e.,
n. Therefore, at each time-step in+1
h is updated according to
∆t n
in+1
h = inh + v
L (3.7)
= ch inh + cv v n ,
where ch and cv are the coefficients relating the magnitude of history current at time-
step n + 1 to inh and v n , i.e., the history current and the branch voltage at time-step n,
respectively.
For capacitors, series RL and series RC branches an expression analogous to (3.7) can
be obtained. The equivalent circuit of (3.7) is depicted in Fig. 3.1. Table 3.1 summarizes
the history current updating coefficients and the discrete admittances for different type
of dynamic branches.
v
vn+1
i R R
L C g ihn+1
L C
Continuous-time branch model Discretized branch model
Figure 3.1. Continuous-time and discretized model of L, C, series RL and series RC branches
28
SIMULATION
Using companion models of dynamic elements, the network model at each time-step is
a resistive network including voltage/current sources. The available methods to develop
the equations of such a network are Sparse Tableau [38], Nodal Analysis (NA) and Mod-
ified Nodal Analysis (MNA) [47]. NA and MNA result in smaller size matrices relative
to Sparse Tableau . Moreover, the matrices in NA and MNA are easily constructed by
inspection of the case study system netlist. Therefore, most of the power system EMT
analysis tools and circuit simulators use NA or MNA.
A feature of NA is that the resulting matrix is symmetric and its drawback is that it
can not include voltage sources in the formulation. Therefore, NA needs to introduce an
artificial series resistance for each voltage source. MNA enables inclusion of independent
and dependent voltage sources. Therefore, MNA is selected to form the network equations
in this work.
The network mathematical model based on MNA is [4]
" #" # " #
G Ag vn+1 un+1
i
n+1
= n+1 , (3.8)
Bg 0 ivs vs
| {z } | {z } | {z }
Y xn+1
node un+1
node
where
un+1
i = in+1
h + in+1
s . (3.9)
un+1
i is the nodal current injection vector. in+1
h and in+1
s are vectors of history and in-
dependent current sources entering the nodes, respectively. vn+1
s is the vector of voltage
n+1
sources, vn+1 and ivs are the vectors of unknown node voltages and voltage source cur-
rents, respectively. It is assumed that the dependent sources are either voltage-controlled
voltage sources or current-controlled current sources. Therefore, the bottom right sub-
matrix of the MNA matrix is 0. G is the conductance matrix, Ag and Bg are matrices
defining node incidence of the independent voltage sources and dependent sources. Equa-
tion (3.8) can be rewritten as
Yxn+1 n+1
node = unode , (3.10)
where Y is the admittance matrix, un+1 n+1

node is the nodal injection vector and xnode is the
vector of unknowns.
Transformers are widely present in the power network. For power system EMT anal-
ysis, within the bandwidth of several kHz, transformers are represented by lumped ele-
ments. Using this method three-phase transformer can be modeled using three single-
phase transformers [13]. Therefore, only the single-phase transformer is explained here.
Figure 3.2 (a) depicts the lumped model of a single-phase transformer which includes
an ideal transformer, magnetizing inductance, primary and secondary winding resistances
29
SIMULATION
and leakage inductances. Fig. 3.2 (b) illustrates the corresponding discretized model.
The ideal transformer is modeled by dependent sources. The history current sources are
modeled by independent sources. The reason is that the each history current source at
each time-step is found from the data from the previous time-step. The ideal transformer
turns ratio and node incidences appear in matrices Ag and Bg .
3
1:n 1
1 3
v1 is
Rp Lp Ls Rs
Lm nis nv1
2 4 2 4
Transformer lumped element model Transformer discretized model
(a) (b)
Figure 3.2. Transformer Model
Based on the discussion of this section, the steps to solve the network equation, based
on nodal formulation, include:
1. Calculate independent voltage and current sources sources vn+1

s , in+1
s .
n+1
2. Update each entry in the history current source vector ih using (3.7) and construct
uin+1 and therefore un+1
node .
3. Update admittance matrix Y (if there is any change).
4. Solve (3.10) to determine the vector of unknowns xn+1node .
The main reason that nodal formulation is widely accepted for simulation of power
system EMTs, especially in the presence of switches where the admittance values in
the network model are repetitively changing, is that it easily lends itself to systematic
preparation of the network equations by inspection of the netlist file containing the con-
figuration of the study system. PSCAD, EMTP-RV and real-time RSCAD and HyperSim
EMT simulation software packages use nodal formulation. This thesis uses a combined
nodal and state-space formulation as described in the following section.
3.2.3 Combined Nodal and State-Space Formulation

In the combined nodal and state-space formulation [15] [16] often a smaller part of the
system is represented based on state-space formulation and the rest (main part) is rep-
30
SIMULATION
resented by nodal formulation. The nodal equivalent of the state-space formulation is

deduced and integrated into the nodal part.
Consider the discretized state-space equations of a subsystem with two input vectors,
i.e.,
" # " #
h i un h i un+1
int int
Md xn+1 = Ad xn + Bd1int Bd1ext + Bd2int Bd2ext ,
unext un+1
ext
" # (3.11)
h i un+1
int
yn+1 = Cd xn+1 + Ddint Ddext ,
un+1
ext
where unint is the vector of internal independent sources of the subsystem. unext is the
vector of independent voltage sources connected to the subsystem terminals and yn+1
is the vector of current outputs of the subsystem terminals. The vector of new state-
variables xn+1 can be deduced from the first row of (3.11) and substituted in the second
row to obtain yn+1 , i.e.,
history term
z }| {
yn+1 = Cd M−1 n n n n+1 n+1

d A d x + B u
d1int int + B u
d1ext ext + B u
d2int int + D u
dint int +
(3.12)
Cd M−1 n+1

d Bd2ext + Ddext uext .
| {z }
admittance matrix
Consider (3.12) is representing a subsystem in the main system which is represented

by nodal formulation. Then, the external input vector un+1 ext is the vector of subsystem
n+1
terminal voltages vss and y n+1
is the vector of subsystem terminal currents in+1
ss . Then
(3.12) in terms of the nodal formulation variables is
in+1 n+1 n+1

ss = Yss vss + ihss . (3.13)
Subsystem equivalent admittance matrix Yss and history current source vector in+1 hss can
be obtained from (3.12). In this formulation the main part of the system uses nodal
formulation. The subsystem admittance matrix Yss and the history current source vector
in+1 n+1
hss are respectively included in the admittance matrix Y and nodal injection vector unode
of the main system, i.e., (3.10). The equivalent circuit of (3.13) is depicted in Fig. 3.3.
The solution steps to solve network equation, using combined nodal and state-space
formulation, include:
1. Calculate independent voltage and current sources vn+1

s , in+1
s .
2. Update each entry in the history current source vectors ihss and ih using (3.7) and
construct un+1
i therefore un+1
node .
31
SIMULATION
yssn+1=issn+1
ihssn+1 uextn+1=vssn+1
Yss
Figure 3.3. Nodal equivalent of the state-space subsystem
3. Update admittance matrix Yss (if there is any change).

4. Update admittance matrix Y (if there is any change).
5. Solve (3.10) to find the vector of unknowns xn+1
node .
The combined nodal and state-space formulation exhibits two features. First, it
can be employed to overcome the limitations of the pure state-space based algorithms
to simulate case studies containing variable parameters, e.g., circuit breakers [15] [16].
Second, in some cases the state-space model of the subsystem is already available. This
method enables the integration of such subsystem models into a software package that
is based on nodal formulation. Moreover, using this approach and separating parts of
the system into groups reduces the main system matrix size. This generally reduces
the computational burden since most of the algorithms in the solution of the network
equations are O(n2 ) or O(n3 ) where n is the dimension of the admittance matrix.
Frequency dependent of long transmission line models, i.e., those of overhead lines
and underground cables, are usually obtained in state-space form. In this work the
combined state-space and nodal formulation is employed to include the transmission line
frequency-dependent model into the main system model.
3.3 Proposed DRTS Architecture

Figure 3.4 illustrates the circuit equivalent of the transmission line frequency-dependent
discretized model [48]. At each end of the line the model provides an admittance matrix
Ytr and two current sources with no connection between the admittance matrices of the
two ends. Therefore, using frequency-dependent transmission line model in the power
system model can be used to decouple the admittance matrix of the two end subsystems.
This enables parallelism in the EMT simulation algorithm and has been widely exploited
in the architecture of the commercially available CPU-based DRTSs.
In emerging power system configurations, e.g., that of Fig. 3.5(a), the renewable
energy resources are integrated into the grid mostly at distribution level where the trans-
32
SIMULATION
i2n+1 i1n+1
iw2n+1= iw1n+1=
ih2n+1 ih1n+1
v2n+1 Ytr f(v1n-nτ ,i1n-nτ) f(v2n-nτ ,i2n-nτ) Ytr v1n+1
Figure 3.4. Discretized transmission line Model
mission lines are short. Therefore, in simulation of EMTs, lumped element models, e.g.,
PI section, are used [13] which does not provide parallelism property.
Figure 3.5(b) depicts the structure of the system equations for simulation of power
system EMTs. Short transmission line and high switching frequency PECs result in a
single large set of equations which requires to be solved with a small time-step. Figure
3.5(c) illustrates the envisioned hardware architecture for the DRTS platform in this
work. The single large set of equations representing the distribution level part of the
power system is solved in the FPGA. Transmission line travelling-wave model is employed
to interface this part of the power system model with the model of the host grid (often
classical part) which is simulated in the RTDS.
Section 3.4 describes the solution of single large set of equations representing the
network of the emerging power system. The line model and the corresponding FPGA-
base implementation are explained in chapter 6.
3.4 FPGA-based Solution of Network Equations

This section explains a method to solve the network equations in FPGA. One approach
is based on generating VHDL code for each study system [36]. This requires digital
hardware code compilation for each study system which is time-consuming. The goal is
to design a fixed hardware computation engine to solve the system equations for arbitrary
network configuration without the need for FPGA code recompilation.
In FPGA-based real-time simulation, a specific hardware “module” is designed for
each specific “task ” in the simulation algorithm. Within the design of each task, potential
parallelism is sought or even created by reformulation. Then, the performance of the
corresponding module is improved by exploiting this parallelism. Moreover, if the data
exchange between different modules is sequential, then potential task-level parallelism
is exploited by task-level pipelining. This section identifies the tasks in simulation of
networks are and their corresponding modules are designed. The computation engine
33
SIMULATION
Transmission
Generation
Distribution
(a) Emerging power system
(b) Structure of the equations for

EMT simulation
Fiber Optic
FPGA CPU based Simulator

(c) Hardware
Figure 3.5. Emerging power systems
is constructed using the simulation modules and other digital modules to control and
interconnect them. These tasks are explained as follows.
Each simulation time-step needs to to solve (3.10). If matrix Y is time-variant, ma-
trix inversion is not used since it is computationally expensive . Instead, the two-step
LU factorization is used, i.e., (i) computing the L and U factors and (ii) forward- and
backward substitution [4]. However, in the small time-step real-time FPGA-based simu-
lation of EMTs, on-the-fly LU factorization, particularly in the case of large admittance
matrices, must be avoided [35]. If Y is time-invariant, the pre-calculated inverse matrix
or LU factors can be used. Time-invariant Y is generated by modeling techniques that
avoid admittance matrix variations during simulation which can introduce inaccuracies,
e.g., ADC power switch model [11]. The proposed techniques to maintain the admittance
matrix of the network invariant while maintaining the accuracy for simulation of PECs
and EMs are presented later in chapters 4 and 5, respectively.
Assuming time-invariant admittance matrix, the inverse matrix approach is employed
in this thesis. The reason is that forward- and backward substitution are inherently
sequential procedures and include division operation, therefore, FPGA-based low latency
34
SIMULATION
execution of these algorithms, i.e., small simulation time-step is not achievable. However,
matrix-vector multiplication contains parallelism and can be exploited using unrolling
and pipelining. More importantly, task level parallelism can be enabled by reformulation
of matrix vector multiplication as explained in the following sections.
Considering a constant admittance matrix and a combined state-space and nodal
formulation, the simulation tasks are (i) independent current and voltage source calcu-
lation, (ii) updating the branch history current sources, (iii) updating the subsystem
history current sources, and (iv) solving the nodal equation using (3.10). Since matrix-
vector multiplication is chosen over forward- and backward substitution, task (iv) is to
carry out
−1 n+1
xn+1
node = Y unode . (3.14)
3.4.1 Anti-parallel Patterns and Proposed Reformulation
The structure of the nodal injection vector un+1node is not desirable for FPGA-based im-
plementation for two reasons. First, the structure of un+1
i part of un+1
node in (3.8) depends
on the network configuration. From the above four tasks, the first three contribute
to un+1
i . Thus the construction time of this vector is not deterministic and can be
a time-consuming task depending on system configurations, considering the latency of
floating-point arithmetic operations in FPGA environment. Second, the structure of
un+1
node prevents any parallelism between the first three tasks and the fourth. The reason
is that all the first three tasks must be completed before the fourth task can be started.
As described in the previous chapter, potential parallelism in an algorithm can be
enabled by reformulating the algorithm. To overcome the above two issues, the algorithm
is reformulated as follows. A new nodal injection vector bn+1
node is defined, i.e.,
 n+1 
bs
bn+1 =  ih  = KSI un+1
 n+1 
node node , (3.15)
n+1
bss
where KSI is the source incidence matrix, bn+1s is the vector of independent sources, in+1
h
is the vector of branch history currents and bn+1
ss is the vector of subsystem history current
sources. KSI provides a desirable order in vector bn+1 node . It should be noted that each
n+1
entry of vector bnode is associated with only one voltage or current source, however, each
entry of vector un+1
i is a summation of current sources connected to the corresponding
node which is configuration-dependent.
For the new reformulation, the vector of unknowns xn+1
node is calculated by
35
SIMULATION
n+1
xn+1
node = HRM N A bnode , (3.16)
where HRM N A is the proposed Reformulated Modified Nodal Analysis (RMNA) matrix
and defined by
HRM N A = KSI Y−1 . (3.17)
Based on (3.17), contribution of each source bn+1 n+1

node (i) to the final results xnode is calculated
by superposition. bn+1 th
node (i) is either a current source or a voltage source and the i column
of HRM N A calculates the contribution of this source in the final solution of the network
equations.
The first three tasks respectively calculate bn+1
s , in+1
h and bn+1
ss which are parts of
n+1 n+1 n+1
bnode (i). The fourth task uses bnode and calculates xnode using (3.16). Hereinafter, vector
bn+1
node (i) is called RMNA injection vector.
One of the objectives of this thesis is to design the DRTS independent of the com-
ponents connectivity to avoid hardware code re-compilation for each study system. The
second and third task require the voltage across each branch to carry out (3.7). How-
ever, the connectivity and therefore the positive- and negative node of each branch vary
depending on the system configuration.This can be resolved by including the history cur-
rent updating task in the matrix-vector multiplication task [35]. However, such method
results in larger matrix dimensions. Matrix-vector multiplication task is the main FPGA
resource consuming task as shown in the next section and therefore larger matrix dimen-
sions increases the overall FPGA resource consumption.
In this work the flexibility of FPGA-based design is leveraged to provide sequential
input data for a separate small history current updating module, irrespective of the
connectivity of the system under consideration. This is enabled by using Multiplexers
(MUXs) that contain data associated with the connectivity of the system.
The dimension of matrix HRM N A in (3.16) is nx × nb as compared with that of the
state-of-the-art method [35], i.e., (nx + nb ) × nx where nx and nb are the length of vectors
n+1
xn+1
node and bnode , respectively. Therefore, the proposed technique reduces the FPGA
resource consumption and consequently larger size systems can be simulated.
The proposed reformulation offers three advantages. First, it enables fixed hardware
architecture design irrespective of the connectivity of the study system. The reasons are
(i) the structure of vector bn+1
node is not dependent on the network configuration and (ii)
MUXs (as it will be in the next part) provide the branch voltages for the history current
updating module based on connectivity of the system. Second, the dimension of the
matrix-vector multiplication is kept relatively small, therefore, larger size systems can
be can be accommodated by the same FPGA, as compared to other approaches. Third,
36
SIMULATION
it considers a proper order between the three vectors bn+1 s , in+1

h and bn+1
ss in the vector
bn+1
node . The order between these vectors is determined by their data requirement from the
previous time-step and the computation time of each vector. This enables “stall-free”
task-level pipelining between the first three tasks and the fourth task. Since vector bn+1
node
is sequentially produced by the first three tasks and sequentially consumed by the last
task, task-level parallelism can be exploited using task-level pipelining to achieve minimal
time-step.
3.4.2 Implementation
This part presents the implementation of the modules to carry out the independent source
calculation, history current updating and matrix-vector multiplication tasks.
3.4.2.1 Independent Source Calculation module
Each entry of vector bn+1

s is either a DC or sinusoidal AC voltage/current source and is
obtained by
bn+1
s (i) = as (i) × scn+1 (i), (3.18)
where
sin(φn+1 (i)) stype (i) = 1
scn+1 (i) = (3.19)
1 stype (i) = 0,
and
0 < i < l1 + 1. (3.20)
where l1 is the maximum number of independent sources in the system. For DC sources
stype (i) = 0 and for AC sinusoidal sources stype (i) = 1. The phase for the ith source,
φn+1 (i), is obtained by
φn+1 (i) = 2πfs ∆t + φn (i). (3.21)
Calculation of trigonometric function in (3.21) on FPGA is time- and resource con-

suming whereas memory resources, i.e., BRAMs are abundant. Therefore, instead of a
dedicated floating-point core, a BRAM containing 1200 sample points of one period of
a sine wave is implemented. This provides adequate precision to calculate the sine wave
in real-time. To obtain sine wave values from the points saved in the BRAM, φn+1 (i) is
scaled and converted into an integer, i.e.,
1200 n+1
φn+1
int (i) = [ φ (i)]. (3.22)
2π
sin(φn+1 (i)) is calculated by using φn+1

int (i) as the address for the BRAM containing the
37
SIMULATION
sine wave sample points.

The hardware design for independent source calculation module is depicted in 3.6.
For each system, the parameters are calculated prior to the simulation by inspecting the
system netlist file using a MATLAB script. Each system parameter is written into the
corresponding module before the simulation starts. The parameters of the independent
source calculation module are time-step ∆t, the source amplitude vector as , the source
type vector stype , the source frequency fs .
bsn+1(i)
(Δt,as,stype,φ0,fs)
0
Δt φn+1(i) φn+1int(i)
-2π <x < float to
2πfs(i)
2π int
Sin(φn+1(i))
φn(i)
scn+1(i)
1199 bsn+1(i)
1.0
stype(i)
as(i)
Figure 3.6. Independent source calculation module
The latency of the designed module is 47 clock cycles with the throughput of one data
per clock cycle. This implies that 47 clock cycles after the module is activated, it outputs
bn+1
s (i) at the rate of one data per clock cycle.
3.4.2.2 History current update module
Based on (3.7) and Table 3.1 for each dynamic branch, at time-step n + 1 the history
current update module calculates in+1
h (i) by
in+1 n n n
h (i) = cv (i)(vp (i) − vn (i)) + ch (i)ih (i), (3.23)
where
vnp (i) = xnnode (p1 (i)),

(3.24)
vnn (i) = xnnode (p2 (i)).
p1 (i) and p2 (i) are respectively positive and negative node number and of the ith dynamic
branch.
38
SIMULATION
History current update module inputs xnnode (p1 (i))s and xnnode (p2 (i))s from the pre-
vious time-step calculation results, i.e., time-step n, in a sequential pattern. This is to
enable pipelining in this module which requires sequential access pattern to data. More-
over, vectors p1 (i) and p2 (i) vary for different systems based on the connectivity of the
network. In the matrix-vector multiplication module design, xnnode (i)s are produced in a
parallel pattern. Therefore, a data bus is considered for xnnode .
Vector xnnode has a parallel write- and random-access read pattern. To handle the
random access read- and parallel write-pattern two Multiplexers (MUXs) are considered.
Figure 3.7 depicts the MUXs and the history current update module. These multiplexers
output xnnode (p1 (i))s and xnnode (p2 (i))s. Vectors p1 and p2 are constant parameters of the
study system which are calculated and written in the MUXs prior to simulation. Figure
3.7(b) depicts the pipelined structure of the history current update module that carries
out (3.23). Vectors cv , ch and i0h are the module parameters that are calculated and
written in it prior to the start of the simulation.
MUXs
xnoden(0)
xnoden(1)
xnoden(p1(i))
ihn+1(i)
p1 , p2 xnoden(p2(i)) (cv,ch,ih0)
ihn(i)
ch(i)
xnoden(p1(i)) ihn+1(i)
n
v (i)
xnoden(p2(i))
cv(i)
Figure 3.7. History current update module
3.4.2.3 Matrix-vector multiplication module
Based on (3.16), matrix-vector multiplication carries out the simulation task and consists
of nx vector dot product processes that can be conducted in parallel. The ith process is
the dot product of vector bn+1
node and i
th
row of matrix HRM N A , i.e.,
39
SIMULATION
nb
X
xn+1
node (i) = HRM N A (i, j) × bn+1
node (j). (3.25)
j=1
Dot product processes can be carried out using floating point multiplier IP core and
floating-point accumulator. Depending on selected accumulator design from Table 2.2,
one or multiple numbers of vector dot product processes can be assigned to a PU, con-
sisting of a floating-point accumulator and a floating-point multiplier. This enables a
trade-off between resource consumption and module latency. In this design, each PU is
assigned to perform one dot-product process.
Fig. 3.8 shows the hardware design of the matrix-vector multiplication module. Ma-
trix HRM N A is calculated prior to the simulation and each row HRM N A (i, :) is saved in a
separate BRAM. As mentioned before, each xn+1 node (i) is calculated using one PU.
PU0
xnoden+1(0)
HRMNA(:,0)
PUj
n+1
bnode (i) xnoden+1(i)
HRMNA(:,j)
PUnx-1
xnoden+1(nx-1)
HRMNA(:,nx-1)
HRMNA(i,j)
xnoden+1(i)
bnoden+1(i)
Partial Acc 0 (9K)

Demux
Partial Acc 1 (9K+1)

Mux
Partial Acc 9 (9K+8)
Figure 3.8. Matrix-vector multiplication module
The floating-point adder and multiplier IP cores provided by Xilinx can be configured
to use combination of DSP and CLB resources. For instance, the single precision floating-
point adder IP core can be configured to be (i) solely implemented using CLBs, (ii)
40
SIMULATION
implemented using CLBs and two DSPs or (iii) implemented using CLBs and three DSPs.
More DSP utilization in an IP core reduces the CLB utilization, however, it increases
the number of pipeline stages and therefore latency of the IP core. This is an important
consideration since matrix-vector multiplication is the most resource consuming module.
It should be noted that maximizing the DSP utilization for this module uses a very small
portion of the available DSP resources on the FPGA. All multipliers and adders of this
module are configured for maximum DSP usage.
The accumulators designed in Table 2.2 are based on floating-point adder IP core
with 6 clock cycles latency. However, the latency of the Xilinx floating-point adder and
multiplier IP cores with maximum DSP utilization are 9 and 7 clock cycles, respectively.
Using these adder and multiplier IP core configuration, the latency of the floating-point
accumulator design with throughput of one data per clock cycle, similar to the fourth
design in the table 2.2, is nb + 44 where nb is the input vector size. Considering 7 clock
cycles latency of the multiplier, the latency of dot-product calculation PU is nb + 51.
Since each dot product is assigned to a separate PU, as shown in Fig. 3.8, the latency of
the matrix-vector multiplication module is also nb + 51.
3.4.2.4 Computation engine
To interconnect, initialize, control and change the parameters of the modules which carry
out simulation tasks, accessories are also required. Fig. 3.9 depicts the diagram of the
computation engine hardware proposed to solve the network equations. A MicroBlaze
(MB) soft processor is instantiated in the design to initialize and change parameters of the
modules. The dashed arrows are connections for the communication between the modules
and MB. The solid arrows are data streams. Vectors bn+1 s and in+1
h are concatenated to
n+1
form RMNA injection vector bnode .
Timing diagram of the computation engine is shown in Fig. 3.10. At t = tn+1 0 time-
step n + 1 starts. Since calculation of independent source values do not require any
information from the vector of unknowns from the previous time-step n, the indepen-
dent source calculation module can start before the time-step n + 1 starts. This module
starts at t = tn+1
−1 . After τ1 clock cycles and at the beginning of the time-step n + 1, i.e.,
t = t0 , this module starts streaming its output bn+1
n+1
s . At t = tn+1
0 the history current
source update module and the matrix-vector multiplication module, due to the enabled
parallelism, start operating in parallel. At t = tn+1
1 independent source calculation mod-
n+1
ule completes outputting bs and the history source update module starts streaming its
output in+1 n+1
h . When this module completes outputting ih , i.e., t = t2
n+1
, matrix-vector
n+1
multiplication module has completely processed input vector bnode and starts adding the
partial floating-point accumulation terms. It takes the matrix-vector multiplication mod-
41
SIMULATION
Soft Processor
MicroBlaze
MUXs
Matrix Vector xnode(0)
Multiplication
1
Independent source bs bnode
calculation p1 , p2
xnode(p1(i)) HRMNA b
History current update
ih
2
xnode(p2(i)) Concat
2 4
Figure 3.9. Computation engine for solution of network equations
ule τM ACC = 51 clock cycles to perform this final stage of the floating-point accumulation
as described in part 2.4.3.
time
l1 l2
bs(0) bs(1) bs(26)

τ1
bsn+1 bh(0) bh(1) bh(109)
bhn+1 τ2
nb
bs(0) bh(0) bh(135) τMACC

bnoden+1
τ4
t-1n+1 t0n+1 t1n+1 t2n+1 t4n+1

t-1n+2 t0n+2
Minimal time-step
Figure 3.10. Timing diagram of the proposed computation engine
The minimal time-step ∆tmin is the minimum number of clock cycles in which the
computation engine can finish the simulation. Timing diagram shows that in the proposed
scheduling of the simulation tasks, the minimal time-step is the latency of the matrix-
42
SIMULATION
vector multiplication module, i.e.,
∆tmin = τ4 = nb + τM ACC . (3.26)
However, this requires stall-free pipelining of the computation engine. stall in pipelining
is when the the module that consumes data is ready to receive the data but the producer
module has not produced it yet. It should be mentioned that the opposite is not an issue
and can be handled by implementing First-In-First-Out (FIFO) digital hardware. To
have a stall-free pipeline, the latency of the modules should satisfy:
τ1 ≤ τ4 ,
τ2 ≤ l1 , (3.27)
l1 + l2 = nb ,
where l1 and l2 are respectively length of vectors bn+1 s and in+1

h . τ4 is the latency of
matrix-vector multiplication module.
The size of vectors bn+1 n+1
node and xnode are considered to be nb = 135 and nx = 70,
respectively. This means the computation engine can simulate case study systems up
to 70 nodes with the minimal time-step of 186 clock cycles according to (3.26). The
independent source calculation module latency is 47 clock cycles, therefore, the first
condition in (3.27) is satisfied. The latency of the history current update module is 26
clock cycles, therefore, to satisfy the second condition in (3.27) the maximum number of
independent sources is considered to be 26, i.e., l1 = 26. l2 is obtained from the third
condition and is 109.
The design meets 140MHz timing constraints, therefore, the minimum time-step is
1.1328µs. The computation engine is capable of simulating networks containing up to
109 dynamic branches and 26 independent source with the time-step of 1.1328µs. The
resource utilization and the latency of different modules in the proposed computation
engine are reported in Table 3.2. It is evident that matrix-vector multiplication module is
the dominant resource consuming one and determines the minimal time-step.. Therefore,
performance of the overall computation engine highly depends on the performance of the
matrix-vector multiplication module.
3.5 Summary and Conclusion

This chapter explains different formulations to represent network equations in EMT stud-
ies. Then describes the envisioned hardware platform of the real-time simulator and the
associated network mathematical model based on the combined nodal and state-space
43
SIMULATION
Table 3.2
Resource utilization of differnet modules
Module FF(%) LUT(%) DSP(%) Latency Throughput

Independent Source Calculation 1,532(≤ 1%) 1928(≤ 1%) 5(≤ 1%) 47 1
History Current Update 2,114(≤ 1%) 1,109(≤ 1%) 10(≤ 1%) 26 1
Multiplexers 4,696(≈ 1%) 3,276(≈ 1%) 0(0%) - 1
Matrix-vector Multiplication 227,930(37%) 203,220(67%) 500(18%) 186 1
Available 607,200 303,600 2,800 - -
formulation. Then, the concept of “distributed-line model” to decouple models of a dis-

tribution power grid and host main grid, for FPGA/RTDS-based simulation studies is
presented.
The tasks in the simulation algorithm, based on the combined nodal and state-space
formulation for the FPGA-based implementation, are identified and the anti-parallel
patterns between these tasks are explained. A reformulation is proposed to enable par-
allelism between the tasks. Hardware design of the module corresponding to each task
is presented. Moreover, the hardware design of the proposed computation engine, in-
cluding these modules, is explained. The procedure that the computation engine can
solve network equations up to 70 outputs, i.e., maximum 70 nodes, in 186 clock cycles,
is presented. The features of the proposed reformulation and computation engine are:
• The structure of the computation engine is fixed, i.e., for a new study system the
parameters of the system, corresponding to each module, are calculated and written
in the module by the MB in the computation engine
• The matrix-vector multiplication module is the dominant resource consuming mod-
ule. The proposed reformulation and the computation engine design result in rela-
tively small matrix dimensions, compared to the existing fix-hardware computation
engines, and therefore enables simulation of larger power systems.
• The scheduling between tasks enables design of a stall-free pipeline. Therefore, the
minimal time-step is only determined by the latency of the matrix-vector multipli-
cation module.
44
Chapter 4
FPGA-Based Real-Time Simulation

of Power Electronic Converters
(PECs) for EMT Analysis
4.1 Introduction
Power switches are represented by behavioral models for simulation of power system
EMTs, thus, presence of High Frequency (HF) PECs in the power system introduces
challenges in terms of real-time simulation of system EMTs. The main purpose of this
chapter is to devise efficient simulation methods and design the corresponding floating-
point hardware modules to enable simulation of multiple PECs. The envisioned simu-
lation method (i) enables simulation of multiple PECs without the need to drastically
increase the simulation time-step or resource utilization, (ii) enables embedding PEC
models into the network model by modifying the RMNA matrix described in the previous
chapter, (iii) does not impose unnecessarily small simulation time-steps to compensate
for errors introduced by the existing real-time modelling methods, (iv) enables parallel
operation of its corresponding digital hardware module, with other hardware modules,
in the FPGA-based computation engine, as devised in the previous chapter.
This chapter is organised as follows. Section 4.2 explains the proposed approach and
a method to avoid “ringing” phenomenon in this work. Section 4.3 explains implemen-
tation of the two-value resistor model in terms of the proposed FPGA-based DRTS in
this work and highlights issues associated with (i) variable admittance matrix and (ii)
iterative process of identifying switch states. Section 4.4 proposes a novel FPGA-based
partitioning method for simulation of PECs to overcome these challenges and applies the
method to multiple PECs. Section 4.5 presents the design of the proposed PEC module,
45
CHAPTER 4. FPGA-BASED REAL-TIME SIMULATION OF POWER
ELECTRONIC CONVERTERS (PECS) FOR EMT ANALYSIS
Table 4.1
Multi-step numerical integration methods
Multi-step method Formula

Trapezoidal(TR) xn+1 − xn = ∆t(0.5f (xn+1 ) + 0.5f (xn ))
Backward Euler (BE) xn+1 − xn = ∆t(f (xn+1 ))
Forward Euler (FE) xn+1 − xn = ∆t(f (xn ))
Adam Bashforth (AB) xn+1 − xn = ∆t(1.5f (xn ) − 0.5f (xn−1 )
based on the proposed partitioning method using FP numerical representation, which is

used to enhance the FPGA-based DRTS in Chapter 3 for simulation of PECs. Based
on the proposed partitioning method and using the enhanced FPGA-based DRTS, real-
time simulation case studies are carried out in Section 4.6 and the results are compared
against those of off-line and existing real-time simulation methods where it is shown that
the proposed method outperforms the existing real-time simulation method. Section 4.7
summarizes and highlights the contribution of this chapter.
4.2 Linear Multi-Step(LMS) Integration Methods

In this section important characteristics of Linear Multi-Step(LMS) methods for simu-
lation of power systems and PECs, i.e., implicit- or explicitness, accuracy, stability and
ringing phenomenon are described. Then, the approach of this work to mitigate ringing
phenomenon while maintaining accuracy is introduced.
The general form of a LMS integration method is [4]
k−1
X k−1
X
αj xn−j = ∆t βj f (xn−j , tn−j ), (4.1)
j=−1 j=−1
where αj and βj are constant coefficients of the method, k is the number of steps of the
integration method, x and f (xn−j , tn−j ) are the state variable and the input function of
the system of differential equations, respectively. The LMS integration method of (4.1) is
called explicit if β1 = 0 and implicit otherwise. Trapezoidal (TR), Backward Euler (BE),
Forward Euler (FE) and Adam- Bashforth two-step (AB) widely adopted LMS integration
methods. TR and BE are implicit while FE and AB are explicit methods. Table 4.1
provides equations defining these methods. Implicit methods have higher computational
burden since they impose solution of multiple sets of linear equations at each time-step.
However, they provide higher accuracy and stability of simulation results.
The accuracy of each LMS integration method is defined by its order, i.e., the or-
der of local truncation error as a function of time-step. TR and AB are second-order
46
methods while FE and BE are first-order methods [4]. Higher order provides smaller lo-
cal truncation error and under certain conditions, including numerical stability, smaller
accumulated error and more accurate simulation results [49].
The region of absolute stability criterion relates the stability of the simulation results
to the time-step for a specific LMS integration method. The region of absolute stability
regarding each of the four aforementioned methods is shown in Fig. 4.1. This is usually
presented for ∆t = 1. The area is expanded by the factor of 1/∆t for time-steps different
than 1. The simulation result of an LMS integration method, at a specific time-step, is
stable if the time-step is adequately small , therefore, the region of absolute stability is
adequately large to encompass the eigenvalues of the system in case of a linear system [4].
In case of a non-linear system, eigenvalues of the Jacobian matrix should locate inside
this region.
If the region of absolute stability of a method covers the left-half-plane, then that
method is Absolutely stable (A-stable). This means the simulation results are stable re-
gardless of the time-step size. The region of absolute stability for BE also covers part of
the right-half-plane which implies BE produces stable results for some differential equa-
tions with unstable poles and is referred to as “artificial damping” of BE [4]. However,
TR produces stable simulation results if and only if the differential equations contain
stable poles since the region for TR only covers the left-half-plane. However, the regions
of FE and AB fail to encompass the left-half-plane [4]. This implies that for differential
equations with high frequency eigenvalues, e.g., a large resistor in series with an inductor,
explicit methods result in instability which is the reason explicit methods are not used
for simulation of stiff systems, e.g., those of power system EMTs.
Most simulation software packages use TR because of its accuracy and stability prop-
erties. However, TR suffers from ringing problem [4]. Consider switch S and inductor L
combination of Fig. 4.2(a) and assume S should be turned off due to either natural- or
forced commutation process. Fig. 4.3 depicts the inductor current, history current and
voltage obtained from TR and BE. Solid and dashed lines are responses based on the
ideal and the two-value resistor switch models, respectively. The inductor voltage and
the history current source exhibit oscillatory behaviour, i.e., the ringing phenomenon,
subsequent to the switch turn-off based on TR, however, BE provides damping in one
time-step.
To solve the ringing issue one approach is to use a more damped method, e.g., BE, for
a limited number of time-steps when ringing is detected and then use TR [4]. However,
this procedure is not computationally affordable in real-time simulation since it requires
handling variable admittance matrices. Therefore, another solution is proposed and
adopted in this thesis as follows.
47
Figure 4.1. Region of absolute stability for multi-step methods
t=tn
t=tn
RON ROFF
S L
gL
ROFF IhL
Rest of the system Rest of the system
(b) (a)
Figure 4.2. Inductor switch example
This work employs TR as the main integration method for most of the models except
those of the inductors connected to switching nodes. These inductor models are dis-
cretized based on BE. The discretized model of an inductor L based on BE is the same
2L
as the discretized model of the inductor based on TR in parallel with a resistance of ∆t .
48
Time-step
Inductor Current
n n+1 n+2 n+3 n+4 n n+1 n+2
Inductor History
Current Source
Inductor Voltage
(TR) (BE)
Figure 4.3. Trapezoidal ringing phenomenon
2L
Therefore, the proposed solution is the same as adding resistance of ∆t in parallel with
each inductor that is connected to a switching node and discretizing the overall system
model based on TR. Since TR is A-stable this approach does not introduce instability
problems.
4.3 Power Switch Models for EMT studies
Detailed non-linear models of power switches are employed in general-purpose non-linear

circuit simulation software packages, e.g., SPICE. However, detailed models are neither
necessary nor computationally affordable to represent PECs in power system EMT stud-
ies. In such class of studies, behavioral model s are implemented to efficiently emulate
behavior of power switches. Behavioral models represent characteristic of a power switch
with respect to its external system. These models can emulate switch behaviour in all
modes, as opposed to the switch function model [50] which is not capable of simulat-
ing natural commutations, e.g., when the gate signals of a two-level VSC are blocked.
Moreover, the switch function model requires prior knowledge of the PEC operation. Be-
havioural switch models include: (i) two-value resistor model and (ii) ADC model which
are described as follows.
49
4.3.1 Two-value Resistor Model

The ideal switch and two-value resistor switch models are shown in Fig 4.4. The on/off
states of the ideal switch are modeled as short- and open-circuit condition. The ideal
switch model repetitively changes the dimension of the admittance matrix and thus rarely
used.
Switch off Roff O.C.
Switch on Ron= RonSwitch S.C.
Two-value resistor Ideal Switch
Figure 4.4. Two-value resistor Switch- and ideal switch model
The two-value resistor model represents an open status by a large resistance and
the closed status by the switch equivalent on-state resistance. This approach retains
the dimension of the admittance matrix unchanged. However, each time a switch state
changes, the admittance matrix changes. Most of circuit simulation software packages,
including power system EMT simulation software packages, use LU decomposition for
solution of linear equations. The change of admittance matrix results in new set of linear
equations and therefore requires on-the-fly LU factorization. Repetitive switch state
changes of PECs, and particularly those of HF PECs, require small simulation time-
step. This implies that real-time simulator is required to carry out LU factorization in a
short interval, i.e., shorter than the simulation time-step which drastically increases the
computation burden. This is the first reason that two-value resistor model for real-time
simulation of systems including HF PECs is not the preferred model.
The two-value resistor model, requires identifying the switch states at each time-step.
Identifying the switch state for forced commutations is straight forward, however, finding
the state for natural commutations involves evaluation of a set of variables. The switch
state , considering the convention shown in Fig.4.5, is
sn+1 n+1 n n+1 n n+1

SW = gSW ||(s̄SW (vSW < 0) + sSW (iSW < 0)), (4.2)
n+1 n+1
where gSW is the switch gate signal at the current time-step. For a diode switch gSW is
50
always zero. One observes from (4.2) that the switch state at time-step n + 1 depends
on the switch current and voltage of the same time-step. The switch current and voltage
depend on its state and also the state of the other switches. This implicit relationship
requires several iterations to determine switch states after a switching event. These
iterations must be carried out within the short time-step of real-time simulation process
and is the second reason that two-value resistor model is not a viable option for real-time
simulation.
This thesis, in contrast to the above arguments, (i) uses the two-value resistor model
to represent switches of HF PECs for real-time simulation and (ii) proposes a novel
partitioning method to overcome the above highlighted issues associated with the variable
admittance matrix and the switch state.
iSW
vSW
Figure 4.5. Positive current and voltage convention for switches
4.3.2 Associated Discrete Circuit (ADC) Model
The ADC model [11] represents switch on and off states by an inductor and a capacitor,
respectively. A series resistor is added to the switch off-state capacitor to damp out
the oscillations due to the artificial RLC circuit constructed by the on- and off-state
representation of the switches. The discrete admittance of the on-state inductance and
off-state series RC branch are the same. Thus there is no admittance matrix change due
to the switch state change.
Fig. 4.6 depicts the ADC switch model. The model parameters are listed in Table 4.2
for the Trapezoidal (TR) and the Backward Euler (BE) integration methods. The ADC
model capacitor and inductor are larger than the actual switch parasitic elements and
result in artificial oscillations. Therefore, BE is usually employed to damp the artificial
oscillations caused by the non-realistic elements since BE features artificial damping [21].
The switch history current source in the ADC switch model, at time-step n + 1, is [11]
in+1 n+1 n n n+1 n n

hSW = sSW (ch ihSW + cv vSW ) + s̄SW (c̄h ihSW + c̄v vSW ), (4.3)
51
Table 4.2
ADC switch model parameters
Parameter Trapezoidal(TR) Backward Euler(BE)

gSW ∆t
2LSW
= 2RSW2CCSW
SW +∆t
∆t
LSW
= RSWCCSW
SW +∆t
RSW RSW
σSW 2LSW 2LSW
chON 1 1
2RSW CSW −∆t RSW CSW
chOF F 2RSW CSW +∆t RSW CSW +∆t
∆t ∆t
cvON LSW LSW
−4CSW ∆t −CSW ∆t
cvOF F (2RSW CSW +∆t)2 (RSW CSW +∆t)2
where the switch state, sn+1 , is [18]
sn+1 n+1 n n n n
SW = gSW ||(s̄SW (vSW < 0) + sSW (iSW < 0)). (4.4)
Based on (4.4), the switch state at time-step n + 1 only depends on the switch current
n
and voltage from the previous time-step, i.e., vSW and inSW . Therefore, identifying the
switch states in ADC model is relatively straightforward as compared to that of two-value
n
resistor model. However, vSW and inSW and the oscillations caused by the artificial RLC
circuit can cause switch state oscillations.
ihSW
RSW
Switch off gSW
CSW
ihSW
Switch on LSW gSW
Switch representation Discretized model
Figure 4.6. ADC switch model
Furthermore, the ADC model requires selection of (i) the switch admittance gSW and
(ii) damping ratio σSW . For a specific time-step, selecting gSW is equivalent to selecting
the on-state inductance. To prevent artificial oscillations, one approach is to increase
σSW . Increasing σSW , corresponding to a selected inductance value, increases the off-
state capacitance and consequently the artificial switching loss. After each transition the
history current source is reset, therefore, the energy stored in the capacitor or inductor is
ignored. This results in artificial switching loss. The off/on and on/off artificial switching
52
losses are respectively proportional to the capacitance and inductance of the ADC model.
Therefore, increasing the capacitance and inductance of the ADC model increase the net
artificially-induced switching loss.
Based on the discussion above, there is a trade-off between minimizing the artificial
switching loss and preventing oscillations. One approach to damp the oscillations while
keeping the artificial switching loss whithin an acceptable range is to determine the
switch parameters by setting the energy loss of the on/off and off/on transitions to be
the same for the damping ratio of 0.85 < σ < 1.333 [10]. This range ensures damping
out the HF oscillations caused by the RLC circuit constructed by the on- and off-state
representation of switches in a PEC. The energy loss at each transition depends on
the corresponding voltage and current. Since it is impractical to consider currents and
voltages at all transitions, rated current and voltage of the PEC are considered. This
results in artificial switching loss of 3.7% of the rated power for a 21-pulse STATCOM
simulated with the time-step of 1.4µs [10]. Although 1.4µs is a relatively small time-
step, this loss is associated with unreal transients [29] and is not acceptable for many
case studies, e.g., those of Photovoltaic (PV) systems, especially for higher switching
frequencies.
In some applications, the ADC model is employed for switches that are not being
repetitively switched on and off, e.g., circuit breakers. Therefore, the artificial switching
loss is not a concern and thus the switch parameters are selected only to minimise oscilla-
tions [51]. Moreover, the status of a circuit breaker is determined by a separate external
signal and there is no need to account for natural commutation process. Therefore, in
this work ADC model is only used to represent non-repetitive switching events.
4.4 Partitioning Method
Partitioning methods are used to divide the power system mathematical model into
subsystems so they can be solved in parallel at each time-step and thus to reduce the
computational time. Since most algorithms for solving linear equations are O(n2 ) to
O(n3 ), where n is the size of the system matrix, thus, partitioning the equations into
m subsets of size n/m reduces the computational burden for solving each subset and
therefore that of the overall system. Based on evaluation of the partitioning methods for
simulation of PECs this section proposes a novel FPGA-based partitioning method.
Partitioning can be devised at modeling, discretization or solution stages. Therefore,
we categorize partitioning methods for simulation of PECs into three groups.
53
Modeling-level Partitioning
Modeling-level partitioning inserts artificial-element models or use alternative models for

power system elements to enable parallelism in the solution of the partitioned model.
A physical transmission line features natural delay between both ends and has been
used to develop multiple partitioning algorithms based on the use of short line (stub
line) [52] [53] [54]. Naturally, for a power system covering a vast geographic span, each
physical line provides inherent partitioning property. Otherwise, fictitious transmission
line models might be used. This method introduces parasitic elements and can lead to
oscillatory modes, therefore, modeling-level partitioning, based on the use of artificial
line, is not used in this work.
Discretization-level Partitioning
The differential equations describing power system components are discretized based on
stiff implicit LMS integration methods. If an explicit LMS method is used for a specific
branch, then the branch discretized model has zero element stamp in the admittance
matrix, e.g., that of a capacitor branch is only a voltage source. In this case, if the
capacitor is grounded, then the capacitor node voltage is a known variable at each time-
step. Moreover, if all the terminal nodes of a subsystem are connected to ground by
capacitor branches, then applying an explicit LMS method to these capacitor branches
partitions the subsystem equations from those of the rest of the system. This method
depends on the system configuration, since presence of capacitors is required.
Forward Euler (FE) explicit method can be used for discretization of the differential
equation of the shunt capacitors and series inductors for partitioning [55]. However, FE
as explained in section 4.2, suffers from instability and inaccuaracy issues and can result
in artificial oscillatory behaviour. Moreover, discretizing three-phase inductor branches,
using explicit methods, creates a current source cutset which violates the unique solv-
ability criteria of the network equations [4]. Therefore, discretization-level partitioning
is not suitable for three-phase PEC systems and not used in this work.
Solution-level Partitioning
Discretizing the differential equations of a system without transmission lines, using im-
plicit LMS methods, creates a single set of equations. Solution-level partitioning seeks
tearing this set of equations into two or more subsystems of equations as follows. At
each simulation time-step an equivalent circuit of each envisioned subsystem is found.
Then, all subsystem equivalent circuits are integrated and an equivalent of the overall
system is created. By solving the latter equivalent the voltage/current variables at the
54
terminals of each subsystem are calculated. Then, these voltage/current variables are fed
back to their corresponding subsystems as independent sources. Finally, the subset of
equations describing each subsystem is solved. GENE [28], MATE [56] , Diakoptics [57]
are examples of solution-level partitioning and share the same conceptual methodology
in solving a single set of linear equations.
In this thesis we propose a new method, for simulation of PECs, based on the solution-
level partitioning concept in which switches are represented by two-value resistor model.
The novelty of this method is in (i) the definition of the subsystems which facilitates
overcoming the difficulties associated with implementing the two-value resistor model for
systems that embed multiple PECs, (ii) being well-suited for FPGA-based implementa-
tion and (iii) the method of calculating the network equivalent for each subsystem which
enables parallelism between all tasks in the devised DRTS. This method is explained in
the following subsection.
4.4.1 Proposed FPGA-based Partitioning Method
The proposed method is based on defining each PEC Switching Network (PECSN) as a
subsystem. The rest of the system model, including the PEC energy storage elements,
i.e., capacitors and inductors, are considered as part of the “system network ”. The
proposed partitioning method comprises three steps, i.e.,
• Obtaining the equivalent circuit of the system external to each PECSN subsystem.
• Solving the equations of each PECSN, using the equivalent circuit, to obtain the
PECSN subsystem terminal variables.
• Using the resulting terminal variables of all PECSNs to solve network equations.
The following subsections describe the procedure to carry out these steps. The first
and third steps are carried out by matrix-vector multiplication task/module defined in
Chapter 3. However, the second step requires definition of a new task and therefore the
design of a new hardware module. We call this task and its corresponding hardware
module PEC simulation task and PEC module, respectively.
4.4.2 Parallel Computation of Network Equivalent
Assume a network which includes only one PEC, linear branches and independent sources.
Fig. 4.7(a) and (b) illustrate an example of such a system and its network discretized
55
model. The network equivalent, external to PECSN subsystem can be expressed as

" # " # " #
n+1
vn+1 vn+1 i
eq
= n+1 + Y−1
N eq
o
, (4.5)
in+1 ieq vn+1
o
where ion+1 and vn+1

o are the subsystem terminal variables, in+1 and vn+1 are the network
variables, vn+1
eq and in+1
eq are the equivalent sources and matrix YN eq is the equivalent
h iT
n+1
circuit admittance matrix. The equivalent source vector vn+1eq i eq can be obtained by
" #
vn+1
eq
= bn+1
eq = Heq bn+1
node . (4.6)
in+1
eq
Matrices Heq and YN eq are constant matrices and are determined by applying node
elimination to the network admittance matrix. bn+1 node is the RMNA injection vector
defined by (3.15). The equivalent of the network external to the PECSN is characterized
by its equivalent admittance matrix and the equivalent source vector ( YN eq , bn+1
eq ).
Since there is only linear branches in the circuit external to the subsystem, matrix
YN eq is constant, however, bn+1
eq must be determined in each time-step. To do so, ac-
n+1
cording to (4.6), vector bnode is required. This vector is produced only during time-step
n + 1. However, if only dynamic branches exist in the network, it can be calculated one
time-step ahead, i.e., at time-step n, as follows.
If only dynamic branches exist in the network, then based on (3.7) vector bn+1
node can
be expressed in a matrix form as
n+1
bnode = Ch bnnode + Cv xnnode , (4.7)
where i) Ch is a diagonal matrix containing history current source update coefficients ch

and ii) Cv is the branch incidence matrix containing current source update coefficients
cv from Table 3.1. Based on (4.6) and (4.7), bn+1
eq is expressed as
bn+1
eq = HeqRM N A bnnode , (4.8)
where matrix HeqRM N A is
HeqRM N A = Heq (Ch + Cv HRM N A ). (4.9)
Since bn+1eq can be calculated as a function of nodal injection vector at time-step n, i.e.,
n
bnode , it can be computed one time-step ahead. This is enabled by incorporating matrix
56
PEC switching network
Network
(a)
io1
v1n+1 vo3n+1
vo2n+1 i1n+1
vo1n+1 i2n+1
i3n+1
Discretized model of the network
(b)
Figure 4.7. (a) A system embedding one PEC (b) The circuit seen from the PECSN
HeqRM N A in the existing matrix-vector multiplication task, i.e.,

" # " #
xnnode HRM N A
n+1
= bnnode . (4.10)
beq HeqRM N A
Independent sources also can be included as follows. In the construction of matrices

Ch and Cv , ch s and cv s corresponding to independent sources are assumed to be 1 and
0, respectively. This results in one time-step delay in calculating the contribution of the
independent sources to bn+1
eq . However, this delay is negligible, comparing the simulation
time-step with the time period associated with the frequency of sources in power systems.
Performing (4.10) requires no modification to the digital hardware of the FPGA-based
DRTS of Chapter 3. It requires only to write the matrix of (4.10) in the BRAMs of the
matrix-vector multiplication module. Moreover, one can observe from (4.10) that vector
bn+1
eq is calculated at time-step n. Thus it enables calculating the network equivalent
sources bn+1
eq external to PECSN one time-step ahead. The second step in the proposed
partitioning method, i.e., PEC simulation task, requires vector bn+1
eq to calculate PECSN
57
h iT
n+1 n+1
terminal variables veq ieq as described in the following subsection. One time-step
ahead calculation of bn+1
eq enables parallelism between PEC simulation task and other
tasks defined in Chapter 3.
4.4.3 PEC Simulation Task
PEC simulation task is to use network equivalent parameters ( YN eq , bn+1

eq ) to solve
n+1 n+1
PECSN equations for terminal variables io and vo as shown in Fig. 4.8(a). MNA
SW5
v1n+1 SW1n+1 SW3
isw1
io1n+1 vo3n+1
vo2n+1
i3n+1 i2n+1 i1n+1

vo1n+1 SW2 SW4 SW6
(YNeq ,beqn+1)
(a)
v1n+1
io1n+1
vo2n+1 v n+1
vo1n+1 o3
Discretized model of the rest of the System
(b)
Figure 4.8. (a) The model of the PEC in the PEC simulation task, (b) The third step of the proposed
partitioning method
58
equations of PECSN are

" # " #
n+1 n+1
i v
Y0Conv (sn+1 ) on+1 = n+1 , (4.11)
vo i
where Y0Conv (s) is the admittance matrix corresponding to the PEC switch state vector
s. If a switch is on (or off), then its corresponding entry in the vector sn+1 is 1 (or 0).
Based on (4.5) and (4.11), PECSN terminal variables, considering switch state sn+1 , is
obtained by " #
in+1
o
= HConv (sn+1 )bn+1
eq , (4.12)
vn+1
o
where
HConv (sn+1 ) = (Y0Conv (sn+1 ) − Y−1 −1
N eq ) . (4.13)
To find switch state vector sn+1 , switch currents are also required. The vector of switch
currents is " #
n+1
i
in+1
SW = MSW (s
n+1
) on+1 , (4.14)
vo
where MSW (s) is the switch incidence matrix in the switching network containing switch
admittances. From (4.14) and (4.12) the switch current vector is
in+1
SW = HSW (s
n+1
)bn+1
eq , (4.15)
where
HSW (sn+1 ) = MSW (sn+1 )HConv (sn+1 ). (4.16)
Equations (4.12) and (4.15) relate the network equivalent source vector bn+1
eq to PEC
terminal variable vector and PEC switch current vector, respectively.
First step within the PEC simulation task is to determine switch states, using (4.15),
as follows. In general, individual switch states are determined based on (4.2). This
usually requires more than one iteration since a switch state transition, e.g., a change
in a gate signal, causes changes in the state of other switches. One of the advantages
of defining each PECSN as a separate subsystem is that each switch only affects the
switches internal to the same subsystem and not switches in other subsystems. The
approach taken in this thesis is to determine the state of PEC switches as one vector,
rather than examining each individual switch state.
PEC switch state vector sn+1 is affected by two factors, i.e., (i) switch gate signals
(only for the forced commutated switches) and (ii) state variables of the system, i.e.,
inductor currents and capacitor voltages. Vector bn+1
eq represents the net effect of inductor
59
currents and capacitor voltages of the overall network on PECSN. Thus we determine
the state of PEC switches as follows.
A combined switch and diode is assumed as one switch pair, therefore, asserting a
gate signal implies that the switch pair is on regardless of its current direction. At the
beginning of time-step n + 1 the gate signal vector gn+1 is used as the initial guess for the
state of PEC switches. The equivalent vector sources bn+1 eq and gn+1 are used to obtain
switch currents. To determine a switch state, signs of both switch current and voltage
are required. Since a switch is represented by the two-value resistor model, current and
voltage of each switch are proportional. Therefore, switch current sign is the same as
switch voltage sign. Considering the switch voltage and current convention in Fig. 4.5,
PEC switch state vector is determined by
sn+1 = (in+1
SW < 0)||g
n+1
. (4.17)
Once the switch states are known, PEC terminal variable vector can be obtained based
on (4.12), i.e., " #
n+1
io
bn+1
Conv = n+1
= HConv (sn+1 )bn+1
eq . (4.18)
vo
This completes PEC simulation task, i.e., the second step in the proposed solution-level
partitioning. To summarise, steps within PEC simulation task are:
1. calculate switch current vector in+1

SW from (4.15);
2. determine switch states from (4.17);
3. determine PEC output variables bn+1Conv from (4.18).
The last step in the proposed partitioning method is to use bn+1

Conv in the solution of
the main network as shown in Fig. 4.8(b). To take into account the effect of this vector
on the solution of the network, it is considered as part of the RMNA injection vector
defined in Chapter 3, i.e.,
h iT
n+1 n+1 n+1 n+1
bnode = bs ih bConv . (4.19)
Since execution of PEC simulation task takes longer than those of the other two tasks,
i.e., independent source calculation task and history current update task, its outputs are
scheduled as the last entries of the RMNA injection vector.
60
4.4.4 Partitioning Method for Multiple PECs
This section explains generalization of the proposed partitioning method for applications
to multiple PECs. For each PECSN, the equivalent circuit is characterized by an equiv-
alent admittance matrix and an equivalent source vector, i.e., YN eq and bn+1
eq . For each
n+1
PECSN, vector beq is calculated using matrix HeqRM N A by (4.10). Matrices HeqRM N A
and YeqN , corresponding to each PEC, are theoretically functions of other PEC switch
states. In this section we introduce provisions to provide constant YN eq and HeqRM N A
for a system that includes multiple PECs.
For the analysis of power system EMTs, simulation time-step must be about 50-100
times smaller than the smallest switching period, i.e., for 10kHz switching frequency,
this requires simulation time-steps in the range of 1µs. Discretizing the PEC inductor
and capacitor branch equations using a time-step imposed by this constraint satisfies
∆t 2C
gLconv = << 1, gCconv = >> 1. (4.20)
L ∆t
For example consider a 10kHz boost converter and its discretized model of Fig. 4.9.
The discrete admittance seen from the output of the converter for on- and off-states of
the diode are
L ∆t 2C
goutON = (( + RON )|| )−1 = 1000.01, goutOF F = = 1000.00. (4.21)
∆t 2C ∆t
One can observe this admittance is negligibly changed. Therefore, the admittance of the
equivalent circuit of the system external to another PECSN is also negligibly changed
by this switch status change in the boost converter. During each time-step each network
variable interacts with the other variables in direct way, i.e., affecting their values of
the same time-step, which is represented by admittance matrix, and indirect way, i.e.,
affecting their values of the next time-steps, which is represented by history current
sources. In this work it is assumed that the changes in the equivalent circuit of the
system external to each PECSN, due to a switch status change in another PECSN, can be
considered indirect. The main reason is that PECs are designed such that the switching
process of each PEC is independent of the other PECs. This enables use of constant
YeqN i , Heqi and therefore HeqRM N Ai matrices for each PECSN which is explained as
follows.
Consider a system embedding multiple PECs. Each PECSN is replaced by its terminal
voltage/current sources. Then node elimination for each PECSN is carried out while the
other switching networks are replaced by their corresponding voltage/current sources.
61
gout
D
L
C 100µH
500µF
(a) (b)
Figure 4.9. (a) Boost converter and (b) its discretized model
For a system including m PECs, the RMNA injection vector is defined as

h iT
bn+1
node = n+1T n+1T n+1T
bs ih bConv , (4.22)
where bn+1
Conv is h iT
bn+1
Conv = bn+1 n+1
Conv1 ...bConvm . (4.23)
bn+1
Convi is the i
th
PEC terminal variable vector. Vector of equivalent sources for ith PEC,
bn+1
eqi , is
bn+1 n
eqi = HeqRM N Ai bnode , (4.24)
where matrix HeqRM N Ai is defined by
HeqRM N Ai = Heqi (Chi + Cv HRM N A ). (4.25)
Matrix Chi is a diagonal matrix which is defined similar to Ch of (4.7). The only difference
in the m PEC case as compared with the single PEC case is that vector bnnode contains
other PECs terminal variables bnConvj s. Chi entries corresponding to bn+1 n+1
Convi and bConvj
(j 6= i) are set to 0 and 1, respectively. This implies that to obtain constant HeqRM N Ai
and YeqN i for each bn+1 n
eqi , other PECSNs are represented by bConvj (j 6= i) and other
branches are represented by their updated history current sources at time-step n + 1.
Based on the above development, the generalized partitioning method for a system
including m PECs is deduced. First the equivalent source vector for each PECSN can be
obtained, i.e.,
 n   
xnode HRM N A
 bn+1   H 
 eq1   eqRM N A1  n
 =  bnode . (4.26)
 :   : 
bn+1
eqm HeqRM N Am
62
Second, the PECSN terminal variables for each PEC are obtain by carrying out PEC
simulation task. The third step is to use these PEC switching terminal variables to solve
the network solutions. This is carried out by considering RMNA injection vector of (4.22)
and (4.23).
To carry out the PEC simulation task for each PEC, a separate PEC module is added
to DRTS as explained in Chapter 3 and the hardware design of this module is explained
in the following section.
4.5 Hardware Design of PEC Module

This section presents the PEC hardware module design that carries out PEC simulation
task. To enhance the FPGA-based DRTS proposed in Chapter 3, m modules are instan-
tiated to simulate a system embedding up to m PECs. The PEC simulation task, as
explained in the previous section, has three steps and a separate PU is designed to carry
out each step.
First and third steps of the PEC simulation task are to solve a set of linear equations
with a time-varying matrix. If the number of permutations of the matrix is not large, then
the first possible attempt is to pre-calculate the inverse matrices for all the permutations
and save them in the memory. However, if the matrix dimension and/or the number of
permutations is large, then the pre-calculation process is not feasible due to high memory
resource requirement. An alternative approach is to assemble and solve the equations
on-the-fly using LU factorization, or equivalently Gaussian elimination.
The dimensions of matrices HConvi and HSW i , which are used for steps one and three,
are not large since they only describe one PECSN. Moreover, since YN eqi is constant
for each PEC, changes in HConvi and HSW i are only due to switch state changes in the
PEC. The number of permutations of these matrices for each PEC is 2nSW i where nSW i
is the number of switches in the ith PEC. Therefore, matrices HConvi and HSW i are
pre-calculated for different permutations of switch states and saved in BRAMs for each
PEC. This highlights another advantage of the proposed partitioning method, i.e., the
memory requirement increases linearly with the number of PECs which is in contrast to
a general approach where the memory requirement exponentially grows with respect to
the number of net switches.
For instance, for a two-level VSC, the dimensions of matrices HConv and HSW are
4 × 4 and 6 × 4, respectively. Thus for all permutation of HConv and HSW , it requires
to save 1024 and 1536 32-bit floating-point numbers which only requires 1 and 2 36Kb
BRAMs, respectively. Considering that 1024 36Kb BRAMs are available on VC7VX485T
chip, the memory requirements for each PEC is negligible.
63
The design of PEC module is shown in Fig. 4.10. Steps one to three of PEC simulation
task, as defined in Section 4.4.3, are respectively assigned to PU1 to PU3. Since the first
and third steps are matrix-vector multiplications and inherently sequential, thus PU1 and
PU3 are designed to share the same floating-point cores. This reduces hardware resource
consumption and improves design efficiency. To save HConvi (si ) and HSW i (si ) for all
permutations of si , they are unrolled to vectors hConvi and hSW i , respectively. These
vectors are parameters of the ith PEC and depend on both the PEC topology/parameters
and system configuration and therefore are calculated prior to simulation by a MATLAB
script and written in BRAMs in the PEC module by MB soft-processor.
PECi
beqin+1
bconvin+1
g n+1 (hSWi,hconvi)
1 3
n+1
beqi iSW n+1
s n+1
bconvin+1
n+1 n+1 2 n+1 n+1
HSWi(g )beqi Hconvi(s )beqi
n+1
g
Figure 4.10. PEC Module
PU2 conducts the second step of the PEC simulation task based on (4.17) and uses a
dedicated floating-point comparison IP core. All floating-point comparisons are carried
out by the same core, using pipelining approach.
Fig. 4.11 depicts the computation engine which embeds four PEC modules. The
simulation task for each PEC is carried out based on one module per PEC. A MUX
is implemented to stream vector bn+1
eqi to the i
th
PEC module from the results of the
matrix-vector multiplication.
The overall latency of the PEC module, τ5 , is 120 clock cycles for PEC topologies up
to 6 switches. Fig. 4.12 shows the timing diagram of the PEC module and other modules
in the computation engine. For a stall-free pipeline, the latency of the modules should
64
Soft Processor
MicroBlaze
MUXs
Matrix Vector
Independent source bs Multiplication
calculation
vn , vp bnode
ih
History current update HRMNA b x
2
HRMNA beq
beqi bconvi
PECi
PEC
bconv4
PEC4 Concat
Figure 4.11. Computation engine for simulation of case studies embedding multiple PECs
satisfy
τ1 ≤ τ4 ,
τ2 ≤ l1 ,
(4.27)
τ5 ≤ (l1 + l2 ),
l1 + l2 + l5 = nb ,
where l5 is the length of vector bn+1 conv . To integrate the PEC modules into the design
presented in chapter 3 and satisfy the stall-free conditions, the same independent source
calculation and matrix-vector multiplication modules are used. This indicates the size of
bn+1 n+1 n+1
node is 135, as described in chapter 3. To incorporate bconv in vector bnode , instead the
length of vector in+1 h is shortened by l5 . The maximum number of PECs is considered
to be four and each PEC is represented by the maximum of four sources, therefore, the
length of bn+1 n+1
conv , l5 , is 16. Thus the length of the new vector ih , l2 , is 93.
The minimal time-step is:
∆tmin = τ4 = nb + τM ACC = 186 clock cycles. (4.28)
The design meets 160MHz timing constraints after implementation and bitstream gen-
eration. Therefore, the minimum time-step is 1.1625µs. Any larger time-step, e.g., 2µs
can be achieved by keeping the simulation modules halted for adequate number of clock
cycles.
Hardware utilization of each PEC module which is designed for simulation of a PEC,
65
l1 l2 l5
bs(1) bs(2) bs(26)

bsn+1
τ1 ih(1) ih(2) ih(93)
τ2
ihn+1
bco(1) bco(2) bco(16)
bconvn+1 τ5
bs(1) b(135)
τMACC
bnoden+1
τ4
a)
t-1n+1 t0n+1 t1n+1 t2n+1 t3n+1 t4n+1
Minimal time-step t-1n+2 t0n+2
τPU1 τPU2 τPU3

b)
Figure 4.12. Timing diagram of (a) computation engine modules (b) PUs in the PEC module
Table 4.3
Resource utilization of the PEC module

One PEC Module 2142(≤ 1%) 2009 (≈ 1%) 5(≤ 1%) 120 1
Available 607,200 303,600 2,800 - -
with the maximum of 6 switches, is shown in Table 4.3. Since PU1 and PU2 share
floating-point cores to maximize design efficiency, separate resource utilization for each
PU is not possible. One can observe that resource utilization for each PEC module is
fairly small. The bottleneck for resource utilization of this module, similarly to other
modules, is LUT utilization. However, this module consumes only about 1% of the total
available LUTs on a VC7VX485T chip.
4.6 Performance evaluation

To investigate feasibility and evaluate performance of the proposed partitioning method
and enhanced FPGA-based DRTS, simulation studies are conducted on the system of
Fig. 4.13(a) which includes two nominally identical PECs. The system can be a repre-
sentation of two nominally identical solar-PV units which are connected to a distribution
substation. Each unit model is shown in Fig. 4.13(b) and consists of a three-phase,
two-level VSC, a DC voltage source connected to a DC-link capacitor by an RL branch,
a second-order filter on the AC side and a ∆Y transformer. The system parameters are
66
given in Table 4.4.
VSC Ldc
DC 300V
AC System
Ldc
370V
DC 300V
(a)
LDC RDC
Llp Lls
CDC
rf Cf LConv
(b)
Figure 4.13. (a) Case study system embedding two PEC units (b) PEC unit model
The simulation case studies are conducted based on three approaches, i.e., (i) the
proposed partitioning method with the time-step of 2µs using the FPGA-based DRTS
platform, (ii) based on two-value resistor switch model with time-step of 2µs using PSIM
off-line simulation package to provide a reference solution and (iii) based on ADC switch
model with the time-step of 2.9µs using the RTDS-based real-time simulation platform.
Two case studies are conducted as follows.
4.6.1 Case I: Start-up

In the first case the system is at zero initial condition. Gate signals for two PECs are
generated using modulation indices of m1 = 0.4∠60◦ and m2 = 0.4∠0◦ , with respect to
the AC system voltage.
Figure 4.14 depicts the simulation results of the three-phase AC- and DC-side currents
of the PECs, for all three methods. The first column contains the PSIM-based simulation
results using the two-value resistor model, the second column includes the FPGA-based
DRTS real-time simulation results using the proposed partitioning method, and the third
67
Table 4.4
Parameters of the PEC station
Description Symbol value

Switching frequency FSW 5kHz
Dc-link inductance LDC 20µH
Dc-link resistance RDC 20mΩ
Dc-link capacitance CDC 10mF
PEC inductance LCconv 400µH
Filter capacitance Cf 100µF
Filter resistance rf 100Ω
Primary inductance Lp 1mH
Primary resistance rp 0.4Ω
Secondary inductance Ls 1mH
Secondary resistance rs 0.4Ω
Transformer turn ratio n1 : n2 1:1
column shows the RTDS-based real-time simulation results using ADC model. The first
and second rows are respectively three-phase AC-side currents of PEC1 and PEC2 and
the last row includes DC-side currents.
The three-phase AC-side currents, i.e., the first row of Fig. 4.14, experience tran-
sients and then reach steady state. Figure 4.14 shows that these currents have identical
behaviour as obtained by the three simulation platforms. However, the DC-side current
simulation results reveals deficiency of the ADC method. The active power exchange for
PEC2 should be close to zero since m2 = 0.8∠0◦ . Therefore, the DC-side current of PEC2
should be close to zero. It is evident that this is the case in the PSIM-based reference
results and those of the proposed partitioning method. However, the ADC-based results
indicate around 50A which is due to the artificial switching loss of the ADC method. For
PEC2 the DC-side current in the PSIM-based reference results and those of the proposed
partitioning method is around 60A. However, this current is around 100A from the ADC-
based method which is also due to the artificial switching loss. Figure 4.14 also shows
that the DC-side current from the proposed partitioning method exhibit similar transient
and steady state behaviour as those of the PSIM-based reference results, however, results
from the ADC method exhibit different dynamic and steady state behaviour.
4.6.2 Case II: Rectifier Mode

In Case II the system of Fig. 4.13 is initially operating under the steady-state loading
conditions described in Case I and the gate signals of both PEC units are blocked at
t = 0.1s. Therefore, both PECs enter the rectifier mode of operation through anti-
68
Proposed Partitioning Reference ADC

150 150 150
iabcPEC1(A) 100 100 100
50 50 50
0 0 0
−50 −50 −50
−100 −100 −100
−150 −150 −150
0 0.05 0.1 0 0.05 0.1 0 0.05 0.1
200 200 200
100 100 100

iabcPEC2(A)
0 0 0
−100 −100 −100
−200 −200 −200

0 0.05 0.1 0 0.05 0.1 0 0.05 0.1
150 150 150
100 100 100

idc1 & idc2(A)
50 50 50
0 0 0
−50 −50 −50
−100 −100 −100

0 0.05 0.1 0 0.05 0.1 0 0.05 0.1
Time(s) Time(s) Time(s)
Figure 4.14. Simulation results of Case I, m1 = 0.4∠60◦ and m2 = 0.4∠0◦ , 1st (left) column includes the
FPGA-based DRTS real-time simulation results using the proposed partitioning method, 2nd column
contains the PSIM off-line simulation results using the two-value resistor reference method and 3rd
column has the RTDS-based real-time simulation results using the ADC method.
parallel diodes of the IGBT switches.

The simulation results of the three methods for this case are shown in Fig 4.15 which
show the three-phase AC-side currents of the ADC method exhibit oscillations subsequent
to the gate-blocking instant. These oscillations are due to the artificial capacitors and
inductors of the ADC method. However, the results from the proposed partitioning
method do not suffer from such oscillations.
One can observe that the DC-side currents of all three methods have same the average
of −50A. The reason is that the artificial switching loss of the ADC method is negligible in
this case and therefore active power exchange or equivalently the DC-side average currents
for all three methods are the same. Figure 4.15 also shows the DC-side current transient
and steady state behaviour from the proposed partitioning method closely match those
of the PSIM-based reference results. However, the DC-side current transient and steady
state behaviour from the ADC method differ from the PSIM-based reference results due
to the oscillations caused by the artificial inductors and capacitors of the method.
69
Proposed Partitioning Reference ADC

100 100 100
50 50 50
iabcPEC1(A)
0 0 0
−50 −50 −50
−100 −100 −100

0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16
100 100 100

iabcPEC2(A)
0 0 0
−100 −100 −100
0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16
100 100 100

50 50 50
idc1 & idc2(A)
0 0 0
−50 −50 −50
−100 −100 −100
−150 −150 −150
0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16 0.08 0.1 0.12 0.14 0.16
Time(s) Time(s) Time(s)
Figure 4.15. Simulation results of Case II, m1 = 0.4∠60◦ , m2 = 0.4∠0◦ and the gate signals of both
PECs are blocked at t=0.1s, 1st (left) column includes the FPGA-based DRTS real-time simulation
results using the proposed partitioning method, 2nd column contains the PSIM off-line simulation results
using the two-value resistor reference method and 3rd column has the RTDS-based real-time simulation
results using the ADC method.
This chapter presents the approach to avoid ringing phenomenon by discretizing the
model of the inductors connected to switching nodes using BE integration method. It
also proposes a novel FPGA-based partitioning method to overcome the challenges as-
sociated with the implementation of two-value resistor model for FPGA-based real-time
simulation which is achieved by considering each PECSN as one subsystem. The method
is generalized for application to systems including multiple PECs and features parallel
processing capability for the PEC simulation task and the other tasks in the simulation
of the network, as explained in the previous chapter. Based on this method, a PEC hard-
ware module is designed, using FP numerical representation, and replicated to enhance
the FPGA-based DRTS for real-time simulation of systems embedding multiple PECs.
70
Simulation case studies are conducted based on three approaches, i.e., (i) the approach
based on the proposed partitioning method using the FPGA-based DRTS platform, (ii)
the approach based on two-value resistor switch model using PSIM off-line simulation
package to provide a reference solution and (iii) the approach based on ADC power switch
model using the RTDS-based real-time simulation platform and it is shown that the
proposed partitioning method and FPGA-based DRTS outperform the approach based
on ADC model. The advantages of the proposed partitioning method and FPGA-based
DRTS are:
• The proposed partitioning method maintains constant network admittance matrix,

while using two-value resistor model to represent power switches in PECs.
• The proposed partitioning method is generalized for application to multiple PECs
which enables real-time simulation of multiple PECs by only instantiating multiple
PEC modules in the FPGA-based DRTS.
• The proposed partitioning method enables parallelism between the PEC modules
and the existing modules in the FPGA-based DRTS, therefore, small simulation
time-step is achieved despite using FP numerical representation.
• The proposed PEC module requires relatively small FPGA logic and memory re-
sources to save the parameters and matrices.
• The enhanced FPGA-based DRTS is designed such that it does not require hard-
ware code re-compilation for new systems.
71
Chapter 5
FPGA-Based Simulation of
Electrical Machines (EMs) for EMT
Analysis
5.1 Introduction
Electrical Machines (EMs) are indispensable power components of power systems and
there are multiple EM models, with different degrees of details, for different types time-
domain analysis of power systems. For EMT simulation studies lumped element models
adequately emulate system-level behaviour of EMs. The main course of this chapter is
to develop an FPGA-based EM model/module to enable real-time simulation of systems
that embed multiple EMs, regardless of configuration and parameters of the system. The
proposed model/module (i) is compatible with the partitioning algorithm and DRTS de-
scribed in the previous chapter to accommodate EM hardware module into the devised
DRTS, (ii) can operate in parallel with the rest of modules in the devised DRTS to enable
small simulation time-step, (iii) can accurately emulate behaviour of EMs regardless of
EM parameters, system configuration and under all operating scenarios and (iv) is ca-
pable of representing different types of EMs, i.e., Induction Machine (IM), Synchronous
Machine (SM), and Permanent Magnet Synchronous Machine (PMSM), using the same
hardware with appropriate parameters to avoid hardware code re-compilation and effi-
cient FPGA resource utilization.
Section 5.2 describes different EM models and interfacing techniques for power system
EMT simulation and identifies advantages and disadvantages of each model in terms of
FPGA-based real-time simulation and concludes that (i) Constant-Parameter-Voltage-
Behind-Reactance (CPVBR) model suits FPGA-based real-time simulation and (ii) is
72
CHAPTER 5. FPGA-BASED SIMULATION OF ELECTRICAL MACHINES (EMS)
FOR EMT ANALYSIS
compatible with the RMNA formulation and the proposed partitioning method described
in Chapters 3 and 4, respectively. Section 5.3 proposes a CPVBR-based EM model that
is tailored for FPGA-based real-time simulation and adopts a novel LMS integration
method for EMs with dynamic saliency, i.e., L”d 6= L”q . Based on the proposed model,
discretized EM model is developed and EM simulation task defined. Section 5.4 presents
the corresponding EM hardware module to carry out this task. This module is used to
enhance the FPGA-based DRTS for real-time simulation of EMs. Section 5.5 presents
simulation case studies using the enhanced FPGA-based DRTS. Section 5.6 concludes
and highlights contributions of this chapter.
5.2 EM Models for EMT Analysis
Depending on the objectives, different EM models are utilized for EMT analysis. Finite
Element Models (FEMs) are used for core loss analysis and/or evaluation of new EM
designs [58]. This level of details is neither necessary nor computationally feasible for
system-level EMT studies where the focus is mostly on the behaviour of EMs in terms
of their interactions with other power system components.
System-level EM models for EMT studies are categorized as Phase Doamin (PD)
models, Quadrature-Direct (QD) models and Voltage-Behind-Reactance (VBR) models.
These models are mathematically equivalent, however, due to the different choices of state
variables, input/output variables and structure, they feature different characteristics for
digital time-domain simulation, e.g., accuracy, input-output compatibility with the other
compnenet models and computational efficiency [59]. PD and VBR are direct models and
QD is an indirect model which requires an interfacing technique to enable its integration
into the host power system model. This section evaluates these models and interfacing
methods to devise a suitable modeling approach to represent EMs in the DRTS in this
thesis.
5.2.1 PD Model
PD models represent EM using a magnetically-coupled inductance matrix. PD model of

a synchronous machine is [60] [61] expressed by
" # " # " #
vabcs iabcr d λabcs
=R + (5.1)
vqdr iqdr dt λqdr
73
FOR EMT ANALYSIS
where subscript abcs denotes stator coordinate and subscript qdr denotes rotor coordinate
and flux linkages are " # " #" #
λabcs Lss Lsr (θ) iabcs
= . (5.2)
λqdr LTsr (θ) Lrr iqdr
| {z }
L(θ)
One can observe that inductance matrix L(θ) is a function of rotor position θ. The
electromagnetic torque in the PD model is expressed by
" #T " #
1 iabcs ∂ iabcs
τe = L(θ) , (5.3)
2 iqdr ∂θ iqdr
and mechanical dynamics of the rotor are represented by
dθ
= ω, (5.4)
dt
dω P Bf
= (τe − τm ) − ω. (5.5)
dt 2J J
The main advantage of PD model is its direct interface to the network. This implies
no interfacing algorithm is required for a PD model since the stator variables, the same
as those of the other power system component models, are directly in abcs coordinates.
Equations (5.2) and (5.3) indicate that EM model is mathematically non-linear.
Therefore, the differential equations describing EM electrical and mechanical dynam-
ics are a set of coupled and non-linear differential equations. Thus their direct solution
requires an iterative method which necessitates multiple LU factorization at each time-
step. Since discretized admittance matrix of network contains admittance matrix of EM
model, at each time-step a change in rotor position changes the PD model inductance
matrix and consequently the network admittance matrix. This requires on-the-fly LU
factorization and is a major drawback for real-time analysis since it significantly increases
the computational burden.
Another drawback of PD model is that this model makes the system more numeri-
cally more stiff and therefore necessitates smaller time-step to achive same accuracy as
compared to other EM models [59] [62]. Due to these drawbacks, PD model is not used
in this thesis.
5.2.2 QD Model
The QD model [1] resolves the rotor-position-dependent admittance matrix limitation of
the PD model by transferring the stator variables from abcs coordinates to qd0 coordi-
74
FOR EMT ANALYSIS
nates using transformation

xqd0s = T(θ)xabcs , (5.6)
where  
cos(θ) cos(θ − 32 ) cos(θ + 23 )
2
T(θ) = sin(θ) sin(θ − 32 ) sin(θ + 23 ) . (5.7)

3 1 1 1
2 2 2
It should be noted that QD model considers q-, d - and also zero sequence components.
Based on this transformation, (5.1), (5.2) and (5.3) respectively become [1]
" # " # " # " #
vqd0s iqd0s d λqd0s λr
=R + + ωr , (5.8)
vqdr iqdr dt λqdr 0
" # " #
λdq0s iabcs
= LQD , (5.9)
λqdr iqdr
and
3P
τe = (λmd iqs − λmq ids ). (5.10)
4
In general the differential equations of a system containing EMs are non-linear, there-
fore, a general Newton iterative method [4] can be used to solve these equations. EMTP-
RV uses such an approach [33], however, iterative methods are not feasible for real-time
simulation due to their excessive computational burden and unpredictable number of
iterations. There are three approaches to interface a QD model with that of the rest of
the system in abcs coordinate, without iterative process, as follows.
5.2.2.1 Current Injection-Based Approach
Based on this approach, at each time-step EM terminal voltages from the previous time-
step simulation results (in abc coordinates) are transformed into the dq0 coordinates,
using predicted mechanical variables and used to calculate the machine three-phase cur-
rents in the QD model. Then, these currents are injected into the network. This method
is simple yet it is prone to instability and inaccuracy [33].
To enhance numerical stability, it uses additional parallel resistive or capacitive snub-
ber branches [33]. The effect of snubber branches can be compensated using another
current source. The numerical instability issue is more significant when EM terminals
are almost open circuit [33], e.g., the EM is connected to a diode rectifier bridge or
equivalently a two-level VSC with blocked gate signals.
The existing FPGA-based real-time simulators use the QD model with current injec-
tion interface approach for its simplicity [18] [22] [31]. However, QD model with current
75
FOR EMT ANALYSIS
injection interface fails to simulate EMs under all system configurations and operational
scenarios, therefore, is not used in this work.
5.2.2.2 Prediction-Based Approach
This approach interfaces the equivalent circuit of discretized EM equations with dis-
cretized model of the rest of the system [33]. Since the power system is represented in
abc frame, this approach requires the EM QD model in the abc frame. If the q- and d -
axis resistances of the discretized EM QD model are the same, then the EM discretized
admittance in the abc frame is time-invariant. However, if they are different, e.g., due to
rotor saliency, the resulting admittance is time-variant. To address the time-varying ad-
mittance issue the average of the two resistances is considered on both axes [63]. On each
axis a voltage source is used to compensate the difference caused by this approximation.
The values of the voltage sources depend on the stator currents. Predicted stator current
values are used to calculate these voltage sources which adversely affects accuracy of the
simulation results and numerical stability [33]. Therefore, prediction-based interface is
also avoided in this work.
5.2.2.3 Compensation-Based Approach
Compensation method [64] interfaces the machine model with that of the rest of the
system in a similar way a non-linear circuit model is interfaced with a linear circuit. At
each time-step the network equivalent in abc frame is calculated and transferred to qd0.
Then, the network equivalent model is interfaced with the EM model and the stator
currents are calculated. These currents are injected into the network, as ideal current
sources, in the same time-step. This interfacing method is mathematically precise, how-
ever, calculation of the network equivalent circuit in qd0 frame can be computationally
expensive, especially in the presence of PECs which repetitively change the network
equivalent admittance matrix. This process can not be computationally afforded in real-
time simulation and therefore is not used in this work.
5.2.3 Voltage Behind Reactance (VBR) Model

VBR models [59] are direct, i.e., do not need any interfacing method. Figure 5.1 shows
VBR schematic structure where L” and e” are the so-called sub-transient inductance
matrix and sub-transient voltages, respectively, and related by
d
v abcs = Rs iabcs + L” (θ) iabcs + e”abcs . (5.11)
dt
76
FOR EMT ANALYSIS
VBR models [59] [65] [66] [67] separate dynamics of fast and slow state variables, i.e.,
stator current and rotor flux linkage variables, respectively. Since e”abcs is a function of
slow rotor variables, i.e., flux linkages and mechanical variables, the use of one time-step
delay in calculation of the sub-transient voltages is applicable.
vabcs e"abc
iabcs RS L"(θ)
Figure 5.1. General VBR Model
VBR and PD models are direct methods, however, VBR models impose less compu-
tational burden and provides better eigen-structure [59], i.e., can use a larger time-step
as compared to PD models for the same degree of accuracy.
The main drawback of VBR models is that inductance matrix L” (θ) is rotor-position-
dependent. The reason is that sub-transient inductances of d- and q-axis, due to dynamic
saliency, are not necessarily the same.. In case of an IM, the sub-transient inductances are
equal, therefore, matrix L” is not rotor-position-dependent. This reduces computational
burden and constitutes Constant-Parameter-VBR (CPVBR) model. CPVBR model is
not readily applicable to EMs with unequal sub-transient inductances. There are two
existing approaches to obtain CPVBR model for EMs, as follows.
The first approach adds an extra fictitious winding in either d or q axes [68]. The wind-
ing inductance is selected such that the resulting sub-transient inductances are the same.
The winding resistance must be considered large enough not to impact low-frequency
dynamics of the EM [3]. However, a large winding resistance makes the model more
numerically stiff and therefore reduces simulation accuracy. More importantly, there are
high frequency discrepancies between the actual EM model and its CPVBR model as
a result of fictitious winding [68]. This approach is not used in this work since high
frequency dynamics of EMs are also of importance, especially in system configurations
where EM is interfaced with high switching frequency PECs.
The second approach moves time-varying part of the sub-transient inductance to
the sub-transient voltage terms which are calculated by one time-step delay [69] [66].
The time-step delay applied to stator voltage/current variables is equivalent to using an
explicit LMS integration method and thus can lead to numerical instability, especially
for system configuration with numerically stiff equations, e.g., an EM with large dynamic
77
FOR EMT ANALYSIS
saliency connected to a three-phase diode rectifier. Therefore, this approach is also not
used in this thesis since one of the objectives is to emulate EM behaviour regardless of
the EM parameters and system configurations.
5.3 Proposed EM CPVBR Model

The envisioned EM model in this work should fulfil the following requirements.
• Enable parallelism to achieve small time-step real-time simulation of systems that

embed EMs. The EM model/module also needs to be compatible with the structure
of the formulation/DRTS developed in Chapter 3 which requires the EM module
to operate in parallel with other simulation modules in DRTS.
• Be compatible with the partitioning method devised for simulation of PECs in

Chapter 4, since in the emerging power systems most EMs can be closely coupled
with PECs.
• Provide a general and consistent model structure such that its fixed hardware mod-
ule design is able to represent different types of EMs, i.e., IM, PMSM and SM just
by changing the module parameters and without the need for re-compilation of the
hardware code.
• Be capable of emulating an EM behaviour under all possible steady-state and dy-

namic scenarios regardless of the system connectivity and EM parameters, e.g.,
dynamic saliency.
To fulfil the first requirement, i.e., parallel operation of EM module and other modules
in DRTS, the EM module must start its computation at the beginning of each time-step.
Since electrical inputs of a CPVBR model, i.e., stator currents from previous time-step
and field voltage are available at the beginning of each time-step, therefore, CPVBR
models also fulfil the first requirement.
The PEC simulation algorithm of Section 4.4 is based on the assumption that the
network connecting the PECSNs has constant parameters. Since a CPVBR model does
not contradict with constant admittance matrix representation of the network, this as-
sumption is not violated. Thus a CPVBR model also meets the second requirement.
Stator equations of a EM based on CPVBR model are represented by constant-
parameters which in this work are a part of the network. The fixed hardware matrix-
vector multiplication module, designed in Chapter 3 to solve the network equations,
accommodates these equations, however, to fulfil the third requirement a fixed hardware
78
FOR EMT ANALYSIS
module to solve the rotor equations is required. This necessitates a general form of rotor
equations capable of representing different types of EMs, which is explained in the next
subsection.
To fulfil the fourth requirements, a novel approach based a new numerically stiff
LMS integration method is adopted to obtain CPVBR model for system configuration
with numerically stiff equations, e.g., an EM with large dynamic saliency connected to a
three-phase diode rectifier. This approach is presented in Subsection 5.3.2.
5.3.1 CPVBR Continuous-time Model

CPVBR model is derived from EM equations in qd0 coordinates and Fig. 5.2 depicts the
model for different types of EMs in q- and d - axis. The circuit regarding zero sequence
equation is not included in this figure, however, it is considered in the equations as
explained later in this section. Fig. 5.2(a) represents an SM with two damper windings
on q-axis, one damper winding and a field winding on d-axis. Fig. 5.2(b) represents a
PMSM including one damper winding on each axis, Fig. 5.2(c) represents a single cage
IM and Fig. 5.2(d) represents a double cage IM.
Figure 5.3 illustrates the EM model adopted in this thesis. The rotor is represented
by one field winding, one damper winding and a permanent magnet on d -axis and two
damper windings on q-axis. It should be noted that this model does not represent a
practical EM since its corresponding hardware module design is intended to represent
different types of EMs of Fig. 5.2. In this work, the EM of Fig. 5.3 is referred to as
“generalized EM”.
CPVBR models separate the differential equations describing the dynamics of the
fast varying state-variables of the stator windings, i.e., the stator currents and the slowly
varying state-variables of the rotor windings, i.e., rotor flux linkages. To do so the
rotor differential equations must be expressed as a function of the EM state-variables.
The course of this subsection is to obtain stator and rotor differential equations of the
generalized EM as a function of state-variables and in a general form to enable fixed
hardware EM module design for FPGA-based real-time simulation.
Stator voltages, including zero sequence component, and rotor flux linkages of the
generalized EM are
d
vsq = rs isq + ωλsd + λsq
dt
d
vsd = rs isd − ωλsq + λsd (5.12)
dt
d
vs0 = rs is0 + λs0 ,
dt
79
FOR EMT ANALYSIS
Llrq1 rrq1 Llf rf

ωλ ds iqs ωλ qs
rs Lls ids rs Lls
Llrq2 rrq2 Llrd2 rrd2
vqs Lmq vds Lmd vf
q axis d axis
(a) SM
ωλ ds iqs ωλ qs
rs Lls Llrq1 rrq1 ids rs Lls Llrq1 rrq1
imag
vqs Lmq vds Lmd
q axis d axis
(b) PMSM
ωλ ds iqs ωλ qs
rs Lls Llrq1 rrq1 ids rs Lls Llrq1 rrq1
vqs Lmq vds Lmd
q axis d axis
(c) Single- cage IM

ωλ ds iqs ωλ qs
rs Lls ids rs Lls
vqs Lmq vds Lmd
q axis d axis
(d) Double- cage IM
Figure 5.2. The model of different types of EMs in qd coordinates, (a) Synchronous Machine, (b)
Permanent-Magnet Synchronous Machine including two damper windings, (c) Single-cage Induction
Machine, (d) Double-cage Induction Machine
and
d
0 = rrq1 iqr1 + λqr1
dt
d
0 = rrd1 idr1 + λdr1
dt (5.13)
d
0 = rrq2 iqr2 + λqr2
dt
d
vf = rrd2 idr2 + λdr2 ,
dt
80
FOR EMT ANALYSIS

ωλ ds iqs ωλ qs
rs Lls ids rs Lls
imag
vqs Lmq vds Lmd vf
q axis d axis
Figure 5.3. The general EM model in qd coordinates
respectively. Magnetization fluxes, as functions of the EM state variables, are
λrq1 λrq2
λmq = L”mq (iqs + + ),
Lrq1 Lrq2
(5.14)
λrd1 λrd2 λmag
λmd = L”md (ids + + + ),
Lrd1 Lrd2 Lmd
where L”mq and L”md are sub-transient magnetizing inductances, i.e.,
L”mq = (Lmq ||Lrq1 ||Lrq2 )

(5.15)
L”md = (Lmd ||Lrd1 ||Lrd2 ).
Stator flux linkages are expressed by
λsq = L”q isq + λ”q ,

λsd = L”d isd + λ”d , (5.16)
λs0 = Lls is0 ,
where sub-transient flux linkages λ”q and λ”d , as functions of rotor flux linkages, are
λrq1 λrq2
λ”q = L”mq ( + ),
Lrq1 Lrq2
(5.17)
λrd1 λrd2 λmag
λ”d = L”md ( + + ),
Lrd1 Lrd2 Lmd
and sub-transient inductances L”q and L”d are
L”q = L”mq + Lls ,

(5.18)
L”d = L”md + Lls .
81
FOR EMT ANALYSIS
The rotor differential equations must be written as a function of the EM state-

variables. To do so, rotor currents of (5.13) are expressed in terms of rotor flux linkages
and magnetizing fluxes, i.e.,
d λrq1 − λmq
λrq1 = − rrq1 ,
dt Lrq1
d λrd1 − λmd
λrd1 = − rrd1 ,
dt Lrd1
(5.19)
d λrq2 − λmq
λrq2 = − rrq2 ,
dt Lrq2
d λrd2 − λmd
λrd2 = − rrd2 + vf .
dt Lrd2
λmq and λmd are not state-variables, therefore, they are expressed as functions of the EM
state-variables based on (5.14), i.e,
d rrq1 L”mq rrq1 L”mq rrq1 L”mq

λrq1 = − (1 − )λrq1 + λrq2 + isq ,
dt Lrq1 Lrq1 Lrq1 Lrq2 Lrq1
d rrd1 L” rrd1 L”md rrd1 L”md rrd1 L”md
λrd1 = − (1 − md )λrd1 + λrd2 + isd + λmag ,
dt Lrd1 Lrd1 Lrd1 Lrd2 Lrd1 Lrd1 Lmd
(5.20)
d rrq2 L”mq rrq2 L”mq rrq2 L”mq
λrq2 = λrq1 − (1 − )λrq2 + isq ,
dt Lrq2 Lrq1 Lrq2 Lrq2 Lrq2
d rrd2 L”md rrd2 L”md rrd2 L”mq rrd2 L”md
λrd2 = λrd1 − (1 − )λrd2 + isd + λmag + vf .
dt Lrd1 Lrd2 Lrd2 Lrd2 Lrd2 Lrd2 Lmd
Equation (5.20) is written in the standard state-space form, i.e.,
λr = Arλ r + Br ur
λ̇
(5.21)
yλ = Crλ r + Dr ur ,
where h iT
λ r = λqr1 , λdr1 , λqr2 , λdr2 , (5.22)
h iT
¯
ur = isq , isd , λmag , vf , isd , (5.23)
h iT
” ”
yλ = λmq λmd λq λd . (5.24)
Matrices Ar , Br ,Cr and Dr are given in Appendix A. Equation (5.21) enables designing
a fixed hardware EM module to represent different types of EMs for FPGA-based real-
time simulation. The magnetizing flux linkages λmq and λmd , and the sub-transient flux
linkages λ”q and λ”q , are considered as output variables of (5.21) to facilitate calculation
82
FOR EMT ANALYSIS
of electromagnetic torque and the sub-transient voltages.
Stator equations of (5.12) using the flux linkages of (5.16) are
d ”
vsq = rs isq + ω(L”d isd + λ”d ) + (L isq + λ”q ),
dt q
d
vsd = rs isd − ω(L”q isq + λ”q ) + (L”d isd + λ”d ), (5.25)
dt
d
vs0 = rs is0 + (Lls is0 ),
dt
which can be expressed in a voltage-behind-reactance form, i.e.,
d ”
vsq = rs isq + ωL”d isd + L isq + e”q ,
dt q (5.26)
d
vsd = rs isd − ωL”q isq + L”d isd + e”d ,
dt
where e”q and e”d are sub-transient voltages defined by
d ”
e”q = ωλ”d + λ
dt q (5.27)
d
e”d = −ωλ”q + λ”d .
dt
Then stator voltage/current variables are transferred to the abc coordinates by
xabcs = T−1 (θ)xqd0s , (5.28)
where  
cos(θ) sin(θ) 1
T−1 (θ) = cos(θ − 23 ) sin(θ − 32 ) 1 . (5.29)
 
cos(θ + 23 ) sin(θ + 32 ) 1
Based on (5.28), the voltage-behind-reactance expression of (5.26) can be transferred to
abcs coordinates as
d
v abcs = rs iabcs + L” (θ) iabcs + e”abcs , (5.30)
dt
83
FOR EMT ANALYSIS
where
 
L” +L” L” +L” L” +L”
Ls + mq 3 md − mq 3 md − mq 3 md
L”mq +L”md L” +L” L” +L”
 
L” (θ) =
 − 3
Ls + mq 3 md − mq 3 md  
L” +L” L” +L” L”mq +L”md
− mq 3 md − mq 3 md Ls + 3
  (5.31)
2π 2π
cos(2θ) cos(2θ − ) cos(2θ + )
L”d − L”q  2π
3
4π
3
+ cos(2θ − 3 ) cos(2θ − ) cos(2θ)  .

3 3
cos(2θ − 2π
3
) cos(2θ) cos(2θ + 4π
3
)
If EM does not have dynamic saliency, i.e., L”q = L”d , then the rotor-position-depenent
term of (5.31) is zero and therefore (5.30) can be rewritten as [3]
d d
vabcs = rs iabcs + L”q iabcs + L0 3is0 + e”abcs , (5.32)
dt dt
which can be represented by the constant-parameter circuit of Fig. 5.4 [3]. However,
this is not possible for EMs with dynamic saliency, i.e, L”q 6= L”d . A method to obtain
CPVBR model for such EMs is explained in the next subsection.
ea” Lq” rs
L0 eb”
ec”
Figure 5.4. The constant parameter interface for CPVBR model [3]
5.3.2 Proposed LMS Integration Method for EMs with Dy-

namic Saliency
In this work CPVBR model is obtained using a new implicit LMS integration method.
Without loss of generality it is assumed that L”q ≥ L”d and EM stator equations of (5.26)
are rewritten as
d ”
vsq = rs isq + ωL”q isd + L isq + e”q2 ,
dt q (5.33)
d
vsd = rs isd − ωL”q isq + L”d isd + e”d2 ,
dt
84
FOR EMT ANALYSIS
where new sub-transient voltages are defined as
d ”
e”q2 = ωλ”d + λ + ω(L”d − L”q )isd
dt q (5.34)
d
e”d2 = −ωλ”q + λ”d .
dt
The proposed LMS integration method is expressed as
xn+1 + (a − 2)xn + (1 − a)xn−1 = ∆t a f (xn+1 , tn+1 ), 0 < a ≤ 1, (5.35)
where a is the parameter of the method and its first and second characteristic polynomial
[4] are respectively defined by
ρ(z) = z 2 + (a − 2)z + (1 − a) and δ(z) = az 2 . (5.36)
The method is zero-stable since ρ(z) satisfies the root condition, i.e., the roots of ρ(z),
z1 = 1 and z2 = (1 − a), are either inside or on the unit circle with multiplicity of 1. This
method is also consistent [4] since
ρ(1) = 0 and ρ0 (1) = δ(1). (5.37)
Satisfying the root condition and consistency indicates the proposed method is conver-
gent, i.e., it produces accurate results as ∆t → 0 [4].
An LMS integration method for numerically stiff differential equations, e.g., those of
power system EMTs, must be stiffly stable. An LMS integration method is stiffly stable
if its region of absolute stability covers a region similar to that of Fig. 5.5 [4]. The
region of absolute stability for the proposeed LMS integration method is obtained by
investigating its third characteristic polynomial [4] π(z, ĥ), i.e.,
π(z, ĥ) = ρ(z) − ĥδ(z). (5.38)
The region of absolute stability is obtained by solving for parameter ĥ corresponding to

z inside the unit circle, i.e., |z| ≤ 1. Fig. 5.6 depicts regions of absolute stability of the
proposed method for different values of parameter a which cover areas similar to that of
Fig. 5.5. This indicates the method is applicable to numerically stiff equations. It should
be noted that for a = 32 , this method is the same as the second-order backward difference
method [4] and the same as BE method for a = 1. This method is used to obtain EM
CPVBR model as follows.
85
FOR EMT ANALYSIS
jb
-c
-jb
Figure 5.5. Region of absolute stability requirement for a stiffly stable LMS integration method [4]
−2
−1 −1
0 0
1 1
2
−1 0 1 2 3 −1 0 1 2 3
a=0.5 a=0.66
−1
−1
−0.5
0 0
0.5
1
1
−1 0 1 2 −1 0 1 2
a=0.8 a=0.99
Figure 5.6. Region of absolute stability of the proposed LMS for different values of parameter a
Discretized differential equation of an inductor L, using the proposed method, is

a
in+1
L + (a − 2)inL + (1 − a)in−1
L = ∆t vLn+1 . (5.39)
L
86
FOR EMT ANALYSIS
The sub-transient inductor on the d - axis L”d , is discretized based on (5.39) using a =
L”d
L”
< 1, i.e.,
q
L”q n+1 L”d − L”q n n−1

n
(isd − isd ) + (isd − isd ) = vLn+1 , (5.40)
∆t ∆t
and Fig. 5.7 depicts the discretized model of this inductor. The model includes an
admittance, a history current source and a history voltage source. The history current
source and the parallel admittance are the same as those of the discretized model of L”q ,
using BE integration method. The history current source and the parallel admittance
correspond to the first term in the left side of (5.40) and the history voltage source
corresponds to the second term. This concludes if inductor L”d is disctretized, using the
proposed method, the resulting model is the same as that of an inductor L”q , disctretized
n+1
using BE method, in series with the history voltage source vhLd , which is defined by
n+1
L”d − L”q n n−1
vhLd = (isd − isd ). (5.41)
∆t
ihLdn+1
iL vL vLdn+1
L isdn+1
gLd vhLdn+1
Continuous-time inductor Inductor discretized model circuit equivalent
model using the proposed LMS
Figure 5.7. Inductor discretized model circuit equivalent
n+1
The above discussion shows that, using the proposed method while vhLd is considered
n+1
as part of the sub-transient voltage ed2 , then the inductor L”d in (5.33) can be replaced by
L”q . Therefore, the rotor-position-dependent matrix term in (5.31) is zero, i.e., CPVBR
model is obtained for EMs with dynamic saliency.
The proposed CPVBR model enables the explicit calculation of the sub-transient
voltages en+1
abc since they are only functions of the state-variables, i.e., these voltages can
be calculated only using the data from the previous time-step, i.e., n. Therefore, the EM
simulation task can be carried out in parallel with the other tasks in RMNA formulation
which is explained in the following section.
87
FOR EMT ANALYSIS
5.3.3 EM Simulation task

This section explains steps of EM simulation task to implement the proposed CPVBR
model. The stator model of Fig. 5.4 is considered as part of the network to represent
the stator equations of the CPVBR model. Therefore, the EM simulation task is to solve
rotor- mechanical and electrical equations, i.e., (5.21), and determine en+1
abc . However,
there’s coupling between the rotor mechanical and electrical equations which prevents
parallel solution of these equations. Since mechanical variables are relatively slow as
compared to electrical ones, one approach is to extrapolate the mechanical variables to
enable parallelism between these equations.
Based on the above discussion, rotor speed and position are extrapolated, i.e.,
ω 0n+1 = 2ω n − ω n−1 ,
(5.42)
θ0n+1 = 2θn − θn−1 .
To solve the state-space equation of (5.21) the input variables of (5.23) must be found in
n+1
which λmag and vf are input variables of the EM model, however, insq , insd and i¯sd are
n
not and must be obtained from iabc , based on
inqd0 = T(θ0n+1 )inabc , (5.43)
and
n+1
i¯sd = insd − in−1
sd , (5.44)
respectively. Equation (5.21) is discretized using TR method, i.e.,
λ n+1
r = ADrλ nr + BDr unr ,
(5.45)
yn+1
λ = CDrλ nr + DDr unr ,
where ADr , BDr , CDr and DDr are
Ar ∆t −1 Ar ∆t
ADr = (I − ) (I + ),
2 2
Ar ∆t −1
BDr = (I − ) Br ∆t, (5.46)
2
CDr = Cr ADr ,
DDr = Cr BDr + Dr .
From flux linkages we obtain electromagnetic torque by
3P n+1 n
τen+1 = (λ i − λn+1 n
mq isd ), (5.47)
4 md sq
88
FOR EMT ANALYSIS
which is used to obtain the rotor speed ω n+1 , by discretizing (5.5), using TR method,
i.e.,
ω n+1 = cω1 ω n + cω2 (τen+1 − τm
n+1
), (5.48)
where
∆tBf −1 ∆tBf
cω1 = (1 + ) (1 − ),
2J 2J (5.49)
∆tBf −1 ∆tP
cω2 = (1 + ) ,
2J 2J
and rotor angle by
θn+1 = θn + ∆tω n+1 . (5.50)
Since floating-point arithmetic implementation in Vivado HLS environment is more

efficient in matrix forms, to find sub-transient voltages, (5.34) is expressed in a matrix
form, i.e.,
e ”n+1 n+1
qd2 = Ce1λ r + Ce2 (ω n+1 )yn+1
λ + De (ω n+1 )unr , (5.51)
where matrices Ce1 , Ce2 (ω) and De (ω) are given in Appendix A. It should be noted that
n+1
matrix De (ω) is defined such that the term vhLd in (5.41) is added to e”d2 , as described
in Subsection 5.3.2. Finally, we obtain sub-transient voltages in abc coordinates by
”n+1
e abc = T−1 (θn+1 )eeqd2
”n+1
. (5.52)
EM simulation task is to carry out (5.42) to (5.52) which include the following steps:
1. Perform the Park transform to find qd variables id , iq using (5.43)

n+1
2. Find i¯sd using (5.44).
3. Solve rotor flux linkage dynamic equations of (5.45) .
4. Find the electrical torque τen+1 and solve for ωrn+1 , θrn+1 using (5.47), (5.48) and
(5.50).
5. Update the matrices Ce2 (ω n+1 ), De (ω n+1 ) and calculate the sub-transient voltages
in qd frame en+1
qd2 using (5.51).
6. Conduct (5.52) to find sub-transient voltages in abc frame en+1abc .
0 n+2 0 n+2
7. Extrapolate the speed and angle ω r , θ r using (5.42).
It should be mentioned that carrying out (5.42) does not require any data from the
solution of network equations in the previous time-step, therefore, it is carried out at
the end of the previous time-step which enhances the parallelism between the EM task
and the other task in the FPGA-based DRTS. The next section presents the EM module
89
FOR EMT ANALYSIS
designed to carry out the EM task.
5.4 Hardware Design of EM Module

Fig. 5.8 (a) depicts hardware design of the DRTS, including EM modules. A separate EM
module is assigned to carry out the simulation task regarding each each EM. A multiplexer
selects the input data from the solution of the network equations from the previous time-
step. The input data contains stator terminal currents in the abc coordinates. Field
voltage vf , and mechanical torque τm are obtained from designated input ports of each
EM module.
There are seven steps in the EM simulation task and each step is carried out using
a specific PU, therefore, each EM module comprises seven PUs. Execution of each step
requires results from the previous steps, i.e., these steps are inherently sequential. There-
fore, the PUs are designed to share floating-point IP cores to avoid excessive hardware
resource utilization. Parameters of each EM model are written to the corresponding
PU BRAMs and registers from Microblaze soft-processor through AXI Interconnect [2].
Design of PU1 to PU7 are described as follows.
PU1 is designed to carry (5.43) transformation for stator currents. This transform re-
quires calculation of trigonometric functions and two 2 × 3 matrix-vector multiplications.
Execution of trigonometric functions on the FPGA comes at a high cost of both area and
latency while BRAMs are abundant. Therefore, six BRAMs are used to save 1200 sam-
ples of one period of each of the six trigonometric functions of the transformation matrix,
i.e., sin(t), sin(t − 2π/3), sin(t + 2π/3), cos(t), cos(t − 2π/3) and cos(t + 2π/3). This
can calculate the trigonometric functions with the required precision. The integer-scaled
0 n+1
angle θint is
0 n+1 1200 0 n+1
θint =[ θ ], (5.53)
2π
which is calculated by PU7 at previous time-step n and used to read the six trigonometric
function values from BRAMs and multiplied by constant 2/3. The calculated values and
the stator three-phase currents are used to obtain inq and ind , based on matrix-vector
multiplication.
PU2 is the implementation of (5.44 ) subtraction and PU2 inputs stator current insd
n+1
from PU1 and outputs variable i¯sd .
PU3 implements a 8 × 9 matrix vector multiplication to solve (5.45). Its input vector
unr is determined by (5.23), using stator currents inq and ind , filed voltage vfn+1 , magnet
n+1
flux linkage λmag and variable i¯sd .
PU4 solves mechanical equations. It inputs stator currents inq and ind , magnetization
90
FOR EMT ANALYSIS
AXI Interconect Soft Processor

MicroBlaze
MUXs
Matrix Vector
calculation
vn , vp bnode
bh & ihSW
History current update HRMNA
2 b x
HeqRMNA beq
Subsystem history bss
calculation
bconv
PEC
bem Concat
iabc
EM τe ω θ
a)
1 vqd n 2 5
vabcn 3
Ce1 eqd n+1 6
subtract eabcn+1
n+1
iabcn T(θ ) urn yrn+1
iqd n
ADr BDr
n+1
Ce2(ω ) T (θn+1)
-1
De(ωn+1)
CDr DDr θn+1
vfn+1 ωn+1
λrn+1
z-1
λmag
4
τen+1
(cω1,cω2) θn+1
τmn+1
P
ωn+1
7
Extrapolate
θ' n+1 θ' n+2

z-1
b)
Figure 5.8. (a) The computational engine including the EM modules, (b) The PUs in the EM module
n+1
flux linkages λn+1
mq and λmd from PU3 to calculate electromagnetic torque τe
n+1
from
n+1
(5.47). Mechanical torque is read from EM module port and mechanical speed ω and
n+1 n+1 n+1
angle θ are calculated based on (5.48) and (5.50). PU4 also writes ω , θ and
τen+1 directly to the outputs of EM module.
PU5 calculates sub-transient voltages in qd coordinates. First, speed-dependent en-
tries of matrices Ce2 and De are updated using mechanical speed from PU4. Then, the
2 × 13 matrix-vector multiplication of (5.51) is performed.
91
FOR EMT ANALYSIS
Table 5.1
Resource utilization of the EM module

One EM Module 6887(≈ 1%) 7688(≈ 2%) 14(≤ 1%) 267 1
Available 607,200 () 303,600 2,800 - -
PU6 conduct transformation of (5.52) and uses the pre-saved values of the six trigono-
metric functions, similar to PU1. PU6 receives mechanical angle θn+1 and obtains
n+1 1200 n+1

θint =[ θ ], (5.54)
2π
which is used to read six trigonometric function values from BRAMs. PU6 uses these
values to carry out a 3×2 matrix-vector multiplication to calculate sub-transient voltages,
en+1
a , ebn+1 and en+1
c .
0
PU7 is the mechanical angle predictor. PU7 predicts mechanical angle θ n+2 for time-
0 n+2
step n + 2 and then determines the integer scaled predicted angle θint using the same
method as PU6.
Table 5.1 shows the resource utilization and the latency of the EM module. It shows
that only about 2% of FPGA resources is required for each EM module. The reason
is that all PUs share the same floating-point IP cores. Using separate floating-point IP
cores for each PU would result in high resource utilization. EM module is replicated
three times to enable real-time simulation of up to three EMs.
To include the EM modules in the FPGA-based DRTS outputs of each EM module,
i.e., sub-transient voltages, are considered as entries of RMNA injection vector, by the
concat module of Fig. 5.8, i.e.,
h iT
bn+1
node = n+1 n+1 n+1 n+1
bs bh bConv bEM , (5.55)
where h iT
bn+1
EM = en+1 n+1 n+1
abc(1) , eabc(2) , eabc(3) . (5.56)
Since EM modules latency is larger than other modules, contributing to bn+1

node , their
n+1
outputs en+1
abc(i) s are scheduled as the last entries of bnode .
5.5 Performance Evaluation

To evaluate performance of the proposed CPVBR model and EM module five simulation
case studies corresponding to different EM types are conducted. Each case is carried
92
FOR EMT ANALYSIS
out based on two methods, i.e., (i) reference method, based on VBR model and using
PLECs toolbox of MATLAB Simulink environment with ode23tb solver for which the
relative error parameter is set to 10−4 , and (ii) real-time FPGA-based method, using the
proposed CPVBR model and the proposed FPGA-based DRTS platform of Fig. 5.8 with
the time-step of 4µs.
In the second method for each case study a separate netlist file is created and read
by a MATLAB script. All system parameters, e.g., network matrices, PEC matrices and
EM parameters are calculated by the script and written to their corresponding module
by MB soft-processor. It should be noted that same FPGA-based DRTS hardware is
used for all case studies and only the contents of BRAMs in modules of the DRTS are
changed by MB soft-processor. The above settings for the two methods are kept the
same for all case studies and the results are presented in the following subsections.
5.5.1 Case I: IM Start-up

Figure 5.9 illustrates the system considered for this case which includes a DC voltage
source connected to the DC-link capacitor of a two-level three-phase VSC to drive an
IM. Parameters of IM and VSC are given in Table 5.2.
LDC RDC
CDC IM
VDC
Figure 5.9. Induction machine case study system
The system initial condition and input mechanical torque of the IM are set to zero. At
t = 0s PWM gate signals of the VSC are enabled with the modulation index of m = 0.4
at 50Hz to drive the IM.
Figure 5.10 depicts the simulation results based on the two methods described above
and includes three-phase stator currents, speed, electromagnetic torque and DC current.
The right column contains the results from the reference method, i.e., based on VBR
model and using PLECs toolbox of MATLAB Simulink environment and the left column
includes the results from the FPGA-based method, i.e., based on the proposed CPVBR
model and using the proposed FPGA-based DRTS platform of Fig. 5.8.
93
FOR EMT ANALYSIS
Table 5.2
Parameters of the PEC and IM

PEC Switching frequency FSW 5kHz
Switch on resistance ron 10mΩ
DC-link resistance RDC 0.1Ω
DC-link inductance LDC 20µH
DC-link capacitance CDC 10mF
DC-link Voltage VDC 400V
IM stator resistance rs 87mΩ
IM stator leakage inductance Ls 0.801mH
IM magnetizing inductance Lm 34.7mH
IM equivalent rotor resistance rr0 0.228Ω
IM equivalent rotor inductance L0r 0.801mH
IM Friction coefficient Bf 0.0N ms
IM moment of inertia J 0.166N ms2
IM number of poles P 4
The system experiences start-up transients. The three-phase current amplitudes in-
crease to provide electromagnetic torque for the IM shaft to accelerate. The IM shaft
speed reaches steady state and at the same time the electromagnetic torque settles to
zero. The DC current also exhibits a transient behaviour to provide the power require
from the IM to accelerate its shaft and settles back to a zero at steady state.
Figure 5.10 illustrates that the results from the proposed FPGA-based method closely
match those from the reference method for all electrical and mechanical variables and
indicates that the proposed CPVBR model and EM module can accurately simulate IM
transient and steady state behaviour in real-time. Moreover, it shows the proposed EM
CPVBR model and EM module are compatible with the PEC model developed in this
thesis. The DC current from the FPGA-based method has the same transient and steady
state behaviour as that of the reference method and indicates that the PEC model does
not suffer from artificial power losses that are induced by ADC model.
5.5.2 Case II: IM Transient Response

This case uses the same system as that of Case I under the same initial conditions. After
the start-up procedure at t = 0.12s a modulation index step change of m = 0.1 is applied
to the VSC.
Figure 5.11 demonstrates the simulation results from the FPGA-based method and
reference method, and includes stator three-phase currents, rotor speed, electromagnetic
94
FOR EMT ANALYSIS
FPGA based method Reference
500 500
iabcIM(A)
0 0
−500 −500
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
400 400
Speed (rad/sec)
300 300
200 200
100 100
0 0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
1000 1000
800 800
Torque (N.m)
600 600
400 400
200 200
0 0
−200 −200
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
300 300
200 200
Idc (A)
100 100
0 0
−100 −100
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2
Figure 5.10. Simulation results of Case I, i.e., IM start-up, m = 0.4. The left column includes the results
from FPGA-based method, based on the proposed CPVBR model and using the proposed FPGA-based
DRTS platform of Fig. 5.8 with the real-time time-step of 4µs and the right column contains the results
from the reference method, i.e., based on VBR model and using PLECs toolbox of MATLAB Simulink
environment with ode23tb solver and the relative error parameter set to 10−4 .
torque and DC current. Subsequent to the modulation index step change the electrical
and mechanical variables of the system experience transients. The step change in modu-
lation index, or equivalently increase in the IM terminal voltage amplitude, increases the
three-phase currents and therefore electromagnetic torque.
Figure 5.11 shows that the IM transient response from the proposed FPGA-based
method closely match those from the reference method for all electrical and mechanical
variables and further confirms accurate performance of the proposed CPVBR model and
the proposed FPGA-based DRTS for simulation of the IM.
5.5.3 Case III: PMSM Transient Response

Fig. 5.12 depicts the system for this case which includes a DC voltage source connected
to the DC-link capacitor of a two-level three-phase VSC which drives a PMSM. The
system parameters are listed in Table 5.3.
To start-up the system, modulation index frequency and amplitude are ramped up
95
FOR EMT ANALYSIS
FPGA−based method Reference
200 200
100 100
iabcIM(A)
0 0
−100 −100
−200 −200
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2
325 325
320 320
Speed (rad/sec)
315 315
310 310
305 305
300 300
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2
200 200
100 100
Torque (N.m)
0 0
−100 −100
−200 −200
−300 −300
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2
100 100
50 50
Idc (A)
0 0
−50 −50
−100 −100
0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2 0.1 0.11 0.12 0.13 0.14 0.15 0.16 0.17 0.18 0.19 0.2
Figure 5.11. Simulation results of Case II, i.e., IM transient response, a step change from m = 0.4 to
m = 0.5 is applied at t = 0.12s. The left column includes the results from FPGA-based method, based
on the proposed CPVBR model and using the proposed FPGA-based DRTS platform of Fig. 5.8 with
the real-time time-step of 4µs and the right column contains the results from the reference method, i.e.,
based on VBR model and using PLECs toolbox of MATLAB Simulink environment with ode23tb solver
and the relative error parameter set to 10−4 .
LDC RDC
VDC CDC PMSM
Figure 5.12. PMSM Machine case study system
to 157.07rad/s and 0.4 in 2 seconds, respectively. After the PMSM speed reaches
157.07rad/s, a step mechanical torque of τm = 10N.m is applied to the PMSM shaft
at t = 3s.
Figure 5.13 depicts the PMSM behaviour obtained from the FPGA-based method on
96
FOR EMT ANALYSIS
Table 5.3
Parameters of the PEC and PMSM

PEC Switching frequency FSW 5kHz
DC-link Voltage VDC 400V
PMSM stator resistance rs 0.4Ω
PMSM d-axis inductance Lmd 10mH
PMSM q-axis inductance Lmq 15mH
IM moment of inertia J 10−3 N ms2
the left column and those from the reference method on the right column. Subsequent to
the mechanical torque step change, the system exhibits an oscillatory transient response
to compensate for the increased mechanical torque. Figure 5.13 shows that the simulation
results from the FPGA-based method matches those from the reference method for all
electrical and mechanical variables. This indicates that the proposed CPVBR model, EM
module and therefore the proposed FPGA-based DRTS can precisely simulate PMSM in
real-time. Moreover, this case study confirms that the EM module can represent different
types of EMs, e.g., IM and PMSM, using the same hardware module only by changing
the parameters in the BRAMs of the module.
5.5.4 Case IV: PMSM and Rectifier

This case uses the system of Case III used. It should be noted that the PMSM in this
system does not have damper windings on d- and q-axis and according to (5.15) the
sub-transient inductances L”q and L”d are respectively equal to Lmq and Lmd and therefore
L”q
dynamic saliency is L”d
= 1.5.
The gate signals of the VSC are blocked and therefore the VSC operates as a three-
phase diode rectifier. The PMSM is at zero initial conditions and mechanical torque of
τm = −15N.m is applied to the shaft of the PMSM at t = 0s. Fig. 5.14 illustrates the
simulation results obtained from the FPGA-based method and those from the reference
method, as described in the beginning of this section, and includes stator three-phase
currents, rotor speed, electromagnetic torque and DC current.
97
FOR EMT ANALYSIS
20 20
iabcIM(A) 10 10
0 0
−10 −10
−20 −20
2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14
250 250
Speed (rad/sec)
200 200
150 150
100 100
50 50
0 0
2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14
30 30
Torque (N.m)
20 20
10 10
0 0
2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14
20 20
10 10
Idc (A)
0 0
−10 −10
2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14 2.98 3 3.02 3.04 3.06 3.08 3.1 3.12 3.14
Time(s) Time(s)
Figure 5.13. Simulation results of Case III, i.e., PMSM transient response, a mechanical torque of
τm = 10N.m is applied the PMSM shaft at t = 3s. The left column includes the results from FPGA-based
method, based on the proposed CPVBR model and using the proposed FPGA-based DRTS platform of
Fig. 5.8 with the real-time time-step of 4µs and the right column contains the results from the reference
method, i.e., based on VBR model and using PLECs toolbox of MATLAB Simulink environment with
ode23tb solver and the relative error parameter set to 10−4 .
The PMSM shaft speeds up until the back electromagnetic force (EMF) of the PMSM
reaches the level that can inject three-phase currents to the DC-link capacitor through
the anti-parallel diodes of the VSC. The three-phase currents produce a counter electro-
magnetic torque and therefore prevent the shaft speed from further speed-up. The shaft
speed reaches final value of 421rad/s at steady state.
Figure 5.14 shows that the PMSM transient simulation results from the FPGA-based
method matches those from the reference method for all electrical and mechanical vari-
ables. This case study also indicates that the CPVBR model and EM module can rep-
resent EMs with relatively high degree of dynamic saliency connected to a three-phase
diode rectifier.
98
FOR EMT ANALYSIS
iabcIM(A) 10 10
0 0
−10 −10
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
500 500
Speed (rad/sec)
400 400
300 300
200 200
100 100
0 0
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
10 10
Torque (N.m)
0 0
−10 −10
−20 −20
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
10 10
0 0
Idc (A)
−10 −10
−20 −20
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time(s)
Time(s)
Figure 5.14. Simulation results of Case IV, i.e., PMSM and rectifier, a mechanical torque of τm =
−15N.m is applied to the PMSM shaft. The left column includes the results from FPGA-based method,
based on the proposed CPVBR model and using the proposed FPGA-based DRTS platform of Fig. 5.8
with the real-time time-step of 4µs and the right column contains the results from the reference method,
i.e., based on VBR model and using PLECs toolbox of MATLAB Simulink environment with ode23tb
solver and the relative error parameter set to 10−4 .
5.5.5 Case V: SM and Rectifier

Figure 5.15 illustrates the system for this case which includes a DC voltage source con-
nected to the DC-link capacitor of a three-phase diode rectifier and a four pole SM. The
system parameters are listed in Table 5.4. SM has one damper winding on each axis and
L”
its dynamic saliency is relatively high, i.e., L”q = 1.79.
d
In this case the SM speed is locked at synchronous speed and the other initial con-
ditions, including those of SM flux linkages, are set to zero. At t = 0s filed voltage is
applied to the field winding.
Fig. 5.16 illustrates the simulation results obtained from the FPGA-based method
and those from the reference method, as described in the beginning of this section, and
includes stator three-phase currents, electromagnetic torque and DC current. As the
SM filed current and therefore flux linkage increases the back EMF increases and injects
current to the the DC link through the three-phase diode rectifier.
99
FOR EMT ANALYSIS
LDC RDC
vf' SM
VDC
Figure 5.15. Synchronous machine case study system
Fig. 5.16 indicates that the SM and rectifier system simulation results obtained from
FPGA-based method closely match those from the reference method. In addition, it
confirms that the the CPBVR model and EM module can represent SMs regardless of
its parameter, e.g., relatively high dynamic saliency and system configuration, e.g., a
three-phase diode rectifier connected to SM terminals.

200 200
100 100
iabcIM(A)
0 0
−100 −100
−200 −200
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
0 0
Torque (N.m)
−200 −200
−400 −400
−600 −600
−800 −800
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
0 0
−50 −50
Idc (A)
−100 −100
−150 −150
−200 −200
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
Time(s) Time(s)
Figure 5.16. Simulation results of Case V, i.e., SM and Rectifier. The SM speed is locked at synchronous
speed and at t = 0s the field voltage is applied to the filed winding. The left column includes the results
from FPGA-based method, based on the proposed CPVBR model and using the proposed FPGA-based
DRTS platform of Fig. 5.8 with the real-time time-step of 4µs and the right column contains the results
from the reference method, i.e., based on VBR model and using PLECs toolbox of MATLAB Simulink
environment with ode23tb solver and the relative error parameter set to 10−4 .
100
FOR EMT ANALYSIS
Table 5.4
Parameters of the SM and Rectifier system

DC-link voltage VDC 400V
SM stator resistance rs 0.4Ω
SM stator Inductance Ls 0.83mH
SM q-axis Magnetizing inductance Lmq 13.5mH
SM d-axis Magnetizing inductance Lmd 39.27mH
SM field inductance L0f 2.54mH
SM field resistance rf0 135mΩ
SM rotor q-axis damper inductance L0rq1 3.4mH
0
SM rotor q-axis damper resistance rrq1 0.923Ω
SM rotor d-axis damper inductance L0rd1 3.68mH
0
SM rotor d-axis damper resistance rrd1 1.31Ω
SM number of poles P 4

In this chapter EM models for simulation of power system EMTs were investigated and it
was deduced that CPVBR model can fulfil the requirements for FPGA-based small time-
step real-time simulation of systems including EMs and other power system components,
especially PECs. A new CPVBR model was proposed to enable simulation of EMs under
all test scenarios, regardless of system configuration and EM parameters, e.g., dynamic
saliency. A general EM was introduced to enable fixed hardware module design and
EM module was designed based on the proposed model using floating-point numerical
representation. The EM module was used to enhance the FPGA-based DRTS devised in
the previous chapter to enable real-time simulation of EMs.
Five real-time simulation case studies were carried out based on the proposed CPVBR
model and using the proposed FPGA-based DRTS for different types of EMs under
different scenarios. The results were compared against the results from the reference
method based on VBR model and using PLECs MATLAB toolbox. It was shown that
the designed EM module can precisely emulate the transient and steady state behaviour
of different types of EMs with different parameters and under different test scenarios. In
summary, the advantages of the proposed EM model and EM module include:
• The proposed EM model keeps the network admittance matrix constant which
101
FOR EMT ANALYSIS
enables including the EM module in the FPGA-based DRTS.

• The proposed EM model is compatible with the partitioning method devised in
this work for simulation of PECs, therefore, it enables the real-time simulation of
case study systems embedding both PECs and EMs.
• The EM model enables parallelism between the EM module and the existing mod-
ules in the FPGA-based DRTS, i.e., at the beginning of the time-step the EM
module starts operating. This enables small time-step simulation of case studies
embedding EMs as well as other power system components.
• The designed EM module can simulate different types of EMs with different pa-
rameters without hardware code re-compilation.
102
Chapter 6
Real-time Simulation Platform and

Performance Verification
6.1 Introduction
Previous chapters proposed FPGA-based modules for PECs, EMs and electrical network
which collectively form an FPGA-based DRTS. The modules are based on modeling
approaches that are tailored to overcome the challenges associated with integrating re-
newable energy sources into power system. This chapter proposes a platform to conduct
real-time simulation studies of such power systems. The platform includes a model prepa-
ration script and a hardware platform. The model preparation script is a MATLAB code
that reads a netlist file and prepares parameters of each module in the FPGA-based
DRTS and the hardware platform includes the FPGA-based DRTS, RTDS and a PC
that runs the script. FPGA-based DRTS provides capability to simulate distribution
system and Inter-Hardware Universal Line Model (IHULM) enables connection of this
system to the rest of power system that is simulated in RTDS.
This chapter is organized as follows. Section 6.2 introduces IHULM concept, de-
fines the associated IHULM task in the context of FPGA-based DRTS and identifies
parallelism among the task steps. Section 6.3 presents the corresponding FPGA-based
hardware design of the IHULM module. Section 6.4 presents the real-time simulation
platform including the hardware platform and the model preparation script. Section 6.5
presents results of real-time simulation studies using the proposed platform. Section 6.6
highlights contributions and summarises this chapter.
103
CHAPTER 6. REAL-TIME SIMULATION PLATFORM AND PERFORMANCE
VERIFICATION
6.2 Inter-Hardware Universal Line Model (IHULM)

Short transmission lines are represented using lumped element models, e.g, PI sections.
However, to accurately emulate EMTs of long transmission lines, i.e., overhead lines
and underground cables, frequency-dependent models which include modal-domain [70]
and phase-domain models [48] [71] are employed [13]. The former adequately represents
transposed overhead lines while in case of untransposed overhead lines and underground
cables the latter should be used. Universal Line Model (ULM) [48] is the most accurate
phase-domain line model. This thesis considers transposed and untransposed overhead
lines and this section presents IHULM which is a transmission line model based on ULM
where one end is represented in the FPGA-based DRTS and the other end is represented
in RTDS and the line serves as the interface media between two platforms.
A transmission line voltage and currents are related by [48]
YcV s − I s = H(II r + YcV r ),

(6.1)
YcV r − I r = H(II s + YcV s ),
where subscripts s and r denote sending-end and receiving-end, respectively, and all
quantities are in frequency-domain. Figure 6.1 shows transmission line model and Y c (s)
is the characteristic admittance matrix defined by
p
Y c (s) = Z (s)−1 Z (s)Y
Y (s), (6.2)
and H (s) is the propagation matrix of the line defined by

√
Z (s))l
(− Y (s)Z
H (s) = e , (6.3)
where Z (s) and Y (s) are the series impedance and shunt admittance per unit length,
respectively. l is the length of the line.
Is(s) Ir(s)
Vs(s) YC(s) Iws(s) Iwr(s) YC(s) Vr(s)
Figure 6.1. Frequency-domain transmission line model
Matrices Y c (s) and H (s) are not rational transfer functions and thus prevent step-
104
VERIFICATION
by-step time-domain simulation of the line model, therefore, rational approximations of

these transfer functions are obtained and employed for time-domain simulation. ULM is
the most accurate and widely used transmission line model in which approximation of
Y c (s) can be directly obtained by applying Vectro Fitting (VF) [72] [70], i.e.,
nY
X RY i
Y c (s) ' Y f (s) = G 0 + , (6.4)
i=1
s − aY i
where aY i s are the poles and R Y i s are their corresponding residue matrices.
Approximation of H (s), however, is more complicated due to the presence of multiple
modes. Therefore, H c (s) is decomposed into mode functions using modal decomposition,
i.e., [70]
T −1 (s),
H (s) = T (s)diag([eλ1 (s)l ... eλnM (s)l ])T (6.5)
where λi s and T (s) are eigenvalues and matrix of right eigenvectors of Y Z (s), respectively,
and nM is the number of modes. An approximation of the propagation matrix is defined
by [48]
nM X nH
X R Hji −τj s
H (s) ' H f (s) = ( e ), (6.6)
j=1 i=1
s − aHji
where τj is the j th mode time-delay and aHji is the ith pole of the j th mode function,
defined by
hj (s) = eλj (s)l+τj s . (6.7)
aHji s are obtained by applying VF to each mode function, i.e.,

nH
X cji
hj (s) ' . (6.8)
i=1
s − aHji
To simulate the line model in time-domain, based on (6.1), one needs to implement
is (t) + hf (t) ∗ (iiY r (t) + ir (t)) = iY s (t),

(6.9)
i r (t) + hf (t) ∗ (iiY s (t) + i s (t)) = i Y r (t),
where
i Y s (t) = hY f (t) ∗ v s (t),

(6.10)
i Y r (t) = hY f (t) ∗ v r (t),
and sign * represents time-domain convolution. hY f (t) and hf (t) are respectively the
impulse responses of Y f (s) and H f (s). Implementation of the convolutions, i.e., hY f
105
VERIFICATION
and hf , requires different formulations. In the case of hf since it contains time-delays,

the relationship between its input and output variables is explicit, therefore, state-space
formulation is applicable. However, hY f is a subsystem of the network, therefore, com-
bined state-space and nodal formulation must be used. Implementations of hY f and hf
convolutions, in the context of IHULM, are explained as follows.
Figure 6.2 (a) illustrates the IHULM concept in which one end of the line is represented
in RTDS and the other end is represented in FPGA-based DRTS. The line end represented
in FPGA-based DRTS is referred to as sending-end and the one in RTDS is considered
as the receiving-end. It should be noted this is only a terminology and the line model is
symmetric. A parallel approach to implement hY f and hf convolutions for both ends of
line are as follows.
Distribution Transmission Generation
is ir
iYs ihs=hf*(iYr+ir) iYr
vs hYf hYf vr
ihr=hf*(iYs+is)
a) IHULM Concept
irn+1
isn+1
ihsn+1 ihYrn+1
ihYsn+1 hf
vsn+1 vrn+1
GY hf GY
ihrn+1
Represented in Represented in
FPGA-based DRTS RTDS
b) Discretized IHULM
Figure 6.2. (a) IHULM concept, (b) Discretized IHULM
To conduct hY f convolution, each differential term of (6.4) can be expressed in time-
106
VERIFICATION
domain, i.e.,
diiY i
= aY ii Y i + R Y iv s , (6.11)
dt
which can be discretized using TR, i.e.,
i n+1 n n n+1
Y i = aY dii Y i + G Y div s + G Y div s , (6.12)
where
∆taY i RY i
∆tR
1+ 2 2
aY di = ∆taY i
, G Y di = . (6.13)
1− 2
1 − ∆ta2 Y i
Rewriting (6.12) in the form of a history current source and a parallel admittance results
in
i n+1 n+1 n+1
Y i = i hY i + G Y div s , (6.14)
and
n+1 n n
i hY i = aY dii hY i + D Y div s , (6.15)
where
GY di .
D Y di = (1 + aY di )G (6.16)
Therefore, the discrete-time model of (6.4) is
in+1 n+1
Y s = GY v s + in+1
hY s , (6.17)
where nY
X
G0 +
G Y = (G G Y di ), (6.18)
i=1
and nY
X
i n+1
hY s = i n+1
hY i . (6.19)
i=1
Matrix G Y is the admittance of hY f and must be included in the admittance matrix.

For sending-end of the line, represented in FPGA-based DRTS, G Y is considered in the
network admittance matrix and for the receiving-end in RTDS, this matrix is represented
by the resistive network of Fig. 6.3 which constitutes a similar admittance matrix as GY ,
107
VERIFICATION
g13
g12
g23
g11 g22 g33
iurn+1(1) iurn+1(2) iurn+1(3)
Figure 6.3. IHULM receiving-end representation in RTDS
using
g11 = G Y (1, 1) + G Y (1, 2) + G Y (1, 3),

g22 = G Y (2, 2) + G Y (1, 2) + G Y (2, 3),
g33 = G Y (3, 3) + G Y (1, 3) + G Y (2, 3),
(6.20)
g12 = −G
GY (1, 2),
g13 = −G
GY (1, 3),
g23 = −G
GY (2, 3).
Both i n+1 n+1

hY s and i hY r , are calculated in the FPGA-based DRTS as shown in Fig. 6.2(b)
since implementation of such convolution processes in the small time-step solver of RTDS
[10] is not available and can be efficiently conducted in FPGA.
The hf convolution processes, regarding both ends of the line, are also carried out
in FPGA-based DRTS as shown in Fig. 6.2 (b). Time-domain realization of the term
corresponding to each pole in (6.6) is
diiHji
= aHjii Hji + R Hjiu s (t − τj ), (6.21)
dt
where
u s (t) = i r (t) + i Y r (t). (6.22)
Using TR, (6.21) can be discretized as
in+1 n n−τDj
Hji = aHdjii Hji + D Hjiu s , (6.23)
108
VERIFICATION
where
∆taHji
1+ 2 R Hji
∆tR τj
aHdji = ∆taHji
, D Hji = ∆taHji
, τDj = . (6.24)
1− 1− ∆t
2 2
Considering all the poles in (6.6), the hf convolution is obtained by

nM
X
in+1
hs = in+1
hsj , (6.25)
j=1
where nH
X
i n+1
hsj = i n+1
Hji . (6.26)
i=1
Presence of the time delay τj in (6.21) facilitates calculation of hf convolution since

there is no term with superscript n + 1 on the right hand side of (6.23). Therefore, this
enables representing hf by a current source, i.e., i hs , and can be compared to (6.17) in
which G Y v n+1 requires a parallel admittance matrix in representation of hY f . Currents
at the sending and receiving ends of the line are respectively obtained by
n+1 n+1
in+1
us = i hs − i hY s ,
n+1 n+1
(6.27)
i n+1
ur = i hr − i hY r .
Based on the above discussion, the steps for IHULM simulation task include:
1. update i n+1 n+1

hY i corresponding to each ai for the sending end and calculate i hY s using
(6.15) and (6.19),
2. update i n+1 n+1
hY i corresponding to each ai for the receiving end and calculate i hY r using
(6.15) and (6.19),
3. update i n+1
Hji corresponding to each aji of j
th
mode of the sending end and calculate
n+1
i hsj using (6.23) (j = 1 to nM ),
4. update i n+1
Hji corresponding to each aji of j
th
mode of the receiving end and calculate
i n+1
hrj using (6.23) (j = 1 to nM ),
5. calculate i n+1 n+1
us and i ur using (6.25) and (6.27).
The first four steps are convolution processes and can be conducted in parallel since they
only require data from previous time-step, i.e., the IHULM simulation task can be carried
out in parallel with other tasks in the FPGA-based DRTS. The result of the first four
steps are used by step 5 to obtain output currents. A hardware module to implement
this task is designed based on floating-point numerical representation and explained in
Section 6.3.
109
VERIFICATION
6.3 IHULM Hardware Module
The first four steps of the IHULM task are matrix-vector convolutions, i.e., two hf and
two hY f convolutions. hf convolution imposes more computational burden than that of
hY f since the number of modes, nM , is more than one and in this work considered to be
nM = 3, i.e., the same as the number of conductors. The maximum number of poles in
fitting process of Yc (s) and hj (s) is considered to be 6, i.e., nH = nY = 6. These indicate
that hf convolution requires 3 times more computational burden than hY f convolution.
Figure 6.4(a) depicts the IHULM hardware module in which a PU is considered for each
step in IHULM task.
Steps one and three are similar to steps two and four, respectively, and therefore have
identical PUs. Moreover, step three/four comprises three convolutions corresponding to
three mode functions hj (s) (j = 1 to 3) which are similar to step one/two. The first
four steps can be conducted in parallel and their outputs are inputs to the fifth step.
Therefore, the latency of PU1 and PU3 should be the same to reduce the overall latency
of the IHULM module. Thus PU3 is designed to have three hardware units, identical to
PU1, so it has the same latency as PU1.
To enable efficient implementation of PU1, using floating-point arithmetic, (6.15) is

expressed in a matrix form
i n+1
hY i = [aY diI D Y di ][iinhY i v ns ]T . (6.28)
Calculation of each i n+1hY i requires a matrix-vector multiplication which can be conducted

using FPMACCs, i.e., an FP multiplier and an FP accumulator. Calculation of each two
i hY i s are assigned to one FPMACC1 in which FP accumulator is based on the first design
in Table 2.2 and Fig. 6.5 shows the order of inputs for FPMACC1 that calculates i hY 1
and i hY 2 . This order facilitates low latency calculation of two i hY i s using one FPMACC1
unit. Therefore, calculation of 6 i n+1hY i s requires three FPMACC1s as shown in Fig. 6.4(c).
PU2 has similar design as PU1.
PU3 inputs the current u ns and saves it in a BRAM. A separate hardware unit, similar
n−τ
to PU1, is considered for each mode which inputs u s Dj and calculates i n+1
hsj . PU4 has
n n+1
an identical design to PU3 and inputs u r and outputs i hsj s.
PU5 is an FP accumulator and calculates i n+1

us and i n+1
ur . Resource utilization and
latency of IHULM module is reported in Table 6.1. Latency of this module is 220 clock-
cycles and utilizes around 5% of available LUTs.
110
VERIFICATION
FPMACC1
Matrix-Vector Mult
FPADD
DYd0(i,j) DYd1(i,j) FPMACC1
Mux
aYd0(i) aYd1(i) FPMULT
FPMACC1
Partial Acc 0
vsn(i)
Demux
Partial Acc 1
Mux
ihY0n(i), ihY1n(i) FPMACC1
Partial Acc 6
(d) (c) (b)
PU1 Convolution PU5
BRAMs
ihYs n+1
vsn
BRAMs Matrix-Vector Mult
BRAMs
PU3 Convolution
usn-τD1
6 ihs1n+1 ius n+1
e-sτD1 Input Select Matrix-Vector Mult FPACC2
Mode 1
usn 6
e-sτD2 Input Select Matrix-Vector Mult
Mode 2
6 ihs2n+1
Mode 3
PU2 Convolution
BRAMs
ihYs n+1
vrn
BRAMs Matrix-Vector Mult
BRAMs
PU4 Convolution
usn-τD1
6 ihs1n+1 iur n+1
e-sτD1 Input Select Matrix-Vector Mult FPACC2
Mode 1
urn 6
Mode 2
6 ihs2n+1
Mode 3
(a)
Figure 6.4. (a) IHULM FPGA module containing 5 PUs to carry out steps 1 to 5 of the IHULM
simulation task. (b) FP Multiply and Accumulate (FPMACC) units similar to the first design in Table
2.2 (c) Matrix-vector multiplication containing three FPMACC units. (d) The input select for matrix-
vector multiplication unit based on the order shown Fig. 6.5
111
VERIFICATION
DYd1(i,j)
aYd1 0 0 ihY1n
0 aYd1 0
ihY1n+1
0 0 aYd1
vsn
ihY2n
aYd2 0 0
0 aYd2 0 ihY2n+1
0 0 aYd2
vsn
DYd2(i,j)
Figure 6.5. The order of inputs for the vector-matrix multiplications in FPMACC1, shown in red arrow
Table 6.1
Resource utilization of IHULM Module

IHULM Module 29209 (≈ 8%) 27699 (≈ 5%) 56(≈ 2%) 220 1
Available 607,200 303,600 2,800 - -
6.4 Proposed real-time simulation platform
Conducting real-time simulation studies requires monitoring capability and calculating

control variables. In this thesis these two are enabled by RTDS and the FPGA-based
DRTS is connected to RTDS to exchange control and monitoring variables. The FPGA-
based DRTS is based on fixed hardware to avoid hardware code re-compilation and,
for each case study system, is configured using the system parameters. We developed a
MATLAB script to calculate these parameters and call it model preparation script. A PC
runs this script and its output is used to configure the FPGA-based DRTS. Therefore,
hardware platform comprises the FPGA-based DRTS, a PC and RTDS. The hardware
platform and model preparation script are explained as follows.
112
VERIFICATION
6.4.1 Hardware Platform

Figure 6.6 illustrates a block diagram of the proposed hardware platform and includes
RTDS and FPGA-based DRTS and a PC. The FPGA-based DRTS includes simulation
modules, i.e., modules that emulate component models and accessory modules to ex-
change data with RTDS. Accessory modules include RTDS interface module, transceiver
module, PWM modules, time-step control module, MB soft-processor and AXI intercon-
nection [2] and are explained as follows.
From PC
JTAG MicroBlaze
Soft- Processor
AXI
MUXs
Interconnect Matrix Vector
calculation Module
vn , vp bnode
bh & ihSW
History current update HRMNA b
2 x
Module
HeqRMNA beq
bss
PWM Module PEC Module
bconv
IH-ULM module
EM
bem
EM Module
θ & ω & τel
AXI
Data Stream
Control Signals Time-step Control

120MHz 100MHz
Time-step Control
v'f & τm To & From RTDS
RTDS FIFO GTFPGA
Interface FIFO
m1 - m4
USB JTAG Cable

Fiber Optic RTDS
PC running Model FPGA-based

Preparation Script DRTS
Figure 6.6. Proposed DRTS Platform
AXI interconnection provides a memory-mapped access to each module BRAMs and

registers for MB soft-processor to set parameters of the associated component in the
module.
Transceiver module is a proprietary IP of RTDS Inc which is used to send control-
and receive monitoring variables and the IHULM signals from RTDS. Since this IP has
its own clock domain, two asynchronous FIFOs are used to ensure fail-safe clock domain
crossing.
113
VERIFICATION
Table 6.2
Resource utilization of modules in the FPGA-based DRTS
Module LUT FF Pairs DSP BRAM Latency

Independent source 1047 1348 5 16 τ1 =48
History current 1041 1807 10 0 τ2 =26
PECs 5852 6636 20 0 τ3 =110
IHULM 29209 27699 62 112 τ4 =220
EMs 16461 16980 42 42 τ5 =267
Matrix-Vector Mult. 101888 126392 240 320 τ6 =305
MUXs 33130 6322 0 0 -
PWM 1109 1237 0 0 -
MB 1118 946 0 512 -
AXI Interconnections 17102 2740 0 262 -
GTFPGA 1740 4078 3 3 -
RTDS Interface 1167 2098 3 0 -
Other 10718 43354 0 0 -
Total 221402 241615 385 1464 305
Available 303600 303600 2800 2060 -
RTDS interface module receives data from the transceiver module and sends it to
the corresponding modules. It sends the modulation indices to the PWM modules,
mechanical torques and field voltages to EM modules and unr to IHULM module. It
sends the receiving-end IHULM current source in+1
r and selectable output variables, i.e.,
node voltages or branch currents, to RTDS at each time-step for monitoring and/or
control purposes.
Control algorithms are carried out in RTDS rather than FPGA since it is nearly
impossible to devise a general structure for control algorithms to enable fixed hardware
module design. However, PWM generation can be efficiently carried out in hardware,
i.e., FPGA, with less burden and lower latency than in RTDS. Therefore, modulation
indices, are calculated in RTDS and sent to the PWM module in the FPGA-based DRTS
to generate PWM signals.
Module resource utilization and latency of the modules are given in Table 6.2. Ta-
ble 6.2 shows that the matrix-vector multiplication is the dominant resource consuming
module among all simulation and accessory modules and uses 33% and 41% of available
LUTs and FF paris, respectively.
AXI interconnection enables access to BRAMs and registers that contain a module
parameters for MB. In Table 6.2 these shared BRAMs are reported as part of AXI
interconnection, rather than the module. Therefore, for instance, the number of BRAMs
for PEC module is zero because all the BRAMs in this module are shared by AXI
114
VERIFICATION
interconnection.
Total LUT and FF pairs utilizations are 73% and 79%. However, 96% (70806 out
of 73694) of all available slices are utilized and, as Fig. 6.7 illustrates, the implemented
design is placed across the entire VC7VX485T chip.
Figure 6.7. Placement of the implemented FPGA-based DRTS on VC7VX485T chip
Latency is only defined for simulation modules and the total latency of the FPGA-
based DRTS is determined by the matrix-vector multiplication module which is designed
to carry out a 138 × 160 matrix vector multiplication in which FP accumulator design
issimilar to the fifth design in Table 2.2.
The FPGA-based DRTS can simulate a power system with up to 160 outputs/nodes,
four PECs, three EMs, one IHULM, 96 dynamic branches and 14 independent sources in
real-time and operates at 120MHz, therefore, minimum real-time time-step is 2.54167µs,
regardless of size of the system. Eight independent sources can be sent from RTDS to
enable representing power components that can be modeled as independent sources, e.g.,
Solar-PV units and Batteries.
Time-step control module coordinates operation of six simulation modules, i.e., inde-
pendent source-, history current source-, PEC-, IHULM-, EM- and Matrix-vector multi-
plication module to ensure their parallel stall-free operation. Figure 6.8 illustrates timing
diagram of parallel operation of simulation modules where minimum time-step is
∆tmin = τ6 = 2nb + τACC . (6.29)
115
VERIFICATION
To ensure stall-free parallel operation of simulation modules, latency of theses modules

τ1 to τ5 and the size of their output vectors l1 to l5 must satisfy
τ1 ≤ τ6 ,
τ2 ≤ 2l1 ,
τ3 ≤ 2(l1 + l2 ),
(6.30)
τ4 ≤ 2(l1 + l2 + l3 ),
τ5 ≤ 2(l1 + l2 + l3 + l4 ),
l1 + l2 + l3 + l4 + l5 = nb ,
where nb = 138. Equation (6.30) enforces sufficient condition to have a stall-free pipeline
design, i.e., the data consumer module should not wait for the output(s) of data pro-
ducer(s).
tn
τ6=Δtmin tn+1
Independent Source Module
τ1
History Current Source Module
τ2
PEC Module
τ3
IH_ULM Module
τ4
CPVBR-based EM Module Processing data
τ5 Inputting data
Matrix-Vector Mult Module Outputting data
time τACC
Figure 6.8. Timing diagram of parallel operation of simulation modules in FPGA-based DRTS
6.4.2 Model Preparation Script

Figure 6.9 illustrates a diagram of model preparation script which has three steps, i.e.,
(i) creating an intermediate netlist from system netlist based on PEC and EM modeling
approaches, (ii) calculating admittance matrix and each component model parameters
and (iii) calculating system matrix of (4.26). These steps are shown in the left, middle
and right column of Fig. 6.9, respectively, and explained as follows.
116
VERIFICATION
Order of Execution
Calculation of RMNA matrix
and Module parameters
Preparation of Intermediate Netlist Calculating system matrix
Independent source Find:

(Δt,as,stype,φ0,fs)
History current Find:

source update (cv,ch,ih0)
Find
IH-ULM (GY , aHdijs , DHijs, aYdis, DYdjs)
Replace each PECSN by its terminal For each PEC For ith PEC find:
voltage/current sources find:
PEC Matrices Yeqi, HCONVi(s),
Y’conv(s)
HSWi(s)
and HeqRMNAi
Replace each EM by its For each EM find:

CPVBR interface Matrices Ce1, Ce2, De,
EM
EM ADr, BDr, CDr, DDr,
and cω1, cω2.
Construct system matrix
Constitute admittance matrix Y
Matrix-Vector HRMNA
And obtain matrix HRMNA HeqRMNA1
Multiplication
.
HeqRMNA4
Figure 6.9. Model preparation algorithm diagram including three steps, 1st step (column) is to prepare
the intermediate netlist file, 2nd step is to find module parameters and RMNA matrix and 3rd step is to
obtain the network equivalent parameter of PECSNs. The process required for each component model
is shown in the corresponding row.
The first step reads the system netlist file and creates an intermediate netlist by re-
placing each PECSN by its voltage/current source representation, as explained in Section
4.4, and each EM by its CPVBR model interface circuit, as described in Section 5.3.
The second step provides discretized models of the dynamic branches from the in-
termediate netlist, including those of CPVBR model interface circuit and transmission
line and constitutes matrices Y and HRM N A . Moreover, it calculates parameters of each
module, e.g., EM matrices and parameters corresponding to different PUs in each EM
module of Fig. 5.8. For each PEC, the admittance matrix of the corresponding PECSN
Y0Convi (s) is calculated for all permutation of its switch states.
Third step obtains matrices Heqi and Yeqi and therefore HeqRM N Ai for each PEC.
These matrices and HRM N A s are used to form the system matrix of (4.26). Finally, for
each PEC, it calculates HConvi (s), HSW i (s), using Heqi , Yeqi and Y0Convi (s) , according
to Section 4.4.
The module parameters are used by MB soft-processor to run a software application,
117
VERIFICATION
developed in Xilinx SDK environment, and configure each module using its corresponding
parameters via AXI interconnection.
6.5 Performance Evaluation

This section evaluates performance of the proposed IHULM module and real-time simula-
tion platform by conducting three simulation case studies. Each case study is performed
based on two method, i.e., (i) the FPGA-based method using the real-time simulation
platform of Fig. 6.6 with the time-step of 2.6µs and (ii) reference method, using PSCAD
and PSIM with the same time-step of 2.6µs.
6.5.1 Case I: Transmission line step response

This case investigates performance of the proposed IHULM module. Figure 6.10 shows
a three-phase 220kV untransposed transmission line. The length of the line is 300km
and Appendix B provides the tower configuration, VF procedure and the corresponding
results for admittance matrix and propagation function of the line.
Represented RTDS Represented in FPGA-based DRTS

t=0s
300km as
ar
br bs
cr cs
220kV
Figure 6.10. Overhead three-phase transmission line
The receiving end is open and a step DC voltage of 220kV is applied to phase a of the
sending end at t = 0s. Figure 6.11 illustrates simulation results from the FPGA-based
method using the proposed IHULM and those of the reference method, using PSCAD.
Time-delays of this line is around t = 1ms, as shown in Appendix B. The wave from
sending end reaches the receiving end after around 1ms and causes voltage transients.
This wave goes back and reaches the sending end at around t = 2ms and cause voltage
transients on phases b and c. Figure 6.11 shows that results from the FPGA-based
method closely matches those of the reference method.
118
VERIFICATION
Reference method
Receiving end voltages(kV)

FPGA−based method
600
600
400 400
200 200
0 0
−200 −200
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Sending end voltages(kV)
300 300
200 200
100 100
0 0
−100 −100
−200 −200
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time(s) Time(s)
Figure 6.11. Simulation results of overhead line energization. The left column includes the results from
FPGA-based method, based on the proposed real-time simulation platform with the real-time time-step
of 2.6µs and the right column contains the results from the reference method, using PSCAD, based on
its phase domain model, with the time-step of 2.6µs.
6.5.2 Case II: Grid connected DER

Figure 6.12(a) depicts a 500kVA Distributed Energy Resource (DER) unit connected
to 27kV distribution network which is supplied by a 220kV transmission system via a
transmission line. The 220kV line is identical to that of Case I. In the FPGA-based
method, the transmission line sending end is represented in the FPGA-based DRTS and
the receiving end is in RTDS. The rest of the system is represented in RTDS by a voltage
source behind an impedance. Figure 6.12(b) shows diagram of the DER unit which
contains a three-phase two-level VSC, a three-phase AC filter and a ∆Y transformer and
Table 6.3 gives the parameters.
Table 6.3
DER unit parameters
Description Symbol Value

DER AC filter inductor Lf 570µH
DER AC filter resistance rf 21mΩ
DER AC filter capacitance Cf 87µF
DER DC voltage VDC 1200V
DER DC-link capacitor CDC 57mF
DER DC-side resistor rDC 1mΩ
DER DC-link inductance LDC 10µH
Switching Frequency FSW 5kHz
119
VERIFICATION
Represented RTDS Represented in FPGA-based DRTS 500kVA DER

10MVA 0.5MVA
AC System
DER1
0.5MVA
T5 27kV 600V
220kV 27kV
(a)
10MW
LDC rDC
Lls rTs Lls rTs Lf rf
CDC
Cf
(b)
Figure 6.12. (a) Grid connected DER study system (b) DER unit model
The modulation index of the VSC is set to m = 0.5∠15◦ with respect to the AC
system. The initial voltage of the DC-link capacitor is set to 1200V and the other initial
conditions are set to zero.
Figure 6.13 presents the start-up transients of the system using the FPGA-based
method and the reference method. Voltages at the receiving end, i.e., the end represented
in RTDS, start from zero, however, it takes around 1ms for the voltage waves to reach
the sending end due to the transmission line delay.
A three-phase to ground fault is applied to the grid terminal at t=0.5s to study the
DER current transients and the simulation results are presented in Fig. 6.14. Subse-
quent to the fault inception, the DER three-phase currents increase and at t=0.51s the
gate signals of the VSC are blocked. Figure 6.14 shows that, after the gate signals are
blocked, the DER currents go to zero through the anti-parallel diodes of the VSC. Figure
6.14 indicates that the results from the FPGA-based method closely match those of the
reference method.
6.5.3 Case III: Microgrid

Figure 6.15(a) depicts a 60Hz microgrid system which includes four 500kVA DER units,
similar to that of Case II, three 50hp IM units and five 150kW loads which are connected
to a 27kV distribution network including five identical transmission lines that are repre-
120
VERIFICATION
Receiving end voltages(kV)

FPGA−based method Reference method
400 400
200 200
0 0
−200 −200
−400 −400
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Sending end voltages(kV)
400 400
200 200
0 0
−200 −200
−400 −400
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
2 2
DER currents(kA)
1 1
0 0
−1 −1
−2 −2
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time(s) Time(s)
Figure 6.13. Start up simulation results of Case II, grid connected DER. The receiving end and sending
end voltages of the transmission line are shown in first and second row, respectively, and the DER
three-phase currents are shown in the third row. The left column includes the results from FPGA-based
method, based on the proposed real-time simulation platform with the real-time time-step of 2.6µs and
the right column contains the results from the reference method, using PSCAD with the time-step of
2.6µs.
sented by their lumped element models. T1 to T4 are identical, 500kV A transformers

at voltage rating of 27kV /600V . T5 to T7 are also identical units at power ratings of
200kV A and voltage ratings of 27kV /460V . Primary and secondary leakage inductance
of T1 and T5 are 0.1pu with respect to their rated values. Parameters of each 50hp IM
are listed in Table 6.4.
The DC-link capacitors initial voltage is set to 1200V and all other initial conditions
are set to zero and two scenarios are simulated for this case which are explained as follows.
Microgrid start-up
In this scenario the microgrid of Fig. 6.15 experiences start-up transients. The modula-
tion indices of DER1 to DER4 are set at m1 = 0.4∠30◦ , m2 = 0.4∠45◦ , m3 = 0.4∠30◦
and m4 = 0.4∠45◦ and angles are with respect to the AC system voltage. It should be
noted that 30◦ offset is considered because of the ∆Y transformer phase-shift.
121
VERIFICATION
Sending end voltages(kV) Receiving end voltages(kV)

FPGA−based method Reference method
400 400
200 200
0 0
−200 −200
−400 −400
0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55
400 400
200 200
0 0
−200 −200
−400 −400
0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55
4 4
DER currents(kA)
2 2
0 0
−2 −2
−4 −4
0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55
Time(s) Time(s)
Figure 6.14. Fault response simulation results of Case II, grid connected DER. The receiving end and
sending end voltages of the transmission line are shown in first and second row, respectively, and the
DER three-phase currents are shown in the third row. A three-phase to ground fault is applied to the
grid voltage at t=0.5s and VSC gate signals are blocked at t=0.51s. The left column includes the results
from FPGA-based method, based on the proposed real-time simulation platform with the real-time time-
step of 2.6µs and the right column contains the results from the reference method, using PSCAD with
the time-step of 2.6µs.
Figure 6.16 shows the three-phase currents of four DERs. The left column is the
results from FPGA-based method and the right column is the results from the reference
method. Four DER three-phase currents reach steady-state after about 100ms. DER1
and DER3 generate output voltages with the same fundamental frequency and angle,
however, DER1 steady-stat current amplitude is less than DER3 as it is electrically
further away from the grid voltage. The same conclusion can be drawn comparing DER2
and DER4 currents. Figure 6.16 also shows that DER currents from FPGA-based method
favourably match those of the reference method.
Figure 6.17 depicts start-up transients of IMs. The left column includes the results
from FPGA-based method and the right column shows the results from the reference
method. It illustrates that each IM draws around 400A (peak) three-phase currents to
provide start-up torque. All IMs have identical mechanical and electrical parameters
and it takes around 1.3s to reach their steady-state, however, IM3 draws slightly larger
122
VERIFICATION
200kVA
150kW
IM1 50hp 500kVA

T5
27kV 460V
DER1
500kVA
T1
27kV/600V
DER2
500kVA
T2
200kVA 27kV/600V
150kW 150kW
AC System
IM2 50hp
T6 DER3
27kV 460V 500kVA
T3
27kV/600V
27kV
DER4
200kVA 500kVA
150kW T4
27kV/600V
IM3 50hp 150kW

T7
27kV 460V
Figure 6.15. (a) Microgrid case study system embedding four DER units
currents and reaches the steady state speed slightly faster than the other two IMs as it
is electrically closer to the grid. Figure 6.17 indicates the results from the FPGA-based
method closely matches those of the reference results.
Three-phase fault
In this scenario the microgrid experiences a fault disturbance. A temporary three-phase

to ground fault is applied to the grid terminal at t = 1.5s and removed at t = 1.532s.
Figure 6.18 illustrates transients of DER currents subsequent to the fault incident.
One can observe that during fault, i.e., 1.5 < t < 1.532, DER1 and DER2 currents
reach similar peak values (' 1150A) since they have the same electrical distance from
the faulted point. DER3 and DER4 currents also reach similar peak current values
(' 1350A) which are greater than the first two DER currents peak values as they are
relatively closer to the faulted point. Figure 6.18 also shows that results from the FPGA-
based method closely matches those from the reference method.
Figure 6.19 shows transients of the IMs subsequent to the fault incident. IM2 has the
least peak current during fault since (i) it is electrically further from the fault point as
compared to IM3 and (ii) it has the voltage support from DER1 and DER2 as compared
to IM1. Figure 6.19 also illustrates that the effect of fault disturbance on IM2 speed and
torque is less than those of the other two IMs. It can be observed that the result from
FPGA method closely match those of the reference method.
123
VERIFICATION
Table 6.4
Microgrid 50hp IM parameters [1]
Description Symbol Value

IM stator resistance rs 87mΩ
IM stator leakage inductance Ls 0.801mH
IM magnetizing inductance Lm 34.7mH
IM equivalent rotor resistance rr0 0.228Ω
IM equivalent rotor inductance L0r 0.801mH
IM moment of inertia J 1.662N ms2

1000 1000
DER1 currents(A)
500 500
0 0
−500 −500
−1000 −1000
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
2000 2000
DER2 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
1000 1000
DER3 currents(A)
500 500
0 0
−500 −500
−1000 −1000
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
2000 2000
DER4 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
0 0.02 0.04 0.06 0.08 0.1 0 0.02 0.04 0.06 0.08 0.1
Time(s) Time(s)
Figure 6.16. Simulation results of DER currents in the microgrid start-up. DER1 to DER4 three-phase
currents are shown in the first to fourth row, respectively. The left column includes the results from
of 2.6µs and the right column contains the results from the reference method, using PSIM with the
time-step of 2.6µs.
124
VERIFICATION

IM1 currents(A) 500 500
0 0
−500
0 0.5 1 1.5 −500
0 0.5 1 1.5
500 500
IM2 currents(A)
0 0
−500
0 0.5 1 1.5 −500
0 0.5 1 1.5
500 500
IM3 currents(A)
0 0
−500 −500
−1000
0 0.5 1 1.5 −1000
0 0.5 1 1.5
1000 1000
IM3
EM torque
500 IM2 500

IM1
0 0
−500
0 0.5 1 1.5 −500
0 0.5 1 1.5
Time(s)
200 200
Speed(rad/sec)
IM3
150 IM2 150
100 IM1
100
50 50
0
0 0.5 1 1.5 0
0 0.5 1 1.5
Time(s)
Time(s)
Figure 6.17. Simulation results of IMs in the microgrid start-up. IM1 to IM3 three-phase currents are
shown in the first to third row, respectively. Fourth and fifth row are respectively the electromagnetic
torque and the mechanical speed of the IMs. The left column includes the results from FPGA-based
the right column contains the results from the reference method, using PSIM with the time-step of 2.6µs.
6.6 Summary and Conclusion

This chapter presented IHULM, i.e., a transmission line model for which one end of
the line is represented in the FPGA-based DRTS and the other end in the RTDS. A
low latency IHULM hardware module was designed, based on floating-point numerical
representation, which operates in parallel with the rest of the modules in the FPGA-based
DRTS.
This chapter also presented the real-time simulation platform which includes the
hardware platform and model preparation script. The hardware platform includes RTDS
and the FPGA-based DRTS. The FPGA-based DRTS can simulate systems with up to
160 nodes/outputs, 96 dynamic branches, four PECs, three EMs and one IHULM with
125
VERIFICATION

2000 2000
DER1 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.48 1.5 1.52 1.54 1.56 1.58 1.6
2000 2000
DER2 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.48 1.5 1.52 1.54 1.56 1.58 1.6
2000 2000
DER3 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.48 1.5 1.52 1.54 1.56 1.58 1.6
2000 2000
DER4 currents(A)
1000 1000
0 0
−1000 −1000
−2000 −2000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.48 1.5 1.52 1.54 1.56 1.58 1.6
Time(s) Time(s)
Figure 6.18. Simulation results of DER currents in the microgrid fault study. DER1 to DER4 three-phase
currents are shown in the first to fourth row, respectively. The left column includes the results from
of 2.6µs and the right column contains the results from the reference method, using PSIM with the
time-step of 2.6µs.
the real-time time-step of 2.55µs. The model preparation script is a MATLAB code that
reads the netlist file of a study system and prepares parameters required by the FPGA-
based DRTS, using component modeling approaches explained in this thesis. Three real-
time simulation studies were conducted using the proposed real-time simulation platform
and results were compared against those of an off-line simulation package.
In summary, the main features of the proposed IHULM and real-time simulation
platform include:
• IHULM enables integration of the proposed FPGA-based DRTS into an industry-

standard real-time simulator, i.e., RTDS, by enabling modeling-level partitioning
of power systems, using natural delay of transmission lines.
126
VERIFICATION

IM1 currents(A) 400 400
200 200
0 0
−200
−200
−400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64 −400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64
IM2 currents(A)
400 400
200 200
0 0
−200
−200
−400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64 −400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64
IM3 currents(A)
400 400
200 200
0 0
−200
−200
−400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64 −400
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64
500 500
IM3
EM torque
0 IM2 0
IM1
−500 −500
−1000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64 −1000
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64
Time(s)
Speed(rad/sec)
190 190
IM3
188 IM2 188
IM1
186 186
184
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64 184
1.48 1.5 1.52 1.54 1.56 1.58 1.6 1.62 1.64
Time(s)
Time(s)
Figure 6.19. Simulation results of IMs in the microgrid fault study. IM1 to IM3 three-phase currents are
shown in the first to third row, respectively. Fourth and fifth row are respectively the electromagnetic
torque and the mechanical speed of the IMs. The left column includes the results from FPGA-based
the right column contains the results from the reference method, using PSIM with the time-step of 2.6µs.
• IHULM task/module is fully in parallel with other simulation task/modules which

enables achieving small real-time simulation time-step, i.e, 2.55µs.
• To the best of our knowledge the proposed IHULM FPGA-based implementation
is the first phase-domain line model, using standard floating-point numerical rep-
resentation, with a 2µs-range time-step.
• The FPGA-based DRTS stall-free pipelined design enables parallel operation of the
modules and simulation of systems up to 160 nodes/outputs, four PECs and three
EMs and one IHULM, with a real-time time-step of 2.55µs.
• The proposed platform is based on a fixed-hardware FPGA-based DRTS design
which eliminates any requirement for time-consuming hardware code compilation.
127
VERIFICATION
• The proposed platform is scalable for conducting real-time simulation studies of

power systems including multiple microgrids.
128
Chapter 7
Summary and Conclusion
7.1 Summary
This dissertation introduces an FPGA-based real-time simulation platform for real-time
simulation studies of large-size power systems which include short transmission lines and
multiple PECs. Presence of short transmission lines creates a large set of network equa-
tions that can not be decoupled. The real-time simulation platform should solve these
equations with a small time-step, due to presence of high switching frequency PECs.
Handling the large admittance matrix, representing the large set of network equations,
requires floating-point numerical representation which is associated with larger latencies
in FPGAs, as compared to fixed-point representation. These latencies prevent achieving
small simulation time-steps for large systems. Moreover, the real-time simulation plat-
form should accurately simulate EMs, as components of power systems, under various
test scenarios and system configurations.
The existing work, for simulation of systems targeted in this thesis, suffers from the
following limitations. The existing FPGA-based simulators (i) are not capable of simulat-
ing networks with large number of nodes using a small time-step, (ii) employ ADC model
to represent PEC switches which introduces substantial errors, e.g., artificial switching
loss and numerical oscillations which are unacceptable for most of practical applications,
(iii) use QD EM model that fails to emulate EM behaviour under specific test scenarios
and system configurations and (iv) are mainly based on fixed-point representation or use
intermediate fixed-point accumulation stage.
The FPGA-based real-time simulation platform of this thesis (i) can simulate large
networks with a small time-step of 2.55µs and up to 160 outputs/nodes, (ii) employs two-
value resistor switch model to accurately represent PECs, (iii) accurately simulates EMs
under all test scenarios and system configurations, (iv) is based on standard floating-point
numerical representation, (v) can be interfaced with commercially available simulation
129
CHAPTER 7. SUMMARY AND CONCLUSION
platform, e.g.,RTDS, to enable parallel simulation of distribution system and the rest of
the system (in RTDS) and (vi) features fixed-hardware design, therefore, time-consuming
hardware code re-compilation is avoided. This platform also includes a model preparation
script that calculates study system parameters based on the modeling approaches devised
in this work.
This thesis also presents RMNA formulation of network equations to solve relatively
large set of equations associated with networks featuring short transmission lines and
enable parallelism between the simulation task of solving network equations and those
of other power system component models. A low latency floating-point matrix-vector
multiplication hardware module is designed to carry out the simulation task of solving
network equations. A novel FPGA-based partitioning method is proposed for accurate
real-time simulation of multiple PECs using two-value resistor model. Based on the par-
titioning method, a PEC hardware module is designed and replicated to simulate systems
embedding multiple PECs. A new EM CPVBR model is proposed to enable real-time
simulation of EMs, regardless of their dynamic saliency and system configuration. An
EM hardware module is designed, based on the proposed EM model, that can represent
different types of EMs, i.e., IM, SM and PMSM. This thesis also presents IHULM which
enables partitioning of power system model, using transmission line natural delays based
on ULM, to enable parallel simulation of distribution system in the FPGA-based DRTS
and the rest of the system in RTDS.
7.2 Conclusions
This thesis shows that, although FPGAs are powerful computation devices, the exist-
ing power system EMT simulation algorithms are not readily applicable for time- and
resource-efficient FPGA-based DRTS design. However, reformulation of the existing
equations and/or devising new component models, tailored for FPGA-based implemen-
tation, can facilitate achieving such design. This thesis also investigates and shows that:
• In contrast to the existing works in the literature, FPGA-based DRTS design, with
small real-time simulation time-step, based on standard floating-point numerical
representation, can be enabled to simulate large networks using a small time-step
by reformulation of floating-point accumulation process explained in Chapter 2;
• Fixed-hardware FPGA-based DRTS design can avoid time-consuming hardware
code recompilation;
• Using High Level Synthesis (HLS) FPGA programming method enables more ag-
gressive design exploration and achieving a higher FPGA-based DRTS performance,
130
i.e., simulating a large system in a small time-step, than using solely HDL languages;
• In partitioning of model of a system, including multiple PECs, considering each
PECSN as a subsystem enables representing power switches by two-value resistor
model. This also enables retaining constant system admittance matrix to avoid
on-the-fly LU factorization;
• CPVBR EM model suits the structure of the proposed FPGA-based DRTS, since
it enables direct interface with the model of the rest of the system and retains
constant admittance matrix;
• In FPGA-based real-time simulation of systems targeted in this work, i.e., systems
with large number of nodes and short transmission lines, the module associated
with the solution of network equations imposes highest computational burden and
therefore is the dominant resource-consuming module.
7.3 Contributions
The main contribution of this dissertation is designing a fixed-hardware FPGA-based
DRTS to avoid hardware code recompilation, using standard floating-point numerical
representation. The FPGA-based DRTS can simulate networks up to 160 nodes/outputs,
four PECs and three EMs with the real-time time-step of 2.55µs. Moreover, the simula-
tor is equipped with an IHULM module that enables parallel simulation of distribution
system in the FPGA-based DRTS and the rest of the system in RTDS. The detailed
technical innovations of this thesis include:
• Devising RMNA formulation to enable (i) parallelism between the task associated
with the solution of the network and those of other component models and (ii)
appropriate scheduling of the tasks based on their associated latency to achieve a
stall-free pipeline design and therefore small real-time simulation time-step.
• Designing a low-latency matrix-vector multiplication module to enable solution
of a large set of network equations, based on standard floating-point numerical
representation.
• Developing a solution-level partitioning method for simulation of PECs which
(i) maintains constant network admittance matrix, while using two-value resistor
model to represent power switches in PECs, (ii) is generalized for application to
multiple PECs which enables real-time simulation of systems embedding multiple
PECs.
• Designing an FPGA-based PEC module, based on the proposed partitioning method,
that can represent PECs with up to 6 switches and requires relatively small FPGA
131
logic and memory resources to save the PEC parameters and matrices.
• Devising a CPVBR EM model which (i) enables direct interface of the EM model
to the rest of the system, (ii) is compatible with RMNA formulation and the de-
vised partitioning method for simulation of PECs, (iii) enables representing EMs
regardless of their dynamic saliency and (iv) enables parallelism between the EM
simulation task and the rest of the tasks in the FPGA-based DRTS.
• Developing an FPGA-based EM module, based on the proposed CPVBR model,
that can represent different types of EMs, i.e., IM, SM and PMSM.
• Designing IHULM module that (i) enables integration of the proposed FPGA-
based DRTS into an industry-standard real-time simulator, i.e., RTDS, based on
modeling-level partitioning of power systems using the natural delay of the trans-
mission lines and (ii) operates in parallel with the other simulation task/modules
which enables achieving small real-time simulation time-step. This module enables
parallel simulation of distribution system in the FPGA-based DRTS and the rest
of the system in RTDS.
• Developing a model preparation script that automatically calculates the parameters
and matrices of a study system using its describing netlist file, based on RMNA
formulation and component modeling approaches devised in this dissertation.
7.4 Future Work

In continuation of this thesis, the following directions are suggested for future studies:
• Developing a constant parameter technique for modeling magnetic saturation to

enable representation of saturated transformers.
• Extending the FPGA-based CPVBR model to consider EM magnetic flux satura-
tion to enable representation of saturated EMs.
• Employing a higher bandwidth transceiver module to exchange larger number of
control and monitoring variables which is 24 for the current transceiver. This will
enable simulation of larger number of PECs, since the resource utilization of the
PEC module is insignificant and the number of modulation index signals currently
limits the maximum number of PECs in the proposed platform.
• Extending IHULM to represent underground cables to enable real-time simulation
studies of cable systems that lend themselves to the structure of the proposed
real-time simulation platform, e.g., off-shore wind farms.
132
Appendix A
EM Matrices
This section presents the matrices of the proposed generalized EM and the CPVBR EM
model which are defined by
h iT
λ r = λqr1 , λdr1 , λqr2 , λdr2 , (A.1)
h iT
0
ur = isq , isd , λmag , vf , vsq , (A.2)
h iT
yλ = λmq λmd λ”q λ”d , (A.3)
 
L”mq rrq1 L”mq
− Lrrq1 (1 − Lrq1
) 0 Lrq1 Lrq2
0
 rq1 L”md rrd1 L”md

rrd1
 0 − Lrd1 (1 − Lrd1 ) 0 Lrd1 Lrd2

Ar =  , (A.4)
 
r L” L”
rq2 mq rrq2 mq

 Lrq2 Lrq1
0 − Lrq2
(1 − Lrq2
) 0 

”
rrd2 Lmd ”
Lmd
rrd2
0 Lrd1 Lrd2
0 − Lrd2 (1 − Lrd2 )
 
rrq1 L”mq
0 0 0 0
 Lrq1 rrd1 L”md rrd1 L”md

 0 0 0
Lrd1 Lrd1 Lmd
Br =  rrq2 L”mq , (A.5)
 

 Lrq2 0 0 0 0
rrd2 L”mq rrd2 L”md
0 Lrd2 Lrd2 Lmd
1 0
 ” 
Lmq L”mq
L
0 Lrq2
0
 rq1 L”md L”md 

 0 0
Lrd1 Lrd2 
Cr =  L”mq , (A.6)

L”mq

 Lrq1 0 Lrq2
0 

L”md ”
Lmd
0 Lrd1
0 Lrd2
133
APPENDIX A. EM MATRICES
 
L”mq 0 0 0 0
L”md
 0 L”md
 
Lmd
0 0
Dr = 
 0
, (A.7)
 0 0 0 0
L”md
0 0 Lmd
0 0
 
L” rrq1 L” rrq2
− mq2 0 − mq
L2rq2
0
Ce1 =  Lrq1 , (A.8)
rrd1 L”md rrd2 L”md
0 − L2rd1
0 − L2rd2
 ” 
L rrq2 L”mq rrq1
( mq
L2rq2
+ L2rq1
) 0 0 ω
Ce2 (ω) =  rrd2 L”md rrd1 L”md
, (A.9)
0 ( L2rd2
+ L2rd1
) −ω 0
and " #
0 ω(L”d − L”q ) 0 0 0
De (ω) = L”md L”d −L”q . (A.10)
0 0 0 Lrd2 ∆t
134
Appendix B
Transmission Line Parameters and

Fitting Results
This section presents the parameters and fitting results of the transmission line used
in this thesis. The tower configuration of an untransposed overhead transmission line
is given in Fig. B.1. The values of the characteristic admittance matrix Y c (s), prop-
agation matrix H (s), impedance per unit length Z (s) and admittance per unit length
Y (s) at sample frequencies ωs s are obtained using PSCAD and the fitting results of the
transmission line is explained as follws.
G1 10m G2
10m
C2
5m
10m
C1 C3
30m
Ground Resistivity: 100 Ω.m

Relative Permeability: 1
Figure B.1. Tower Configuration
135
APPENDIX B. TRANSMISSION LINE PARAMETERS AND FITTING RESULTS
Table B.1
Conductor and ground wire data
Description Value
Conductor radius 0.020345 Ω
Conductor DC resistance 0.032060 Ω/km
Ground wire radius 0.005524 m
Ground wire resistance 2.86450 Ω/km
The fitting of Y c (s) is carried out using vector fitting for matrix transfer functions
[72] [73] and Fig.B.2 shows the results from fitting of Y c (s).
−3
x 10
2.5
2 data
fitted
(s)
c(1,1)
1.5
Y
0.5
−1 −4 0 1 2 3 4 5 6
x 10
3.5
2.5
(s)
c(1,2)
2
Y
1.5
0.5
−1 0 1 2 3 4 5 6
−4
x 10
3.5
2.5
(s)
c(1,3)
2
Y
1.5
0.5
−1 0 1 2 3 4 5 6
Log(f)
Figure B.2. Results from fitting of characteristic admittance Y c (s). Top is Y c (1, 1)(s), middle Y c (1, 2)(s)
and bottom Y c (1, 3)(s).
To fit H (s) the mode functions hi (s)s are obtained by (6.5) at all sample frequency
136
ωs s. The time delays of the transmission line are
[τ1 τ2 τ3 ]T = [1002µs µs1024µs 1002µs]T . (B.1)
Figure B.3 shows the fitting results of hi (s)s.
1
data
0.8 fitted
0.6
|h1(s)|
0.4
0.2
0
−1 0 1 2 3 4 5 6
1
0.8
0.6
|h2(s)|
0.4
0.2
0
−1 0 1 2 3 4 5 6
1
0.8
0.6
|h3(s)|
0.4
0.2
0
−1 0 1 2 3 4 5 6
Log(f)
Figure B.3. Results from fitting of mode functions. Top is h1 (s), middle h2 (s) and bottom h3 (s).
The poles of hi (s)s are used to obtain residue matrices of (6.6) using
A Rx R = B R (B.2)
where x R is an unknown vector, containing entries of R Hji s, and each row of A R corre-
sponds to (6.6) at a specific sample frequency point [48]. Equation (B.2) is an overde-
termined set of linear equations, i.e., the number of unknowns is less than number of
137
equations, therefore,
min ||A
ARx R − B R ||2 (B.3)
is solved by [48]
xR = A+
R BR (B.4)
where
A+ ATRA R )−1A TR
R = (A (B.5)
Figure B.4 shows fitting results of H (s).
1
data
0.8 fitted
H(1,1)(s)
0.6
0.4
0.2
0
−1 0 1 2 3 4 5 6
0.4
0.3
(s)
(1,2)
0.2
H
0.1
0
−1 0 1 2 3 4 5 6
0.4
0.3
H(1,3)(s)
0.2
0.1
0
−1 0 1 2 3 4 5 6
Log(f)
Figure B.4. Results from fitting of propagation matrix H (s). Top is H (1, 1)(s), middle H (1, 2)(s) and
bottom H (1, 3)(s).
138
Bibliography
[1] Paul C. Krause, Oleg Wasynczuk, Scott D. Sudhoff, and Steven Pekarek. Introduc-
tion to the Design of Electric Machinery, pages 608–. Wiley-IEEE Press, 2013.
[2] [online]. available: http://www.xilinx.com/.
[3] M. Chapariha, L. Wang, J. Jatskevich, H. W. Dommel, and S. D. Pekarek. Constant-

parameter rl -branch equivalent circuit for interfacing ac machine models in state-
variable-based simulation packages. IEEE Transactions on Energy Conversion,
27(3):634–645, Sept 2012.
[4] Farid N. Najm. Circuit simulation. Wiley-IEEE Press.
[5] M. Pavella, D. Ernst, and D. Ruiz-Vega. Transient Stability of Power Systems.

Springer.
[6] N. Watson and J. Arrillaga. Power Systems Electromagnetic Transients Simulation.
[7] M. D. Omar Faruque, T. Strasser, G. Lauss, V. Jalili-Marandi, P. Forsyth, C. Dufour,

V. Dinavahi, A. Monti, P. Kotsampopoulos, J. A. Martinez, K. Strunz, M. Saeedi-
fard, Xiaoyu Wang, D. Shearer, and M. Paolone. Real-time simulation technologies
for power systems design, testing, and analysis. IEEE Power and Energy Technology
Systems Journal, 2(2):63–73, June 2015.
[8] RTDS Technology Inc. Winnipeg Canada. Real-time simulation. [online]. available:
http://www.rtds.com.
[9] H. W. Dommel. Digital computer solution of electromagnetic transients in single-

and multiphase networks. IEEE Transactions on Power Apparatus and Systems,
PAS-88(4):388–399, April 1969.
[10] T. Maguire and J. Giesbrecht. Small time-step vsc model for the real time digital
simulator. IPST, Montreal, QC, Canada,, 2005.
139
BIBLIOGRAPHY
[11] P. Pejovic and D. Maksimovic. A method for fast time-domain simulation of networks
with switches. Power Electronics, IEEE Transactions on, 9(4):449–456, Jul 1994.
[12] R. Meka, M. Sloderbeck, M.O. Faruque, J. Langston, M. Steurer, and L.S. DeBrun-
ner. Fpga model of a high-frequency power electronic converter in an rtds power
system co-simulation. In Electric Ship Technologies Symposium (ESTS), 2013 IEEE,
pages 71–75, April 2013.
[13] Pscad help [online] available: https://hvdc.ca/pscad/.
[14] OPAL-RT Technology Inc. Opal-rt website. [online]. available: http://opal-rt.com.
[15] C. Dufour, J. Mahseredjian, and J. Belanger. A combined state-space nodal method

for the simulation of power system transients. Power Delivery, IEEE Transactions
on, 26(2):928–935, April 2011.
[16] C. Dufour, J. Mahseredjian, J. Belanger, and J.L. Naredo. An advanced real-time

electro-magnetic simulator for power systems with a simultaneous state-space nodal
solver. In Transmission and Distribution Conference and Exposition: Latin America
(T D-LA), 2010 IEEE/PES, pages 349–358, Nov 2010.
[17] G. Sybille and P. Giroux. Simulation of facts controllers using the matlab power
system blockset and hypersim real-time simulator. In Power Engineering Society
Winter Meeting, 2002. IEEE, volume 1, pages 488–491 vol.1, 2002.
[18] C. Dufour, S. Cense, T. Ould-Bachir, L. Gregoire, and J. Belanger. General-purpose

reconfigurable low-latency electric circuit and motor drive solver on fpga. In IECON
2012 - 38th Annual Conference on IEEE Industrial Electronics Society, pages 3073–
3081, Oct 2012.
[19] Plexim: Electrical Engineering Software. [online]. available:

http://www.plexim.com/.
[20] [online]. available: https://www.mathworks.com/products/simulink/.
[21] M. Matar and R. Iravani. Fpga implementation of the power electronic converter
model for real-time simulation of electromagnetic transients. Power Delivery, IEEE
Transactions on, 25(2):852–860, April 2010.
[22] M. Matar and R. Iravani. Massively parallel implementation of ac machine models

for fpga-based real-time simulation of electromagnetic transients. Power Delivery,
IEEE Transactions on, 26(2):830–840, April 2011.
140
BIBLIOGRAPHY
[23] G.G. Parma and V. Dinavahi. Real-time digital hardware simulation of power elec-
tronics and drives. Power Delivery, IEEE Transactions on, 22(2):1235–1246, April
2007.
[24] Aung Myaing and V. Dinavahi. Fpga-based real-time emulation of power electronic
systems with detailed representation of device characteristics. Industrial Electronics,
IEEE Transactions on, 58(1):358–368, Jan 2011.
[25] N. Roshandel Tavana and V. Dinavahi. A general framework for fpga-based real-time
emulation of electrical machines for hil applications. Industrial Electronics, IEEE
[26] Can Wang, Wei Li, and J. Belanger. Real-time and faster-than-real-time simulation
of modular multilevel converters using standard multi-core cpu and fpga chips. In
Industrial Electronics Society, IECON 2013 - 39th Annual Conference of the IEEE,
pages 5405–5411, Nov 2013.
[27] H. Saad, T. Ould-Bachir, J. Mahseredjian, C. Dufour, S. Dennetiere, and S. Ngue-

feu. Real-time simulation of mmcs using cpu and fpga. Power Electronics, IEEE
Transactions on, 30(1):259–267, Jan 2015.
[28] K. Strunz and E. Carlson. Nested fast and simultaneous solution for time-domain
simulation of integrative power-electric and electronic systems. Power Delivery,
IEEE Transactions on, 22(1):277–287, Jan 2007.
[29] T. Ould-Bachir, H.F. Blanchette, and K. Al-Haddad. A network tearing technique

for fpga-based real-time simulation of power converters. Industrial Electronics, IEEE
Transactions on, 62(6):3409–3418, June 2015.
[30] Yuan Chen and V. Dinavahi. Fpga-based real-time emtp. Power Delivery, IEEE
[31] Yuan Chen and V. Dinavahi. Digital hardware emulation of universal machine and
universal line models for real-time electromagnetic transient simulation. Industrial
Electronics, IEEE Transactions on, 59(2):1300–1309, Feb 2012.
[32] Internationa council on large electric systems. [online]. available:

http://www.cigre.org/.
[33] L. Wang, J. Jatskevich, V. Dinavahi, H. W. Dommel, J. A. Martinez, K. Strunz,

M. Rioual, G. W. Chang, and R. Iravani. Methods of interfacing rotating machine
141
BIBLIOGRAPHY
models in transient simulation programs. IEEE Transactions on Power Delivery,

25(2):891–903, April 2010.
[34] V. Jalili-Marandi and V. Dinavahi. Simd-based large-scale transient stability sim-

ulation on the graphics processing unit. Power Systems, IEEE Transactions on,
25(3):1589–1599, Aug 2010.
[35] T. Ould-Bachir, C. Dufour, J. Blanger, J. Mahseredjian, and J. P. David. Effective

floating-point calculation engines intended for the fpga-based hil simulation. In
Industrial Electronics (ISIE), 2012 IEEE International Symposium on, pages 1363–
1368, May 2012.
[36] J. C. G Pimentel. Implementation of simulation algorithms in fpga for real time

simulation of electrical networks with power electronics devices. In 2006 IEEE In-
ternational Conference on Reconfigurable Computing and FPGA’s (ReConFig 2006),
pages 1–8, Sept 2006.
[37] G. Lauss, M.O. Faruque, K. Schoder, C. Dufour, A. Viehweider, and J. Langston.

Characteristics and design of power hardware-in-the-loop simulations for electrical
power systems. Industrial Electronics, IEEE Transactions on, 63(1):406–417, Jan
2016.
[38] G. Hachtel, R. Brayton, and F. Gustavson. The sparse tableau approach to network
analysis and design. IEEE Transactions on Circuit Theory, 18(1):101–113, Jan 1971.
[39] W. Vanderbauwhede and K. Benkrid. High-Performance Computing Using FPGAs.

Springer.
[40] J. P. David. Low latency solver for linear equation systems in floating point arith-
metic. In 2015 International Conference on ReConFigurable Computing and FPGAs
(ReConFig), pages 1–7, Dec 2015.
[41] [online]. available: http://www.ni.com/fpga/.
[42] [online]. available: https://www.altera.com/support/literature/lit-opencl-sdk.html.
[43] [online]. available: https://www.xilinx.com/products/design-tools/software-

zone/sdaccel.html.
[44] S. D. Pekarek, O. Wasynczuk, E. A. Walters, J. V. Jatskevich, C. E. Lucas, Ning Wu,

and P. T. Lamm. An efficient multirate simulation technique for power-electronic-
based systems. IEEE Transactions on Power Systems, 19(1):399–409, Feb 2004.
142
BIBLIOGRAPHY
[45] H. F. Blanchette, T. Ould-Bachir, and J. P. David. A state-space modeling approach

for the fpga-based real-time simulation of high switching frequency power converters.
IEEE Transactions on Industrial Electronics, 59(12):4555–4567, Dec 2012.
[46] Simscape Power Systems. [online]. available:

https://www.mathworks.com/products/simpower/.
[47] Chung-Wen Ho, A. Ruehli, and P. Brennan. The modified nodal approach to network
analysis. IEEE Transactions on Circuits and Systems, 22(6):504–509, Jun 1975.
[48] A. Morched, B. Gustavsen, and M. Tartibi. A universal model for accurate calcula-
tion of electromagnetic transients on overhead lines and underground cables. IEEE
Transactions on Power Delivery, 14(3):1032–1038, Jul 1999.
[49] Randall J. LeVeque. Finite Difference Methods for Ordinary and Partial Differential.
[50] H. Jin. Behavior-mode simulation of power electronic circuits. IEEE Transactions

on Power Electronics, 12(3):443–452, May 1997.
[51] R. Razzaghi, C. Foti, M. Paolone, and F. Rachidi. A novel method for the op-
timal parameter selection of discrete-time switch model. Proceedings of the 10th
International Conference on Power Systems Transients, pages 1–7, July 2013.
[52] K. K. Fung and S. Y. R. Hui. Fast simulation of multistage power electronic sys-
tems with widely separated operating frequencies. IEEE Transactions on Power
Electronics, 11(3):405–412, May 1996.
[53] S. Y. R. Hui, K. K. Fung, and C. Christopoulos. Decoupled simulation of dc-linked

power electronic systems using transmission-line links. IEEE Transactions on Power
Electronics, 9(1):85–91, Jan 1994.
[54] S. Y. R. Hui and C. Christopoulos. A discrete approach to the modeling of power

electronic switching networks. IEEE Transactions on Power Electronics, 5(4):398–
403, Oct 1990.
[55] T. Kato, K. Inoue, T. Fukutani, and Y. Kanda. Multirate analysis method for
a power electronic system by circuit partitioning. IEEE Transactions on Power
Electronics, 24(12):2791–2802, Dec 2009.
[56] J. R. Marti, L. R. Linares, J. A. Hollman, and F. A. Moreira. Integrated soft-

ware/hardware solution for real-time simulation of large power systems. PSCC,
Sevilla, Spain,, Jun 2003.
143
BIBLIOGRAPHY
[57] H. H. Happ. Diakoptics:the solution of system problems by tearing. Proceedings of

the IEEE, 62(7):930–940, July 1974.
[58] E. Dlala. Comparison of models for estimating magnetic core losses in electrical ma-
chines using the finite-element method. IEEE Transactions on Magnetics, 45(2):716–
725, Feb 2009.
[59] S.D. Pekarek, O. Wasynczuk, and H.J. Hegner. An efficient and accurate model
for the simulation and analysis of synchronous machine/converter systems. Energy
Conversion, IEEE Transactions on, 13(1):42–48, Mar 1998.
[60] J. R. Marti and K. W. Louie. A phase-domain synchronous generator model includ-

ing saturation effects. IEEE Transactions on Power Systems, 12(1):222–229, Feb
1997.
[61] P. J. Lagace, M. H. Vuong, and K. Al-Haddad. A time domain model for transient
simulation of synchronous machines using phase coordinates. In 2006 IEEE Power
Engineering Society General Meeting, pages 6 pp.–, 2006.
[62] L. Wang and J. Jatskevich. A voltage-behind-reactance synchronous machine model

for the emtp-type solution. IEEE Transactions on Power Systems, 21(4):1539–1549,
Nov 2006.
[63] H. W. Dommel. EMTP Theory Book.
[64] H. K. Lauw and W. S. Meyer. Universal machine modeling for the representation of
rotating electric machinery in an electromagnetic transients program. IEEE Trans-
actions on Power Apparatus and Systems, PAS-101(6):1342–1351, June 1982.
[65] D. C. Aliprantis, O. Wasynczuk, and C. D. Rodrguez Valdez. A voltage-behind-

reactance synchronous machine model with saturation and arbitrary rotor network
representation. IEEE Transactions on Energy Conversion, 23(2):499–508, June
2008.
[66] N. Amiri, M. Chapariha, S. Ebrahimi, J. Jatskevich, and L. Wang. Constant param-

eter vbr model of permanent magnet synchronous machine wind generation system.
In 2015 IEEE Power Energy Society General Meeting, pages 1–5, July 2015.
[67] L. Wang, J. Jatskevich, and S. D. Pekarek. Modeling of induction machines using

a voltage-behind-reactance formulation. IEEE Transactions on Energy Conversion,
23(2):382–392, June 2008.
144
BIBLIOGRAPHY
[68] S. D. Pekarek and E. A. Walters. An accurate method of neglecting dynamic saliency

of synchronous machines in power electronic based systems. IEEE Transactions on
Energy Conversion, 14(4):1177–1183, Dec 1999.
[69] M. Chapariha, F. Therrien, J. Jatskevich, and H. W. Dommel. Explicit formulations

for constant-parameter voltage-behind-reactance interfacing of synchronous machine
models. IEEE Transactions on Energy Conversion, 28(4):1053–1063, Dec 2013.
[70] B. Gustavsen and A. Semlyen. Simulation of transmission line transients using

vector fitting and modal decomposition. IEEE Transactions on Power Delivery,
13(2):605–614, Apr 1998.
[71] O. Ramos-Leanos, J. L. Naredo, J. Mahseredjian, C. Dufour, J. A. Gutierrez-Robles,

and I. Kocar. A wideband line/cable model for real-time simulations of power system
transients. IEEE Transactions on Power Delivery, 27(4):2211–2218, Oct 2012.
[72] B. Gustavsen and A. Semlyen. Rational approximation of frequency domain re-

sponses by vector fitting. IEEE Transactions on Power Delivery, 14(3):1052–1061,
Jul 1999.
[73] B. Gustavsen. Computer code for rational approximation of frequency dependent

admittance matrices. IEEE Transactions on Power Delivery, 17(4):1093–1098, Oct
2002.
145

Mirzahosseini Ramin 201711 PhD Thesis(1)

Uploaded by

Copyright:

Available Formats

Mirzahosseini Ramin 201711 PhD Thesis(1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mirzahosseini Ramin 201711 PhD Thesis(1)

Uploaded by

Copyright:

Available Formats

FPGA-based Real-Time Simulation Platform For Power

Grids Including Multiple Converters

A thesis submitted in conformity with the requirements

c Copyright 2017 by Ramin Mirzahosseini

2 FPGA-Based High Performance Computing 13

3 FPGA-Based Solution of Network Equations for EMT Simulation 25

4 FPGA-Based Real-Time Simulation of Power Electronic Converters

5 FPGA-Based Simulation of Electrical Machines (EMs) for EMT Anal-

6 Real-time Simulation Platform and Performance Verification 103

7 Summary and Conclusion 129

B Transmission Line Parameters and Fitting Results 135

2.1 Hardware Rsources available on VC7VX485T . . . . . . . . . . . . . . . 16

4.1 Multi-step numerical integration methods . . . . . . . . . . . . . . . . . . 46

5.1 Resource utilization of the EM module . . . . . . . . . . . . . . . . . . . 92

6.1 Resource utilization of IHULM Module . . . . . . . . . . . . . . . . . . . 112

B.1 Conductor and ground wire data . . . . . . . . . . . . . . . . . . . . . . 136

1.1 Power System Transients Time-Scale. . . . . . . . . . . . . . . . . . . . . 3

2.1 CLB resource in Virtex-7 family. . . . . . . . . . . . . . . . . . . . . . . . 15

3.1 Continuous-time and discretized model of L, C, series RL and series RC

4.1 Region of absolute stability for multi-step methods . . . . . . . . . . . . 48

5.1 General VBR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.12 PMSM Machine case study system . . . . . . . . . . . . . . . . . . . . . 96

5.15 Synchronous machine case study system . . . . . . . . . . . . . . . . . . 100

6.1 Frequency-domain transmission line model . . . . . . . . . . . . . . . . . 104

B.1 Tower Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

1.2 Statement of the Problem

1.2.1 Power System Analysis

10-7 10-5 10-3 10-1 101 103

Figure 1.1. Power System Transients Time-Scale.

1.2.2 Analyses of EMTs in Power System

1.2.3 Real-Time Simulation of EMTs

Digital Real-Time Signal Conditioning Hardware under Test

Figure 1.2. Block diagram of Hardware-In-The-Loop (HIL) platform.

1.2.4 Existing Digital Real-Time Simulator (DRTS) platforms

• There are also dedicated FGPA-based simulator algorithms to simulate specific

1.2.5 Motivation of this Work

Primary Transmission HV system

Figure 1.3. CIGRE benchmark system for renewable energy integration.

• Computational capability to accurately simulate multiple PECs.

• The existing FPGA-based developments are dependent on system configuration and

1.3 Research Objectives

– Presenting an efficient FPGA-based general nodal network solver that can

• Developing an FPGA-based EM model capable of emulating the machine behaviour

• Implementing the final design of the FPGA-based real-time simulator in a hardware

• FPGA-based implementation of transmission line model to interface the FPGA-

1.4 Thesis Outline

FPGA-Based High Performance

In this chapter FPGA resources and FPGA-based techniques to exploit parallelism

2.2 FPGA Hardware Resources

2.2.1 Configurable Logic Blocks (CLBs)

2.2.2 DSP Resources

Figure 2.2. Basic DSP48E1 Slice Diagram

2.2.3 Memory resources

Figure 2.3. Block RAM

Slices DSP Slices BRAM 36 Kb

2.3 Numerical Representations

floating-point representations according to IEEE-754. The floating-point representation

where N in (2.2) is the number of bits in the Exponent.

MSB N bits LSB

Single: 8bits Single: 23bits