Parallel-Vector Equation Solvers For Finite Element Engineering Applications
Parallel-Vector Equation Solvers For Finite Element Engineering Applications
Parallel-Vector Equation Solvers For Finite Element Engineering Applications
Nguyen, Duc T.
Parallel-vector equation sol vers for finite element engineering applicationslDuc Thai Nguyen.
p. cm.
lncludes bibliographical references and index.
ISBN 978-1-4613-5504-5 ISBN 978-1-4615-1337-7 (eBook)
DOI 10.1007/978-1-4615-1337-7
1. Finite element method. 2. Parallel processing (Electronic computers) 3. Differential
equations-Numerical solutions. I. Title.
ISBN 978-1-4613-5504-5
© 2002 Springer Science+Business Media New York
Originally published by Kluwer Academic /Plenum Publishers, New York in 2002
Softcover reprint ofthe hardcover Ist edition 2002
10 9 8 7 6 5 432
A C.I.P. record for this book is available from the Library of Congress.
No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any
means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written
permission from the Publisher
To Dac K. Nguyen
Thinh T. Thai
Hang N. Nguyen
Eric N. D. Nguyen and Don N. Nguyen
Preface
In spite ofthe fact that parallel-vector computational (equation solution) algorithms have
been receiving a lot of attentions for over 20 years, and a large number of research
articles have been published during this period, a limited number of texts and research
books have been written on the subject. Most (if not all) existing texts on this subject
have been written by computer scientists, and/or applied mathematicians. Majority of
existing texts have been over-emphasizing on theoretical developments of new, and/or
promising parallel (equation solution) algorithms (for varieties of applications in
Engineering and Science disciplines). Materials presented in most existing texts are
either too condense (without enough important detailed explanations), or too advance
for the typical senior undergraduate and/or graduate engineering students. It should be
emphasized here that while many important theoretical developments, which have
significant impacts on highly efficient existing parallel-vector (equation solution)
algorithms, have been carefully discussed and well-documented in current texts,
important detailed computer implementations of the developed algorithms, however,
have been usually omitted. Furthermore, it should be kept in minds that while few
existing texts in this subject (direct equation solution algorithms for parallel and/or
vector computers) have been written by computer scientists, and/or applied
mathematicians, truly large-scale models (which require parallel and vector capabilities
offered by modem high-performance computers) are often generated, solved, and
interpreted by the engineering communities.
This book is written to address the concerns mentioned above and is intended
to serve as a textbook for senior undergraduate, and graduate "engineering" students. A
number of state-of-the-art FORTRAN codes, however, have been developed and
discussed with great details in this textbook. Special efforts have been made by the
author to present the materials in such a way to minimize the mathematical background
requirements for typical senior undergraduate, and graduate engineering students. Thus,
compromises between rigorous mathematics and practical simplicities are sometimes
necessary.
This book has several unique features that distinguish it from other books:
I. Simplicity: The book has been written and explained in simple
fashion, so that senior undergraduate and first year graduate students (in Civil,
Mechanical, Aerospace, Electrical, Computer Science and Mathematic departments) can
understand the presented materials with minimum background requirements. A working
(undergraduate) knowledge in FORTRAN codings is helpful to understand the "detailed
codings" of the presented materials. Some (undergraduate) linear algebra background
should be useful, although it is NOT a requirement for reading and understanding the
materials in the book. Undergraduate background in Matrix Structural Analysis and/or
Finite Element Analysis should be useful, only for the materials presented in Chapter 3.
Graph theories have not been traditionally introduced in the undergraduate/graduate
engineering curriculums and therefore graph theories are not required to understand the
materials presented in this book.
2. Algorithms are discussed for different parallel and/or vector computer platforms:
Parallel and/or vectorized algorithms for various types ofdirect equation solvers are
vii
viii Preface
presented and discussed for both "shared memory" (such as the Cray-2, Cray-YMP,
Cray-C90, Convex) and "distributed memory" (such as the Intel i860, Intel Paragon,
IBM-SP2, Meiko) computer platforms. The vectorized algorithms can also be
"efficiently" executed on IBM-R6000/590 workstations. The vectorized algorithms
and their associated FORTRAN codes can also be executed (with less efficiency)
on other workstations and/or personal computers (P.c.) without having vectorized
capabilities.
3. More emphasis on important detailed FORTRAN computer implementations:
Efforts have been made to explain to the readers on important detailed FORTRAN
computer implementations of various algorithms presented in the book. Thus, the
readers should be able to incorporate the presented computer codes, subroutines
into hislher application codes.
4. Several state-of-the-art FORTRAN equation solvers are discussed: While great
amounts of effort have been spent to explain the detailed algorithms in a "simple
fashion," many state-of-the-art equation solvers have been developed and presented
in the book. Many of the presented solvers have been used by universities, large
aerospace corporations and government research laboratories in the U.S., Europe
and Asia.
5. Large-scale practical engineering finite element models are used: For derivations
and explanations of various algorithms described in the book, small-scale examples
are used to simplify and to facilitate the discussions. However, several medium to
large-scale, practical engineering finite element models are used to demonstrate the
efficiency and accuracy of the presented algorithms.
6. Algorithms are available for different types of linear equations: Different types of
algorithms for the solutions of various types of system of simultaneous linear
equations are presented in the book. Symmetrical/unsymmetrical, positive
definite/negative definite/indefinite, incore/out-of-core, skyl ine/variab Ie
bandwidth/sparse/tridiagonal system of equations have all been treated in great
detail by the author.
The book contains II chapters. Chapter I presents a brief review of some basic
descriptions of shared and distributed parallel-vector computers. Measurements for
algorithms' performance, and commonly "good practices" to achieve vector speed are
also discussed in this chapter. Different storage schemes for the coefficient (stiffness)
matrix (of system of linear equations) are discussed with great details in Chapter 2.
Efficient parallel algorithms for generation and assembly of finite element coefficient
(stiffness) matrices are explained in Chapter 3. Different parallel-vector "skyline"
algorithms for shared memory computers (such as Cray-YMP, Cray-C90 etc ... ) are
developed and evaluated in Chapter 4. These algorithms have been developed in
conjunction with the skyline storage scheme, proposed earlier in Chapter 2. Parallel-
vector "variable bandwidth" equation solution algorithms (for shared memory
computers) are presented and explained in Chapter 5. These algorithms have been
derived based upon the variable bandwidth storage scheme, proposed earlier in Chapter
2. Out-of-core equation solution algorithms on shared memory computers are considered
in Chapter 6. These algorithms are useful for cases where very large-scale models need
to be solved, and there are not enough core-memories to hold all arrays in the in-core
Preface ix
Norfolk, Virginia
Acknowledgments
During preparation of this book, I have received (directly and indirectly) help from many
people. First, I would like to express my sincere gratitude to my colleagues at NASA
Langley Research Center: Dr. Olaf O. Storaasli, Dr. Jerrold M. Housner, Dr. James
Starnes, Dr. Jaroslaw S. Sobieski, Dr. Keith Belvin, Dr. Peigman M. Maghami, Dr. Tom
Zang, Dr. John Barthelemy, Dr. Carl Gray Jr., Dr. Steve Scotti, Dr. Kim Bey, Dr. Willie
R. Watson, and Dr. Andrea O. Salas for their encouragement and support on the subject
of this book during the past 13 years.
The close collaborative works with Dr. OlafO. Storaasli and Dr. Jiangning Qin, in
particular, have direct impacts on the writing of several chapters in this textbook.
I am very grateful to Dr. Lon Water (Maui, Hawaii), Professors Pu Chen (China),
S.D. Rajan (Arizona), B.D. Belegundu (Pennsylvania), J.S. Arora (Iowa), Dr. Brad
Maker (California), Dr. Gerald Goudreau (California), Dr. Esmond Ng (Lawrence
Berkeley Laboratory, California) and Mr. Maurice Sancer (California) for their
enthusiasm and supports of many topics discussed in this book.
My appreciation also goes to several of my former doctoral students, such as Dr.
T.K. Agarwal, Dr. John Zhang, Dr. Majdi Baddourah, Dr. AI-Nasra, and Dr. H.
Runesha who have worked with me for several years. Some of their research
contributions have been included in this book.
In addition, I would like to thank my colleagues at Old Dominion University (ODU)
and Hong Kong University of Science and Technology (HKUST) for their support,
collaborative works, and friendship. Among them, Professor A. Osman Akan, Professor
Isao Ishibashi, Professor Chuh Mei, and Professor Zia Razzaq at ODU, Professor T. Y.
Paul Chang, and Professor Pin Tong at HKUST. Substantial portions of this textbook
have been completed during my sabbatical leave period (January 1 - December 30,
1996) from O.D.U. to work at HKUST (during February 22 - August 22, 1996).
The successful publication and smooth production of this book are due to ODU
skillful office supported staff: Mrs. Sue Smith, Mrs. Mary Carmone, Mrs. Deborah L.
Miller, and efficient management and careful supervision ofMr. Tom Cohn, Ms. Ana
Bozicevic, and Mr. Felix Portnoy, Editors ofKluwer/Plenum Publishing Corporation.
Special thanks go to Ms. Catherine John, from Academic Press (AP) Ltd., London,
UK for allowing us to reproduce some materials from the AP textbook "Sparse Matrix
Technology," (by Sergio Pissanetzky) for discussions in Chapter 10 (Tables 10.2 and
10.5) of our book.
Last but not least, I would like to thank my parents (Mr. Dac K. Nguyen, and Mrs.
Thinh T. Thai), my family (Hang N. Nguyen, Eric N. D. Nguyen and Don N. Nguyen),
whose encouragement has been ever present.
Duc T. Nguyen
Norfolk, Virginia
Xl
Disclaimer of Warranty
We make no warranties, express or implied, that the programs contained in this
distribution are free of error, or that they will meet your requirements for any particular
application. They should not be relied on for solving a problem whose incorrect solution
could result in injury to a person or loss of property. The author and publisher disclaim
all liability for direct, indirect, or consequential damages resulting from use of the
programs or other materials presented in this book.
xiii
Contents
1. Introduction 0000000000000000000000000000000000000000000000000000
xv
xvi Contents
4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 51
4.2 Choleski-based Solution Strategies .......................... 51
4.3 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 52
4.3.1 Basic sequential skyline Choleski factorization: computer
code (version 1) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 55
4.3.2 Improved basic sequential skyline Choleski factorization:
computer code (version 2) . . . . . . . . . . . . . . . . . . . . . . . . . . .. 59
4.3.3 Parallel-vector Choleski factorization (version 3) . . . . . . . . .. 60
4.3.4 Parallel-vector (with "few" synchronization checks)
Choleski factorization (version 4) .. . . . . . . . . . . . . . . . . . . .. 64
4.3.5 Parallel-vector enhancement (vector unrolling) Choleski
factorization (version 5) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 66
4.3.6 Parallel-vector (unrolling) skyline Choleski factorization
(version 6) ........................................ 69
4.4 Solution of Triangular Systems ............................. 72
4.4.1 Forward solution ................................... 72
4.4.2 Backward solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 78
4.5 Force: A Portable, Parallel FORTRAN Language ............... 81
4.6 Evaluation of Methods on Example Problems . . . . . . . . . . . . . . . . .. 82
4.7 Skyline Equation Solver Computer Program ................... 86
4.8 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 86
4.9 Exercises. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 87
4.10 References.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 88
5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 91
5.2 Data Storage Schemes .................................... 91
5.3 Basic Sequential Variable Bandwidth Choleski Method. . . . . . . . .. 96
5.4 Vectorized Choleski Code with Loop Unrolling ............... 101
5.5 More on Force: A Portable, Parallel FORTRAN Language ...... 103
5.6 Parallel-Vector Choleski Factorization ...................... 103
5.7 Solution of Triangular Systems ............................ 108
5.7.1 Forward solution .................................. 109
5.7.2 Backward solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 112
5.8 Relations Amongst the Choleski, Gauss and LDU Factorizations . 115
5.8.1 Choleski (UTU) factorization . . . . . . . . . . . . . . . . . . . . . . . .. 115
5.8.2 Gauss (with diagonal terms Ljj =l) LU factorization ....... 117
5.8.3 Gauss (LU) factorization with diagonal terms U jj =1 ....... 118
5.8.4 LDLT factorization with diagonal term Ljj =1 ............ 120
5.8.5 Similarities of Choleski and Gauss methods . . . . . . . . . . . .. 122
5.9 Factorization Based Upon "Look Backward" Versus "Look Forward"
Strategies ............................................. 123
Contents xvii